Regular Low-Density Parity-Check Code Decoder

EURASIP Journal on Applied Signal Processing 2003:6, 530–542 c 2003 Hindawi Publishing Corporation An FPGA Implementation of (3, 6)-Regular Low-Density Parity-Check Code Decoder Tong Zhang Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180, USA Email: [email protected] Keshab K. Parhi Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, USA Email: [email protected] Received 28 February 2002 and in revised form 6 December 2002 Because of their excellent error-correcting performance, low-density parity-check (LDPC) codes have recently attracted a lot of attention. In this paper, we are interested in the practical LDPC code decoder hardware implementations. The direct fully parallel decoder implementation usually incurs too high hardware complexity for many real applications, thus partly parallel decoder design approaches that can achieve appropriate trade-offs between hardware complexity and decoding throughput are highly desirable. Applying a joint code and decoder design methodology, we develop a high-speed (3,k)-regular LDPC code partly parallel decoder architecture based on which we implement a 9216-bit, rate-1/2(3, 6)-regular LDPC code decoder on Xilinx FPGA device. This partly parallel decoder supports a maximum symbol throughput of 54 Mbps and achieves BER 10−6 at 2 dB over AWGN channel while performing maximum 18 decoding iterations. Keywords and phrases: low-density parity-check codes, error-correcting coding, decoder, FPGA. 1. INTRODUCTION decoding messages are iteratively computed on each variable In the past few years, the recently rediscovered low-density node and check node and exchanged through the edges be- parity-check (LDPC) codes [1, 2, 3] have received a lot of at- tween the neighboring nodes. tention and have been widely considered as next-generation Recently, tremendous efforts have been devoted to ana- error-correcting codes for telecommunication and magnetic lyze and improve the LDPC codes error-correcting capabil- storage. Defined as the null space of a very sparse M × N ity, see [4, 5, 6, 7, 8, 9, 10, 11] and so forth. Besides their parity-check matrix H, an LDPC code is typically represented powerful error-correcting capability, another important rea- by a bipartite graph, usually called Tanner graph, in which son why LDPC codes attract so many attention is that the one set of N variable nodes corresponds to the set of code- iterative BP decoding algorithm is inherently fully parallel, word, another set of M check nodes corresponds to the set thus a great potential decoding speed can be expected. of parity-check constraints and each edge corresponds to The high-speed decoder hardware implementation is ob- a nonzero entry in the parity-check matrix H. (A bipartite viously one of the most crucial issues determining the extent graph is one in which the nodes can be partitioned into two of LDPC applications in the real world. The most natural so- sets, X and Y, so that the only edges of the graph are be- lution for the decoder architecture design is to directly in- tween the nodes in X and the nodes in Y.) An LDPC code stantiate the BP decoding algorithm to hardware: each vari- is known as (j, k)-regular LDPC code if each variable node able node and check node are physically assigned their own has the degree of j and each check node has the degree of processors and all the processors are connected through an k, or in its parity-check matrix each column and each row interconnection network reflecting the Tanner graph con- have j and k nonzero entries, respectively. The code rate of a nectivity. By completely exploiting the parallelism of the BP (j, k)-regular LDPC code is 1 − j/k provided that the parity- decoding algorithm, such fully parallel decoder can achieve check matrix has full rank. The construction of LDPC codes very high decoding speed, for example, a 1024-bit, rate-1/2 is typically random. LDPC codes can be effectively decoded LDPC code fully parallel decoder with the maximum symbol by the iterative belief-propagation (BP) algorithm [3] that, throughput of 1 Gbps has been physically implemented us- as illustrated in Figure 1, directly matches the Tanner graph: ing ASIC technology [12]. The main disadvantage of such An FPGA Implementation of (3, 6)-Regular LDPC Code Decoder 531 Check nodes Check-to-variable message Variable-to-check message Variable nodes Figure 1: Tanner graph representation of an LDPC code and the decoding messages flow. fully parallel design is that with the increase of code length, complexity is significantly reduced. Exploiting the good min- typically the LDPC code length is very large (at least several imum distance property of LDPC codes, this decoder em- thousands), the incurred hardware complexity will become ploys parity check as the earlier decoding stopping criterion more and more prohibitive for many practical purposes, to achieve adaptive decoding for energy reduction. With the for example, for 1-K code length, the ASIC decoder imple- maximum 18 decoding iterations, this FPGA partly parallel mentation [12]consumes1.7M gates. Moreover, as pointed decoder supports a maximum of 54 Mbps symbol through- out in [12], the routing overhead for implementing the en- put and achieves BER (bit error rate) 10−6 at 2 dB over tire interconnection network will become quite formidable AWGN channel. due to the large code length and randomness of the Tan- This paper begins with a brief description of the LDPC ner graph. Thus high-speed partly parallel decoder de- code decoding algorithm in Section 2.InSection 3,webriefly sign approaches that achieve appropriate trade-offsbetween describe the joint code and decoder design methodology for hardware complexity and decoding throughput are highly (3,k)-regular LDPC code partly parallel decoder design. In desirable. Section 4, we present the detailed high-speed partly parallel For any given LDPC code, due to the randomness of its decoder architecture design. Finally, an FPGA implementa- Tanner graph, it is nearly impossible to directly develop a tion of a (3, 6)-regular LDPC code partly parallel decoder is high-speed partly parallel decoder architecture. To circum- discussed in Section 5. vent this difficulty, Boutillon et al. [13]proposedadecoder- first code design methodology: instead of trying to conceive the high-speed partly parallel decoder for any given ran- 2. DECODING ALGORITHM dom LDPC code, use an available high-speed partly par- Since the direct implementation of BP algorithm will incur allel decoder to define a constrained random LDPC code. too high hardware complexity due to the large number of We may consider it as an application of the well-known multiplications, we introduce some logarithmic quantities “Think in the reverse direction” methodology. Inspired by to convert these complicated multiplications into additions, the decoder-first code design methodology, we proposed which lead to the Log-BP algorithm [2, 15]. a joint code and decoder design methodology in [14]for Before the description of Log-BP decoding algorithm, ,k (3 )-regular LDPC code partly parallel decoder design. By we introduce some definitions as follows. Let H denote the jointly conceiving the code construction and partly paral- M × N sparse parity-check matrix of the LDPC code and ,k lel decoder architecture design, we presented a (3 )-regular Hi,j denote the entry of H at the position (i, j). We de- LDPC code partly parallel decoder structure in [14], which fine the set of bits n that participate in parity-check m as ,k not only defines very good (3 )-regular LDPC codes but ᏺ(m) ={n : Hm,n = 1}, and the set of parity-checks m in also could potentially achieve high-speed partly parallel which bit n participates as ᏹ(n) ={m : Hm,n = 1}.Wede- decoding. note the set ᏺ(m) with bit n excluded by ᏺ(m) \ n, and the In this paper, applying the joint code and decoder design set ᏹ(n) with parity-check m excluded by ᏹ(n) \ m. methodology, we develop an elaborate (3,k)-regular LDPC code high-speed partly parallel decoder architecture based Algorithm 1 (Iterative Log-BP Decoding Algorithm). on which we implement a 9216-bit, rate-1/2(3, 6)-regular LDPC code decoder using Xilinx Virtex FPGA (Field Pro- Input grammable Gate Array) device. In this work, we significantly p0 = P x = p1 = P x = = modify the original decoder structure [14] to improve the de- The prior probabilities n ( n 0) and n ( n 1) − p0 n = ,...,N coding throughput and simplify the control logic design. To 1 n, 1 ; achieve good error-correcting capability, the LDPC code decoder architecture has to possess randomness to some extent, Output which makes the FPGA implementations more challenging Hard decision x ={x1,...,xN }; since FPGA has fixed and regular hardware resources. We propose a novel scheme to realize the random connectivity Procedure by concatenating two routing networks, where all the ran- (1) Initialization: For each n, compute the intrinsic (or 0 1 dom hardwire routings are localized and the overall routing channel) message γn = log pn/pn and for each (m, n) ∈ 532 EURASIP Journal on Applied Signal Processing High-speed partly Constrained random parallel decoder parameters Random input H3 (3,k)-regular LDPC code Deterministic Construction of ensemble defined by input H Selected code = 1 H H H H = H2 H3 Figure 2: Joint design flow diagram. {(i, j) | Hi,j = 1},compute Moreover, by delivering the hard decision xi from each VNU to its neighboring CNUs, the parity-check H · x can be eas- 1+e−|γn| ily performed by all the CNUs. Thanks to the good min- αm,n = sign γn log , (1) 1 − e−|γn| imum distance property of LDPC code, such adaptive decoding scheme can effectively reduce the average energy con- where sumption of the decoder without performance degradation.

Regular Low-Density Parity-Check Code Decoder

Design of 8-Bit Arithmetic Processor Unit Based on Reversible Logic by A

Computer Architecture (BCA Part-I)

Address Decoding Large-Size Binary Decoder: 28-To-268435456 Binary Decoder for 256Mb Memory

Modeling and Implementation of a 6-Bit, 50Mhz Pipelined ADC in CMOS

Unit-4 Combinational Logic Design

School of Electrical & Computer Engineering Purdue University

Certification of an Instruction Set Simulator Xiaomu Shi

Unit Iii Combinational Logic Design

Ece421) Digital Electronics Fundamental (Ece422

Multivalued Circuits and Interconnect Issues

Lecture Summary Notes

(12) United States Patent (Io) Patent No.: US 9,831,873 B2 Stoica Et Al