An Ultimate-Shannon-Limit-Approaching Gbps Throughput Encoder/Decoder System S
Total Page:16
File Type:pdf, Size:1020Kb
1 An Ultimate-Shannon-Limit-Approaching Gbps Throughput Encoder/Decoder System S. Jiang, P. W. Zhang, F. C. M. Lau and C.-W. Sham Abstract—A turbo-Hadamard code (THC) is a type of low-rate component of the encoding/decoding system is fully opti- channel code with capacity approaching the ultimate Shannon mized to achieve high hardware-utilization rate. Arrange- − limit, i.e., 1.59 dB. In this paper, we investigate the hardware ment is made to ensure the data corresponding to different design of a turbo-Hadamard encoder/decoder system. The entire system has been implemented on an FPGA board. A bit error codewords are in an orderly manner such that they can be −5 rate (BER) of 10 can be achieved at Eb/N0 = 0.04 dB with received/stored/retrieved/processed efficiently at the decoder. a throughput of 3.2 Gbps or at Eb/N0 = −0.45 dB with a Last but not the least, we construct rate-adaptive THCs by throughput of 1.92 Gbps. puncturing the parity bits of the code. Sect. II briefly reviews the THC. Sect. III describes our overall design of the THC encoder/decoder system and evaluates the latency of each sub- I. INTRODUCTION decoder. Sect. IV shows the FPGA implementation results, Turbo code, low-density parity-check (LDPC) code, and including hardware utilization, throughput and bit error rate polar code are the most widely studied and implemented performance. Finally, Sect. ?? provides some concluding re- error-correction codes over the last two-and-a-half decades marks. because of their capacity-approaching capabilities [1], [2], [3], [4], [5], [6]. Both turbo code and LDPC code, when II. TURBO-HADAMARD CODE (THC) used in conjunction with Hadamard code, have been shown to achieve performance very close to the ultimate Shannon A Hadamard code of order-r can be constructed from a limit, i.e., −1.59 dB [7], [8]. Another code with comparable Hadamard matrix of the same order. An order-r Hadamard r performance is the concatenated zig-zag Hadamard code [9]. matrix Hn where n =2 can be constructed recursively using Such codes can be used in a multi-user environment such as +Hn/2 +Hn/2 in a code-division multiple-access or an interleave-division Hn = (1) +Hn/2 −Hn/2 multiple-access (IDMA) [10] system. The codes can also be used to carry embedded messages in point-to-point wire- with H1 = [+1]. The codeword set of an order-r Hadamard less/wired communications. In [7], [8], [9], simulation results code is denoted by {±hj : j = 0, 1, 2,..., 2r − 1}, of the aforementioned codes with code lengths of 3.5 Mbits where +hj and −hj represent the (j + 1)-th columns of and 22 Mbits have been performed. However, the hardware +Hn and −Hn, respectively. Denoting the code-bit positions implementation of the codes has never been investigated. One by {0, 1, 2,..., 2r − 1}, the information bits are located at − reason is that the code lengths quoted above are extremely {0, 1, 2, 4, 8,..., 2r 1} and the parity-check bits are at the long, making the hardware complexity prohibitively high. remaining positions. Fig. 1 shows the encoder block diagram In this paper, we investigate the hardware implementation and code structure of a convolutional-Hadamard code and of turbo-Hadamard code (THC) using FPGA. Our aim is to Fig. 2 shows the code structure of a THC [7]. A convolutional- explore an efficient implementation of an encoder/decoder Hadamard code is a concatenation of a single-parity-check system that can accomplish high throughput with reasonable code, an S-state recursive convolutional code and a Hadamard complexity. To achieve our goal, we propose a multiple code. A THC is the combination of a number of convolutional- sub-decoders system. By allowing the sub-decoders in the Hadamard codes, say M convolutional-Hadamard codes, car- system to decode different codewords simultaneously, we can rying the same but interleaved message bits. increase the throughput multiple times. Another challenge We refer to Fig. 1(b). In a convolutional-Hadamard code, to overcome is to design interleavers that can minimize the each block message D contains L bits and is segmented into decoding latency without sacrificing bit error rate (BER) K sub-blocks where each sub-block dk (k = 1, 2,...,K) ′ performance. For this issue, we propose applying a fixed contains r bits, i.e., L = rK. The parity bit qk of dk is inter-window shuffle (FIWS) interleaver [11]. Moreover, each computed and sent through an S-state rate-1/2 systematic recursive convolutional encoder producing convolutional codes S. Jiang, P. W. Zhang and F. C. M. Lau are with the Department ′ denoted as (qk , qk). Finally (dk, qk) is encoded into an order- of Electronic and Information Engineering, The Hong Kong Polytech- nic University, Hong Kong (email: [email protected], peng- r Hadamard code ck = (dk, qk, pk), where pk represents [email protected], [email protected]). The work de- the parity bits in the Hadamard code. The code length and scribed in this paper was supported by a grant from the RGC of the Hong code rate of an order-r THC are therefore given by l = Kong SAR, China (Project No. PolyU 152088/15E). r rK r C.-W. Sham is with Department of Computer Science, The University of rK + MK(2 − r) and rc = rK+MK(2r −r) = r+M(2r −r) , Auckland, New Zealand (email: [email protected]). respectively. 2 qk qk S c ę hj Pseudo-random Turbo-Hadamard Turbo-Hadamard 63& VWDWH5&( +DGDPDQG k ^ ` Channel dk HQFRGHU number generator encoder decoder dk (a) Fig. 3: Data flow of the THC encoder/decoder system. D q P the last THC decoding iteration, replaces the corresponding 2 values in 2xk/σ and becomes the input of the current K component code. r III. DECODER DESIGN r+ r Fig. 3 shows the data flow of our THC encoder/decoder (b) system. As the implementations of the encoder and channel Fig. 1: Convolutional-Hadamard code. (a) Encoder block diagram and (b) code simulator are relatively straightforward, we only focus on structure. SPC: single-parity check; RCE: recursive convolutional encoder. the decoder design. Our THC decoder design consists of M convolutional-Hadamard decoder (each called a sub-decoder). D D Moreover, consecutive sub-decoders (also the Mth one with the first one) are connected via an appropriate interleaver. ʌ &RQYROXWLRQDO+DGDPDQGHQFRGHU q , p Compared with the design using only one convolutional- Hadamard decoder, our design can decode M THC codewords ʌ &RQYROXWLRQDO+DGDPDQGHQFRGHU q , p simultaneously in a pipeline manner using the same number of interleavers, thus increasing the throughput by M times. M M M Moreover, the latency can be reduced because simpler control ʌ &RQYROXWLRQDO+DGDPDQGHQFRGHU q , p logics can be used. Fig. 4 shows an M =3 THC decoder. Fig. 2: Structure of turbo-Hadamard code. Referring to lower part of Fig. 4, a sub-decoder consists of three main processing units: fast-Hadamard-transform (FHT), BCJR and dual FHT (DFHT). We use a fast Hadamard trans- Suppose a convolutional-Hadamard codeword c = form (FHT) to compute the a priori information {γ (±hj )} c c c c is transmitted in an additive white Gaus- k [ 1, 2, 3, ..., K] and a DFHT (also called a-posteriori-probability-FHT (APP- sian noise (AWGN) channel with noise variance σ2 and FHT)) to compute the a posteriori LLR of the information bits a vector x = [x1, x2, x3, ..., xK] is received. Considering the a-posteriori-probability (APP) decoding of convolutional- because both transforms [7] are simple to be implemented on Hadamard code, the logarithm-likelihood ratio (LLR) of the hardware. Details of each processing unit is described below. th bit in the th sub-block is given by [7] i k 1) (i) The output LLRs of the message bits from the previous j P γk(±h )α(sk)β(sk+1) sub-decoder (after interleaving), (ii) the extrinsic LLRs H[i,j]=±1 of the message bits produced by the current sub-decoder Lk[i] = log j (2) P γk(±h )α(sk)β(sk+1) H[i,j]=∓1 in the previous iteration, and (iii) channel LLRs of the parity bits of the current sub-decoder (stored in RAM), j j where γk(±h ) = Pr(xk|ck = ±h ) is the a priori informa- are input to the sub-decoder. The extrinsic LLRs of the tion and is calculated based on the channel log-likelihood ratio current sub-decoder are subtracted from the output LLRs 2xk (LLR) Lk = σ2 ; α(sk) and β(sk+1) are calculated using from the previous sub-decoder. Then together with the the forward-backward recursion algorithm in the Bahl-Cocke- channel LLRs of the parity bits, these signals are input Jelinek-Raviv (BCJR) decoder. Consequently, a convolutional- in a pipeline manner to the FHT unit. The FHT unit Hadamard decoder consists of three main stages. j calculates the a priori information {γk(±h )}. Note that j 1) Generate the a priori information {γk(±h )} based on the FHT unit contains r stages. After each stage, the 2 the input 2xk/σ . number of quantization bits in FHT unit is increased by j 2) Perform BCJR decoding based on {γk(±h )}. 1 to avoid data overflow. 3) Update the a posteriori LLR of the information bits based 2) According to the encoding of the convolutional- on the results from the first two stages. Hadamard code, the outputs of the FHT unit represent The minimum hardware requirement of a THC decoder con- the probabilities of a transition between the states in a sists of one convolutional-Hadamard decoder, M interleavers, convolutional code based on the trellis of the code.