electronics

Article Reducing the Overhead of BCH Codes: New Double Error Correction Codes

Luis-J. Saiz-Adalid * , Joaquín Gracia-Morán , Daniel Gil-Tomás, J.-Carlos Baraza-Calvo and Pedro-J. Gil-Vicente

Institute of Information and Communication Technologies (ITACA), Universitat Politècnica de València, Camino de Vera s/n, 46022 Valencia, Spain; [email protected] (J.G.-M.); [email protected] (D.G.-T.); [email protected] (J.-C.B.-C.); [email protected] (P.-J.G.-V.) * Correspondence: [email protected]

 Received: 15 October 2020; Accepted: 7 November 2020; Published: 11 November 2020 

Abstract: The Bose-Chaudhuri-Hocquenghem (BCH) codes are a well-known class of powerful error correction cyclic codes. BCH codes can correct multiple errors with minimal redundancy. Primitive BCH codes only exist for some word lengths, which do not frequently match those employed in digital systems. This paper focuses on double error correction (DEC) codes for word lengths that are in powers of two (8, 16, 32, and 64), which are commonly used in memories. We also focus on hardware implementations of the encoder and decoder circuits for very fast operations. This work proposes new low redundancy and reduced overhead (LRRO) DEC codes, with the same redundancy as the equivalent BCH DEC codes, but whose encoder, and decoder circuits present a lower overhead (in terms of propagation delay, silicon area usage and power consumption). We used a methodology to search parity check matrices, based on error patterns, in order to design the new codes. We implemented and synthesized them, and compared their results with those obtained for the BCH codes. Our implementation of the decoder circuits achieved reductions between 2.8% and 8.7% in the propagation delay, between 1.3% and 3.0% in the silicon area, and between 15.7% and 26.9% in the power consumption. Therefore, we propose LRRO codes as an alternative for protecting information against multiple errors.

Keywords: reliability; fault tolerance; error control codes; double error correction; BCH codes

1. Introduction Error control codes (ECCs) are frequently employed for fault tolerance in dependable systems. has been studied for over half a century, and it is still going stronger than ever [1,2], because of its extensive application in computing, data storage, communications, etc. The Bose-Chaudhuri-Hocquenghem (BCH) codes are one of the best-known ECCs. They were discovered in 1959 by Hocquenghem [3] and, independently, in 1960 by Bose and Ray-Chaudhuri [4]. Since then, BCH codes have been extensively employed in several applications, namely: flash memories in solid-state drives (SSDs) [5], optical storage like compact disks (CDs) or digital versatile disks (DVDs) [6], ethernet [7], video codecs [8], digital video broadcasting (DVB) [9], and satellite communications [10]. Their algebraic features make BCH codes very useful in a wide range of situations, and their error coverage is achieved with minimum redundancy. Nevertheless, when applied to computers, these codes have two main weaknesses. First, primitive BCH codes (those generated according to their strict definition) only exist for a limited number of word lengths [11]. This problem can be easily solved by shortening a longer code. However, although they maintain their error coverage, shortened BCH codes may lose other properties, such as cyclicity or minimal redundancy, as stated later.

Electronics 2020, 9, 1897; doi:10.3390/electronics9111897 www.mdpi.com/journal/electronics Electronics 2020, 9, x FOR PEER REVIEW 2 of 14

Electronics 2020, 9, 1897 2 of 15 The second problem with these codes is the latency of the decoding process. Although their algebraic structure is useful in simplifying the encoding and decoding procedures, most of the decodingThe secondalgorithms problem have witha sequential these codes structure is the that latency requires of the several decoding clock process. cycles to Although complete their the algebraicdecoding structure[11]. Commonly, is useful inthis simplifying is not a problem the encoding in software and decoding implementation procedures, or most even of thein hardware decoding algorithmswhen the speed have arequirements sequential structure are not thatvery requires high. Ho severalwever, clock the cyclesusefulness to complete of BCH the codes decoding is limited [11]. Commonly,when very fast this encoding is not a problemand decoding in software operations implementation are required. or even in hardware when the speed requirementsThe information are not stored very high. in key However, elements theof a usefulness computer ofsystem, BCH codessuch as is registers limited whenand memories, very fast encodingmay be perturbed and decoding by different operations physical are required. mechanisms [12–14]. As technology increases in integration scale,The single information error correction stored in (SEC) keyelements or single oferro a computerr correction-double system, such error as registersdetection and (SEC-DED) memories, codes may bemay perturbed not be enough by different for present physical and mechanisms future comput [12–14ers.]. AsMultiple technology error increasescorrection in codes, integration like BCH, scale, singlebecome error an correctioninteresting (SEC)alternative, or single but error designers correction-double have to deal error with detection the two (SEC-DED) problems codesdescribed may notpreviously. be enough First, for the present lengths and of futuredata words computers. are commo Multiplenly in error a power correction of two codes, (8, 16, like32, etc.), BCH, and become they ando interestingnot match to alternative, the block sizes but designers of primitive have BCH to dealcodes. with Second, the two their problems “slow” decoding described may previously. reduce First,the system the lengths performance. of data words are commonly in a power of two (8, 16, 32, etc.), and they do not matchThis to thepaper block focuses sizes on of binary primitive codes BCH for codes.double Second,error correction their “slow” (DEC), decoding applied to may common reduce data the systemword lengths performance. in computers (i.e., 8, 16, 32, and 64). We propose new low redundancy and reduced overheadThis paper(LRRO) focuses DEC oncodes, binary which codes maintain for double the error same correction error coverage (DEC), appliedand redundancy to common as data the wordequivalent lengths BCH in computerscodes, but reduce (i.e., 8, the 16, 32,overhead and 64). intr Weoduced propose by the new encoder low redundancy and decoder and circuits reduced in overheadterms of the (LRRO) propagation DEC codes,delay, silicon which area, maintain and power the same consumption. error coverage Different and encoder redundancy and decoder as the equivalentcircuits have BCH been codes, implemented but reduce and the synthesized overhead introduced in order byto validate the encoder the anderror decoder coverage circuits and into termsmeasure of theand propagation compare those delay, parameters. silicon area, The and results power validate consumption. the improvements Different encoder achieved and decoderby our circuitsproposal, have and been confirm implemented LRRO andcodes synthesized as an intere in ordersting to alternative validate the to error BCH coverage codes andin high-speed to measure andapplications, compare like those random parameters. access The memories results (RAM) validate and the processor improvements registers. achieved These by codes our proposal,are especially and confirmwell suited LRRO for codes RAM as memories, an interesting where alternative low redundancy, to BCH codes high-speed in high-speed operations, applications, and likelow random power accessconsumption memories are mandatory. (RAM) and processor registers. These codes are especially well suited for RAM memories,This paper where is low organized redundancy, as follows. high-speed Section operations, 2 introduces and basic low powerconcepts consumption about coding are theory mandatory. and BCHThis codes. paper The isproposed organized LRRO as follows. codes are Section described2 introduces in Section basic 3. concepts Section 4 about includes coding the theory evaluation and BCHof the codes. proposal The and proposed a comparison LRRO codes with arethe describedBCH code ins. SectionFinally,3 .Section Section 54 presentsincludes thesome evaluation conclusions of theand proposal ideas for andfuture a comparison work. with the BCH codes. Finally, Section5 presents some conclusions and ideas for future work. 2. Coding Theory and BCH Codes 2. Coding Theory and BCH Codes 2.1. Basics on Error Control Coding 2.1. Basics on Error Control Coding An (n, k) binary linear block ECC encodes a k-bit input word in an n-bit output word [15]. The An (n, k) binary linear block ECC encodes a k-bit input word in an n-bit output word [15]. The input word, u = (u0, u1, ..., uk−1), is a k-bit vector that represents the original data. The code word, b = input word, u = (u0, u1, ... , uk 1), is a k-bit vector that represents the original data. The code word, (b0, b1, ..., bn−1), is an n-bit vector,− where the (n − k) added bits are called the parity, code, or redundant b = (b0, b1, ... , bn 1), is an n-bit vector, where the (n k) added bits are called the parity, code, or bits. b is transmitted− through an unreliable channel that− delivers the received word, r = (r0, r1, ..., rn−1). redundant bits. b is transmitted through an unreliable channel that delivers the received word, r = (r , The error vector, e = (e0, e1, ..., en−1), models the error induced by the channel. If no error has occurred0 r1, ... , rn 1). The error vector, e = (e0, e1, ... , en 1), models the error induced by the channel.⊕ If no in the i-th− bit, then ei = 0; otherwise, ei = 1. Therefore,− r can be interpreted as r = b e. Figure 1 error has occurred in the i-th bit, then e = 0; otherwise, e = 1. Therefore, r can be interpreted as r = b synthesizes this encoding, channel crossing,i and syndromei decoding processes. ⊕ e. Figure1 synthesizes this encoding, channel crossing, and syndrome decoding processes.

Figure 1. Encoding, channel crossing, and syndrome decoding processes. Figure 1. Encoding, channel crossing, and syndrome decoding processes. The generator matrix, Gk n, of a (together with its related parity check matrix, × H(n Thek) ngenerator) defines thematrix, code G [k1×].n, of For a linear the encoding code (together process, withb = itsu G related. The encoded parity check word, matrix,b, meets H(n− thek)×n) defines− × the code [1]. For the encoding process, b = u·G. The encoded· word, b, meets the requirement

Electronics 2020, 9, 1897 3 of 15 requirement H bT = 0, which means that it is a correct code word. For syndrome decoding, syndrome · is defined as sT = H rT, and it exclusively depends on e, as follows: · sT = H rT = H (b e)T = H bT H eT = H eT. (1) · · ⊕ · ⊕ · · There must be a different syndrome, s, for each correctable error vector, e. Syndrome decoding is done through a lookup table that relates each syndrome to the decoded error vector, eˆ. If s = 0, we can assume that eˆ = 0, and hence r is correct. Otherwise, an error has occurred. The decoded code word, bˆ , is calculated as bˆ = r eˆ. From bˆ , it is easy to obtain uˆ by discarding the parity bits. If the fault ⊕ hypothesis used to design the ECC is consistent with the behavior of the channel, uˆ and u must be equal with a very high probability. For a binary word, the term Hamming weight, w, denotes the number of ones in that word. As explained later, the Hamming weights of the rows and columns of the parity check matrix determine the complexity of a code. The between two binary words is the number of bits in which they differ. The minimum Hamming distance of a code (dmin) is the minimum of the distances between all pairs of valid code words. This parameter determines the error coverage of a code. Code shortening is a code construction where, starting from a longer linear , the number of data bits is reduced, while maintaining the same number of parity bits [15]. In this way, we can construct a code for shorter data words, maintaining the same Hamming distance, and, therefore, the same error coverage. As stated above, the encoding process in linear block codes can be easily implemented. However, the decoding process (mainly the lookup table implementation) may be difficult. Cyclic codes form an important subclass of linear block codes [11]. Encoding and syndrome computation can be implemented easily using linear feedback shift register circuits operated sequentially. Because of their algebraic structure, there are different practical methods for decoding. Cyclic codes are linear block codes with the additional property that, for each code word, all cyclically shifted words are also valid code words. representation [11] is frequently employed to work with these codes. For example, the input word, u, can be represented as a (k 1) or lower degree polynomial with the variable X, − where the power of X is used to locate the bit ui in the following word:

k 1 k 2 u(X) = uk 1 X − + uk 2 X − + ... + u1 X + u0 (2) − · − · · Cyclic codes can be represented using a generator polynomial, g(X). The data words, b(X), are computed as b(X) = u(X) g(X). A parity check polynomial can be obtained from the generator · polynomial. In fact, h(X) g(X) = Xn + 1 [11]. The generator and parity check matrices can be obtained · from these . BCH codes are one of the best-known cyclic codes. Reed–Solomon codes are the most important subclass of non-binary BCH codes [11]. However, we will focus on binary BCH codes from now on.

2.2. Binary BCH Codes For any positive integers where m 3 and t < 2m 1, there exists a binary BCH code with code ≥ − word length, n = 2m 1; number of parity bits, (n k) mt; and minimum distance d 2t + 1. − − ≤ min ≥ Therefore, this code can correct t errors [11]. For a t-error correction BCH code, its generator polynomial is given by g(X) = m1(X) m3(X) i × × ... m2t 1(X), where mi(X) is the minimal polynomial of α ,(i = 1, 3, ... , 2t 1) and α is an element × m − − of GF(2 ) of order n. GF refers to Galois fields (or finite fields). If m1(X) is a primitive polynomial of degree m over GF (2), the code is called a primitive binary BCH code. Then, α is a primitive element and its order is n = 2m 1. The minimal polynomials m (X) and/or the generator polynomials for − i Electronics 2020, 9, 1897 4 of 15 binary primitive BCH codes can be found in the literature [11]. More information about algebra for coding theory can be found, for example, in the literature [1,11,15]. There are different algorithms to calculate the error-location polynomial (the key step of the decoding): Berlekamp–Massey (mainly used to implement software decoders), Euclidean (mostly employed in hardware implementations), etc. [11,16]. These methods are sequential algorithms with iterative steps. Therefore, they require several clock cycles to perform the decoding process. A combinational decoder (sometimes referred to as a parallel decoder) is proposed in the literature [17]. Because of its importance in this work, it is briefly explained in Section 2.4. It is first necessary to understand how to obtain the generator and parity check matrices from the equivalent polynomials.

2.3. Matrices and Polynomials As an example, let us consider the binary BCH code with m = 3 and t = 1. It is a (7, 4) BCH SEC code. It is equivalent to the cyclic Hamming code. Its generator polynomial is g(X) = X3 + X + 1 [11]. As stated in Section 2.1, h(X) = (X7 + 1)/g(X) = X4 + X2 + X + 1. Combinational encoders and decoders may require the equivalent generator and parity check matrices. The binary representation of the generator polynomial is 1011. Arranging and shifting this word in the matrix, we obtain the following:

 1011000       0101100  G =  . (3)  0010110     0001011 

n 1 As it can be observed, it is not in a systematic form. Obtaining the remainder by dividing X − , n 2 n k X − , ... , X − by the generator polynomial, we get the matrix in a systematic form [16]:

 1000101       0100111  Gs =  . (4)  0010110     0001011 

The same procedure can be followed to obtain the parity check matrix:      1011100   1001011      H =  0101110  Hs =  0101110 . (5)   →    0010111   0010111 

This method is applied to BCH DEC codes in the literature [17], as presented in the following.

2.4. Combinational Circuits for BCH Codes The work presented in the literature [17] proposes combinational encoder and decoder circuits for BCH DEC codes for SRAM protection. As stated before, BCH DEC codes have not found favorable application in SRAMs because of the non-alignment of their block sizes to typical memory word lengths, and particularly because of the large multi-cycle latency of traditional iterative decoding algorithms. The authors propose a solution based on the generator and parity check matrices. Shortened codes can be easily obtained from the matrices of the primitive codes. The encoder circuit can be simply designed from the generator matrix. Syndrome computation (the first step of the decoding process) is generated using the parity check matrix. Electronics 2020, 9, 1897 5 of 15

The key element in this proposal is the error pattern decoder, equivalent to the lookup table shown in Figure1. Combinational logic is employed to map the syndromes for correctable error patterns. This mapping is precomputed by multiplying all correctable error patterns with the parity check matrix, H. Once the estimated error vector is determined, an erroneous bit is corrected by complementing it; hence, the error corrector circuit (final step of the decoding process) is simply a stack of XOR gates. The authors of this work implemented BCH DEC codes for 16, 32, and 64 bits, and synthesized them using a 90-nm standard cell library. They reported, for example, decoding latencies ranging from 1.4 ns (for 16 bits) to 2.2 ns (for 64 bits).

3. Low Redundancy and Reduced Overhead (LRRO) Double Error Correction Codes The Hamming weight for a row of a parity check matrix determines the number of terms to be XORed in order to calculate the parity and syndrome bits. Therefore, in hardware implementation, the heaviest row defines the logic depth of the XOR tree, which implements the encoder circuit and the syndrome computation in the decoder circuit. It influences the delay introduced by the ECC. In the same way, the Hamming weight of the whole parity check matrix determines the number of logic gates required in the encoder circuit and the syndrome computation in the decoder circuit. This number has an important influence on both the silicon area occupied and the power consumed by those circuits. Finally, the complexity of the ECC is mainly determined by its error coverage, and affects all overheads (delay, silicon area, and power consumption). Nevertheless, as the error coverage is determined by the design, nothing can be done about this parameter. Therefore, to reduce the overhead introduced by the encoder and decoder circuits, we must focus on the Hamming weights of the heaviest row and the whole matrix. Frequently, ECC designers can achieve these objectives by increasing the number of parity bits. Depending on the application, it could be an interesting alternative (e.g., [18]). However, this is not a good idea when protecting memories, as all redundant bits must be stored per individual word in the whole memory, leading to a much higher silicon area being occupied and more power being consumed by the memory circuitry. Is it possible to find ECCs with the same redundancy as BCH codes, but reducing the overhead introduced? For thid, we employed a searching methodology based on the errors to be corrected and/or detected [19]. Although this methodology was initially employed to design flexible unequal error control (FUEC) codes, it has been successfully used to design different families of codes (e.g., [18,20,21]). First, let us briefly describe this methodology. Later, we describe the procedure needed in order to obtain matrices for the existing BCH DEC codes. Next, we present the new LRRO DEC codes. Finally, some important considerations have been included at the end of this section.

3.1. Searching Parity Check Matrices Searching a parity check matrix that achieves the required error coverage may end up being very complex. We used a methodology based on searching matrices that can correct and/or detect a given set of error vectors. This methodology was first presented in the literature [19] to design flexible unequal error control (FUEC) codes. Although a detailed explanation of the methodology is out of the scope of this paper, it is briefly summarized in Figure2, and is described in the following. After determining the values of n and k for the code to be designed, the error patterns to be corrected must be selected. These error patterns can be represented using error vectors, according to the definition given in Section 2.1. Then, the parity check matrix, H, that satisfies (6) is searched, where E+ represents the set of error vectors to be corrected.

T T H e , H e ; e , e E+ e , e , (6) · i · j ∀ i j ∈ i j Electronics 2020, 9, 1897 6 of 15

Electronics 2020, 9, x FOR PEER REVIEW 6 of 14 that is, each correctable error must have a different syndrome. In this case, E+ must include the vectors for theA detailed single and explanation double errors. of this Single algorithm, errors as are well represented as a code design with vectors example, ( ... can1 ... be ),found and doublein the literatureerrors with [19]. vectors ( ... 1 ... 1 ... ), where the dots represent zero or more zeroes.

Procedure CheckPartialMatrix partial_H (n-k) × ncols /* ncols ∈ [1..n] */ SyndromeSet = {} Determine n, k For each error vector e in E+ partial_e = (e0, e1, ..., encols−1) If HammingWeight(e) = HammingWeight(partial_e) Fill the set of error vectors newSyndrome = CalculateSyndrome(partial_H × Transpose(partial_e)) If newSyndrome in SyndromeSet then Return /* Wrong partial matrix */ to be corrected (E+) Else Add newSyndrome to SyndromeSet End if End for Search H If ncols = n Add partial_H to SolutionsSet Return Implement the encoder Else For each possible new_column /* n-k bits, excluding the all 0 combination */ and decoder circuits CheckPartialMatrix [partial_H | new_column] (n-k) × (ncols+1) End for End if End procedure

(a) (b)

FigureFigure 2. Searching parityparity check check matrices matrices methodology methodology based based on the on set the of errorset of patterns error patterns to be corrected: to be corrected:(a) flow chart (a) flow and chart (b) recursive and (b) backtrackingrecursive backtracking algorithm algorithm (extracted (extracted from [19]). from [19]).

3.2. BCHTo findDEC the Codes matrix, for 8, the 16, recursive32, and 64 backtrackingBits algorithm shown in Figure2b is used. It checks partial matrices and adds a new column only if the previous matrix satisfies the requirements. The As stated above, in most cases, the block sizesn ofk the primitive BCH codes do not align with the added columns must be non-zero, so there are 2 − 1 combinations for each column. In order to sizes employed in memories. Hence, if we want to use −BCH DEC codes with data lengths of 8, 16, 32, find matrices with a low Hamming weight, columns with a lower number of ones are checked first. or 64 bits, they must be shortened from primitive BCH DEC codes of longer data lengths. As Once the H matrix is selected, it is easy to determine the logic equations in order to calculate each described in Section 2.1, code shortening allows for the design of codes with any block size using parity and syndrome bit, as well as the syndrome lookup table. They are required for encoder and longer block size codes. The primitive BCH DEC codes required for our purpose are (31, 21), (63, 51), decoder implementation. and (127, 113). From the (31, 21) code, we can obtain (18, 8) and (26, 16) BCH DEC codes. From the In addition, we can improve the matrix generation so as to reduce the number of 1 s in those (63, 51) code, a (44, 32) code can be designed. The (127, 113) code allows for the construction of a (78, rows with a higher number of 1 s of the parity check matrix, as well as reducing the total number 64) BCH DEC code. of 1 s in the whole matrix. As said before, these reductions will lead to faster, smaller, and less As an example, let us consider the primitive (31, 21) BCH DEC code. Its generator polynomial, power-consuming circuits. obtained by multiplying two primitive polynomials [11], is as follows: A detailed explanation of this algorithm, as well as a code design example, can be found in the literature [g19(X].) = (X5 + X2 + 1)·(X5 + X4 + X3 + X2 + 1) = (X10 + X9 + X8 + X6 + X5 + X3 + 1). (7)

3.2. BCHFollowing DEC Codesthe steps for 8, described 16, 32, and in 64 BitsSection 2.3 to get the matrices in a systematic form, and shortening the code as described in the literature [17], the parity check matrix for the (26, 16) BCH DEC codeAs stated can be above, obtained in most as follows: cases, the block sizes of the primitive BCH codes do not align with the sizes employed in memories. Hence, if we want to use BCH DEC codes with data lengths of 8, 16, 32, 10000000001101010111100100 or 64 bits, they must be shortened from⎡01000000000110101011110010 primitive BCH DEC codes of longer⎤ data lengths. As described in Section 2.1, code shortening allows⎢00100000000011010101111001 for the design of codes with any⎥ block size using longer block ⎢00010000001100111101011000⎥ size codes. The primitive BCH DEC codes⎢ required for our purpose are⎥ (31, 21), (63, 51), and (127, 113). 00001000000110011110101100 From the (31, 21) code, weHBCH can 26,16 obtain = ⎢ (18, 8) and (26, 16) BCH DEC codes.⎥. From the (63, 51) code, a(8) (44, ⎢00000100001110011000110010⎥ 32) code can be designed. The (127, 113)⎢00000010001010011011111101 code allows for the construction⎥ of a (78, 64) BCH DEC code. As an example, let us consider the⎢00000001000101001101111110 primitive (31, 21) BCH DEC code.⎥ Its generator polynomial, ⎢ ⎥ obtained by multiplying two primitive00000000101111110001011011 polynomials [11], is as follows: ⎣00000000011010101111001001⎦ In theg( Xsame) = ( Xway,5 + X the2 + matrices1) (X5 + Xfor4 +allX 3of+ theX2 +aforementioned1) = (X10 + X9 +codesX8 + haveX6 + beenX5 + obtained.X3 + 1). These (7) · matrices and their resulting encoding and decoding circuits will be compared with our proposal in Section 4.

3.3. LRRO DEC Codes for 8, 16, 32, and 64 Bits In this paper, we propose new low redundancy and reduced overhead (LRRO) codes for double error correction, equivalent (in terms of redundancy and error coverage) to the BCH DEC codes described in Section 3.2. That is, as primitive BCH codes offer the minimum possible redundancy, the

Electronics 2020, 9, 1897 7 of 15

Following the steps described in Section 2.3 to get the matrices in a systematic form, and shortening the code as described in the literature [17], the parity check matrix for the (26, 16) BCH DEC code can be obtained as follows:   10000000001101010111100100      01000000000110101011110010     00100000000011010101111001       00010000001100111101011000     00001000000110011110101100  =   HBCH 26,16  . (8)  00000100001110011000110010     00000010001010011011111101       00000001000101001101111110     00000000101111110001011011     00000000011010101111001001 

In the same way, the matrices for all of the aforementioned codes have been obtained. These matrices and their resulting encoding and decoding circuits will be compared with our proposal in Section4.

3.3. LRRO DEC Codes for 8, 16, 32, and 64 Bits In this paper, we propose new low redundancy and reduced overhead (LRRO) codes for double error correction, equivalent (in terms of redundancy and error coverage) to the BCH DEC codes described in Section 3.2. That is, as primitive BCH codes offer the minimum possible redundancy, the new codes have been designed to maintain the same redundancy and coverage as the BCH DEC codes mentioned previously, but attempt to reduce the overhead induced by such codes. In Section 3.4, some comments about the redundancy of non-primitive BCH codes, as well as a new code, have been included. Using the methodology described in Section 3.1, and taking the values n = 18 and k = 8, we have found an (18, 8) LRRO DEC code. This is its parity check matrix:   100000000011100000      010000000011010000     001000000010100000       000100000010011100     000010000001101010  =   HLRRO 18,8  . (9)  000001000001010110     000000100000111001       000000010000000101     000000001000001011     000000000100000111 

In the same way, varying the values of n and k, matrices for the new (26, 16), (44, 32), and (78, 64) LRRO DEC codes have been found. They are shown in (10), (11), and (12), respectively: Electronics 2020, 9, 1897 8 of 15

  10000000001110000110010100      01000000001101100000111000     00100000001010110001100001       00010000001001011010001100     00001000000110101011001000  =   HLRRO 26,16  , (10)  00000100000101010101000011     00000010000011001100100110       00000001000000111100010001     00000000100000000011110011     00000000010000000000001111    10000000000011100001100101000110101000000010      01000000000011011000001110100000100010000101     00100000000010101100011000001000011000000011       00010000000010010110100011001101000101000000     00001000000001101010110010001000000100011100       00000100000001010101010000100100011100101000  HLRRO 44,32 =  , (11)  00000010000000110011001001010010000011100001       00000001000000001111000100010001000010010011     00000000100000000000111100110011000001001011       00000000010000000000000011110000110000101111     00000000001000000000000000001111110000010101     00000000000100000000000000000000001111111110    100000000000001110000110010100011010100000000010001100000000000110111000011010      010000000000001101100000111010000010001000100001011001010001001000100100100100     001000000000001010110001100000100001100000100000000010010100011000011100111001       000100000000001001011010001100110100010100001001000110001000000001100001110000     000010000000000110101011001000100000010001010110010001100000010000000100111010       000001000000000101010101000010010001110010111000000000000000100010100101000111     000000100000000011001100100101001000001110010000001000000001001101010001010011  =   HLRRO 78,64  . (12)  000000010000000000111100010001000100001001000100000100100000100101101010101101     000000001000000000000011110011001100000100001100100011000000000000011010001111       000000000100000000000000001111000011000010000010100000100011100010010111110011     000000000010000000000000000000111111000001000001100000001101011110000010101111       000000000001000000000000000000000000111111000000010000011011110000101010011000     000000000000100000000000000000000000000000111111110000000110110001011001000111     000000000000010000000000000000000000000000000000001111111110001111000111000100  It must be considered that these are just some examples of word sizes frequently employed in digital systems. Codes for different sizes (e.g., 128) can also be found using this methodology. From the parity check matrices, it is easy to obtain the logic equations for the encoder circuits and the syndrome computation in the decoder circuits. An example can be found in the literature [19]. Later, it is necessary to relate each syndrome with the corresponding error vector, implementing a lookup table for each code. In fact, these are the truth tables of a group of logic functions where the inputs are the syndrome bits, and the outputs are the bits of the estimated error vectors. Each logic function is a sum of minterms representing the syndromes that affect each bit. A good explanation of how to generate the lookup table and the logic equations for the bits of the estimated error vector can be found in the literature [18].

3.4. Redundancy and Non-Primitive BCH Codes As stated before, although shortened (or non-primitive) BCH codes maintain the error coverage of the primitive codes, they may lose other properties. For example, do they maintain the minimum Electronics 2020, 9, 1897 9 of 15 redundancy property for a given data block size? The answer is no for a general case, although it depends on the number of columns shortened. Using our methodology, we did not find LRRO DEC codes with a lower redundancy than the equivalent BCH DEC codes for 16, 32, and 64 bits. This does not mean that such codes do not exist, as a complete search is unfeasible nowadays; but if they exist, they are not easy to find. Using our methodology, if a code exists, it is frequently found after a short searching time. However, when the (31, 21) BCH DEC code is shortened to get the (18, 8) code, 13 columns are discarded—more than a half of the data columns. In this case, we found a (16, 8) LRRO DEC code, whose parity check matrix is as follows:   1000000011100001      0100000011011000     0010000010101100       0001000010010110  HLRRO 16,8 =  . (13)  0000100001101010       0000010001010101     0000001000110011     0000000100001111 

This code has a better redundancy than the equivalent BCH code. This is important when protecting memories, as the redundant bits are included in all of the words in the whole memory. In any case, this code is not included in the comparisons performed in the next section. A deeper study of codes with lower redundancy than the equivalent BCH codes, as well as the implementation and synthesis of the code shown in (13), are out of the scope of this paper, and will form part of a future work.

4. Evaluation and Comparison

4.1. Theoretical Study The real values for the overhead induced by the encoder and decoder circuits of an ECC depend on each implementation: the technology, the synthesis process, etc. From a practical point of view, a logic synthesis is required to obtain these values. However, it is possible to do a more generic, theoreticalElectronics 2020 study., 9, x FOR As PEER stated REVIEW above, the Hamming weight of the heaviest row and the Hamming weight9 of 14 of the whole parity check matrix are important parameters that influence the overhead. the sameFigure redundancy3 shows the and Hamming error coverage. weights of From the heaviest a theoretical row (i.e., point the of maximum view, these number improvements of 1 s in a row)should of result all matrices in encoder for the and codes decoder considered. circuits Aswith can a belower observed, overhead. all LRRO The question codes reduce now is this to confirm number withtheserespect good results to the when equivalent the circuits BCH codes, are implement and the died.fference This analysis increases is asperformedk increases. in the following.

Weight of the heaviest row 0 10203040

7 (18, 8) 5 12 BCH (26, 16) 8 23 LRRO (44, 32) 13 39 (78, 64) 23

Figure 3. Hamming weights of the heaviest row.

Weight of the whole matrix 0 100200300400500

58 (18, 8) 42 104 BCH (26, 16) 75 229 LRRO (44, 32) 147 472 (78, 64) 309

Figure 4. Hamming weights of the whole matrix.

4.2. Implementation and Logic Synthesis All LRRO DEC codes presented in this paper, as well as the considered BCH DEC codes, have been implemented and simulated to check if they achieve the expected error coverage. We implemented a simulation-based error injection tool [21] with which we could analyze the error coverage of the different ECCs. This tool can inject diverse error models. Particularly, single and multiple random errors were injected. These errors emulate bit-flips in memory cells. The basic scheme of our error injector is shown in Figure 5. This tool can verify whether the injected error provokes an incorrect or a correct decoding. To do this, the simulation-based error injector tool compares the input and output words. As expected, all of the studied codes corrected 100% of single and double errors.

Figure 5. Block diagram of the error injector [21].

Electronics 2020, 9, x FOR PEER REVIEW 9 of 14 the same redundancy and error coverage. From a theoretical point of view, these improvements should result in encoder and decoder circuits with a lower overhead. The question now is to confirm these good results when the circuits are implemented. This analysis is performed in the following. Electronics 2020, 9, x FOR PEER REVIEW 9 of 14 the same redundancy and error coverage.Weight of From the aheaviest theoretical row point of view, these improvements should result in encoder0 and decoder 10203040 circuits with a lower overhead. The question now is to confirm these good results when the circuits are implemented. This analysis is performed in the following. 7 (18, 8) 5 Weight12 of the heaviest row BCH (26, 16) 8 Electronics 2020 9 0 10203040 , , 1897 23 LRRO 10 of 15 (44, 32) 13 (18, 8) 7 5 39 Figure(78,4 shows64) the Hamming weights of the whole23 matrix (i.e., the total number of 1 s of each 12 BCH parity check(26, matrix) 16) for the codes8 considered. Again, all LRRO codes reduce this number with respect to the equivalent BCH codes,Figure and the3. Hamming difference weights increases of the with heaviestk. row. 23 LRRO (44, 32) 13 Weight of the whole matrix 39 (78, 64) 23 0 100200300400500 Figure 3. Hamming weights of the heaviest row. 58 (18, 8) 42 104Weight of the whole matrix BCH (26, 16) 75 0 100200300400500 229 LRRO (44, 32) 147 (18, 8) 58 42 472 (78, 64) 309 104 BCH (26, 16) 75 FigureFigure 4. 4. HammingHamming weights weights of of the the whole whole matrix. matrix. 229 LRRO (44, 32) 147 4.2. ImplementationThe results shown and Logic here indicateSynthesis that the proposed LRRO codes improve, in all cases, the attributes 472 of their parity(78, 64) check matrices, when compared with the equivalent BCH codes, maintaining the same All LRRO DEC codes presented in this paper, as well309 as the considered BCH DEC codes, have redundancy and error coverage. From a theoretical point of view, these improvements should result been implemented and simulated to check if they achieve the expected error coverage. We in encoder and decoder circuitsFigure with 4. Hamming a lower overhead.weights of the The whole question matrix. now is to confirm these good implemented a simulation-based error injection tool [21] with which we could analyze the error results when the circuits are implemented. This analysis is performed in the following. coverage4.2. Implementation of the different and Logic ECCs. Synthesis This tool can inject diverse error models. Particularly, single and multiple4.2. Implementation random errors and Logicwere Synthesisinjected. These errors emulate bit-flips in memory cells. All LRRO DEC codes presented in this paper, as well as the considered BCH DEC codes, have The basic scheme of our error injector is shown in Figure 5. This tool can verify whether the been Allimplemented LRRO DEC codesand simulated presented into this check paper, if asthey well achieve as the considered the expected BCH DECerror codes, coverage. have beenWe injected error provokes an incorrect or a correct decoding. To do this, the simulation-based error implemented anda simulation-based simulated to check error if theyinjection achieve tool the [21] expected with which error we coverage. could analyze We implemented the error injector tool compares the input and output words. coveragea simulation-based of the different error injectionECCs. This tool tool [21 ]can with inje whichct diverse we could error analyze models. the Particularly, error coverage single of and the As expected, all of the studied codes corrected 100% of single and double errors. multipledifferent random ECCs. This errors tool were can injected. inject diverse These error errors models. emulate Particularly, bit-flips in memory single and cells. multiple random errorsThe were basic injected. scheme These of our errors error emulate injector bit-flips is shown in memory in Figure cells. 5. This tool can verify whether the injectedThe error basic provokes scheme of an our incorrect error injector or a correct is shown decoding. in Figure To5 do. This this, tool the cansimulation-based verify whether error the injectorinjected tool error compares provokes the an input incorrect and output or a correct words. decoding. To do this, the simulation-based error injectorAs toolexpected, compares all of the the input studied and codes output corr words.ected 100% of single and double errors.

Figure 5. Block diagram of the error injector [21].

Figure 5. BlockBlock diagram of the error injector [21]. [21].

As expected, all of the studied codes corrected 100% of single and double errors. On the other hand, the encoder and decoder circuits for all ECCs were implemented in Very High speed integrated circuit Hardware Description Language (VHDL). For example, Figure6 shows an excerpt of the VHDL implementation of the (26, 16) LRRO DEC code. Then, using CADENCE software [22], we carried out a logic synthesis for 45-nm technology using the NanGate FreePDK45 Open Cell Library [23,24]. Standard cells were based on Scalable Complementary Metal Oxide Electronics 2020, 9, x FOR PEER REVIEW 10 of 14

On the other hand, the encoder and decoder circuits for all ECCs were implemented in Very High speed integrated circuit Hardware Description Language (VHDL). For example, Figure 6 shows Electronicsan excerpt2020 ,of9, the 1897 VHDL implementation of the (26, 16) LRRO DEC code. Then, using CADENCE11 of 15 software [22], we carried out a logic synthesis for 45-nm technology using the NanGate FreePDK45 Open Cell Library [23,24]. Standard cells were based on Scalable Complementary Metal Oxide SemiconductorSemiconductor (SCMOS)(SCMOS) designdesign rules.rules. TheThe powerpower voltagevoltage andand temperaturetemperature conditionsconditions werewere 1.11.1 VV and 25and◦C, 25 respectively. °C, respectively.

entity Encoder is port(u : in std_logic_vector(0 to 15); -- Original data b : out std_logic_vector(0 to 25) -- Encoded word ); end; architecture behavioral of Encoder is begin b(0) <= u(0) XOR u(1) XOR u(2) XOR u(7) XOR u(8) XOR u(11) XOR u(13); ... -- Parity bits (according to H) b(9) <= u(12) XOR u(13) XOR u(14) XOR u(15); b(10) <= u(0); ... -- Data bits b(25) <= u(15); end; (a) entity Decoder is port(r : in std_logic_vector(0 to 25); -- Received word u : out std_logic_vector(0 to 15) -- Estimated original data ); end; architecture behavioral of Decoder is signal e : std_logic_vector(10 to 25); -- Estimated error vector signal s : std_logic_vector(0 to 9); -- Syndrome begin s(0) <= r(0) XOR r(10) XOR r(11) XOR r(12) XOR r(17) XOR r(18) XOR r(21) XOR r(23); ... -- Syndrome bits (according to H) s(9) ... e(25) <= ( NOT s(9) AND NOT s(8) AND s(7) AND s(6) AND NOT s(5) AND NOT s(4) AND NOT s(3) AND s(2) AND NOT s(1) AND NOT s(0) ) OR ( NOT s(9) AND s(8) AND s(7) AND NOT s(6) AND s(5) AND NOT s(4) AND NOT s(3) AND s(2) AND NOT s(1) AND NOT s(0) ) OR ... -- All syndromes affecting bit 25 ... -- Estimated error bits e(10) ... u(0) <= r(10) XOR e(10); ... u(15) <= r(25) XOR e(25); end;

(b)

FigureFigure 6. 6. ExcerptExcerpt of the the VHDL VHDL implementation implementation for for the the (26, (26, 16) 16) low low redundancy redundancy and and reduced reduced overhead overhead (LRRO)(LRRO) double double errorerror correctioncorrection (DEC) (DEC) code: code: (a (a) )encoder encoder and and (b ()b decoder.) decoder.

AlthoughAlthough less generic generic than than the the theoretical theoretical study study presented presented previously, previously, logic synthesis logic synthesis allows allowsfor forobtaining obtaining more more realistic realistic information information about about the theoverhead overhead induced induced by different by different ECCs ECCs for a for given a given technology.technology. The The values values considered considered for for comparison comparison are are the the propagation propagation delay delay of theof the circuits, circuits, the the silicon areasilicon occupied, area occupied, and the and power the power consumption. consumption. FigureFigure7 7shows shows the the delaydelay inducedinduced by the encode encoderr circuits. circuits. As As can can be be observed, observed, the the proposed proposed ElectronicsLRROLRRO codes codes 2020, 9 greatlygreatly, x FOR PEER reducereduce REVIEW this delay for for all all block block sizes. sizes. 11 of 14 Figure 8 depicts the propagation delay of the decoder circuits. In this case, the reduction achieved by the LRRO codes is less importantEncoder than delay for (ps)the encoders. Anyway, all of the LRRO decoders reduce the delay with respect to the equivalent BCH circuits, for all block sizes. When protecting0 memories 50 100using an 150 ECC, enco 200ding 250and decoding 300 delays 350 may influence the clock cycle and the working frequency of the 153processor. Therefore, the reduction achieved by LRRO codes makes(18, them 8) a better option74 than BCH codes. The silicon areas occupied by the encoder and decoder circuits (in Figures 9 and 10, respectively) (26, 16) 226 BCH show the same trends observed with the propagation146 delay. All LRRO circuits use less area than the LRRO equivalent(44, BCH 32) circuits, and the reduction is proportionally more260 important in the encoder circuits. Because of the high integration density of the nanometric183 manufacturing processes, the silicon area is not a problem nowadays, in most cases. However, LRRO codes are superior329 to the equivalent BCH (78, 64) 220 codes in this comparison.

FigureFigure 7. 7. PropagationPropagation delay delay of of the the encoder encoder circuits circuits (in (in ps). ps).

Figure8 depicts the propagation delay of the decoder circuits. In this case, the reduction achieved by the LRRO codes is less important thanDecoder for the encoders.delay (ps) Anyway, all of the LRRO decoders reduce the delay with respect0 to 100 the equivalent 200 BCH 300 circuits, 400 for all 500 block 600 sizes. 700

422 (18, 8) 410 462 BCH (26, 16) 422 547 LRRO (44, 32) 508 648 (78, 64) 613

Figure 8. Propagation delay of the decoder circuits (in ps).

Encoder silicon area (µm2) 0 100 200 300 400 500 600 700

61 (18, 8) 45 125 BCH (26, 16) 101 302 LRRO (44, 32) 214 627 (78, 64) 473

Figure 9. Silicon area occupied by the encoder circuits (in µm2).

Decoder silicon area (µm2) 0 5000 10000 15000 20000 25000

861 (18, 8) 835 2060 BCH (26, 16) 2028 6410 LRRO (44, 32) 6327 21821 (78, 64) 21162

Figure 10. Silicon area occupied by the decoder circuits (in µm2).

Electronics 2020, 9, x FOR PEER REVIEW 11 of 14

Encoder delay (ps) 0 50 100 150 200 250 300 350

153 (18, 8) 74 226 BCH (26, 16) 146 LRRO Electronics 2020(44,, 9 32), x FOR PEER REVIEW 260 11 of 14 Electronics 2020, 9, x FOR PEER REVIEW 183 11 of 14 (78, 64) 329 Encoder delay220 (ps) Encoder delay (ps) Electronics 2020, 9, 1897 12 of 15 0Figure 50 7. Propagation 100 150 delay 200of the encoder 250 circuits 300 (in ps). 350 0 50 100 150 200 250 300 350 (18, 8) 153 74 153 (18, 8) 74 Decoder delay (ps) (26, 16) 226 BCH 0 100 200 300146 400226 500 600 700 BCH (26, 16) 146 260 LRRO (44, 32) 183 422 LRRO (18, 8) 410 260 (44, 32) 183 329 (78, 64) 220462 BCH (26, 16) 422 329 (78, 64) 220 LRRO (44, 32) Figure 7. Propagation delay of the encoder circuits547 (in ps). Figure 7. Propagation delay of the encoder508 circuits (in ps). (78, 64) 648 Decoder delay (ps) 613 Decoder delay (ps) 0Figure 100 8. PropagationPropagation 200 300 delay delay 400of the decoder 500 circuits 600 (in ps). 700 0 100 200 300 400 500 600 700 422 When(18, protecting 8) memories using an ECC, encoding410 and decoding delays may influence the clock (18, 8) Encoder silicon area422 (µm2) cycle and the working frequency of the processor. Therefore,410 the reduction achieved by LRRO codes (26, 16) 462 BCH makes them a better0 option 100 than BCH200 codes. 300 400422462 500 600 700 BCH (26, 16) 422 The silicon areas occupied by the encoder and decoder circuits547 (in Figures9 andLRRO 10, respectively) (44, 32) 61 LRRO show the same(18, 8) trends observed with the propagation delay. All508547 LRRO circuits use less area than the (44, 32) 45 508 equivalent(78, BCH 64) circuits, and the reduction is proportionally more important648 in the encoder circuits. (26, 16) 125 613648 BCH Because of(78, the 64) high integration101 density of the nanometric manufacturing613 processes, the silicon area is LRRO not a problem(44, 32) nowadays,Figure in most 8. Propagation cases. However, delay302 of LRROthe decoder codes circuits are superior (in ps). to the equivalent BCH codes in this comparison.Figure 8. Propagation214 delay of the decoder circuits (in ps). (78, 64) 627 Encoder silicon area473 (µm2) Encoder silicon area (µm2) 0Figure 100 9. Silicon 200 area occupied 300 400by the encoder 500 circuits 600 (in µm 7002). 0 100 200 300 400 500 600 700 (18, 8) 61 4561 2 (18, 8) 45 Decoder silicon area (µm ) 125 BCH (26, 16) 101 (26, 16) 0 5000125 10000 15000 20000 25000 BCH 101 LRRO (44, 32) 302 (18, 8) 861 214 302 LRRO (44, 32) 835 214 (78, 64) 627 (26, 16) 2060 473 627 BCH (78, 64) 2028 473 LRRO (44, 32) Figure 9. Silicon6410 area occupied by the encoder circuits (in µm2). Figure 9. SiliconSilicon6327 area area occupied occupied by the encoder circuits (in µmµm2).). (78, 64) 21821 Decoder silicon area (µm2) 21162 Decoder silicon area (µm2) 0Figure 500010. Silicon area 10000 occupied 15000 by the decoder 20000 circuits (in 25000 µm2). 0 5000 10000 15000 20000 25000 (18, 8) 861 861835 (18, 8) 835 2060 BCH (26, 16) 2028 (26, 16) 2060 BCH 2028 LRRO (44, 32) 6410 63276410 LRRO (44, 32) 6327 (78, 64) 21821 2116221821 (78, 64) 21162

Figure 10. Silicon area occupied by the decoder circuits (in µµmm2).). Figure 10. Silicon area occupied by the decoder circuits (in µm2). Finally, the power consumed by the encoder circuits is shown in Figure 11, while Figure 12 presents the power consumed by the decoder circuits. Again, the same trends can be observed. All LRRO circuits consume less power than the equivalent BCH circuits. The reduction is proportionally higher in the encoder circuits, but conversely to previous comparisons, the reductions observed in the Electronics 2020, 9, x FOR PEER REVIEW 12 of 14 Electronics 2020, 9, x FOR PEER REVIEW 12 of 14 Finally, the power consumed by the encoder circuits is shown in Figure 11, while Figure 12 presentsFinally, the powerthe power consumed consumed by the by decoder the encoder circui circts. Again,uits is shownthe same in trendsFigure can 11, bewhile observed. Figure All12 LRROpresents circuits the power consume consumed less power by the than decoder the equivale circuintts. BCHAgain, circuits. the same The trends reduction can isbe proportionally observed. All higherLRRO circuitsin the encoder consume circuits, less power but converselythan the equivale to previousnt BCH comparisons, circuits. The theredu reductionsction is proportionally observed in higherthe decoder in the circuits encoder are circuits, also high. but The conversely static power, to previous more related comparisons, to the silicon the reductionsarea, represents observed a small in thefraction decoder of the circuits total powerare also consumed. high. The staticIn this power, case, the more dynamic related power to the siliconconsumption area, represents has been greatlya small reduced,fraction of leading the total the power total powerconsumed. consumed In this to case, the theresults dynamic shown. power consumption has been greatly reduced,Nowadays, leading thea low total power power consumption consumed to theis aresults must shown.in most digital systems. Therefore, the reductionsNowadays, achieved a low by the power LRRO consumption codes make them is a amust clearly in better most option digital compared systems. with Therefore, BCH codes. the reductionsTo sum achieved up, the lowby the redundancy LRRO codes and make reduced them overhead a clearly betterDEC codesoption presented compared in with this BCHpaper codes. are a betterTo option sum up, than the the low equivalent redundancy BCH and DEC reduced codes, overhead as they maintain DEC codes the presentedsame redundancy in this paper and errorare a coverage,better option while than inducing the equivalent a reduced BCH overhead. DEC codes, Comparing as they maintain the codes the for same the redundancy8-, 16-, 32-, and and 64-bit error datacoverage,Electronics lengths2020 while, against9, 1897 inducing BCH, aLRRO reduced circuits overhead. achieve Comparing the following: the codes for the 8-, 16-, 32-, and 1364-bit of 15 data lengths against BCH, LRRO circuits achieve the following: • Lower delay: at least 30% reduction in the encoder, up to 8.7% in the decoder circuits. • •decoder LowerLess circuits silicon delay: areas: are at alsoleast reductions high.30% reduction The up static to 29% in power, the in enthe morecoder, encoder related up toand 8.7% to 3.0% the in siliconin the the decoder decoder area, representscircuits. circuits. a small • •fraction LowerLess of silicon the power total areas: consumption: power reductions consumed. about up to In 40% 29% this in in case, the the encoder theencoder dynamic and and up power3.0% to 26.9%in consumption the decoderin the decoder. hascircuits. been greatly •reduced, Lower leading power the consumption: total power consumedabout 40% toin the resultsencoder shown. and up to 26.9% in the decoder. Power consumed by encoder (µW) Power consumed by encoder (µW) 0 50 100 150 200 250 0 50 100 150 200 250 (18, 8) 12,88 7,1812,88 (18, 8) 7,18 (26, 16) 33,41 BCH (26, 16) 21,8433,41 BCH 21,84 LRRO (44, 32) 96,11 55,68 96,11 LRRO (44, 32) 55,68 (78, 64) 230,9 144,59 230,9 (78, 64) 144,59

Figure 11. Power consumed by the encoder circuits (in µW). Figure 11. PowerPower consumed consumed by by the the encoder encoder circuits circuits (in µW).µW). Power consumed by decoder (µW) Power consumed by decoder (µW) 0 2000 4000 6000 8000 10000 12000 0 2000 4000 6000 8000 10000 12000 (18, 8) 137 110137 (18, 8) 110 (26, 16) 434 BCH (26, 16) 366434 BCH 366 LRRO (44, 32) 2202 17012202 LRRO (44, 32) 1701 (78, 64) 11580 8460 11580 (78, 64) 8460

Figure 12. PowerPower consumed consumed by by the the de decodercoder circuits circuits (in µW).µW). Figure 12. Power consumed by the decoder circuits (in µW). 5. ConclusionsNowadays, a low power consumption is a must in most digital systems. Therefore, the reductions 5.achieved Conclusions by the LRRO codes make them a clearly better option compared with BCH codes. The Bose–Chaudhuri–Hocquenghem (BCH) codes are one of the best-known error control codes. To sum up, the low redundancy and reduced overhead DEC codes presented in this paper are a TheyThe are Bose–Chaudhuri–Hocquenghemextensively employed in several (BCH) fields, codes but they are onehave of notthe best-knownfound favorable error applicationcontrol codes. in better option than the equivalent BCH DEC codes, as they maintain the same redundancy and error Theymemories. are extensively The non-alignment employed of in th severaleir block fields, sizes but to theytypical have memory not found word favorable lengths applicationand the large in coverage, while inducing a reduced overhead. Comparing the codes for the 8-, 16-, 32-, and 64-bit data multi-cyclememories. Thelatency non-alignment of traditional of iterative their block decoding sizes algorithmsto typical memory are the main word reasons lengths for and this. the large lengths against BCH, LRRO circuits achieve the following: multi-cycleThis paper latency presents of traditional low redundancy iterative decoding and reduced algorithms overhead are the(LRRO) main doublereasons errorfor this. correction This paper presents low redundancy and reduced overhead (LRRO) double error correction (DEC)Lower codes delay: as an atalternative least 30% to reduction the BCH in DEC the encoder,codes for up protecting to 8.7% in registers the decoder and memories, circuits. i.e., for (DEC)• codes as an alternative to the BCH DEC codes for protecting registers and memories, i.e., for hardwareLess siliconimplementations areas: reductions and block up tosizes 29% that in theare encoderin powers and of 3.0%2 (8, 16, in the32, decoder64, etc.). circuits.The objective is hardware• implementations and block sizes that are in powers of 2 (8, 16, 32, 64, etc.). The objective is Lower power consumption: about 40% in the encoder and up to 26.9% in the decoder. •

5. Conclusions The Bose–Chaudhuri–Hocquenghem (BCH) codes are one of the best-known error control codes. They are extensively employed in several fields, but they have not found favorable application in memories. The non-alignment of their block sizes to typical memory word lengths and the large multi-cycle latency of traditional iterative decoding algorithms are the main reasons for this. This paper presents low redundancy and reduced overhead (LRRO) double error correction (DEC) codes as an alternative to the BCH DEC codes for protecting registers and memories, i.e., for hardware implementations and block sizes that are in powers of 2 (8, 16, 32, 64, etc.). The objective is to maintain the redundancy and the error coverage of BCH DEC codes, while reducing the overhead induced by the encoder and decoder circuits. Electronics 2020, 9, 1897 14 of 15

We have demonstrated that all overhead comparisons are favorable to LRRO DEC codes. The theoretical study shows that all LRRO codes have a lower Hamming weight in the heaviest row of their parity check matrices, as well as a lower Hamming weight for the whole matrix, compared with the equivalent BCH DEC codes. Moreover, the logic synthesis illustrates that for all word lengths, the LRRO DEC circuits have a lower propagation delay, employ less silicon areas, and consume less power than the corresponding BCH DEC codes. Therefore, we propose the LRRO DEC codes as an interesting alternative to BCH DEC codes for protecting registers and memories against single and double errors. Finding ECCs with a lower redundancy, thereby maintaining affordable overheads, is part of our ongoing and future work.

Author Contributions: Conceptualization, L.-J.S.-A. and P.-J.G.-V.; methodology, L.-J.S.-A. and J.G.-M.; software, L.-J.S.-A. and J.G.-M.; validation, L.-J.S.-A., J.G.-M., D.G.-T., and J.-C.B.-C.; formal analysis, L.-J.S.-A., D.G.-T., and P.-J.G.-V.; investigation, L.-J.S.-A. and J.G.-M.; resources, J.-C.B.-C. and P.-J.G.-V.; data curation, L.-J.S.-A. and J.G.-M.; writing—original draft preparation, L.-J.S.-A.; writing—review and editing, J.G.-M., D.G.-T., J.-C.B.-C., and P.-J.G.-V.; visualization, L.-J.S.-A.; supervision, P.-J.G.-V.; project administration, J.G.-M. and P.-J.G.-V.; funding acquisition, J.G.-M. and P.-J.G.-V. All authors have read and agreed to the published version of the manuscript. Funding: This research was supported in part by the Spanish Government, project TIN2016-81075-R, by Primeros Proyectos de Investigación (PAID-06-18), Vicerrectorado de Investigación, Innovación y Transferencia de la Universitat Politècnica de València (UPV), project 20190032, and by the Institute of Information and Communication Technologies (ITACA). Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

1. Fujiwara, E. Code Design for Dependable Systems; John Wiley & Sons: Chichester, UK, 2006. [CrossRef] 2. Zhang, X. VLSI Architectures for Modern Error-Correcting Codes; CRC Press: Boca Raton, FL, USA, 2016. [CrossRef] 3. Hocquenghem, A. Codes correcteurs d’erreurs. Chiffres 1959, 2, 147–156. 4. Bose, R.C.; Ray-Chaudhuri, D.K. On a Class of Error Correcting Binary Group Codes. Inf. Control 1960, 3, 68–79. [CrossRef] 5. Chen, P.; Zhang, C.; Jiang, H.; Wang, Z.; Yue, S. High performance low complexity BCH error correction circuit for SSD controllers. In Proceedings of the IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC), Singapore, 1–4 June 2015; pp. 217–220. [CrossRef] 6. Nguyen, J.P. Applications of Reed Solomon Codes on Optical Media Storage. Master’s Thesis, San Diego State University, San Diego, CA, USA, October 2011. 7. IEEE 802.3-2018 - IEEE Standard for Ethernet. Available online: https://standards.ieee.org/standard/802_3- 2018.html (accessed on 14 October 2020). 8. H.263: Video Coding for Low Bit Rate Communication. Available online: https://www.itu.int/rec/T-REC-H. 263/en (accessed on 14 October 2020). 9. Vangelista, L.; Benvenuto, N.; Tomasin, S.; Nokes, C.; Stott, J.; Filippi, A.; Vlot, M.; Mignone, V.; Morello, A. Key technologies for next-generation terrestrial digital television standard DVB-T2. IEEE Commun. Mag. 2009, 47, 146–153. [CrossRef] 10. McEliece, R.J.; Swanson, L. Reed–Solomon codes and the exploration of the solar system. In Reed–Solomon Codes and Their Applications; Wicker, S.B., Bhargava, V.K., Eds.; IEEE Press: Piscataway, NJ, USA, 1994; pp. 25–40. 11. Lin, S.; Costello, D.J. Error Control Coding, 2nd ed.; Pearson Prentice-Hall: Upper Saddle River, NJ, USA, 2004. 12. 2013 ITRS—International Technology Roadmap for Semiconductors. Available online: http://www.itrs2.net/ 2013-itrs.html (accessed on 14 October 2020). 13. Ibe, E.; Taniguchi, H.; Yahagi, Y.; Shimbo, K.; Toba, T. Impact of scaling on neutron-induced soft error in SRAMs from a 250 nm to a 22 nm design rule. IEEE Trans. Electron Devices 2010, 57, 1527–1538. [CrossRef] 14. Gil-Tomás, D.; Gracia-Morán, J.; Baraza-Calvo, J.C.; Saiz-Adalid, L.J.; Gil-Vicente, P.J. Studying the effects of intermittent faults on a microcontroller. Microelectron. Reliab. 2012, 52, 2837–2846. [CrossRef] Electronics 2020, 9, 1897 15 of 15

15. Neubauer, A.; Freudenberger, J.; Kühn, V. Coding Theory: Algorithms, Architectures and Applications; John Wiley & Sons: Chichester, UK, 2007. [CrossRef] 16. Morelos-Zaragoza, R.H. The Art of Error Correcting Coding, 2nd ed.; John Wiley & Sons: Chichester, UK, 2006. [CrossRef] 17. Naseer, R.; Draper, J. DEC ECC design to improve memory reliability in sub-100nm technologies. In Proceedings of the IEEE International Conference on Electronics, Circuits and Systems, St. Julien’s, Malta, 31 August–3 September 2008; pp. 586–589. [CrossRef] 18. Saiz-Adalid, L.J.; Gracia-Morán, J.; Gil-Tomás, D.; Baraza-Calvo, J.C.; Gil-Vicente, P.J. Ultrafast Codes for Multiple Adjacent Error Correction and Double Error Detection. IEEE Access 2019, 7, 151131–151143. [CrossRef] 19. Saiz-Adalid, L.J.; Gil-Vicente, P.J.; Ruiz, J.C.; Gil-Tomás, D.; Baraza, J.C.; Gracia-Morán, J. Flexible unequal error control codes with selectable error detection and correction levels. In Lecture Notes in Computer Science, vol 8153, Proceedings of the 2013 Computer Safety, Reliability, and Security Conference (SAFECOMP 2013), Toulouse, France, 24–27 September 2013; Bitsch, F., Guiochet, J., Kaâniche, M., Eds.; Springer: Berlin, Germany, 2013; pp. 178–189. [CrossRef] 20. Saiz-Adalid, L.J.; Reviriego, P.; Gil, P.; Pontarelli, S.; Maestro, J.A. MCU Tolerance in SRAMs Through Low-Redundancy Triple Adjacent Error Correction. IEEE Trans. VLSI Syst. 2015, 23, 2332–2336. [CrossRef] 21. Gracia-Moran, J.; Saiz-Adalid, L.J.; Gil-Tomás, D.; Gil-Vicente, P.J. Improving Error Correction Codes for Multiple Cell Upsets in Space Applications. IEEE Trans. VLSI Syst. 2018, 26, 2132–2142. [CrossRef] 22. Cadence: Computational Software for Intelligent System Design. Available online: https://www.cadence.com (accessed on 14 October 2020). 23. Stine, J.E.; Castellanos, I.; Wood, M.; Henson, J.; Love, F.; Davis, W.R.; Franzon, P.D.; Bucher, M.; Basavarajaiah, S.; Oh, J.; et al. FreePDK: An open-source variation-aware design kit. In Proceedings of the IEEE International Conference on Microelectronic Systems Education (MSE), San Diego, CA, USA, 3–4 June 2007; pp. 173–174. [CrossRef] 24. NanGate FreePDK45 Open Cell Library. Available online: http://www.nangate.com/?page_id=2325 (accessed on 28 May 2018).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).