FFT Algorithm for Binary Extension Finite Fields and its Application to Reed-Solomon Codes Sian-Jheng Lin, Member, IEEE, Tareq Y. Al-Naffouri, Member, IEEE, and Yunghsiang S. Han, Fellow, IEEE

Abstract—Recently, a new polynomial over binary ex- specifically, since the practical implementations of RS codes tension fields was proposed such that the fast Fourier transform are typically over binary extension finite fields, the complexity (FFT) over such fields can be computed in the complexity of of RS codes over those fields has received more attentions than order O(n lg(n)), where n is the number of points evaluated in FFT. In this work, we reformulate this FFT algorithm such that it that over others [6][7]. can be easier understood and be extended to develop frequency- domain decoding algorithms for (n = 2m, k) systematic Reed- The conventional syndrome-based RS decoding algorithm + Solomon (RS) codes over F2m , m ∈ Z , with n − k a power of has quadratic complexities. Some fast approaches [8][9] are two. First, the basis of syndrome polynomials is reformulated in based on FFTs or fast polynomial arithmetic techniques. How- the decoding procedure so that the new transforms can be applied to the decoding procedure. A fast extended Euclidean algorithm ever, the structures of FFTs over finite fields vary with the sizes is developed to determine the error locator polynomial. The of fields Fq. When q−1 is a smooth number, meaning that q−1 computational complexity of the proposed decoding algorithm can be factorized into many small primes, the Cooley-Tucky is O(n lg(n − k) + (n − k) lg2(n − k)), improving upon the best FFT in O(n lg(n)) field additions and field multiplications 2 currently available decoding complexity O(n lg (n) lg lg(n)), and can be applied. A conventional case involves choosing Fermat reaching the best known complexity bound that was established primes q ∈ {2m + 1|m = 1, 2, 4, 8, 16}. Based on such FFTs, by Justesen in 1976. However, Justesen’s approach is only for the 2 codes over some specific fields, which can apply Cooley-Tucky Justesen [8] gave an O(n lg (n)) approach for decoding (n, k) FFTs. As revealed by the computer simulations, the proposed RS code over F2m+1. Another approach to solve the key decoding algorithm is 50 times faster than the conventional one equations of BCH codes was proposed by Pan [10], and it 16 15 for the (2 , 2 ) RS code over F216 . reduces a factor of lg n when the characteristic of the field is large enough. However, the algorithm [10] does not have I.INTRODUCTION improvement for the codes over binary extension fields. If Reed-Solomon (RS) codes are a class of block error- q − 1 is not smooth, Cooley-Tucky FFTs are inapplicable. correcting codes that were invented by Reed and Solomon [1] In this case, the FFTs over arbitrary fields [11][12] can be in 1960. An (n, k) RS code is constructed over , for applied and it requires O(n lg(n) lg lg(n)) field operations. Fq 2 n = q − 1. Its extended version, called extended Reed- Gao [9] presented an O(n lg (n) lg lg(n)) RS decoding algo- Solomon codes [2], admits a codeword length of up to n = q rithm over arbitrary fields, by utilizing fast polynomial multi- or n = q + 1. The systematic version of (n, k) RS code plications [13]. Further, for the codes over F2m , the additive appends n − k parity symbols to the k message symbols, FFT [14], that requires O(n lg(n) lg lg(n)) operations, can be forming a codeword of length n. RS codes are maximum applied to reduce the leading constant further. To authors’ distance separable (MDS). (n, k) RS codes can correct up to knowledge, the additive FFT [14] is the fastest algorithm over b(n − k)/2c erroneous symbols. Nowadays, RS codes have F2m so far. numerous important applications, including barcodes (such as As RS codes are typically constructed over binary extension

arXiv:1503.05761v3 [cs.IT] 14 Aug 2016 QR codes), storage devices (such as Blu-ray Discs), digital television (such as DVB and ATSC), and data transmission fields, we consider this case in this paper. Clearly, if one technologies (such as DSL and WiMAX). RS codes are also wants to remove the extra factor lg lg(n) in the RS algorithms used to design other forward error correction codes, such as over binary extension fields, the FFTs in O(n lg(n)) are regenerating codes [3][4] and local reconstruction codes [5]. required. Recently, Lin et al. [15] showed a new way to solve The wide range of applications of RS codes raises an impor- aforementioned FFT problem. The paper [15] defined a new tant issue concerning their computational complexity. More polynomial basis based on subspace polynomials over F2m . For a polynomial of degree less than h in this new basis, This work was supported in part by CAS Pioneer Hundred Talents Program the h-point multipoint evaluations can be made in O(h lg(h)) and the National Science of Council (NSC) of Taiwan under Grants NSC 102- 2221-E-011-006-MY3, NSC 101-2221-E- 011-069-MY3. S.-J. Lin is with the field operations. Based on the multipoint evaluation algorithm, School of Information Science and Technology, University of Science and encoding/erasure decoding algorithms for (n, k) RS codes [15] Technology of China (USTC), Hefei, China and the Electrical Engineering were proposed to achieve O(n lg(n)). However, the error- Department, King Abdullah University of Science and Technology (KAUST), Kingdom of Saudi Arabia (e-mail: [email protected]), Tareq Y. Al-Naffouri correction RS decoding algorithm based on the new basis was is with the Electrial Engineering Department at King Abdullah University of not yet provided. Science and Technology (KAUST), Thuwal, Makkah Province, Kingdom of Saudi Arabia. (e-mail: [email protected]), and Y. Han is with the Department of Electrical Engineering, National Taiwan University of Science This paper develops an error correction decoding algorithm m and Technology, Taipei, Taiwan. (e-mail: [email protected]). for (n = 2 , k) RS codes over F2m , for k/n ≥ 0.5 and (n−k)

1 1 a power of two. In practice RS codes usually have rates where vk = (v0, v1, . . . , vk−1) is a basis of space Vk, and k/n ≥ 0.5. The complexity of the proposed algorithm is given k ≤ m. We can form a strictly ascending chain of subspaces by O(n lg(n − k) + (n − k) lg2(n − k)). Holding constant the given by code rate k/n yields a complexity O(n lg2(n)), which is better than the best existing complexity of O(n lg2(n) lg lg(n)), that {0} = V0 ⊂ V1 ⊂ V2 ⊂ · · · ⊂ Vm = F2m . was achieved by Gao [9] in 2002. The algorithm is based 2m−1 Let {ωi}i=0 denote the elements of F2m . Each element is on the non-standard polynomial basis [15]. To embed the defined as new basis into the decoding algorithm, we reformulate the decoding formulas such that all arithmetics are performed on ωi = i0 · v0 + i1 · v1 + ··· + im−1 · vm−1, the new basis. The key equation is solved by the Euclidean where i ∈ {0, 1} is the binary representation of i. That is, algorithm, and thus the fast polynomial divisions, as well j m−1 as the Euclidean algorithm in the new basis are proposed. i = i0 + i1 · 2 + ··· + im−1 · 2 , ∀ij ∈ {0, 1}. Finally, we combine those algorithms, resulting in a fast error- 2k−1 correction RS decoding algorithm. The major contributions of This implies that Vk = {ωi}i=0 , for k = 0, 1, . . . , m. Note this paper are summarized as follows. that ω0 = 0 is the additive identity in the filed. In this work, ω0 1) An alternative description of the algorithms [15] for the and 0 will be used interchangeably when there is no confusion. new polynomial basis is presented. The subspace polynomial [14, 16, 17] of Vk is defined as 2) An O(h lg(h)) fast polynomial division in the new basis Y sk(x) = (x − a), (2) is derived. a∈Vk 3) An O(h lg2(h)) fast half-GCD algorithm in the new k basis is presented. and it is clear to see that deg(sk(x)) = 2 . For example, 4) An O(n lg(n − k)) RS encoding algorithm is presented, s0(x) = x, and s2(x) = x(x − v0)(x − v1)(x − v0 − v1). The for n − k a power of two. properties of sk(x) are given in [16, 18]. 5) A syndrome-based RS decoding algorithm that is based Theorem 1 ([16, 18]). (i). sk(x) is an F2-linearlized polyno- on the new basis is demonstrated. 2 mial for which 6) An O(n lg(n − k) + (n − k) lg (n − k)) RS decoding k X 2i algorithm is presented, for n − k a power of two. sk(x) = sk,ix , (3) Notably, [15] gave the encoding algorithms for RS codes with i=0 O(n lg(k)) k the complexity , for a power of two. The encoding with each sk,i ∈ F2m . This implies that algorithm [15] is suitable for coding rate k/n ≤ 0.5; however, the proposed encoding algorithm in this work is suitable for sk(x + y) = sk(x) + sk(y), ∀x, y ∈ F2m . (4) k/n ≥ 0.5. (ii). The formal derivative of s (x) is a constant The rest of this paper is organized as follows. Section II k 0 Y reviews the definitions of the polynomial basis. The multi- sk(x) = a. (5) point evaluation algorithm is provided in Sec. III. Section IV a∈Vk\{0} provides an alternative polynomial basis that is constructed The recursive form [16] of subspace polynomials is given using monic polynomials. The polynomial operations that are by used in the encoding/decoding of RS codes are explicated. s (x) = x; Section V presents the fast extended Euclidean algorithm that 0 (6) is based on the half-GCD method. Section VI and Section VII sj(x) = sj−1(x)sj−1(x − vj−1) (7) introduce the algorithms for encoding and decoding RS codes. =(s (x))2 − s (v )s (x) j = 1, 2, . . . , m. Section VIII presents simulations and draws conclusions. j−1 j−1 j−1 j−1

2m B. Polynomial basis II.POLYNOMIAL BASIS IN F2m [x]/x − x ¯ ¯ ¯ ¯ m m Let = {X0(x), X1(x),..., X2 −1(x)} denote a basis of This section reviews the subspace polynomials over F2 , X m 2 ¯ and the polynomial basis defined in [15]. F2m [x]/(x − x). Each Xi(x) is defined as

X¯i(x) = Xi(x)/pi, (8) A. Subspace polynomial where Let F2m denote an extension finite field with dimension m m−1 m−1 over F2. Let v = (v0, v1, . . . , vm−1) denote a basis of F2m . Y Y X (x) = (s (x))ij , p = (s (v ))ij , (9) That is, all vi ∈ F2m are linearly independent over F2.A i j i j j j=0 j=0 k-dimensional space Vk of F2m is defined as and each i ∈ {0, 1} is the binary representation of i. Notice Vk = Span(vk) j 0 0 (1) that (sj(x)) = (sj(vj)) = 1. For example, X¯0(x) = 1, and ={i0 · v0 + i1 · v1 + ··· + ik−1 · vk−1|∀ij ∈ {0, 1}}, X¯3(x) = X3(x)/p3 = (s0(x)s1(x))/(s0(v0)s1(v1)). It can be ¯ ¯ 1There are many (n, k) can be chosen when n = 2m, k = 2m −2t, where seen that deg(Xi(x)) = i, and thus the basis X can represent 2m t < m. all elements in F2m [x]/(x − x).

2 Algorithm 1 Transform of the basis X¯ Algorithm 2 Inverse transform of the basis X¯ ¯ ¯ ¯ ¯ ¯ Input: FFT¯ (D2k , k, β): D2k = (d0, d1,..., d2k−1), k is the Input: IFFT¯ (D2k , k, β): D2k = (d0, d1, . . . , d2k−1), where X X ¯ binary logarithm of size, and β ∈ F2m each di = D2k (ωi + β), k is the binary logarithm of size, k Output: 2 evaluations D2k = (d0, d1, . . . , d2k−1), where and β ∈ F2m ¯ ¯ ¯ ¯ ¯ each di = D2k (ωi + β) Output: D2k = (d0, d1,..., d2k−1), the coefficients of ¯ ¯ 1: if k = 0 then return d0 D2k (x) 2: end if 1: if k = 0 then return d0 3: for i = 0,..., 2k−1 − 1 do 2: end if (0) (0) ¯ sk−1(β) ¯ ¯ 4: g ← di + d k−1 3: Call D k−1 ← IFFT¯ (V0, k − 1, β), where V0 = i sk−1(vk−1) i+2 2 X (1) (0) (0) (0) (0) ¯ (d , . . . , d k−1 ) and D¯ = (g , . . . , g ) 5: gi ← gi + di+2k−1 0 2 −1 2k−1 0 2k−1−1 (1) 6: end for 4: Call D¯ ← IFFT¯ (V , k − 1, v + β), where V = 2k−1 X 1 k−1 1 ¯ (0) ¯ (0) (1) (1) (1) 7: Call V0 ← FFT¯ (D k−1 , k − 1, β), where D k−1 = ¯ X 2 2 (d2k , . . . , d2k−1) and D2k−1 = (g0 , . . . , g2k−1−1) (0) (0) k−1 (g0 , . . . , g2k−1−1) and V0 = (d0, . . . , d2k−1−1) 5: for i = 0,..., 2 − 1 do (1) (1) ¯ (0) (1) 8: Call V ← FFT¯ (D¯ , k−1, v +β), where D¯ = 6: di+2k−1 ← g + g 1 X 2k−1 k−1 2k−1 i i (1) (1) ¯ (0) sk−1(β) ¯ 7: di ← g + di+2k−1 (g0 , . . . , g2k−1−1) and V1 = (d2k , . . . , d2k−1) i sk−1(vk−1) 9: return D k = (d , d , . . . , d k ) 8: end for 2 0 1 2 −1 ¯ ¯ ¯ ¯ 9: return D2k = (d0, d1,..., d2k−1)

A polynomial D¯ (x) of degree h in the basis ¯ is repre- h X divided into two subsets sented as h−1 D¯ k (V +β) = D¯ k (V +β)∪D¯ k (V +v +β). X ¯ 2 k 2 k−1 2 k−1 k−1 (11) D¯ h(x) = diX¯i(x), (10) i=0 The algorithm relied on the following lemma.

¯ ¯ m ¯ with each di ∈ F2m . Throughout this paper, Dh = Lemma 1. Given γ ∈ F2 and a polynomial D2k (x) ∈ 2m ¯ ¯ ¯ m ¯ (d0, d1,..., dh−1) is used to indicate the vector of the co- F2 [x]/(x − x) in the basis X, we have efficients of D¯ h(x). Due to the fact deg(X¯i(x)) = i, the new ¯ D2k (a + γ) basis possesses the following properties. 2k−1−1 m s (γ) (12) 2 X ¯ k−1 ¯ ¯ Corollary 1. Given a polynomial f(x) ∈ F2m [x]/(x − x) = (di + di+2k−1 )Xi(a + γ), sk−1(vk−1) respectively expressed in the monmial basis and X¯ i=0 2m−1 2m−1 for each a ∈ Vk−1. X (0) i X (1) ¯ f(x) = fi x = fi Xi(x), Based on Lemma 1, the algorithm to compute (11) is i=0 i=0 described below. By substituting γ = β into (12), we obtain the following properties hold. ¯ D2k (a + β) (0) (1) 1) fi = fi = 0, for i ≥ h + 1. 2k−1−1 (1) (0) X ¯ sk−1(β) ¯ ¯ 2) fh = fh · ph. = (di + di+2k−1 )Xi(a + β) (1) (1) (1) sk−1(vk−1) 3) For 0 ≤ j ≤ h, (fj , fj+1, . . . , fh ) is determined by i=0 (f (0), f (0) , . . . , f (0)), and vice versa. 2k−1−1 j j+1 h X (0) ¯ (0) = gi Xi(a + β) = D2k−1 (a + β) ∀a ∈ Vk−1, i=0 III.MULTIPOINT EVALUATIONS AT Vk (13) For any polynomial f(x) and a set V , let the notation f(V ) where each denote a set of evaluation values f(V ) = {f(a)|∀a ∈ V }. (0) ¯ sk−1(β) ¯ k−1 k k gi = di + di+2k−1 i = 0, 1,..., 2 − 1. [15] gave a recursive algorithm in O(2 lg(2 )) to calculate sk−1(vk−1) ¯ D2k (Vk + β), where (14) ¯ (0) This converts D2k (Vk−1 + β) into D2k−1 (Vk−1 + β). Further- Vk + β = {a + β|a ∈ Vk} for any β ∈ 2m . F more, by substituting γ = vk−1 + β into (12), we obtain (15), In this section, we describe the algorithm [15] in another view- where each point, which helps us to develop encoding/decoding algorithm (1) (0) ¯ k−1 gi = gi + di+2k−1 i = 0, 1,..., 2 − 1. (16) for RS codes. ¯ (1) The set of evaluation points can be divided into two indi- This converts D2k (Vk−1+vk−1+β) into D2k−1 (Vk−1+vk−1+ vidual subsets β). From (13) (15), the set of evaluation points (11) can be Vk + β = (Vk−1 + β) ∪ (Vk−1 + vk−1 + β), expressed as

D¯ k (V + β) where (Vk−1 +vk−1 +β) is the coset of (Vk−1 +β) by adding 2 k (0) (1) (17) vk−1. Accordingly, the set of polynomial evaluations can be ¯ ¯ =D2k−1 (Vk−1 + β) ∪ D2k−1 (Vk−1 + vk−1 + β).

3 By comparing (11) and (17), the degrees of both polynomi- als are reduced one-half (the number of terms are reduced from 2k to 2k−1). The complexity of obtaining both polynomials are discussed below. In (13), each coefficient gi takes an d D0 0 addition and a multiplication, except if sk−1(β) = 0, then 푠0(훽) g = d¯ without any arithmetic operations. However, we do not 푠1(훽) i i 푠0(푣0) consider this exception here, because the reduction from those 푠1(푣1) exceptions is limited. As D(0) (x) has 2k−1 coefficients, it D 2k−1 d1 1 takes a total of 2k−1 additions and 2k−1 multiplications to ¯ obtain them. In (15), calculating each coefficient gi + di+2k−1 푠1(훽) k−1 takes an addition, so it takes a total of 2 additions to obtain 푠1(푣1) ¯ (1) the coefficients of D2k−1 (x). d2 D2 This procedure can be applied recursively to each set ¯ (0) ¯ (1) D2k−1 (Vk−1 + β) and D2k−1 (Vk−1 + vk−1 + β) until the 푠0(훽 + 푣1) size of each set is one. With the divide-and-conquer strategy, 푠 (푣 ) 0 0 the additive complexity and the multiplicative complexity are D d3 3 respectively written as A(h) = 2 × A(h/2) + h, M(h) = 2 × M(h/2) + h/2, (a) The transform and the result is A(h) = h lg(h) and M(h) = h/2 lg(h). Algorithm 1 depicts the details of the recursive approach, denoted as FFT¯ (•, k, β). D0 d0 X 푠1(훽) The inverse FFT can be obtained by backtracking FFT 푠0(훽) 푠1(푣1) given above. As opposite to (17), the inverse transform get 푠0(푣0) ¯ (0) ¯ (1) the coefficients of D k−1 (x) and D k−1 (x), and the objective 2 ¯ 2 is to find the coefficients of D2k (x). We reformulate (16) and D1 d1 (14) as ¯ (0) (1) 푠1(훽) d k−1 = g + g , i+2 i i 푠 (푣 ) 1 1 ¯ (0) sk−1(β) ¯ k−1 D2 d2 di = gi + di+2k−1 i = 0, 1,..., 2 − 1. sk−1(vk−1) 푠0(훽 + 푣1) (18) 푠0(푣0) ¯ From (18), we can compute the coefficients of D2k (x). The ¯ (0) ¯ (1) coefficients of D2k−1 (x) and D2k−1 (x) can be obtained by ap- D3 d3 plying the inverse transform recursively. The details are shown in Algorithm 2. Note that IFFT¯ (•, k, β) denotes the inverse X transform. Algorithms 1 and 2 use the same notations such that (b) The inverse transform one can follow them easily. It is clear that both algorithms have Fig. 1. Data flow diagram of proposed 4-poiont transform and its inversion. the same number of arithmetic operations. Figure 1 showed an example of the proposed algorithm and its inversion. The ¯ P3 ¯ input polynomial is defined as D(x) = i=0 diXi(x), and the output is given by Di = D¯(ωi + β), for i = 0, 1, 2, 3.

¯ D2k (a + vk−1 + β) 2k−1−1 X sk−1(vk−1 + β) = (d¯ + d¯ k−1 )X¯ (a + v + β) i s (v ) i+2 i k−1 i=0 k−1 k−1 2k−1−1 X sk−1(β) = (d¯ + d¯ k−1 + d¯ k−1 )X¯ (a + v + β) i s (v ) i+2 i+2 i k−1 i=0 k−1 k−1 (15) 2k−1−1 X (0) ¯ ¯ = (gi + di+2k−1 )Xi(a + vk−1 + β) i=0 2k−1−1 X (1) ¯ ¯ (1) = gi Xi(a + vk−1 + β) = D2k−1 (a + vk−1 + β) ∀a ∈ Vk−1, i=0

4 IV. POLYNOMIAL BASIS WITH MONIC POLYNOMIALS AND (45). The objective of these multiplications are to align the ITS OPERATIONS results such that the desired polynomial can be extracted In this section, we define an alternative version of the properly. polynomial basis, and its algorithms to perform multiplica- Let Q(A(x), i) denote the quotient of dividing A(x) by i+1 tions, formal derivatives, and divisions on the new basis. All si(x), where A(x) is in the basis X and deg(A(x)) < 2 . i+1 these operations will be used in the coding algorithms. The Precisely, for a polynomial of degree h < 2 , alternative basis is defined as h−1 2i−1 h−1 X X X A(x) = alXl(x) = alXl(x) + alXl(x) X = {X0(x),X1(x),...,X2m−1(x)} l=0 l=0 l=2i m 2 i (22) in F2m [x]/(x − x), where each Xi(x) is given in (9). 2 −1 h−1 X X This implies that each Xi(x) is a monic polynomial. For any = alXl(x) + si(x) alXl−2i (x). 2m Dh−1(x) ∈ F2m [x]/(x − x), the basis conversion between l=0 l=2i and ¯ requires only h multiplications/divisions: X X The quotient of dividing A(x) by si(x) is then h−1 h−1 X ¯ X di h−1−2i Dh−1(x) = di · Xi(x) = Xi(x). (19) X p (A(x), i) = a i X (x). i=0 i=0 i Q l+2 l l=0 With the linear-time basis conversion, the multipoint evalua- In general, given a dividend a(x) and a divisor b(x), the tion in X¯ (Algorithm 1) can also be applied on X, and the complexity is unchanged. division is to determine the quotient Q(x) and the remainder To simplify the notations, in the rest of this paper, the r(x) such that polynomials are represented in X. For D2k (x) in X, the a(x) = Q(x) · b(x) + r(x), (23) 2k−1 evaluations at Vk + β = {ωi + β}i=0 is denoted as where deg(r(x)) ≤ deg(b(x)) − 1. Without loss of generality, FFT (D k , k, β) X 2 (20) we consider the case =(D k (ω0 + β),D k (ω1 + β),...,D k (ω k + β)), 2 2 2 2 −1 deg(a(x)) > deg(b(x)) ≥ 0. (24) and the inversion is denoted as IFFTX(D2k , k, β). Based on Algorithm 1, the transforms are defined as The proposed algorithm firstly finds out the quotient Q(x), and then the remainder is calculated by FFTX(D2k , k, β) = FFT¯ (D2k ⊗ P2k , k, β), X (21) r(x) = a(x) − Q(x) · b(x). (25) IFFT (D k , k, β) = IFFT¯ (D k , k, β) P k , X 2 X 2 2 In the following, we focus on the algorithm to determine Q(x). where P2k = (p0, p1, . . . , p2k−1). The operation ⊗ is the pairwise multiplication on two vectors, and the operation Let D is the pairwise division. Since the multiplication and formal y = 2 ` − deg(b(x)) − 1, (26) derivative in X are similar to those given in [15], we summa- where rize them in Appendix A for completeness. Next we present D` = dlg(deg(a(x)) + 1)e. (27) the algorithm for polynomial division that is essential for decoding of RS codes. To begin with, (25) is multiplied by Xy(x) to obtain

r(x) · Xy(x) = a(x) · Xy(x) − Q(x) · b(x) · Xy(x). (28) A. Polynomial Division To simplify the notations, let In this subsection, we proposed an O(h lg(h)) polynomial division in the basis X. The proposed algorithm is based on R(x) = r(x) · Xy(x), Newton iteration approach that was used by the fast division A(x) = a(x) · Xy(x), (29) algorithms in the standard basis [13] with O(h lg(h)), if B(x) = b(x) · X (x). O(h lg(h)) FFT exists. However, since our basis is different y from the standard basis, some moderate modifications are Then we have required. R(x) = A(x) − Q(x) · B(x). (30) As compared with the conventional fast division [13], the proposed approach has two major differences. First, the Next we present a method to determine Q(x) from (30). conventional fast division shall reverse the coefficients of the Assume that there exists a polynomial Λ(x) such that divisor B(x) upon performing the Newton iteration. However, Λ(x) · s (x) · B(x) = s (x) + H(x), (31) in our basis X, the polynomial reversion cannot be applied. 1 Da Thus, the proposed algorithm does not reverse the polyno- where mials, and all operations are performed on the polynomials deg(H(x)) ≤ deg(B(x)) + 1 = 2D` , (32) without reversions. Second, the proposed algorithm includes some specific multiplications that are not required in the and conventional approach, such as Xy(x) in (28) and s1(x) in Da = dlg(deg(A(x)) + 1)e. (33)

5 The algorithm to find out Λ(x) will be addressed in Sec- Algorithm 3 Polynomial divisions in X tion IV-B. Before determining Q(x), we first present two Input: A dividend a(x) and a divisor b(x), with deg(a(x)) > lemmas whose proofs are given in Appendix B. deg(b(x)) ≥ 0 Output: A quotient Q(x) and a remainder r(x), such that Lemma 2. Da = D` + 1. (34) a(x) = Q(x) · b(x) + r(x).

From (31) and Lemma 2, the degree of Λ(x) is thus 1: Compute deg(Λ(x)) A(x) = a(x) · Xy(x),

= deg(sDa (x)) − deg(B(x)) − deg(s1(x)) (35) B(x) = b(x) · Xy(x), D D D =2 a − (2 ` − 1) − 2 = 2 ` − 1. where y is defined as (26). 2: Find Λ(x) such that (31) holds. After obtaining Λ(x), (30) is multiplied by Λ(x) · s1(x) to obtain 3: Compute Q(x) by (39). 4: Compute r(x) by (25). R(x) · Λ(x) · s1(x) (36) 5: return Q(x) and r(x). =A(x) · Λ(x) · s1(x) − Q(x) · B(x) · Λ(x) · s1(x) By (31), we have i the updated polynomial Λi(x) of degree 2 − 1 is calculated R(x) · Λ(x) · s1(x) from Λi−1(x). The initial polynomial is =A(x) · Λ(x) · s1(x) − Q(x) · (sDa (x) + H(x)) −1 Λ0(x) = b . (41) and then dB

Q(x) · H(x) + R(x) · Λ(x) · s1(x) Let BD` (x) = B(x), and (37) =A(x) · Λ(x) · s (x) − Q(x) · s (x). 1 Da Bi(x) = Q(Bi+1(x), i) i = 0, 1,...,D` − 1. (42) Lemma 3. The left-hand side of (37) has degree (42) can be rewritten as Da deg(Q(x) · H(x) + R(x) · Λ(x) · s1(x)) ≤ 2 − 1. (38) Bi+1(x) = Bi(x) · si(x) + B¯i(x), (43)

In (37), Q(x)·sDa (x) is a polynomial where the coefficients i where B¯i(x), deg(B¯i) ≤ 2 − 1, is the residual. Clearly, D of Q(x) starts from X2 a (x) = sDa (x). By Lemma 3, the i deg(Bi(x)) = 2 − 1. degree of the left-hand side is no more than 2Da − 1. Thus, For i = 1, 2,...,D , Λ (x) is defined as Da ` i A(x) · Λ(x) · s1(x) has quotient Q(x) starting on degree 2 , 2 ¯ and hence the quotient can be obtained by Λi(x) = Q((si−1(x)) · Λi(x), i + 1), (44)

Q(x) = Q(A(x) · Λ(x) · s1(x),Da), (39) where Λ¯ (x) = (Λ (x))2 · B (x) · s (x). (45) In (39), we have i i−1 i 1 ¯ i+1 Da D`−1 D` It can be verified that deg(Λi(x)) = 2 − 1 and deg(A(x)·Λ(x)·s1(x)) ≤ 2 +2 +2 = 3·2 +1. (40) i deg(Λi(x)) = 2 − 1 holds. The validity of Λ(x) = ΛD` (x) Algorithm 3 shows the steps of the division algorithm. The is supported as follows, where all proofs are given in Ap- complexity is analyzed below. In Step 1, as deg(A(x)) = pendix B. deg(a(x)) + deg(y(x)) < 2D`+1 and deg(B(x)) = Lemma 4. Λ (x) possesses the following equality: deg(b(x)) + deg(y(x)) = 2D` − 1, the complexity is i D`+1 D`+1 D` D` O(2 lg(2 )) = O(2 lg(2 )). In Step 2, we will Λi(x) · Bi(x) · s1(x) = si+1(x) +r ¯i(x), (46) show that O(2D` lg(2D` )) suffice in Section IV-B. In Step i 3, (40) shows that the complexity is O(2D` lg(2D` )). In with deg(¯ri(x)) ≤ 2 . D Step 4, as the degrees of polynomials are less than 2 ` , The following reformulation of (44), that contains no poly- D D the complexity is no more than O(2 ` lg(2 ` )). In sum- nomial multiplications, can be used to determine the complex- D D mary, Algorithm 3 has the complexity O(2 ` lg(2 ` )) = ity of calculating (44). O(deg(a(x)) lg(deg(a(x)))). Lemma 5. (44) can be rewritten as B. Determining Λ(x) given in (1) (1) (31) Λi(x) = Λi (x) + Q(Λi (x), i − 1) · si−1(vi−1), (47) Given B(x) = PdB b X (x) with b 6= 0, this subsection j=0 j j dB where presents a method to find out Λ(x) in (31). Notice that (1) ¯ D Λi (x) = Q(Λi(x), i). dB = deg(Λ(x)) = 2 ` − 1. The proposed method can be seen as a modified version of the division with Newton Algorithm 4 depicts the steps. The algorithm repeats per- iterations [13][19]. forming (45) and (44) (or (47)) to obtain ΛD` (x), which is the The method iteratively computes the coefficients of Λ(x) desired output Λ(x). For the complexity, each iteration (lines from highest degree to lowest degree. For i = 0, 1,...,D`, 3-4) calculates (45) and (44). In (45), as deg(Λi−1(x)) =

6 Algorithm 4 Λ(x) computation Algorithm 5 Half-GCD algorithm Input: A polynomial B(x) Input: HGCD(a(x), b(x), g), where a(x), b(x) ∈ 2m Output: A polynomial Λ(x) such that (31) holds, where F2m [x]/x − x in basis X, and deg(b(x)) ≤ deg(a(x)), g−1 g deg(Λ(x)) = deg(B(x)) = 2D` − 1. 2 ≤ deg(a(x)) ≤ 2 − 1 −1 1: Let Λ0(x) = bd . Output: Two matrices (Z,M) given in (58) B g−1 2: for i = 1, 2,...,D` do 1: if deg(b(x)) < 2 then return 3: Compute (45). a(x) 1 0 Z = ,M = . (48) 4: Compute (44) (or (47), equivalently). b(x) 0 1 5: end for 2: end if 6: return ΛD` (x). 3: (ZH,MH) ← HGCD(aH(x), bH(x), g − 1) 4: Compute i−1 i     2 − 1, deg(Bi(x)) = 2 − 1 and deg(si(x)) = 2, the zM0(x) aL(x) i i = ZH · sg−1(x) + MH . (49) multiplications (45) requires O(2 lg(2 )). In (44), Lemma zM1(x) bL(x) 5 showed that the computation can be reduced to O(2i) g−1 without polynomial multiplications. Thus, each iteration takes 5: if deg(zMl(x)) ≤ 2 − 1 then return i i   O(2 lg(2 )) operations, and the complexity for the loop (line zM0(x) (ZM = ,MH). 2-5) takes zMl(x) D X` 6: end if O(2i lg(2i)) = O(2D` lg(2D` )). 7: zM0(x) is divided by zM1(x) to get i=1 zM0(x) = qM(x) · zM1(x) + rM(x) (50) g−1 g−2 V. EXTENDED EUCLIDEAN ALGORITHMBASEDON with deg(rM(x)) < deg(zM1(x)) ≤ 2 + 2 − 1. HALF-GCDAPPROACH 8: zM1(x) and rM(x) are divided into three polynomials, This section introduces the extended Euclidean algorithm denoted as that will be used in the decoding of RS codes. Given two zM1(x) polynomials a(x) = r−1(x), b(x) = r0(x), and =zM1LL(x) + sg−2(x)zM1LH(x) + sg−1(x)zM1H(x), g deg(b(x)) ≤ deg(a(x)) < 2 , (52) rM(x) Euclidean algorithm is a procedure to recursively divide =rMLL(x) + sg−2(x)rMLH(x) + sg−1(x)rMH(x). rk−2(x) by rk−1(x) to get Compute

rk−2(x) = qk(x) · rk−1(x) + rk(x), zM1M(x)

=zM1LH(x) + (sg−2(x) + sg−2(vg−2))zM1H(x), with deg(rk) < deg(rk−1). The procedure stops at rN (x) = 0, and rN−1(x) is the greatest common divisor (gcd) of rMM(x) a(x) and b(x). An extension version, namely extended Eu- =rMLH(x) + (sg−2(x) + sg−2(vg−2))rMH(x). clidean algorithm, calculates rk(x) with a pair of polynomials 9: (YM,MM) ← HGCD(zM1M(x), rMM(x), g − 1) (uk(x), vk(x)) in each iteration such that 10: return (ZR,MR), where a(x) · u (x) + b(x) · v (x) = r (x). k k k 0 1  MR = MM MH, The (k − 1)-th step of extended Euclidean algorithm can be 1 −qM(x)   (51) expressed as a matrix form zM1LL(x) ZR = YM · sg−2(x) + MM . r (x) u (x) v (x) a(x) rMLL(x) k−2 = k−2 k−2 · . (53) rk−1(x) uk−1(x) vk−1(x) b(x) The next step is shown as r (x) 0 1  u (x) v (x) a(x) For polynomials in the basis, there exist fast ap- k−1 = · k−2 k−2 · . rk(x) 1 −qk(x) uk−1(x) vk−1(x) b(x) proaches in O(M(h) lg(h)) operations, where M(h) denotes (54) the complexity of multiplying two polynomials of degrees The half-GCD algorithm [20][13] calculates the temporal h/2 (see [13, Algorithm 11.6] or [19, Figure 8.3]). The idea result of extended Euclidean algorithm at s-th step such that comes from an observation that, the quotient qk(x) in (54) is g−1 determined by the upper degree part of rk−1(x) and rk(x), and deg(rs(x)) ≤ 2 − 1. (55) the lower degree part of rk−1(x) and rk(x) are not necessary. In this section, we present a half-GCD algorithm in basis Fortunately, this observation is also applicable to our basis X. X. This approach will be performed to solve the error locator From the observation, we partition the inputs a(x) (and polynomial (see (76)) in the decoding procedure of RS codes. b(x)) into several portions, so that the procedure can be applied

7 on the portions of higher degrees. For the algorithms on polynomial division, and this requires O(h lg(h)) by using , it is simple to make such partitions. For basis the fast division approach in Sec. IV. Line 4 and line 10 X, we have to choose partition points at degrees X2g−2 (x) and have polynomial additions and polynomials multiplications. X2g−1 (x). Precisely, a(x) is divided into three polynomials As those polynomials have degrees less than h, the complexity aLL(x), aLH(x) and aH(x) at sg−2(x) and sg−1(x), respec- is O(h lg(h)) by the results given in Appendix A. In summary, tively. The representation is given by the overall complexity is 2 a(x) =aL(x) + sg−1(x)aH(x) T (h) = 2T (h/2) + O(h lg(h)), and T (h) = O(h lg (h)). (56) =aLL(x) + sg−2(x)aLH(x) + sg−1(x)aH(x). VI.REED-SOLOMONENCODINGALGORITHM Similarly, b(x) is partitioned in the same manner: This section introduces an O(n lg(n − k)) encoding algo- b(x) =bL(x) + sg−1(x)bH(x) m t (57) rithm for (n = 2 , k) RS codes over F2m , with T = 2 = =bLL(x) + sg−2(x)bLH(x) + sg−1(x)bH(x). n − k a power of two. There exist two viewpoints for the Algorithm 5 depicts the proposed algorithm constructions of RS codes, termed as the polynomial evalua- HGCD(a(x), b(x), g), with deg(b(x)) ≤ deg(a(x)) and tion approach and the generator polynomial approach. For the g−1 g polynomial evaluation approach, the message is interpreted as 2 ≤ deg(a(x)) ≤ 2 − 1. The algorithm outputs two 2m matrices a polynomial u(x) ∈ F2m [x]/(x − x) of degree less than k. The codeword v = (v0, v1, . . . , vn−1) is defined as the z (x) m (x) m (x) Z = 0 and M = 00 01 (58) evaluations of u(x) at n distinct points. z (x) m (x) m (x) 1 10 11 Assume u(x) is in the basis X¯, and thus u(x) = Pk−1 ¯ such that i=0 uiXi(x). The vector of coefficients is denoted as 1) u = (u , u , . . . , u , ω , ω , . . . , ω ), (63) a(x) 0 1 k−1 0 0 0 Z = M · ; (59) | {z } b(x) T with T ω s in the high degree part. Then the codeword can 2) 0 g−1 be computed via Algorithm 1: deg(z0(x)) ≥ 2 , g−1 (60) deg(z (x)) ≤ 2 − 1; v = FFT¯ (u, m, ω ). (64) 1 X 0 3) However, (64) requires O(n lg(n)) operations, and the gen- erated codeword is not systematic. In the following, another deg(m (x)) ≤ deg(a(x)) − deg(z (x)); (61) 11 0 formula with complexity O(n lg(n − k)) is given, and the 4) generated codeword is systematic. The inversion of (64) is given by deg(mi0(x)) ≤ deg(mi1(x)), u = IFFT¯ (v, m, ω ). (65) (62) X 0 deg(m0i(x)) ≤ deg(m1i(x)), i = 0, 1. Note that, in (65), u has T ω s in the high degree part (see Before proving the validity of Algorithm 5, we give the 0 (63)). To begin with, v is divided into a number of sub-vectors following Lemmas whose proofs are given in Appendix B. v = (v , v ,..., v ), (66) Lemma 6. Algorithm 5 always outputs Z and M given in 0 1 n/T −1 (58) that satisfy (59). where each vi has T elements defined as

Lemma 7. The recursive calls in HGCD(a(x), b(x), g) meet vi = (vi·T , v1+i·T , . . . , vT −1+i·T ) i = 0, 1, . . . , n/T − 1. the requirements deg(b(x)) < deg(a(x)) and 2g−1 ≤ deg(a(x)) ≤ 2g − 1. Those sub-vectors can be proved to possess the equality given in the following lemma, whose proof is given in Appendix B. Lemma 8. Algorithm 5 always outputs Z and M given in (58) that satisfies (60). Lemma 10. The following equality is hold:

ω0 =IFFT¯ (v0, t, ω0) + IFFT¯ (v1, t, ωT ) + ... Lemma 9. Algorithm 5 always outputs Z and M given in X X (67) + IFFT¯ (v , t, ω ), (58) that satisfy (61) and (62). X n/T −1 k By the above Lemmas, we have where + is the addition for vectors. Theorem 2. Algorithm 5 is valid. That is, Algorithm 5 always (67) plays the core transform of the proposed algorithm. n/T −1 outputs Z and M given in (58) that satisfy the above four Assume v0 includes the parity symbols, and others {vi}i=1 conditions. are the message symbols. From (67), the parity is computed via We determine the computational complexity as follows. 0 v =IFFT¯ (v , t, ω ) + IFFT¯ (v , t, ω ) + ... The algorithm complexity is denoted as T (h) of polynomial 0 X 1 T X 2 2T g + IFFT¯ (v , t, ω ), (68) degrees h = 2 . In step 3 and step 9, the algorithm shall X n/T −1 k 0 call the routine twice, and it takes 2 · T (h/2). line 7 is the v =FFT¯ (v , t, ω ). 0 X 0 0

8 This algorithm requires a T -point FFT and (n/T − 1) In (75), the degree of each term is less than k = n − T = m t times of T -point IFFT. Hence, the complexity of the encoding 2 − 2 , except for the last term X2m−2t (x)st(x). Thus, the algorithm is quotient of dividing sm(x) by Xk(x) would be st(x). Based on above results, the new key formula is O(T lg(T )) + (n/T − 1)O(T lg(T )) = O(n lg(n − k)). z0(x) = s(x)λ(x) + q(x)st(x), (76) VII.REED-SOLOMONDECODINGALGORITHM with deg(z0(x)) ≤ T/2. (76) is the key equation to find the This section shows a decoding algorithm for (n = error locator polynomial. m 2 , k) RS codes over F2m , where the codeword v = To find λ(x), extended Euclidean algorithm is applied on (v0, . . . , vn−1) = (u(ω0),..., u(ωn−1)) is generated by Sec- st(x) and s(x). The extended Euclidean algorithm stops when tion VI. The proposed algorithm follows the syndrome-based the remainder has degree less than T/2. After obtaining λ(x), decoding process. Let r = (r0, r1, . . . , rn−1) = v + e denote the next step is to find out the locations of errors E defined the received vector with error pattern e = (e0, e1, . . . , en−1). in (70), that is the set of roots of λ(x). Hence, After obtaining E, the final step is to calculate the error ri = u(ωi) + ei. (69) values. The formal derivative of (73) is 0 0 If ei 6= 0, ri is an erroneous symbol. Suppose e contains u (x) · λ(x) + u(x) · λ (x) 0 0 0 (77) v ≤ (n − k)/2 = T/2 non-zero symbols. Let =¯r (x) · λ(x) + ¯r(x) · λ (x) + q (x) · sm(x) + q(x).

E = {ωi ∈ F2m |ei 6= 0} (70) By substituting ωi ∈ E into (77), the error value is given by 0 0 denote the set of ωi corresponding to locations of errors. Then, u(ωi) · λ (ωi) = ¯r(ωi) · λ (ωi) + q(ωi) error-locator polynomial is defined as q(ωi) (78) ⇒u(ωi) − ¯r(ωi) = 0 , ∀ωi ∈ E. Y λ (ωi) λ(x) = (x − ωi). (71)

ωi∈E Notice that (78) uses q(x) to compute the error values, Let ¯r(x) denote a polynomial of degree less than 2m, with rather than z0(x) used in Forney’s formula. In summary, the decoding algorithm consists of four steps: ¯r(ωi) = ri, ∀ωi ∈ F2m . It is clear to see that  1) Calculate syndrome polynomial s(x). 0 if ωi ∈ E; ¯r(ωi) · λ(ωi) = 2) Determine the error-locator polynomial λ(x) from (76) ri · λ(ωi) if ωi ∈ F2m \ E. by extended Euclidean algorithm. The above formula leads to 3) Find the error locations E. 4) Calculate the error values via (78). u(ωi) · λ(ωi) = ¯r(ωi) · λ(ωi) The details of each step is described below. In the first step, ⇒u(x) · λ(x) = ¯r(x) · λ(x) (mod x − ω ) ∀ω ∈ m . i i F2 s(x) is the high degree part of applying IFFT on the received (72) codeword r. However, since the high degree part is required Due to 2m−1 only, we follow the same idea of the encoding formula (67). Y 2m (x − ωi) = x − x = sm(x), In particular, the received codeword is divided into several i=0 individual parts r = (r0, r1,..., rn/T −1), where each ri has t (72) implies that T = 2 = n − k elements. Then the syndrome polynomial is calculated by u(x) · λ(x) ≡ ¯r(x) · λ(x) (mod sm(x)) (73) s =IFFT (r0, t, ω0) + IFFT (r1, t, ωT ) + ... ⇒u(x) · λ(x) = ¯r(x) · λ(x) + q(x) · sm(x), X X + IFFT (rn/T −1, t, ωk). with deg(q(x)) < v ≤ T/2. Given ¯r(x), (73) is the key X equation [21][9] to find out λ(x), by applying the Euclidean In the second step, the fast Euclidean algorithm (Algorithm algorithm on sm(x) and ¯r(x). However, though (73) is similar 5) is applied on st(x) and s(x). Upon performing the Eu- to the key equation of the syndrome decoding, ¯r(x) is not the clidean algorithm, we go a step by dividing st(x) with s(x), syndrome polynomial. To obtain the syndrome decoding, the resulting in new key formula is the quotients of dividing λ(x) and sm(x) st(x) = qt(x) · s(x) + rt(x). by X (x). k s(x) r (x) In this case, ¯r(x) is divided into two parts Then call Algorithm 5 with inputs and t to obtain     z0(x) u0(x) v0(x) ¯r(x) = ¯r0(x) + Xk(x)s(x), (74) ( , ) ← HGCD(s(x), rt(x),T ). z1(x) u1(x) v1(x) where ¯r0(x) denotes the residual. Notably, if no error occurs, Then we have ¯r(x) = u(x) of degree less than k, and hence s(x) = 0. Thus we can take s(x) as the syndrome polynomial. z1(x) = u1(x)s(x) + v1(x)rt(x) For sm(x), the polynomial is recursively decomposed by ⇒z1(x) = u1(x)s(x) + v1(x)(st(x) − qt(x) · s(x)) (79)

(7) to obtain (75). ⇒z1(x) = v1(x)st(x) + (u1(x) − v1(x)qt(x))s(x),

9 and thus the error locator polynomial is given by Euclidean algorithm was also given in this paper. Combining these algorithms, a fast RS decoding algorithm is proposed, to λ(x) = u (x) − v (x)q (x). 1 1 t achieve the complexity O(n lg(n − k) + (n − k) lg2(n − k)). In the third step, the roots of λ(x) can be searched via FFTs. By letting k/n a constant, the complexity can be written as The transform O(n lg2(n)), that improves upon the best currently available O(n lg2(n) lg lg(n)) FFTX(λ, T, ωi·T ) (80) decoding complexity of [9]. Although Justesen [8] had given the algorithm with the same complexity is to evaluate λ(x) at Vt + ωi·T . If the result vector contains in 1976, it does not include the field m , that can be zeros, then λ(x) has some roots at the corresponding points. F2 recognized as the most important case in the real applications. (80) is performed at i = 0, 1, . . . , n/T − 1 to search the roots The following we address some potential future works: 1. in m . Notably, if deg(λ(x)) is larger than the number of F2 To remove the constraint (n − k) a power of two in the found roots, the decoding procedure shall be terminated. This encoding/decoding algorithms. This will increase the values situation occurs when the number of errors exceeds T/2. of n and k to be selected. 2. To generalize the algorithm In the final step, we compute FFT (q, T, ω ) and X i·T to handle both errors and erasures. 3. To reduce the leading FFT (λ0, T, ω ) (computing λ0(x) is given in Appendix A), X i·T constant of the FFT approach. This will make the proposed for i = 0, 1, . . . , n/T −1. Then the error values are calculated algorithm more competitive for short codes. via (78). To determine the computational complexity, the first step requires (n/T ) times of T -point IFFT such that the complexity REFERENCES is n/T ·O(T lg(T )) = O(n lg(n − k)). The second step takes [1] I. S. Reed and G. Solomon, “Polynomial codes over 2 O((n−k) lg (n−k)) operations. The third step requires (n/T ) certain finite fields,” Journal of the Society for Industrial times of T -point FFT, and thus the complexity is O(n lg(n − and Applied Mathematics, vol. 8, no. 2, pp. 300–304, k)). The final step requires a formal derivative of polynomial 1960. degree T , and at most 2(n/T ) times of T -point FFT. Thus, [2] J. K. Wolf, “Adding two information symbols to certain the complexity is O(n lg(n − k)). In summary, the proposed nonbinary bch codes and some applications,” Bell System 2 decoding algorithm requires O(n lg(n−k)+(n−k) lg (n−k)). Technical Journal, vol. 48, no. 7, pp. 2405–2424, Sept 1969. VIII.CONCLUDINGREMARKS [3] K. V. Rashmi, N. Shah, and P. Kumar, “Optimal exact- In the simulations, we implemented the algorithm in C and regenerating codes for distributed storage at the MSR and compiled it in 64-bit GCC compiler on Intel Xeon X5650 and MBR points via a product-matrix construction,” IEEE Windows 7 platform. For (n, k) = (216, 215) RS codes over Trans. Inf. Theory, vol. 57, no. 8, pp. 5227–5239, Aug −3 F216 , the program took about 2.22 × 10 second to produce 2011. a codeword. We tested a codeword with (n − k)/2 errors, and [4] S.-J. Lin, W.-H. Chung, Y. S. Han, and T. Y. Al-Naffouri, the decoding takes about 0.401 seconds. As for a comparison, “A unified form of exact-msr codes via product-matrix we also ran the standard RS decoding algorithm [22], that took frameworks,” IEEE Trans. Inf. Theory, vol. 61, no. 2, pp. about 22.014 seconds to decode a codeword. Thus, the pro- 873–886, Feb 2015. posed decoding is around 50 times faster than the traditional [5] C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, approach under the parameter configurations described above. P. Gopalan, J. Li, and S. Yekhanin, “Erasure coding in In our simulations, the proposed RS algorithm is suitable for windows azure storage,” in Presented as part of the 2012 long RS codes. USENIX Annual Technical Conference (USENIX ATC In this paper, we developed fast decoding algorithms for 12). Boston, MA: USENIX, 2012, pp. 15–26. (n = 2m, k) systematic Reed-Solomon (RS) codes over fields [6] N. Chen and Z. Yan, “Complexity analysis of reed- + m F2m , m ∈ Z . The proposed algorithms are formed on a new solomon decoding over GF (2 ) without using syn- basis X [15]. We reformulated the formulas of the syndrome- dromes,” EURASIP J. Wirel. Commun. Netw., vol. 2008, based decoding algorithm, such that the FFTs for the new pp. 16:1–16:11, Jan. 2008. basis can be applied. Further, the fast polynomial division [7] T. Truong, P. Chen, L. Wang, and T. Cheng, “Fast algorithm is proposed. We made some modifications such that transform for decoding both errors and erasures of reed- the Newton iteration can be applied to the new basis. The fast solomon codes over GF (2m) for 8 ≤ m ≤ 10,” IEEE

sm(x) =sm−1(x)(sm−1(vm−1) + sm−1(x))

=sm−1(x)(sm−1(vm−1) + sm−2(x)(sm−2(vm−2) + sm−2(x))) = ··· (75) =sm−1(x)sm−1(vm−1) + sm−1(x)sm−2(x)sm−2(vm−2) + ...

+ sm−1(x)sm−2(x) . . . st(x)st(vt) + sm−1(x)sm−2(x) . . . st(x)st(x)

=X2m−2m−1 (x)sm−1(vm−1) + X2m−2m−2 (x)sm−2(vm−2) + ··· + X2m−2t (x)st(vt) + X2m−2t (x)st(x).

10 Trans. Veh. Commun., vol. 54, no. 2, pp. 181–186, Feb APPENDIX A 2006. POLYNOMIAL MULIPLICATION AND FORMAL DERIVATIVE [8] J. Justesen, “On the complexity of decoding Reed- ONNEWBASIS Solomon codes (corresp.),” IEEE Trans. Inf. Theory, vol. 22, no. 2, pp. 237–238, Mar 1976. [15] showed the polynomial multiplication and formal [9] S. Gao, “A new algorithm for decoding Reed-Solomon derivative in ¯. We take the similar procedure to show the codes,” in Communications, Information and Network X corresponding operations in . Security. Kluwer, 2002, pp. 55–68. X [10] V. Y. Pan, “Faster solution of the key equation for decoding bch error-correcting codes,” in Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of A. Muliplication Computing, ser. STOC ’97. New York, NY, USA: ACM, 1997, pp. 168–175. To multiply two polynomials, there exists a well-known fast [11] A. Schonhage,¨ “Schnelle multiplikation von polynomen approach based on FFT techniques. This approach can also be uber¨ korpern¨ der charakteristik 2,” Acta Informatica, applied on the basis X over finite fields F2m . Let a(x) = vol. 7, no. 4, pp. 395–398, 1977. [Online]. Available: h−1 h−1 P a · X (x) and b(x) = P b · X (x) denote the two http://dx.doi.org/10.1007/BF00289470 i=0 i i i=0 i i polynomials in . Its product a(x) · b(x) (mod s (x)) can be [12] D. G. Cantor and E. Kaltofen, “On fast multiplication of X i computed as polynomials over arbitrary algebras,” Acta Informatica, vol. 28, no. 7, pp. 693–701, 1991. [13] J. V. Z. Gathen and J. Gerhard, Modern Computer IFFTX(FFTX(a, β) ⊗ FFTX(b, β), β), Algebra, 3rd ed. New York, NY, USA: Cambridge University Press, 2013. i where a = (a0, a1, . . . , ah−1, 0,..., 0) is a 2 -point vector [14] S. Gao and T. Mateer, “Additive fast fourier transforms represents the coefficients of a(x) up to degree 2i − 1. over finite fields,” IEEE Trans. Inf. Theory, vol. 56, Similarly, b is defined accordingly. The operation ⊗ performs no. 12, pp. 6265–6272, Dec 2010. pairwise multiplication on two vectors. This requires one 2i- [15] S. J. Lin, W. H. Chung, and Y. S. Han, “Novel polynomial point IFFT, two 2i-point FFTs and 2i multiplications, and thus basis and its application to reed-solomon erasure codes,” the complexity is O(2i lg(2i)). in Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on, Oct 2014, pp. 316–325. [16] O. Ore, “On a special class of polynomials,” Trans. Amer. Math. Soc., vol. 35, no. 11, pp. 559–584, Nov 1933. B. Formal derivative [17] D. G. Cantor, “On arithmetical algorithms over finite fields,” Journal of Combinatorial Theory, Series A, For a polynomial D2k (x) in X, we have vol. 50, no. 2, pp. 285–300, 1989. [18] J. von zur Gathen and J. Gerhard, “Arithmetic and 2k−1 factorization of polynomial over F2,” in Proceedings X of the 1996 International Symposium on Symbolic and D2k (x) = diXi(x) Algebraic Computation, Zurich, Switzerland, 1996, pp. i=0 2k−1−1 2k−1 1–9. X X = diXi(x) + diXi(x) [19] T. Mateer, “Fast Fourier transform algorithms with appli- (81) cations,” Ph.D. dissertation, Clemson, SC, USA, 2008. i=0 i=2k−1 2k−1−1 2k−1−1 [20] R. T. Moenck, “Fast computation of GCDs,” in ACM X X Symposium on Theory of Computing (STOC), 1973, pp. = diXi(x) + sk−1(x) di+2k−1 Xi(x) 142–151. i=0 i=0 (0) (1) [21] A. Shiozaki, “Decoding of redundant residue polynomial =D2k−1 (x) + sk−1(x)D2k−1 (x). codes using euclid’s algorithm,” IEEE Trans. Inf. Theory, vol. 34, no. 5, pp. 1351–1354, Sep 1988. The formal derivative of D2k (x) is given by [22] S. Rockliff. (1989) Reed-Solomon (RS) codes. [Online]. Available: http://www.eccpage.com/ 0 (0) 0 0 (1) D k (x) =[D ] (x) + s (x)D (x) 2 2k−1 k−1 2k−1 (82) (1) 0 + sk−1(x)[D2k−1 ] (x).

0 (0) 0 From Theorem 1, sk−1(x) is a constant. [D2k−1 ] (x) and (1) 0 k sk−1(x)[D2k−1 ] (x) can be computed recursively. Let h = 2 , and the recursive form of the complexity is written by T (h) = 2 · T (h/2) + O(h) and then T (h) = O(h lg(h)).

11 APPENDIX B C. Proof of Lemma 3 PROOFOF LEMMAS Proof. (38) is a summation of two terms. For the first term, we have A. Proof of Lemma 1 deg(Q(x) · H(x)) ¯ Proof. From the definition, D2k (x) can be reformulated as ≤ deg(Q(x)) + deg(B(x)) + 1 (From (32)) = deg(A(x)) + 1 (From (30)) 2k−1 ¯ X ¯ ¯ = deg(a(x)) + y + 1 (From (29)) D2k (x) = diXi(x) (88) D` i=0 = deg(a(x)) + 2 − deg(b(x)) (From (26)) 2k−1−1 2k−1 ≤ deg(a(x)) + 2D` X X = d¯ X¯ (x) + d¯ X¯ (x) D D i i i i ≤2 ` − 1 + 2 ` (From (27)) i=0 i=2k−1 (83) =2Da − 1. 2k−1−1 2k−1−1 X ¯ sk−1(x) X ¯ = d X¯ (x) + d k−1 X¯ (x) For the second term, we have i i s (v ) i+2 i i=0 k−1 k−1 i=0 deg(R(x) · Λ(x) · s1(x)) 2k−1−1 X ¯ sk−1(x) ¯ ¯ = deg(r(x)) + y + 2D` + 1 = (di + di+2k−1 )Xi(x). s (v ) D D i=0 k−1 k−1 = deg(r(x)) + (2 ` − deg(b(x)) − 1) + 2 ` + 1 (From (26)) =2Da + deg(r(x)) − deg(b(x)) From Theorem 1, given γ ∈ F2m , we have ≤2Da − 1. (89) sk−1(a + γ) =sk−1(a) + sk−1(γ) (84) This completes the proof. =sk−1(γ) ∀a ∈ Vk−1.

From (83) and (84), we have D. Proof of Lemma 4 Proof. The proof follows mathematical induction. For the ¯ D2k (a + γ) based case i = 0, (41) shows the following holds. k−1 2 −1 Λ (x) · B (x) · s (x) = s (x), X ¯ sk−1(a + γ) ¯ ¯ 0 0 1 1 = (di + di+2k−1 )Xi(a + γ) sk−1(vk−1) i=0 (85) and r¯0(x) = 0. 2k−1−1 Assume (46) holds at i = j. That is, X ¯ sk−1(γ) ¯ ¯ = (di + di+2k−1 )Xi(a + γ), s (v ) Λj(x) · Bj(x) · s1(x) = sj+1(x) +r ¯j(x), (90) i=0 k−1 k−1 2 which is multiplied by (sj(x)) to get for each a ∈ V . This completes the proof. k−1 2 (sj(x)) · Λj(x) · Bj(x) · s1(x) 2 2 (91) =(sj(x)) · sj+1(x) + (sj(x)) · r¯j(x). B. Proof of Lemma 2 From (7), we have 2 2 Proof. From (33), we have (sj(x)) · sj+1(x) + (sj(x)) · r¯j(x) 2 2 =(sj+1(x)) + sj(vj)sj(x)sj+1(x) + (sj(x)) · r¯j(x) D a =sj+2(x) + sj+1(vj+1)sj+1(x) + sj(vj)sj(x)sj+1(x) =dlg(deg(a(x)) + y + 1)e 2 + (sj(x)) · r¯j(x). =dlg(deg(a(x)) + 2D` − deg(b(x)))e (86) (92) By (92)(43), (91) can be rewritten as ≥dlg(2D` + 1)e (From (24))

=D` + 1. sj(x) · Λj(x) · Bj+1(x) · s1(x) = sj+2(x) +r ˆj(x), (93) where Moreover, rˆj(x) =sj+1(vj+1)sj+1(x) + sj(vj)sj(x)sj+1(x)

D` 2 Da =dlg(deg(a(x)) + 2 − deg(b(x)))e + (sj(x)) · r¯j(x) (94)

D` ¯ ≤dlg(deg(a(x)) + 2 )e (87) + sj(x) · Λj(x) · Bj(x) · s1(x).

D` D` ≤dlg(2 + 2 )e = D` + 1. The degree of each term of rˆj(x) is j+1 deg(sj+1(vj+1)sj+1(x)) = 2 , (86)(87) concludes that Da = D` + 1. This completes the j j+1 proof. deg(sj(vj)sj(x)sj+1(x)) = 2 + 2 ,

12 2 j+1 j deg((sj(x)) · r¯j(x)) ≤ 2 + 2 , From (99), the first term in (98) can be reformulated as ¯ ¯ (1) ¯ deg(sj(x) · Λj(x) · Bj(x) · s1(x)) Q(si(x)Λi(x), i + 1) = Λi (x) = Q(Λi(x), i). ≤2j + (2j − 1) + (2j − 1) + 2. With the similar step, it can be shown that the second stem j+1 j Thus, we have deg(ˆrj(x)) ≤ 2 + 2 . can be formulated as When i = j + 1, from (45)(44), we have ¯ si−1(vi−1)Q(si−1(x)Λi(x), i + 1) Λ (x) = ((s (x) · Λ (x))2 · B (x) · s (x), j + 2). j+1 Q j j j+1 1 (1) =Q(Λi (x), i − 1) · si−1(vi−1). The above equation can be rewritten as

Λj+1(x) · sj+2(x) +r ˇj+2(x) This completes the proof. 2 (95) =(sj(x) · Λj(x)) · Bj+1(x) · s1(x), j+2 where deg(ˇrj+2(x)) ≤ 2 − 1. We then multiply (95) by F. Proof of Lemma 6 Bj+1(x) · s1(x) to obtain Proof. For the based case deg(b(x)) < 2g−1 (see Algorithm 5, Λj+1(x) · Bj+1(x) · s1(x) · sj+2(x) line 1), it is clear that (48) holds. HGCD(a(x), b(x), j) +r ˇj+2(x) · Bj+1(x) · s1(x) Assume Algorithm 5 is valid for with j ≤ g −1. When j = g, the degree of a(x) is between 2g−1 ≤ =(s (x) · Λ (x) · B (x) · s (x))2 (96) j j j+1 1 deg(a(x)) ≤ 2g − 1. In this case, both a(x) and b(x) are 2 =(sj+2(x) +r ˆj(x)) (By (93)) divided into three individual polynomials as expressed in (56) 2 2 =(sj+2(x)) + (ˆrj(x)) . and (57). In line 3, HGCD(aH(x), bH(x), g − 1) is called to obtain (ZH,MH), that possesses (96) is then divided by sj+2(x) to get a (x) Λj+1(x) · Bj+1(x) · s1(x) = sj+2(x) +r ¯j+1(x), (97) Z = M · H . (100) H H b (x) where H 2 (ˆrj(x)) − rˇj+2(x) · Bj+1(x) · s1(x) Multiplying (100) by sg−1(x) to obtain r¯j+1(x) = . sj+2(x)   aH(x) In (97), the degree of each term of deg(¯rj+1(x)) is as follows: ZH · sg−1(x) = MH · sg−1(x) (101) bH(x) deg((ˆr )2(x)) ≤ 2 · (2j+1 + 2j), j which is equivalent to deg(ˇrj+2(x) · Bj+1(x) · s1(x))   j+2 j+1 aL(x) ≤(2 − 1) + (2 − 1) + 2, ZH · sg−1(x) + MH j+2 bL(x) deg(sj+2(x)) = 2 .   (102) sg−1(x) · aH(x) + aL(x) j+1 =MH . Thus, deg(¯rj+1(x)) ≤ 2 . This completes the proof. sg−1(x) · bH(x) + bL(x) E. Proof of Lemma 5 By (56) and (57), (102) becomes Proof. From (44), we have     aL(x) a(x) ZH · sg−1(x) + MH = MH . (103) Λi(x) bL(x) b(x) = ((s (x))2 · Λ¯ (x), i + 1) Q i−1 i Then we have ¯ =Q((si(x) + si−1(vi−1)si−1(x)) · Λi(x), i + 1) (From (7)) z (x) a(x) a(x) = (s (x)Λ¯ (x), i + 1) M0 = M and Z = M . (104) Q i i z (x) H b(x) M H b(x) ¯ M1 + si−1(vi−1) · Q(si−1(x)Λi(x), i + 1) (98) Note that zM0(x) and zM1(x) are computed in line 4. (104) ¯ i+1 (98) has two terms, and we recalled that deg(Λi(x)) = 2 − shows that (ZM,MH) satisfies the equality, and thus the return 1. Let in line 5 is valid. ¯ (0) (1) Λi(x) = Λi (x) + si(x)Λi (x), In line 7, zM0(x) is divided by zM1(x) to get where both Λ(0)(x) and Λ(1)(x) have degrees no more than i i z (x) = q (x) · z (x) + r (x), (105) 2i − 1. Then M0 M M1 M

si(x)Λ¯ i(x) with (0) (1) deg(r (x)) < deg(z (x)). (106) =si(x)(Λi (x) + si(x)Λi (x)) M M1 (0) 2 (1) =si(x)Λi (x) + (si(x)) Λi (x) (99) The matrix form of (105) can be reformulated as (0) (1) =si(x)Λi (x) + si(vi)si(x)Λi (x)       zM1(x) 0 1 zM0(x) (1) = . (107) + si+1(x)Λi (x). rM(x) 1 −qM(x) zM1(x)

13 Then zM1(x) and rM(x) are decomposed into several polyno- G. Proof of Lemma 7 mials as Proof. Assume HGCD(a(x), b(x), i) is valid for i ≤ g − 1, zM1(x) i.e., the recursive calls in line 3 (and line 9) are valid. Assume =zM1LL(x) + sg−2(x)zM1LH(x) + sg−1(x)zM1H(x) i = q. It is clear that the call at line 3 satisfies the condition, since aH(x) and bH(x) are the high degree portions of a(x) =zM1LL(x) + sg−2(x)zM1LH(x) (108) and b(x), respectively. + sg−2(x)(sg−2(x) + sg−2(vg−2))zM1H(x) For the call at line 9, we first consider the degree of zM1(x). =zM1LL(x) + sg−2(x)zM1M(x), For simplicity, (ZH,MH) is denoted as where     zH0(x) mH00(x) mH01(x) zM1M(x) ZH = ,MH = . (109) zH1(x) mH10(x) mH11(x) =z (x) + (s (x) + s (v ))z (x). M1LH g−2 g−2 g−2 M1H q−2 q−2 Because deg(aL(x)) ≤ 2 − 1, deg(bL(x)) ≤ 2 − 1, and Similarly, q−2 deg(zH1(x)) ≤ 2 − 1, from the assumption, we have rM(x) = rMLL(x) + sg−2(x)rMM(x), (110) q−2 g−1 deg(zM1(x)) ≤ max{(2 − 1) + 2 , where q−2 deg(mH10(x)) + (2 − 1), (120) rMM(x) q−2 (111) deg(mH11(x)) + (2 − 1)}. =rMLH(x) + (sg−2(x) + sg−2(vg−2))rMH(x). From the assumption, zM1M(x) (and rMM(x)) can be treated as the quotient of dividing zM1(x) (and rM(x)) by sg−2(x). By (106), this deg(mH10(x)) ≤ deg(mH11(x)) ≤ deg(aH(x))−deg(zH0(x)). implies q−1 q−2 As deg(aH(x)) ≤ 2 −1 and deg(zH0(x)) ≥ 2 , we have deg(rMM(x)) < deg(zM1M(x)). q−2 Line 9 calls HGCD(zM1M(x), rMM(x), g − 1) to obtain deg(mH10(x)) ≤ deg(mH11(x)) ≤ 2 − 1. (Y ,M ) possessing M M Then (120) gives   zM1M(x) YM = MM . (112) deg(zM1(x)) rMM(x) ≤ max{(2g−2 − 1) + 2g−1, (2q−2 − 1) + (2q−2 − 1)} Multiplying (112) by sg−2(x) to obtain g−2 g−1   =(2 − 1) + 2 . zM1M(x) YM · sg−2(x) = MM · sg−2(x) (113) (121) rMM(x). Thus, the inequality   zM1LL(x) g−1 g−2 By adding to both side of (113), we have deg(rM(x)) < deg(zM1(x)) ≤ 2 + 2 − 1 rMLL(x)   in line 7 is valid. In line 8, z (x) and r (x) are the zM1LL(x) M1M MM YM · sg−2(x) + MM quotients of dividing r (x) and z (x) by s (x), and then rMLL(x) M M1 g−2 (114)   g−1 zM1LL(x) + zM1M(x) · sg−2(x) deg(rMM(x)) < deg(zM1M(x)) ≤ 2 − 1. (122) =MM rMLL(x) + rMM(x) · sg−2(x) Further, due to the if condition in line 5, we have which is equivalent to g−1 z (x) z (x) deg(zM1(x)) ≥ 2 Y · s (x) + M M1LL = M M1 . (115) M g−2 M r (x) M r (x) MLL M after line 7. This implies Substituting (107) and (104) into (115) to obtain g−2   deg(zM1M(x)) ≥ 2 . (123) zM1LL(x) YM · sg−2(x) + MM rMLL(x) By (122) and (123), the requirements of the call in line 9 are (116) 0 1  a(x) verified. =MM MH . 1 −qM(x) b(x) Hence, we have H. Proof of Lemma 8 a(x) Z = M , (117) Proof. Algorithm 5 has three returns at lines 1, 5 and 10. R R b(x) Assume that the recursive call HGCD in line 3 and line 10 where outputs the valid results. For line 1, it is clear to see it. For 0 1  MR = MM MH (118) line 5, (101)-(104) show that the degree of zM0(x) is at least 1 −qM(x) g−1 and deg(zM0(x)) ≥ deg(zH0) + 2 . z (x) M1LL g−2 ZR = YM · sg−2(x) + MM (119) By the assumption, deg(zH0) ≥ 2 and we have rMLL(x) g−2 g−1 are the return results in Line 10. deg(zM0(x)) ≥ 2 + 2 . (124)

14 By (124) and the if condition in line 5, the first condition This verifies (127). holds. Let us consider line 10. ZR, YM MM, and MR can be Let us consider line 10. ZR, YM and MM can be denoted denoted as as     zR0(x) yM0(x)     ZR = ,YM = , zR0(x) yM0(x) zR1(x) yM1(x) ZR = ,YM = , zR1(x) yM1(x)     mM00(x) mM01(x) mR00(x) mR01(x) m (x) m (x) MM = .MR = . M = M00 M01 . mM10(x) mM11(x) mR10(x) mR11(x) M m (x) m (x) M10 M11 The objective is to prove The degree of zR0(x) is at least deg(m (x)) ≤ deg(a(x)) − deg(z (x)), (132) g−2 R11 R0 deg(zR0(x)) ≥ deg(yM0(x)) + 2 ≥2g−2 + 2g−2 = 2g−1. and deg(mRi0(x)) ≤ deg(mRi1(x)), Further, (133) g−2 deg(mR0i(x)) ≤ deg(mR1i(x)), i = 0, 1. deg(zR1(x)) = max{ deg(yM1(x)) + 2 , By assumptions, line 9 of the algorithm gives deg(mM10(x)zM1LL(x)), (125)

deg(mM11(x)rMLL(x))}. deg(mM11(x)) ≤ deg(zM1M(x)) − deg(yM0(x)), (134) By assumptions, we have and

deg(m (x)) ≤ deg(m (x)) deg(mMi0(x)) ≤ deg(mMi1(x)), M10 M11 (135) ≤ deg(zM1M(x)) − deg(yM0(x)) deg(mM0i(x)) ≤ deg(mM1i(x)), i = 0, 1. ≤ (2q−1 − 1) − 2q−2 = 2q−2 − 1. To verify (133), (118) can be reformed as

Then (125) becomes MR deg(z (x)) m (x) m (x) m (x) m (x) R1 = M00 M01 H10 H11 ≤ max{(2g−2 − 1) + 2g−2, (2q−2 − 1) + (2q−2 − 1)} mM10(x) mM11(x) mH00(x) mH01(x) (136)   =2g−1 − 1, mM01(x)   − qM(x) mH10(x) mH11(x) . (126) mM11(x) q−2 as, by assumptions, deg(zM1LL(x)) ≤ 2 − 1 and Based on assumptions (130) and (133), it can be seen that the q−2 g−2 deg(rMLL(x)) ≤ 2 − 1, and deg(yM1(x)) ≤ 2 − 1. degrees of the elements of MR are determined by the second This verifies (60). term. Precisely,

I. Proof of Lemma 9 deg(mR00(x)) = deg(qM(x)) + deg(mM01(x)) + deg(mH10(x)), Proof. Assume that the recursive call HGCD in line 3 and line deg(mR01(x)) = deg(qM(x)) + deg(mM01(x)) + deg(mH11(x)), 10 outputs the valid results. For the base case in line 1, it is deg(mR10(x)) = deg(qM(x)) + deg(mM11(x)) + deg(mH10(x)), clear that the condition holds. Notice that deg(0) is a special deg(mR11(x)) = deg(qM(x)) + deg(mM11(x)) + deg(mH11(x)). case, we can treat deg(0) = 0 in this case. For line 5, the (137) objective is to prove From the assumptions deg(mM01(x)) ≤ deg(mM11(x)) and deg(mH10(x)) ≤ deg(mH11(x)), (133) can be verified. deg(mH11(x)) ≤ deg(a(x)) − deg(zM0(x)), (127) The verification of (132) is considered as follows. From and (111), (134) can be reformed as deg(mHi0(x)) ≤ deg(mHi1(x)), (128) deg(mM11(x)) deg(mH0i(x)) ≤ deg(mH1i(x)), i = 0, 1. ≤ deg(zM1M(x)sg−2(x)) − deg(yM0(x)sg−2(x)) (138) By assumptions, line 3 of the algorithm gives = deg(zM1(x)) − deg(zR0(x)). deg(mH11(x)) ≤ deg(aH(x)) − deg(zH0(x)), (129) (138) is summed by (131), resulting in (139), and thus (132) and is verified.

deg(mHi0(x)) ≤ deg(mHi1(x)), (130) J. Proof of Lemma 10 deg(mH0i(x)) ≤ deg(mH1i(x)), i = 0, 1. (130) verifies that (128) is true. Further, (101)-(104) show that Proof. The proof follows mathematical induction. We pick (129) can be reformed as T = n/2 as the base case. Then from (63), u = (u0, ω0) has n/2 ω0s in the high degree part. From (66), v = (v0, v1) deg(mH11(x)) is divided into two equal sub-vectors. Then (65) can be written ≤ deg(aH(x)sg−1(x)) − deg(zH0(x)sg−1(x)) (131) as (u , ω ) = IFFT¯ ((v , v ), m, ω ). = deg(a(x)) − deg(zM0(x)). 0 0 X 0 1 0

15 (0) In Algorithm 2, Line 3 computes D = IFFT¯ (v , m − X 0 (1) 1, ω ), and Line 4 computes D = IFFT¯ (v , m−1, ω ). 0 X 1 n/2 The vector ω0 is calculated by line 6, and u0 is computed by line 7. As line 6 only requires pointwise additions, which can be written as a vector addition: ω = D(0) + D(1) 0 (140) =IFFT¯ (v , m − 1, ω ) + IFFT¯ (v , m − 1, ω ). X 0 0 X 1 n/2 Assume (67) holds at T = S = 2s, and thus

ω0 =IFFT¯ (v0, s, ω0) + IFFT¯ (v1, s, ωS) + ... X X (141) + IFFT¯ (v , s, ω ). X n/S−1 n−S When T = S/2 = 2s−1, (141) becomes

(un/(S/2)−2, ω0) =IFFT¯ (v , s, ω ) + IFFT¯ (v , s, ω ) + ... (142) X 0 0 X 1 S + IFFT¯ (v .s, ω ). X n/S−1 n−S

We can extract the computations regarding ω0 in (142). Similarly, this decomposes each s-point IFFT into two (s/2)- point IFFTs, resulting in

ω =IFFT¯ (v , s − 1, ω ) + IFFT¯ (v , s − 1, ω ) + ... 0 X 0 0 X 1 S/2 + IFFT¯ (v , s − 1, ω ), X n/S−1 n−S/2 (143) This completes the proof.

deg(mM11(x)) + deg(mH11(x)) ≤ deg(zM1(x)) − deg(zR0(x)) + deg(a(x)) − deg(zM0(x))

⇒ deg(mM11(x)) + deg(mH11(x)) + deg(zM0(x)) − deg(zM1(x)) ≤ deg(a(x)) − deg(zR0(x)) (139) ⇒ deg(mM11(x)) + deg(mH11(x)) + deg(qM(x)) ≤ deg(a(x)) − deg(zR0(x)) (By (105))

⇒ deg(mR11(x)) ≤ deg(a(x)) − deg(zR0(x)) (By (137)),

16