Decoding Concatenated using Soft Information

Venkatesan Guruswami∗ Madhu Sudan† University of California at Berkeley MIT Laboratory for Computer Science Computer Science Division 200 Technology Square Berkeley, CA 94720. Cambridge, MA 02139. [email protected] [email protected]

Abstract good codes and efficient encoding/decoding, and this trans- lates into similar claims for the concatenated . We present a decoding algorithm for concatenated codes Since its discovery by Forney [For66a], code concatena- when the outer code is a Reed-Solomon code and the in- tion has become one of the most widely used tools in all of ner code is arbitrary. “Soft” information on the reliability . Forney’s original motivation was to prove of various symbols is passed by the inner decodings and that one could approach Shannon capacity with polynomial exploited in the Reed-Solomon decoding. This is the first decoding complexity, and he did so by using suitable con- analysis of such a soft algorithm that works for arbitrary in- catenated codes with an outer Reed-Solomon code. One ner codes; prior analyses could only handle some special of the most fundamental problems in coding theory is the inner codes. Crucial to our analysis is a combinatorial re- construction of asymptotically good code families whose sult on the coset weight distribution of codes given only its rate and relative distance are both positive, and until the minimum distance. Our result enables us to decode essen- recent work of Sipser and Spielman [SS96], concatenated tially up to the “Johnson radius” of a concatenated code codes comprised the only known explicit (or even poly- when the outer distance is large (the Johnson radius is the nomial time) constructions of asymptotically good codes. “a priori list decoding radius” of a code as a function of its The explicit binary code constructions with the current best distance). As a consequence, we are able to present simple known trade-offs between rate and minimum distance are and efficient constructions of q-ary linear codes that are list still based on concatenation. decodable up to a fraction (1 − 1/q − ε) of errors and have In light of the pervasiveness of concatenated codes, the 6 rate Ω(ε ). Codes that can correct such a large fraction problem of decoding them efficiently is an important one of errors have found numerous complexity-theoretic appli- which has received considerable attention; a thorough dis- cations. The previous constructions of linear codes with a cussion of some of the basic results in this area can be found similar rate used algebraic-geometric codes and thus suf- in [Dum98]. A concatenated code whose outer and inner fered from a complicated construction and slow decoding. codes have distance at least D and d respectively, has min- imum distance at least Dd. Forney [For66b] presented an elegant and general algorithm based on Generalized Mini- 1. Introduction mum Distance (GMD) decoding to correct such a code from fewer than Dd/2 errors (this bound is called the product Concatenation of codes presents a simple, yet powerful bound). The GMD algorithm is efficient as long as the inner tool to construct good codes over small (eg., binary) alpha- code has polynomially many codewords and the outer code bets by combining an “outer” code over a large alphabet, has an efficient errors-and-erasures decoding algorithm. often an algebraic code with nice properties, with a good Recently, with the spurt of activity on the subject of list “inner” code over a small alphabet. The basic idea behind decoding, there is a renewed interest in decoding impor- code concatenation is to first encode the message using the tant families of codes beyond the half-the-distance bound. outer code and then encode each of the symbols of the outer Specifically the goal is to find a list of all codewords that codeword further using the inner code. The size of the in- differ from the received word in at most e places for some ner code is small enough to permit efficient constructions of bound e which is much bigger than d/2 (but is still small enough so that the list that the decoding algorithm has to ∗Supported by a Miller Postdoctoral Fellowship. †Supported by MIT-NTT Award MIT2001-04, and NSF CCR-9875511 output is guaranteed to be small). Following the powerful and NSF CCR-9912342. list decoding algorithms for Reed-Solomon and algebraic- geometric codes [Sud97, GS99, SW99], list decoding algo- weights and analyzing the performance of the overall algo- rithms with good error-correction performance have been rithm. Our algorithm follows the spirit of these algorithms proposed for certain concatenated codes with outer Reed- in [GS00, KV01], and each inner decoding returns a list of Solomon or AG-codes [GS00, STV01, Nie00, GHSZ00, symbols each with its associated weight. These weights are KV01]. then passed to the outer decoder which, as is by now stan- In this paper, we present a list decoding algorithm for dard, is the weighted (or, soft) Reed-Solomon list decoding concatenated codes when the outer code is a Reed-Solomon algorithm due to the authors [GS99]. The novelty in our code and the inner code is arbitrary. Quantitatively, ours approach is that we are able to present a suitable setting of is the first algorithm which decodes up to essentially the weights in the inner decoding and quantitatively analyse the Johnson radius of a concatenated code when the inner code performance of the entire algorithm for every choice of the is arbitrary and the outer code is a Reed-Solomon code with inner code, using only the distance of the Reed-Solomon large distance. (The Johnson radius is the “a priori list de- and inner codes in the analysis. Along the way, we also coding potential” of a code and is the largest general bound prove a combinatorial property concerning the coset weight on number of errors for which list decoding with “small” distribution of a code given only information about its min- lists is possible.) To explain what is new in our result imum distance. compared to the several previous algorithms and analyses, We now discuss a concrete goal which motivated much we next elaborate on the basic operational structure of the of the prior work (eg. those in [STV01, GS00, GHSZ00]) known list decoding algorithms for concatenated codes. on decoding concatenated codes, and where our result

Let n0 be the block length of the outer code. The re- yields a quantitative improvement over the previous results. ceived word of a concatenated code is broken up into n0 The goal is to construct simple and explicit fam- blocks which correspond to the inner encodings of the n0 ilies over small finite fields Fq that can be efficiently list de- outer codeword positions. Each of these n0 blocks is de- coded from up to a fraction (1 − 1/q − ε) of errors (think of coded by an inner decoder to produce some information ε as an arbitrarily small positive constant). This represents about the i’th symbol of the outer codeword, for every i, the “high-noise” regime and is information-theoretically the 1 ≤ i ≤ n0. This information from the inner decodings largest fraction of errors one can hope to decode from (since is then passed to the outer decoder which uses it to list de- a random received word will differ from each codeword code the outer code and produce a final list of answers. This in an expected fraction (1 − 1/q) of positions). Having is the general scheme which all known list decoding algo- fixed the desired error-resilience of the code, the goal is to rithms for concatenated codes, including ours in this paper, achieve good rate together with fast construction and decod- follow. The main difference between the various algorithms ing times. is in the exact inner decoding used and the nature of infor- One of the main motivations for the study of such codes mation passed to the outer decoding stage. which can correct a fraction (1 − 1/q − ε) of errors comes Our goal is to correct as many errors as possible by a from complexity theory where they have applications in careful choice of the nature of information passed by the hardness amplification (and as a consequence in derandom- inner decoder. In most previously known decoding algo- ization) [STV01], constructions of generic hardcore predi- rithms, the inner decodings have either passed a single sym- cates from one-way permutations (see [Sud00]), construc- bol corresponding to each position of the outer code, pos- tions of pseudorandom generators [SU01, Uma02], con- sibly together with a “weight” which is a quantitative esti- structions of efficiently decodable extractor codes and hard- mate of the decoder’s confidence on that symbol being the core functions that output several bits [TZ01], hardness of correct one (eg. in [For66b, Nie00]), or they have passed a approximation results [MU01], etc. (see [Sud00] or [Gur01, list of potential symbols (without any confidence informa- Chap. 12] for a discussion of these applications). Though tion to rank them) for each position (eg. in [STV01, GS00]). qualitatively the codes necessary for all of these applica- The exceptions to this are the algorithms in [GS00] for the tions are now known, these diverse complexity-theoretic case when the inner code is an , where every applications nevertheless motivate the pursuit for construc- possible symbol is returned by the inner code along with an tions of such codes with better quantitative parameters and associated weight. Such an algorithm is referred as decod- in particular constructions that approach the optimal rate of 2 ing with soft information in coding theory parlance, since Ω(ε ) (which is achieved by random codes and exponential the inner decoder qualifies its vote with a “soft” reliabil- time decoding). ity information. More recently, Koetter and Vardy [KV01] We first review the known results concerning this prob- gave such an algorithm when the inner code is the ex- lem that appear in [GS00, GHSZ00]. For the binary tended binary Golay code. In both these cases, the spe- case, the construction in [GHSZ00] gives the currently best cial nature of the inner code (specifically, its so-called coset known rate of Ω(ε4); their construction, however, is not ex- weight distribution) was crucially exploited in setting the plicit and takes deterministic nO(1/ε) time. It also applies only to binary linear codes. An earlier result by [GS00] code C, is defined to the minimum Hamming distance be- achieves a rate of Ω(ε6) and works over larger alphabets as tween a pair of distinct codewords of C.A q-ary linear code well, but the outer code has to be an algebraic-geometric of block length n, dimension k and minimum distance d is (AG) code which makes the construction complicated and usually denoted as an [n, k, d]q code. The relative distance the decoding slow and caveat-filled. The previous best con- of such a code is defined to be the normalized quantity d/n. struction of reasonable complexity that worked for all al- For r ∈ [q]n and 0 ≤ e ≤ n, the Hamming ball of radius 8 n phabet sizes had a rate of Ω(ε ) (the construction was a e around r is defined by Bq(r, e) = {x ∈ [q] : ∆(r, x) ≤ Reed-Solomon code concatenated with a large distance in- e}. For a pair of real vectors v and w, we denote by hv, wi ner code). Though the deterministic construction time is their usual dot product over the reals. O(1/ε) high (namely, n ), these codes have a fast probabilis- Given an [N,K,D]qm linear code C1 and an [n, m, d]q tic construction (either a logarithmic time Monte Carlo con- linear code C2, their concatenation C˜ = C1 ⊕ C2 is a struction that will have the claimed list decodability prop- code which first encodes the message according to C1 and erty with high probability, or a near-linear time Las Vegas then encodes each of symbols of the codeword of C1 fur- construction that is guaranteed to have the claimed list de- ther using C2 (since each symbol is an element of Fqm and codability property). If this is not good enough, and one C2 is a q-ary code of dimension m, this encoding is well- needs completely explicit constructions,1 then this can be defined). The concatenated code C˜ is an [Nn, Kk, ≥ Dd]q 10 achieved by lowering the rate to Ω(ε ). linear code. In particular, its distance is at least the product As a corollary of our decoding algorithm for concate- of the outer and inner distances (Dd is called the designed nated codes with arbitrary inner codes, we get a fast prob- distance of the concatenated code). The codes C1 and C2 6 abilistic construction that achieves a rate of Ω(ε ). This as above are respectively referred to as the outer and inner matches the best known bound that works for all alphabet codes of the concatenation. sizes, and further, unlike the previous AG-codes based con- A Reed-Solomon code is a linear code whose messages struction, has a very fast probabilistic construction and a are degree k polynomials over a finite field Fq, and a mes- near-quadratic time list decoding algorithm. The construc- sage is encoded by its evaluations at n distinct elements of tion can be made completely explicit by lowering the rate the field. This gives an [n, k + 1, n − k] linear code. Note 8 q to Ω(ε ), and once again this is the best known for com- that the definition of the code requires that q ≥ n, thus pletely explicit codes (which do not even involve search for these are codes over a large alphabet. By concatenating constant-sized objects). these codes with, say a binary code of dimension log2 q, Organization of the paper: We begin in Section 2 with we can get binary codes. some basic definitions. Section 3 states and proves the combinatorial result that will be used in decoding the inner 3. A combinatorial result codes in our algorithms. In Section 4 we use the combinato- rial result to design an algorithm for decoding concatenated codes that makes good use of soft information. In Section 5, In this section we prove a combinatorial result that will we focus on the “high-noise” situation when we wish to cor- be used in the analysis of the error-correction performance rect a large (namely, (1 − 1/q − ε)) fraction of errors, and of our decoding algorithm for concatenated codes. To mo- state and prove what our algorithm implies in this situation. tivate the exact statement of the combinatorial result, we Finally, we conclude by mentioning a few open questions in jump ahead to give a hint of how the inner codes will be Section 6. decoded in our algorithms. When presented with a received word r, the inner decoder will simply search for and output 2. Preliminaries and Notation all codewords which lie in a Hamming ball of a certain ra- dius R around r. The weight associated with a codeword A linear code C over a q-ary alphabet is simply a sub- c at a distance ec = ∆(r, c) ≤ R from r will be set to be n (R − ec). (This is a fairly natural choice for the weights, space of Fq (Fq being the finite field with q elements) for some n; we call n the block length of the code. We since the larger the distance between r and c, intuitively the identify the elements of a q-ary alphabet with the integers less likely it is that r will be received when c is transmit- 1, 2, . . . , q in some canonical way. Let [q] = {1, 2, . . . , q}. ted.) These weights, for each of the outer codeword posi- For x, y ∈ [q]n, the Hamming distance between x and y, tions, will be passed to a soft decoding algorithm for Reed- denoted ∆(x, y), is the number of positions where x and Solomon codes which will then use the weights to complete y differ. The minimum distance, or simply distance of a the list decoding. We now state and prove a combinatorial result that gives an upper bound on the sum of squares of 1While we do not try to define the term “explicit” formally, a construc- the weights (R − ec). tion would be considered explicit if it can be specified by a mathematical formula of an acceptable form. n n  1 Proposition 1 Let C ⊆ [q] be a q-ary code (not neces- hv, vi = + α2 1 − n (4) sarily linear), and let d be the minimum distance of C, and q q n δ = d/n its relative distance. Let r ∈ [q] be arbitrary, and hci, cji = n − ∆(ci, cj) ≤ n − d . (5) let s Using (3), (4) and (5), we get for i 6= j  1 δ  R = n 1 − 1 − 1 − (1) q (1 − 1/q) hdi, dji = hci − v, cj − vi  1 ≤ αe + αe − d + 1 − (1 − α)2n (this quantity is the so-called q-ary Johnson radius of the i j q code, the maximum bound up to which we are guaranteed  1 ≤ 2αR − d + 1 − (1 − α)2n (6) to have a “small” number of codewords within distance R q of r). Then we have Hence we have hdi, dji ≤ 0 as long as X  2 max{(R − ∆(r, c)), 0 } ≤ δn2 (2)  αn (1 − 1/q)n − d R ≤ (1 − 1/q)n − (1 − 1/q) + . c∈C 2 2α

q d/n q δ Proof: The proof follows essentially the same approach Picking α = 1 − (1−1/q) = 1 − (1−1/q) maximizes as in the proof of the “Johnson bound” (see, for example, the “radius” R for which our bound will apply. Hence we [Gur01, Chap. 3]) which gives an upper bound on the the pick number of codewords within a distance R from r. We now  δ 1/2 α = 1 − . (7) require an upper bound on the sum of squares of linear func- (1 − 1/q) tions of the distance over all such codewords. and q We identify elements of [q] with vectors in R by re- s i 1 ≤ i ≤ q  1 δ  1 placing the symbol ( ) by the unit vector of R = n 1− 1− 1 − = n1− (1−α) . length q with a 1 in position i. We then associate ele- q (1 − 1/q) q ments in [q]n with vectors in Rnq by writing down the vec- (8) tors for each of the n symbols in sequence. This allows For this choice of α, R, we have hdi, dji ≤ 0 for every us to embed the codewords of C as well as the received 1 ≤ i < j ≤ M. Now a simple geometric fact, proved in nq word r into R . Let c1, c2,..., cM be all the codewords Lemma 2 at the end of this proof, implies that for any vector nq that satisfy ∆(r, ci) ≤ R, where R is a parameter that x ∈ R that satisfies hx, dii ≥ 0 for i = 1, 2,...,M, we will be set shortly (it will end up being set as in Equation have M 2 (1)). By abuse of notation, let us denote by ci also the nq- X hx, dii dimensional real vector associated with the codeword c , for ≤ hx, xi . (9) i hdi, dii 1 ≤ i ≤ M (using the above mentioned identification), and i=1 n nq We will apply this to the choice x = r. Straightforward by r the vector corresponding to r ∈ [q] . Let 1 ∈ R (1−α) computations show that be the all 1’s vector. Now define v = αr + q 1 for a parameter 0 ≤ α ≤ 1 to be specified later in the proof. hr, ri = n (10) The idea behind the rest of the proof is the following. hdi, dii = hci − v, ci − vi We will pick α so that the nq-dimensional vectors di = 2 1 (c −v), for 1 ≤ i ≤ M, have all pairwise dot products less = 2αei + (1 − α) (1 − )n (11) i q than 0. Geometrically speaking, we shift the origin by v to  1 a new point relative to which the vectors corresponding to hr, dii = (1 − α) 1 − n − ei = R − ei . (12) the codewords have pairwise angles which are greater than q 90 degrees. We will then exploit the geometric fact that for Since each ei ≤ R, we have hr, dii ≥ 0 for each i, 1 ≤ such vectors di, for any vector w, the sum of the squares i ≤ M, and therefore we can apply Equation (9) above. For of its projections along the di’s is at most hw, wi (this is 1 ≤ i ≤ M, define proved in Lemma 2). This will then give us the required hr, dii R − ei bound (2). Wi = p = p (13) hdi, dii 2αei + (1 − α)R For 1 ≤ i ≤ M, let ei = ∆(r, ci) be the Hamming distance between ci and r. Note by the way we associate (the second step follows using (8), (11) and (12)). Since n vectors with elements of [q] , we have hci, ri = n − ei. each ei ≤ R, we have Now, straightforward calculations yield: R − ei R − ei R − ei Wi = ≥ = √ , n p2αe + (1 − α)R p(1 + α)R δn hc , vi = α(n − e ) + (1 − α) (3) i i i q (14) m where the last equality follows by substituting the values of at most k0 = (1 − ∆)n0 over GF(q ), and a polyno-

α and R from (7) and (8). Now combining (10), (11) and mial p is encoded as hCin(p(x1)),...,Cin(p(xn0 )i, where m (12), and applying Equation (9) to the choice x = r, we get x1, x2, . . . , xn0 are distinct elements of GF(q ) that are PM 2 i=1 Wi ≤ n. Together with Equation (14), this gives used to define the Reed-Solomon encoding. We now present and analyze the algorithm that decodes M the concatenated code up to the claimed fraction (17) of er- X 2 2 (R − ∆(r, ci)) ≤ δn . (15) n rors. Let y ∈ Fq be a received word. For 1 ≤ i ≤ n0, i=1 denote by yi the portion of y in block i of the codeword This clearly implies the bound (2) claimed in the statement (namely, the portion corresponding to the encoding by Cin of the ith symbol of the outer Reed-Solomon code). of the proposition, since the codewords ci, 1 ≤ i ≤ M, include all codewords c that satisfy ∆(r, c) ≤ R, and the We now perform the “decoding” of each of the n0 blocks remaining codewords contribute zeroes to the left hand side yi as follows. Let of Equation (2). 2 s  1 qδ  R = n 1 − 1 − 1 − (18) The geometric fact that was used in the above proof is stated 1 q q − 1 below. A proof may be found in Appendix A. be the “Johnson radius” of the inner code Cin. For 1 ≤ i ≤ m Lemma 2 Let v1, v2,..., vM be distinct unit vectors in n0 and α ∈ GF(q ), compute the Hamming distance ei,α N R such that hvi, vji ≤ 0 for 1 ≤ i < j ≤ M. Fur- between yi and the codeword Cin(α), and then compute the N ther, suppose x ∈ R is a vector such that hx, vii ≥ 0 for weight wi,α as: each i, 1 ≤ i ≤ M. Then def wi,α = max{(R − ei,α), 0} . (19) m X 2 hx, v i ≤ hx, xi . (16) Note the computation of all these weights can be done by a i m i=1 straightforward brute-force computation in O(n0n1q ) = O(1) O(n1n0 ) = poly(n) time. Thus all the inner decodings 4. The formal decoding algorithm and its anal- can be performed efficiently in polynomial time. ysis By Proposition 1 applied to the yi’s, for 1 ≤ i ≤ n0, we know that the above weights have the crucial combinatorial property We are now ready to state and prove our main result X w2 ≤ δn2 , (20) about decoding concatenated codes with a general inner i,α 1 α code. for i = 1, 2, . . . , n0. We will now run the soft decoding Theorem 3 Consider a family of linear q-ary concatenated algorithm for Reed-Solomon codes from [GS99] for this codes where the outer codes belong to a family of Reed- choice of weights. In the form necessary to us, which is Solomon codes of relative distance ∆ over a field of size at stated for example in [Gur01, Chap. 6], this result states most polynomial in the block length, and the inner codes be- the following for decoding a Reed-Solomon over GF(Q) of long to any family of q-ary linear codes of relative distance block length n0: Let  > 0 be an arbitrary constant. For δ. Then, there is a polynomial time decoding procedure to each i ∈ [n] and α ∈ GF(Q), let wi,α be a non-negative ra- list decode codes from such a family up to a fraction tional number. Then, there exists a deterministic algorithm with run-time poly(n, Q, 1/) that, when given as input the  s  weights wi,α for i ∈ [n] and α ∈ GF(Q), finds a list of all  1 qδ p 1 − 1 − 1 − − δ(1 − ∆) (17) polynomials p ∈ GF(Q)[x] of degree at most k that satisfy q q − 1 v n u n X u X X 2 of errors. wi,p(xi) ≥ k wi,α +  max wi,α . (21) t i,α i=1 i=1 α∈Fq Proof: (Sketch) Consider a concatenated code C with outer Now, combining the above result with Equation (20) and the code a Reed-Solomon code over GF(qm) of block length definition of the weights from Equation (19), we conclude n0, relative distance ∆ and dimension (1 − ∆)n0 + 1. m O(1) that we can find in time polynomial in n and 1/, a list of all We assume q ≤ n , so that the field over which the m 0 polynomials p over GF(q ) of degree at most k0 for which Reed-Solomon code is defined is of size polynomial in the the condition block length. Let the inner code Cin be any q-ary linear n0 q code of dimension m, block length n1 and relative distance X 2 (R − ei,p(xi)) ≥ k0n0δn1 + n1 (22) δ. Messages of C correspond to polynomials of degree i=1 holds. Recalling the definition of R (Equation (18)) and (The only change required in the proof is to replace the using k0 = (1 − ∆)n0, we conclude that we can find a list lower bound on Wi from Equation (14) with the alterna- q of all codewords that are at a Hamming distance of at most tive lower bound W ≥ 1 − ei  n(q−1) , which follows i R˜ q  s  easily from the definition of W in Equation (13).)  1 qδ p i n 1 − 1 − 1 − − n δ(1 − ∆) − n1 , Now, replacing the choice of weights in Equation (19) in q q − 1 the proof of Theorem 3 by from y. Picking  < 1/n1, we get decoding up to the def n ei,α  o wi,α = max 1 − , 0 , claimed fraction of errors. 2 R˜ 4.1. Comment on the error-correction per- and then using (24), we obtain a decoding algorithm to de- formance of Theorem 3 code up to a fraction s !2 s ! The bound of (17) is attractive only for very large values  1 qδ 1 − ∆ 1− 1 − 1 − 1 − (25) of ∆, or in other words when the rate of the outer Reed- q q − 1 (1 − 1/q) Solomon code is rather small. For example, for the binary case q = 2, even for ∆ = 3/4, the bound does not even of errors. This bound is positive whenever ∆ > 1/q, and achieve the product bound (namely, ∆δ/2), for any value of in general appears incomparable to that of (17). However, δ in the range 0 < δ < 1/2 (in fact, the bound as stated in note that even for ∆ very close to 1, the bound (25) does (17) is negative unless ∆ is quite large). However, the merit not approach the Johnson bound, except for δ very close to of the bound is that as ∆ gets very close to 1, the bound (17) (1 − 1/q). But as with the bound (17), for ∆ → 1 and q qδ δ → (1 − 1/q), the above tends to a fraction (1 − 1/q) of approaches the quantity (1−1/q)(1− 1 − q−1 ), and since the relative designed distance of the concatenated code is errors. In particular, it can also be used, instead of (17), to ∆ · δ → δ, it approaches the so-called Johnson bound on obtain the results outlined in the next section for highly list list decoding radius. This bound states that for a q-ary code decodable codes. of relative distance γ and block length n, every Hamming ball of radius at most 5. Consequence for highly list decodable codes  r γ  e (γ, q, n) def= n(1 − 1/q) 1 − 1 − (23) J 1 − 1/q We now apply Theorem 3 with a suitable choice of pa- rameters to obtain an alternative construction of codes list has at most a polynomial number of codewords (in fact, at decodable up to a fraction (1 − 1/q − ε) of errors and most O(nq)) codewords; cf. [Gur01, Chap. 3]). Therefore, which have rate Ω(ε6). Compared to the construction of for ∆ → 1, the result of Theorem 3 performs very well [GS00] that was based on a concatenation of AG-codes with and decodes almost up to the Johnson bound, and hence Hadamard codes, the rate is slightly worse — namely by beyond the product bound, for almost the entire range of the a factor of O(log(1/ε)). But the following construction inner code distances 0 < δ < (1 − 1/q). In particular, for offers several advantages compared to the AG+Hadamard ∆ → 1 and δ → (1 − 1/q), the bound tends to (1 − 1/q), based construction in [GS00]. Firstly, it is based on outer permitting us to list decode up to close to the maximum Reed-Solomon codes, and hence does not suffer from the possible fraction (1 − 1/q) of errors (this will be expanded high construction and decoding complexity of AG-codes. upon in Section 5). In particular, the claim of polynomial time decoding is un- conditional and does not depend on having access to pre- 4.2. Alternative decoding bound computed advice information about the outer code, which By slightly modifying the analysis used in proving the is necessary for decoding AG-codes, see [GS01]. Secondly, combinatorial bound of Proposition 1, one can prove the the inner code can be any linear code of large minimum dis- following alternative bound instead of (2). tance, and not necessarily the Hadamard code. In fact, pick- ing a random code as inner code will give a highly efficient 2 X n ∆(r, c) o q probabilistic construction of the code that has the desired max 1 − , 0 ≤ , (24) R˜ q − 1 list decodability properties with high probability. c∈C For binary linear codes, a construction of codes list de- where we use the same notation as in the statement of codable from a fraction (1/2 − ε) of errors and having rate Proposition 1 and R˜ is defined as Ω(ε4) is known [GHSZ00]. Even with this substantially s !2 better rate, the result of [GHSZ00] does not strictly sub- qδ  1 R˜ def= 1 − 1 − 1 − n . sume the result proved in this section. This is for two rea- q − 1 q sons. Firstly, the stated result from [GHSZ00] applies only to binary linear codes, where as the result below applies rate and distance property can be found in O(n log2 n/ε4) to linear codes over any finite field Fq. Secondly, while time with high probability. Allowing for a small probabil- the deterministic construction complexity of both the con- ity for error, a Monte Carlo construction can be obtained in structions in this section and the one with rate Ω(ε4) are O(log2 n/ε4) probabilistic time by picking a random linear almost similar (both of them being fairly high), the codes of code as inner code (the claimed distance and list decodabil- this section have very efficient probabilistic constructions, ity properties (ii), (iii) will then hold with high probability). where as we do not know a faster probabilistic construction As the outer Reed-Solomon code is explicitly specified, this for the rate Ω(ε4) codes of [GHSZ00]. implies that the description of the concatenated code can be found within the same time bound. Theorem 4 For every fixed prime power q, the following A naive derandomization of the above procedure will re- holds: For every small enough ε > 0, there exists a family quire time which is quasi-polynomial in n. But the con- of linear codes over Fq with the following properties: struction time can be made polynomial by reducing the (i) A description of a code of block length, say n, in the size of the sample space from which the inner code is 4 family can be constructed deterministically in nO(1/ε ) picked. For this, we note that, for every prime power q, time. For probabilistic constructions, a Las Vegas there is a small sample space of q-ary linear codes of any construction can be obtained in time which with high desired rate, called a “Wozencraft ensemble” in the liter- probability will be O(n log2 n/ε4), or a Monte Carlo ature, with the properties that: (a) a random code can be construction that has the claimed properties with high drawn from this family using a linear (in the block length) 4 probability can be obtained in O(log n/ε ) time. number of random elements from Fq, and (b) such a code (ii) Its rate is Ω(ε6) and its relative minimum distance is will meet the Gilbert-Varshamov bound with high prob- (1 − 1/q − O(ε2)). ability. We record this fact as Proposition 6 at the end of this section. Applying Proposition 6 for the choice of (iii) There is a polynomial time list decoding algorithm for −4 parameters b = O(ε ), k = O(logq n), and using the every code in the family to perform list decoding up to −1 2 fact that for small γ, Hq (1 − O(γ )) is approximately a fraction (1 − 1/q − ε) of errors. (1 − 1/q − O(γ)), we obtain a sample space of linear codes 4 4 of size qO(logq n/ε ) = nO(1/ε ) which includes a code of Proof: We will use Theorem 3 with the choice of parame- 4 2 ters ∆ = 1 − O(ε2) and δ = 1 − 1/q − O(ε2). Substituting rate Ω(ε ) and relative distance (1−1/q −O(ε )). One can in the bound (17), the fraction of errors corrected by the simply perform a brute-force search for the desired code in such a sample space. Thus one can find an inner code of rate decoding algorithm from Section 4 will be (1 − 1/q − ε), 4 2 which handles Property (iii) claimed above. Also, the rel- Ω(ε ) and relative distance (1 − 1/q − O(ε )) determinis- O(1/ε4) ative distance of the code is at least ∆ · δ, and is thus tically in n time. Moreover, picking a random code (1−1/q−O(ε2)), verifying the distance claim in (ii) above. from this sample space, which works just as well as picking 4 The outer Reed-Solomon code has rate 1 − ∆ = Ω(ε2). a general random linear code, takes only O(log n/ε ) time. For the inner code, if we pick a random linear code, then it This reduces the probabilistic construction times claimed earlier by a factor of log n. Hence a description of the over- will meet the Gilbert-Varshamov bound (R = 1 − Hq(δ)) with high probability (cf. [vL99, Chapter 5]). Therefore, a all concatenated code can be obtained within the claimed random inner code of rate Ω(ε4) will have relative distance time bounds. This completes the verification of Property (i) δ = 1 − 1/q − O(ε2), exactly as we desire. The overall as well. 2 rate of the concatenated code is just the product of the rates of the Reed-Solomon code and the inner code, and is thus Obtaining an explicit construction: The high determin- Ω(ε2 · ε4) = Ω(ε6), proving Property (ii). istic construction complexity or the probabilistic nature of We now turn to Property (i) about the complexity of con- construction in Theorem 4 can be removed at the expense structing the code. We may pick the outer Reed-Solomon of a slight worsening of the rate of the code. One can pick explicitly specified q code over a field of size at most O(n). Hence, the inner for inner code an -ary code of relative distance (1 − 1/q − O(ε2)) and rate Ω(ε6). A fairly sim- code has at most O(n) codewords and thus dimension at + most O(log n). The inner code can be specified by its ple explicit construction of such codes is known [ABN 92] q explicit construction O(log n)×O(log n/ε4) generator matrix G. To construct (see also [She93]). This will give an of q q Ω(ε8) an inner code that has relative distance (1 − 1/q − O(ε2)), the overall concatenated code with rate . We record we can pick such a generator matrix G at random, and then this below. check, by a brute-force search over the at most O(n) code- words, that the code has the desired distance. Since the Theorem 5 For every fixed prime power q, the following distance property holds with high probability, we conclude holds: For every ε > 0, there exists a family of explicitly that the generator matrix an inner code with the required specified linear codes over Fq with the following properties: (i) Its rate is Ω(ε8) and its relative minimum distance is settings (including when the outer code is a Reed-Solomon (1 − 1/q − O(ε2)). code over a polynomially large field). While an algorithm which decodes any concatenated (ii) There is a polynomial time list decoding algorithm for code with outer Reed-Solomon code up to the Johnson every code in the family to perform list decoding up to bound will be a remarkable achievement, it will still not a fraction (1 − 1/q − ε) of errors. imply a construction with rate better than Ω(ε6) for codes decodable up to a fraction (1−1/q−ε) of errors. Therefore, A small space of linear codes meeting the Gilbert- obtaining a rate better than Ω(ε6) together with fast con- Varshamov bound structions is another challenge in this area. One promising We now turn to the result about a small space of linear route to achieving this would be to prove that inner codes codes meeting the Gilbert-Varshamov bound. Such an en- with the property required in the construction of [GHSZ00] semble of codes is referred to as a “Wozencraft ensemble” (which achieves a rate of Ω(ε4) for the binary case) exist in in the literature. Recall that we made use of such a result in “abundance” for every alphabet size. This will imply that a the proof of Theorem 4. The proof of the following result is random code picked from an appropriate ensemble can be implicit in [Wel73]. used as the inner code, thereby yielding a fast probabilistic construction with the same properties as in [GHSZ00]. Proposition 6 For every prime power q, and every integer b ≥ 1, the following holds. For all large enough k, there def References exists a sample space, denoted Sq(b, n) where n = (b + 1)k, consisting of [n, k] linear codes of rate 1/(b+1) such q + that: [ABN 92] Noga Alon, Jehoshua Bruck, Joseph Naor, Moni Naor, and Ronny Roth. Construction of asymp- (i) There are at most qbn/(b+1) codes in S (b, n). In par- q totically good low-rate error-correcting codes through ticular, one can pick a code at random from S (b, n) q pseudo-random graphs. IEEE Transactions on Infor- using at most O(n log q) random bits. mation Theory, 38:509–516, 1992. (ii) A random code drawn from Sq(b, n) meets the Gilbert- Varshamov bound, i.e. has minimum distance n · [Dum98] Ilya I. Dumer. Concatenated codes and their mul- −1 b  tilevel generalizations. In V. S. Pless and W. C. Huff- Hq b+1 − o(1) , with overwhelming (i.e. 1 − o(1)) probability. man, editors, Handbook of Coding Theory, volume 2, pages 1911–1988. North Holland, 1998. 6. Concluding Remarks [For66a] G. David Forney. Concatenated Codes. MIT Press, Cambridge, MA, 1966. The Johnson bound on list decoding radius (defined in Equation (23)) gives a lower bound on the number of errors [For66b] G. David Forney. Generalized Minimum Dis- one can hope to correct from with small lists, in any q-ary tance decoding. IEEE Transactions on Information code of a certain relative distance. Accordingly, for a q-ary Theory, 12:125–131, 1966. concatenated code whose outer (say, Reed-Solomon) and inner codes have relative distance at least ∆ and δ respec- [Gur01] Venkatesan Guruswami. List Decoding of Error- tively, a natural target would be to decode up to a fraction Correcting Codes. Ph.D thesis, Massachusetts Insti- q ∆δ  tute of Technology, August 2001. (1 − 1/q) 1 − 1 − 1−1/q of errors. In this paper, we presented an algorithm whose decoding performance ap- [GHSZ00] Venkatesan Guruswami, Johan Hastad,˚ Madhu proached this bound for the case when the outer distance Sudan, and David Zuckerman. Combinatorial bounds ∆ was very close to 1. This is ideally suited for the situ- for list decoding. Proceedings of the 38th Annual ation when we wish to tolerate a very large fraction of er- Allerton Conference on Communication, Control and rors, as in such a case one has to use an outer codes with Computing, pages 603–612, October 2000. large relative distance. However, it is an extremely interest- [GS99] Venkatesan Guruswami and Madhu Sudan. Im- ing question to decode up to the Johnson bound for every proved decoding of Reed-Solomon and algebraic- choice of outer and inner distances ∆, δ. A good first tar- geometric codes. IEEE Transactions on Information get, one that already appears quite challenging, would be to Theory, 45:1757–1767, 1999. decode beyond the “product bound”, namely beyond a frac- tion ∆δ/2 of errors, for every choice of ∆, δ. Decoding up [GS00] Venkatesan Guruswami and Madhu Sudan. List to the product bound will result in a unique codeword and decoding algorithms for certain concatenated codes. can be accomplished in polynomial time using Generalized Proceedings of the 32nd Annual ACM Symposium on Minimum Distance (GMD) decoding for most interesting Theory of Computing, pages 181–190, 2000. [GS01] Venkatesan Guruswami and Madhu Sudan. On [vL99] J. H. van Lint. Introduction to Coding Theory. representations of algebraic-geometric codes. IEEE Graduate Texts in Mathematics 86, (Third Edition) Transactions on Information Theory, 47(4):1610– Springer-Verlag, Berlin, 1999. 1613, May 2001. [Wel73] Edward J. Weldon, Jr. Justesen’s construction — [KV01] Ralf Koetter and Alexander Vardy. Decoding the low-rate case. IEEE Transactions on Information of Reed-Solomon codes for additive cost functions. Theory, 19:711–713, 1973. Manuscript, October 2001.

[MU01] Elchanan Mossel and Chris Umans. On the com- A. Proof of a geometric fact plexity of approximating the VC dimension. Proceed- ings of the 16th IEEE Conference on Computational Proof of Lemma 2: Note that if hvi, vji = 0 for every Complexity, pages 220-225, 2001. i 6= j, then the vi’s form a linearly independent set of pair- wise orthogonal unit vectors. They may thus be extended to [Nie00] Rasmus R. Nielsen. Decoding concatenated codes an orthonormal basis. The bound (16) then holds since the using Sudan’s algorithm. Manuscript submitted for sum of squares of projection of a vector on vectors in an or- publication, May 2000. thonormal basis equals the square of its norm, and hence the sum of squares when restricted to the v ’s cannot be larger [SU01] Ronen Shaltiel and Christopher Umans. Simple i than hx, xi. We need to show this holds even if the v ’s are extractors for all min-entropies and a new pseudo- i more than 90 degrees apart. random generator. Proceedings of the 42nd An- Firstly, we can assume hx, v i > 0 for i = 1, 2,...,M. nual Symposium on Foundations of Computer Science, i This is because if hx, v i = 0, then it does not contribute pages 648-657, October 2001. i to the left hand side of Equation (16) and may therefore be [She93] Ba-Zhong Shen. A Justesen construction of bi- discarded. In particular, this implies that we may assume nary concatenated codes that asymptotically meet the (vi 6= −vj) for any 1 ≤ i, j ≤ M. Since the vi’s are Zyablov bound for low rate. IEEE Transactions on distinct unit vectors, this means that |hvi, vji| < 1 for all Information Theory, 39:239–242, 1993. i 6= j. We will prove the claimed bound (16) by induction on [SW99] M. Amin Shokrollahi and Hal Wasserman. List M. When M = 1 the result is obvious. For M > 1, we decoding of algebraic-geometric codes. IEEE Trans- will project the vectors v1,..., vM−1, and also x, onto the actions on Information Theory, 45(2):432–437, 1999. space orthogonal to vM. We will then apply the induction hypothesis to the projected vectors and conclude our final [SS96] Michael Sipser and Daniel Spielman. Expander bound using the analog of (16) for the set of projected vec- codes. IEEE Transactions on Information Theory, tors. The formal details follow. 42(6):1710–1722, 1996. 0 For 1 ≤ i ≤ M − 1, define vi = vi − hvi, vMivM. 0 Since vi is different from vM and −vM, each vi is a non- [Sud97] Madhu Sudan. Decoding of Reed-Solomon codes 0 zero vector. Let ui be the unit vector associated with vi. beyond the error-correction bound. Journal of Com- 0 plexity, 13(1):180–193, 1997. Let us also define x = x − hx, vMivM. We wish to apply the induction hypothesis to the vectors u1,..., uM−1 and [Sud00] Madhu Sudan. List decoding: Algorithms and ap- x0. 0 0 plications. SIGACT News, 31:16–27, 2000. Now, for 1 ≤ i < j ≤ M − 1, we have hvi, vji = hvi, vji − hvi, vMihvj, vMi ≤ hvi, vji ≤ 0, since all pair- [STV01] Madhu Sudan, Luca Trevisan, and Salil Vadhan. wise dot products between the vi’s are non-positive. Hence Pseudorandom generators without the XOR lemma. the pairwise dot products hui, uji, 1 ≤ i < j ≤ M − 1, Journal of Computer and System Sciences, 62(2): 236- are all non-positive. To apply the induction hypothesis we 0 266, March 2001. should also verify that hx , uii > 0 for i = 1, 2,..., (M − 1). It will be enough to verify that hx0, v0i > 0 for each i. [TZ01] Amnon Ta-Shma and David Zuckerman. Extractor i But this is easy to check since Codes. Proceedings of the 33rd Annual ACM Sympo- sium on Theory of Computing, pages 193–199, July 0 0 hx , v i = hx, vii − hx, vMi · hvi, vMi 2001. i ≥ hx, vii (26) [Uma02] Chris Umans. Pseudo-random generators for all > 0 hardnesses. Proceedings of the 34th ACM Symposium on Theory of Computing, to appear, May 2002. where (26) follows since hx, vMi > 0 and hvi, vMi ≤ 0. 0 0 We can therefore apply the induction hypothesis to the Also, by (26) hx , vii ≥ hx, vii, and therefore 0 (M − 1) unit vectors u1, u2,..., uM−1 and the vector x . 0 This gives hx, vii ≤ hx , uii , (28) M−1 X 0 2 0 0 hx , uii ≤ hx , x i . (27) for i = 1, 2,..., (M − 1). Also, we have i=1 hx0, x0i = hx, xi − hx, v i2 . 0 2 0 0 2 2 M (29) Now, kvik = hvi, vii = hvi, vii − hvi, vMi ≤ kvik = 2 0 0 0 1 = kuik . This implies that hx , vii ≤ hx , uii, for 1 ≤ The claimed result now follows by using (28) and (29) to- i ≤ M − 1. gether with the inequality (27). 2