Block Codes 15853:Algorithms in the Real World message (m) Each message and codeword is of fixed size Error Correcting Codes I coder ∑∑∑ = codeword alphabet k =|m| n = |c| q = | ∑| – Overview, Hamming Codes, Linear Codes codeword (c) Error Correcting Codes II C ⊆ Σn (codewords) noisy – ReedSolomon Codes channel – Concatenated Codes codeword’ (c’) (x,y) = number of positions Error Correcting Codes III (LDPC/Expander Codes) s.t. xi ≠ yi Error Correcting Codes IV (Decoding RS, Number thy) decoder d = min{ (x,y) : x,y ∈ C, x ≠ y}

message or error Code described as: (n,k,d)q

15-853 Page1 15-853 Page2

Linear Codes Generator and Parity Check Matrices

n If ∑ is a field, then ∑ is a vector space Generator : ∑n Definition : C is a if it is a linear subspace of A k x n matrix G such that: C = { xG | x ∈ ∑k } of dimension k. Made from stacking the spanning vectors

This means that there is a set of k independent vectors n Parity Check Matrix : vi ∈ ∑ (1 ≤ i ≤ k) that span the subspace. n T i.e. every codeword can be written as: An (n – k) x n matrix H such that: C = {y ∈ ∑ | Hy = 0} (Codewords are the null space of H.) c = a 1 v1 + a 2 v2 + … + a k vk where a i ∈ ∑

“Linear”: the sum of two codewords is a codeword. These always exist for linear codes Minimum distance = weight of leastweight codeword

15-853 Page3 15-853 Page4

1 k mesg Relationship of G and H

n For linear codes, if G is in standard form [I k A] n T mesg then H = [ A Ink] G = codeword Example of (7,4,3) :

nk recv’d word nk 1 0 0 0 1 1 1   1 1 1 0 1 0 0 = 0 1 0 0 1 1 0   H syndrome G =   H = 1 1 0 1 0 1 0 0 0 1 0 1 0 1     1 0 1 1 0 0 1 0 0 0 1 0 1 1 if syndrome = 0, received word = codeword else have to use syndrome to get back codeword 15-853 Page5 15-853 Page6

Two Codes

Hamming codes are binary (2 r1–1, 2 r1r, 3) codes. Basically (n, n – log n, 3) ReedSolomon Codes Hadamard codes are binary (2 r1, r, 2 r1). Basically (n, log n, n/2)

The first has great rate, small distance. The second has poor rate, great distance. Can we get (n) rate, (n) distance?

Irving S. Reed and Gustave Solomon Yes. One way is to use a random linear code. Let’s see some direct, intuitive ways.

15-853 Page7 15-853 Page8

2 ReedSolomon Codes in the Real World

2 (204,188,17) 256 : ITU J.83(A) (128,122,7) 256 : ITU J.83(B) PDF417 (255,223,33) 256 : Common in Practice QR code – Note that they are all byte based (i.e., symbols are from GF(2 8)). Decoding rate on 1.8GHz Pentium 4: – (255,251) = 89Mbps – (255,223) = 18Mbps Aztec code Dozens of companies sell hardware cores that operate 10x DataMatrix code faster (or more) All 2dimensional ReedSolomon bar codes – (204,188) = 320Mbps (Altera decoder)

15-853 Page9 15-853 Page10 images: wikipedia

Applications of ReedSolomon Codes Viewing Messages as Polynomials

• Storage : CDs, DVDs, “hard drives”, A (n, k, nk+1) code: • Wireless : Cell phones, wireless links Consider the polynomial of degree k1 k1 Finite field with • Sateline and Space : TV, Mars rover, Voyager, p(x) = a k1 x + L + a 1 x + a 0 qr elements, q is a • Digital Television : DVD, MPEG2 layover Message : (a k1, …, a 1, a 0) prime, r integer Codeword : (p(1), p(2), …, p(n)) • High Speed Modems : ADSL, DSL, .. r To keep the p(i) fixed size, we use ai ∈ GF(q ) To make the i distinct, n ≤ qr Good at handling burst errors. Other codes are better for random errors. For simplicity, imagine that n = q r. So we have a – e.g., Gallager codes, Turbo codes (n, k, nk+1) n code. Alphebet size increasing with the codeword length. A little awkward. (But we can still do things.)

15-853 Page11 15-853 Page12

3 Which field to use PolynomialBased Code

A (n, k, 2s +1) code: k 2s

n Can detect 2s errors Can correct s errors Generally can correct α erasures and β errors if α + 2 β ≤ 2s

15-853 Page13 15-853 Page14

Polynomials and their degrees Polynomials and their degrees

15-853 Page15 15-853 Page16

4 Why is the distance nk+1? Correcting Errors

Direct Proof Correcting s errors : 1. RS is a linear code: indeed, if we add two codewords 1. Find k+s symbols that agree on a degree (k1) poly p(x). corresponding to P(x) and Q(x), we get a codeword These must exist since originally k + 2s symbols agreed corresponding to the polynomial P(x) + Q(x). and only s are in error 2. There are no k+s symbols that agree on the wrong 2. So look at the least weight codeword. It is the evaluation degree (k1) polynomial p’(x) of a polynomial of degree k1 at some n points. So it can Any subset of k symbols will define p’(x) be zero on only k1 points. Hence nonzero on (n(k1)) Since at most s out of the k+s symbols are in error, points. p’(x) = p(x) Correct s errors, so distance ≥ 2s+1 = nk+1 3. This means distance at least d = nk+1 This suggests a bruteforce approach, very inefficient. Better algos exist (talk on the board). 15-853 Page17 15-853 Page18

The BerlekampWelch Decoder RS and “burst” errors

Let’s compare to Hamming Codes (which are also optimal).

An efficient way to decode ReedSolomon codes. code bits check bits

RS (255, 253, 3) 256 2040 16 11 11 We may do this in coding lecture #4 this time. Hamming (2 -1, 2 -11-1, 3) 2 2047 11 They can both correct 1 error, but not 2 random errors. – The Hamming code does this with fewer check bits However, RS can fix 8 contiguous bit errors in one byte – Much better than lower bound for 8 arbitrary errors  n n log1+   + +   > 8log(n − )7 ≈ 88 check bits    L    1 8

15-853 Page19 15-853 Page20

5 Concatenated Codes

Take a RS code (n,k,nk+1) q = n code.

David Forney Can encode each alphabet symbol using another code.

One way to achieve linear rate and linear distance CONCATENATED CODES

15-853 Page21 15-853 Page22 Wikipedia

Concatenated Codes Concatenated Codes

Take a RS code (n,k,nk+1) q = n code. Can encode each alphabet symbol of k’ = log q = log n bits using another code.

E.g., use ((k’ + log k’), k’, 3) 2Hamming code. Now we can correct one error per alphabet symbol with little rate loss. (Good for sparse periodic errors.)

k’ k’1 Or (2 , k’, 2 )2 Hadamard code. (Say k = n/2.) 2 2 Then get (n , (n/2) log n, n /4) 2 code. Much better than plain Hadamard code in rate, distance worse only by factor of 2. 15-853 Page23 15-853 Page24

6 Concatenated Codes

Take a RS code (n,k,nk+1) q = n code. Can encode each alphabet symbol of k’ = log q = log n bits using another code.

Or, since k’ is O(log n), could choose a code that requires exhaustive search but is good. δδδ δδδ Random linear codes give ((1+f( ))k’, k’, k’)) 2 codes. Composing with RS (with k = n/2), we get δδδ δδδ ( (1+f( ))n log n, (n/2) log n, (n/2)log n ) 2

Gives constant rate and constant distance and binary

alphabet ! And polytime encoding15-853 and decoding. Page25

7