3. Coding Theory 3.1. Basic Concepts in This Chapter We Will Discuss Brieﬂy Some Aspects of Error Correcting Codes

3. Coding theory 3.1. Basic concepts In this chapter we will discuss briefly some aspects of error correcting codes. The main problem is that if information is sent via a noisy channel, then the received message contains errors which need to be detected and corrected. Example The lectures are given by a non native speaker of English. Any mispronunciation due to his accent will be corrected by your brain to the correct English word. The idea is to assign at each symbol a codeword. If there are not to many errors introduced during the transmission then the received word might not be a codeword but near to a unique codeword which, hopefully, was the original one. Example One way of decoding YES and NO is by YES=1111111 and NO=0000000. The sequence 0010010 is not a codeword and is likely to mean NO (there were two errors) rather then YES (there were five errors). Definition A q-ary code C is a given set of sequences of symbols where each symbol is chosen from a set F of q elements. The set F is called the alphabet. If q = 2, the code will be called binary. We limit ourself to the situation where F is a field of order q. n n IFq will denote the set of all ordered n-tuples a = a1a2 . . . an with ai ∈ IFq. The elements of IFq n n are called vectors and n is called the length of a. Observe that the set IFq has q elements. n A q-ary code C of length n is a subset of IFq . Definition We say that a code C of length n is s error detecting if changing up to s digits to a codeword does not produce a codeword. We say that a code C of length n is t error correcting if from a given string of length n which differs on at most t places from some codeword one can deduce the codeword. Definition n The Hamming distance between two vectors x and y of IFq is the number of places in which they differ. It is denoted by d(x, y). Example 5 4 In IF2 we have d(00111, 11001) = 4. In IF3 we have d(0122, 1220) = 3.

Lemma 3.1.1 The Hamming distance is a distance function, that is it satisﬁes the three conditions

i.) d(x, y) = 0 if and only if x = y.

n ii.) d(x, y) = d(y, x), for all x, y ∈ IFq .

n iii.) d(x, y) ≤ d(x, z) + d(z, y), for all x, y, z ∈ IFq .

Proof. The first two conditions are easy to verify. The third, known as the triangle inequality, is n verified as follows. Note that if u, v ∈ IFq , then d(u, v) is the minimum number of changes of digits required to change u into v. But we can change x into y by first changing x into z and then z into y. Hence d(x, y) ≤ d(x, z) + d(z, y), 2

The problem of decoding we consider is as follows. Suppose a codeword x, unknown to us, has been transmitted to us and we have received the vector y. This vector y may have been distorted by 3. CODING THEORY 2 noise. It seem reasonable to decode y as that code word z such that d(z, y) is as small as possible. This is called nearest decoding. Deﬁnition Given a code C the minimum distance, denoted d(C), is the smallest distances between distinct code words. That is, d(C) = min{d(x, y) | x, y ∈ C, x 6= y}.

Lemma 3.1.2 Let C be a code.

(i) One can detect up to s errors in any codeword if d(C) ≥ s + 1.

(ii) One can correct up to t errors in any codeword if d(C) ≥ 2t + 1.

Proof. Suppose d(C) ≥ s + 1. Suppose a codeword x is transmitted and s or fewer errors are introduced. Then the received vector cannot be a diﬀerent codeword and so the errors can be detected. Suppose d(C) ≥ 2t + 1. Suppose a codeword x is transmitted and the vector y is received in which t or fewer errors have occurred, so d(x, y) ≤ t. Suppose z is a codeword with d(z, y) ≤ t. Then d(x, z) ≤ d(x, y) + d(y, z) ≤ 2t, hence z = x. So x is the nearest codeword to y. 2

Corollary 3.1.3 Let C be a code with minimum distance d. Then up to d−1 errors can be detected and up to b(d − 1)/2c errors can be corrected.

Proof. We have d ≥ s + 1 if and only if s ≥ d − 1, and d ≥ 2t + 1 if and only if t ≤ (d − 1)/2. 2

Example Let YES=1111 and NO=0000, then one error can be corrected but the two errors in 1001 can’t be corrected, it is unclear if the correct word is YES or NO. Definition A(n, M, d)-code is a code of length n, containing M codewords and having minimum distance d. A good (n, M, d)-code has small n (for fast transmission) , large M (for a large variety of messages) and large d (to correct many errors). These are conflicting aims. One of the main problems is to optimize one parameter given the other two. The usual version of the problem is to find the largest code M, given the the length n and minimum distance d. Note that M ≤ qn. Example The q-ary repetition code of length n is defined as follows C = {a1a2 . . . an | a1 = ... = an, a1 ∈ IFq}. It has length n, q codewords and minimal distance n. Note that any q-ary code with minimum distance n can have at most q codewords as any two codewords must differ in all positions. Definition Two q-ary codes are called equivalent if one can be obtained from the other by a combination of operations of the following type

(A) a permutation of the position of the code;

(B) a permutation of the symbols appearing in a ﬁxed position.

Example The code C = {00100, 00011, 11111, 11000} is equivalent to D = {00000, 01101, 10110, 11011}. In- deed ﬁrst permute 0 ↔ 1 in the third position and then interchange positions 2 and 4. 3. CODING THEORY 3

If a code is displayed as an M × n-matrix whose rows are the code words, then the operations of type (A) correspond to a permutation of the columns of the matrix. The operations of type (B) correspond to re-labeling the symbols in a given column. Observe that the operations do no change the hamming-distance between two codewords. Hence does not change the minimum distance. Example    0 1 2  0 0 0 The codes 1 2 0 and 1 1 1 are equivalent, ﬁrst apply 0 7→ 2 7→ 1 7→ 0 to the second  2 0 1  2 2 2 column and then 0 7→ 1 7→ 2 7→ 0 to the third

Lemma 3.1.4 Any q-ary (n, M, d)-code over an alphabet containing 0 is equivalent to a q-ary (n, M, d)-code containing 00 ··· 0.

Proof. Choose a codeword a1a2 ··· an and sequentially apply the operations of type (B) of the form 0 ↔ aj if aj 6= 0 to the code. 2

We will denote with 0 the all zero codeword. From now on we will assume that 0 ∈ C. Deﬁnition The weight of codeword x, denoted by w(x), is the number of non zero entries. That is, w(x) = d(x, 0).

Lemma 3.1.5 Suppose that C is a binary code of length n. Let x, y ∈ C with w(x) and w(y) even, then d(x, y) is even too.

Proof. Let x = x1x2 ··· xn and y = y1y2 ··· yn. Let A = {i | xi = 1, 1 ≤ i ≤ n} and B = {i | yi = 1, 1 ≤ i ≤ n}. Then |A| = w(x), |B| = w(y) and d(x, y) = |(A ∪ B)\A ∩ B|. Hence d(x, y) = w(x) + w(y) − 2|A ∩ B|, which is even. 2

Theorem 3.1.6 Suppose d is odd. A binary (n, M, d)-code exists if and only if a binary (n + 1, M, d + 1)- code exists.

Proof. Suppose C a binary (n + 1, M, d + 1)-code, with d odd. Choose two codewords x and y at minimal distance, that is d(x, y) = d + 1. Choose a position in which x and y differ and delete this position from all codewords. The codewords obtained from x and y are now at distance d and any two codewords will differ in at least d positions. Hence the resulting code is a (n, M, d)-code. Suppose D is a binary (n, M, d)-code, with d odd. For a codeword x ∈ D, with x = x1x2 ··· xn we define ˆx = x1x2 ··· xnxn+1, where xn+1 ≡ w(x) (mod 2). Let Dˆ be the code of length n + 1 defined as follows Dˆ = {ˆx | x ∈ D}. Clearly d ≤ d(Dˆ) ≤ d + 1. Observe that w(ˆx) is even for all ˆx ∈ Dˆ. By the previous lemma the distance between two codewords will be even. Hence d(Dˆ) will be even too. Hence Dˆ is a binary (n + 1, M, d + 1)-code. 2

Example Consider the Fano plane, that is the geometry with 7 points and 7 lines such that any two points are on a unique line and any two lines meet in a unique point. Let A be the incidence matrix of the Fano plane (rows indexed by the points, columns by the lines) and B the matrix obtained from A by swapping 0 ↔ 1. Let C be the code consisting of the rows of A, rows of B, the all 0 vector and the all 1 vector. The codewords are 3. CODING THEORY 4

   1 0 0 0 1 0 1  0 1 1 1 0 1 0    1 1 0 0 0 1 0  0 0 1 1 1 0 1  0 1 1 0 0 0 1  1 0 0 1 1 1 0 0 0 0 0 0 0 0   rows of A 1 0 1 1 0 0 0 and rows of B 0 1 0 0 1 1 1 1 1 1 1 1 1 1    0 1 0 1 1 0 0  1 0 1 0 0 1 1    0 0 1 0 1 1 0  1 1 0 1 0 0 1  0 0 0 1 0 1 1  1 1 1 0 1 0 0

This code has minimum distance 3. Hence is a binary (7, 16, 3)-code. The all zero vector and the first row of A are at minimal distance and differ in the first entry. Deleting the first entry of all vectors gives a binary (6, 16, 2)-code.

3.2. Linear Codes

In this section V (n, q) denotes the vector space of dimension n over the ﬁeld IFq, where vectors will be written as row-vectors. Thus

V (n, q) = {(x1, x2, . . . , xn) | x1, x2, . . . xn ∈ IFq}

A vector x = (x1, x2, . . . , xn) will often simply written as x1x2 ··· xn. Deﬁnition A linear code C is a subspace of V (n, q). That is 1. (0,..., 0) ∈ C; 2. If x, y ∈ C, then x + y ∈ C;

3. If x ∈ C and a ∈ IFq, then ax ∈ C. If C has dimension k, then the linear code C is called a [n, k]-code. If it has minimum distance d it is also called a [n, k, d]-code. So a q-ary [n, k, d]-code is a q-ary (n, qk, d)-code. If C is a linear code, then the weight of C, denoted by w(C), is the smallest of the weights of a non-zero code words. That is,

w(C) = min {w(x) | x ∈ C, x 6= 0}.

Lemma 3.2.1 Let x, y ∈ V (n, q). Then d(x, y) = w(x − y).

Proof. The vector x − y has a non zero entry precisely on those places where x and y diﬀer. 2

Lemma 3.2.2 Let C be a linear code. Then d(C) = w(C).

Proof. Let x and y be two codewords with d(x, y) = d(C). Then d(C) = d(x, y) = w(x−y) ≥ w(C), since x − y ∈ C. On the other hand, for some codeword z we have w(C) = w(z) = d(z, 0) ≥ d(C), since 0 ∈ C. Hence d(C) = w(C) 2

Deﬁnition A k × n matrix whose rows form a basis of a linear [n, k]-code is called a generator matrix for the code. Example 3. CODING THEORY 5

The code constructed from the Fano plane is a [7, 4]-code with generator matrix  1 1 1 1 1 1 1   1 0 0 0 1 0 1  G =   .  1 1 0 0 0 1 0  0 1 1 0 0 0 1

The notion of equivalence between linear codes is slightly different from the one defined before. The second operation on the codewords is more restrictive. Definition Two linear q-ary codes are called equivalent if the one can be obtained from the other by combining operations of the following types (A.) permutations of the positions of the code;

(B.) multiplication of the symbols appearing in a ﬁxed position by a non-zero element of IFq.

Theorem 3.2.3 Two k × n matrices generate equivalent linear [n, k]-codes over IFq if one matrix can be obtained from the other by a sequence of operations of the following types: (R1) Permutation of rows. (R2) Multiplication of a row by a non-zero scalar. (R3) Addition of a scalar multiple of one row to another row. (C1) Permutation of columns. (C2) Multiplication of a column by a non-zero scalar. Proof. The row operations (R1), (R2) and (R3) preserve the subspace C and simply replace one basis of C by another. The operations (C1) and (C2) convert the generator matrix to one of an equivalent code. 2

Theorem 3.2.4 Let G be a generator matrix of an [n, k]-code. Then by performing operations of the type (R1), (R2), (R3), (C1) and (C2), G can be transformed into standard type

[Ik | A] where Ik is the identity matrix and A a k × (n − k) matrix. Proof. Note that n ≥ k. Using row operations (R1), (R2) and (R3) we can transform G into reduced echelon form and then use (C1) to move the columns containing the pivots to the left. 2

On the vector space V (n, q) we deﬁne an inner-product. Let x, y ∈ V (n, q) with x = x1x2 ··· xn and y = y1y2 ··· yn, then the inner-product of x and y is deﬁned by

x · y = x1y1 + x2y2 + ··· + xnyn. If x · y = 0, then the vectors are called orthogonal. For a matrix A, the transposed will be denoted by AT . Deﬁnition Let C be a [n, k]-code. The dual code C⊥is deﬁned as the set of vectors of V (n, q) that are orthogonal to every codeword of C, that is C⊥ = {x ∈ V (n, q) | x · y = 0 for all y ∈ C}. 3. CODING THEORY 6

Lemma 3.2.5 Suppose C is a [n, k]-code with generator matrix G. Let x ∈ V (n, q). Then x ∈ C⊥ if and only if xGT = 0. That is, if and only if x is orthogonal to all the rows of G.

Proof. Note that the rows of G generate the code. So a vector is orthogonal to all codewords in C if and only if it is orthogonal to each row of G. 2

Theorem 3.2.6 Suppose C is a [n, k]-code. Then C⊥ is a linear [n, n − k]-code and (C⊥)⊥ = C.

Proof. This can be shown using arguments of linear algebra. 2

Deﬁnition Let C be an [n, k]-code. A parity-check matrix H for C is a generator matrix of C⊥. Thus H is T an (n − k) × n matrix satisfying GH = Ok,n−k, where Ok,n−k is the all 0 matrix. A parity-check matrix H is said to be in standard form if H = [B | In−k]. Since (C⊥)⊥ = C it follows that the code C can also be described via a parity-check matrix, as C = {x ∈ V (n, q) | xHT = 0}.

Theorem 3.2.7 Let C be an [n, k]-code with generator matrix G = [Ik | A] in standard form. Then T a parity check-matrix for C is H = [−A | In−k].

T T Proof. Indeed the matrix H has the right size and [Ik | A]([−A | In−k]) = 0. 2

3.2.1. Coding and syndrome decoding of linear codes

Let C be an [n, k]-code over IFq with generator matrix G. We identify messages with vectors of V (k, q). Definition Suppose H is a parity check matrix for an [n, k]-code C. Then for any vector y ∈ V (n, q) the row vector S(y) = yHT , which is of length n − k, is called the syndrome of y. Observe that S(y) = 0 if and only if y ∈ C. Definition Let z ∈ V (n, k), then the C-coset of z is defined as z + C = {z + x | x ∈ C}.

Lemma 3.2.8 Suppose C is an [n, k]-code. Then

(i) every vector of V (n, q) is in some C-coset.

(ii) every coset contains exactly qk vectors.

(iii) two cosets are either disjoint or coincide.

Proof. Straight forward. 2

The lemma tells us that there are qn−k diﬀerent C-cosets in V (n, k). In the following lemma we will see that the syndrome is a way to decide quickly if two vectors give rise to the same coset.

Lemma 3.2.9 Let x, y ∈ V (n, k). Then x + C = y + C if and only if S(x) = S(y). 3. CODING THEORY 7

Proof. We have x+C = y +C if and only if x−y ∈ C. But x−y ∈ C if and only if (x−y)HT = 0. T T Which holds if and only if xH = yH . That is, if and only if S(x) = S(y). 2

Definition Let x + C be a coset. Among the vectors in x + C we choose a vector z such that the weight is minimal. If the choice of a vector of minimal weight in the coset is not unique we choose z to be one of them. We have w(z) ≤ w(y) for all y ∈ x + C. The vector z is called the coset leader of the coset z + C. Observe that if w(C) = d, then any non-zero vector of C has at least d non-zero entries. So if y d d is a vector with w(y) < 2 , then any vector of y + C different from y has at least 2 non zero entries. hence y is a coset leader of y + C. n−k Given a [n, k]-code we first produce a table of coset leaders 0, a1, a2,..., as, where s = q − 1, together with their syndromes S(0),S(a1),S(a2),...,S(as). This table is called the syndrome look- up table. Encoding A message x = x1x2 ··· xk is encoded by multiplying x on the right with G, that is, the encoded message is xG. Note that this is a vector of length n and is a linear combination of the rows of G, hence a code word. To obtain the message x from a given codeword y, one needs to solve the system of linear equations xG = y.

In case G is in standard form the encoding gets even simpler. Since G is of the form [Ik | A] and P x = x1x2 · xk, we have xG = y1y2 ··· ykyk+1 ··· yn, where yi = xi for 1 ≤ i ≤ k and yk+j = aijxj for 1 ≤ j ≤ n − k. The digits y1y2 ··· yk are called the message digits and the digits yk+1 ··· yn are the check digits. The check digits are added to the message to give protection against noise. Note that yk+1 ··· yn = xA. Syndrome decoding The decoding procedure is as follows:

1. For a received vector y we calculate the syndrome S(y) = yHT .

2. In the syndrome look-up table locate S(y) and the corresponding coset leader z.

3. Correct y as y − z.

4. y − z is a code word, compute the corresponding message from it.

Suppose a codeword x is sent through a noisy channel and the vector received is y. We define the error vector e to be e = y − x. Now we have y = x + e. The decoding problem now becomes given y to find e and subtract it from y to obtain x. Note that e ∈ y + C, for x is a codeword. Hence e ∈ z + C as y + C = z + C. If the error was small then the weight w(e) is small and so it is likely w(C) that e = z and thus y − z = x. In particular, if w(e) < 2 , then e is the vector of smallest weight in e + C, thus e is the coset leader of the coset of syndrome S(e) = S(y). Hence e = z, and so the w(C)−1 procedure corrects up to b 2 c errors. If G is in standard form, then the message is the first k digits of y − z.

3.2.2. Binary Hamming codes In this subsection we briefly discuss binary Hamming codes. It should be noted that these codes also exists over any finite field but we will only study the case where the field is IF2, the field with two elements. 3. CODING THEORY 8

Deﬁnition Let r be a positive integer, and H be the r ×(2r −1) matrix whose columns are the distinct non-zero vectors of V (r, 2). The code having H as parity-check matrix is called the binary Hamming code and is denoted by Ham(r, 2). Since the columns can be taken in any order, the code Ham(r, 2) is any of a number of equivalent codes. Example Let r = 2, then if we choose choose H and G in standard form:

1 1 0 H = and G = 1 1 1 . 1 0 1

Hence Ham(2, 2) is just the binary triple repetition code.

Theorem 3.2.10 The binary Hamming code Ham(r, 2), with r ≥ 2, is a [2r − 1, 2r − 1 − r]-code and has minimum distance 3, hence is single-error correcting.

Proof. By deﬁnition the dual code Ham(r, 2)⊥ is a [2r − 1, r]-code, hence Ham(2, r) is a [2r − 1, 2r − 1 − r]-code. Since Ham(2, r) is a linear code the minimum weight is equal to the minimum distance. We show that there are no codewords of weight 1 or 2, but there are codewords of weight 3. Suppose that x is a codeword of weight 1. Then x = 0 ··· 010 ··· 0, with an 1 in the i-th place. Since xHT = 0, we have that the i-th column of H is the all-zero vector, a contradiction. Suppose that x is a codeword of weight 2. Then x = 0 ··· 010 ··· 010 ··· 0, with an 1 in the i-th and j-th place. Since xHT = 0, we have that the sum of the i-th and j-the column of H is the all-zero vector. Since these are vectors of V (r, 2) it means that i-th and j-the column of H are the same, a contradiction. Remains to show that there are codewords of weight 3. We can suppose that the ﬁrst 3 rows of T H are 0 ··· 001, 0 ··· 010 and 0 ··· 011. Showing that 1110 ··· 0 is a codeword. 2

The code has k = 2r − r − 1, and n = 2r − 1, so there are 2n−k = 2r cosets. The vectors of weight at most one of V (n, 2) are coset leaders, since there are n + 1 = 2r of them they are all the T coset leaders. The syndrome of the vector ej = 0 ··· 010 ··· 0, with a 1 on place j is 0 ··· 010 ··· 0H , which is the transpose of the j-th column of H. If the columns are arranged in order of increasing binary numbers (that is the j-the column of H is the binary representation of j), then the syndrome of the coset leader ej is the binary representation of j. The decoding algorithm becomes very nice: 1. For a received vector y we calculate the syndrome S(y) = yHT . 2. If S(y) = 0, then y is a codeword and no error was made. 3. If S(y) 6= 0, then S(y) gives the binary representation of the position in which the error was made, and so the error can be corrected.

Example Consider Ham(3, 2) with parity-check matrix

 0 0 0 1 1 1 1  H =  0 1 1 0 0 1 1  , 1 0 1 0 1 0 1

If y = 11010011, then S(y) = 110, which is 6. The error is in place 6 and we correct y to the codeword 1101001. 3. CODING THEORY 9

3.3. Cyclic codes

Deﬁnition A code C is called cyclic if 1.) C is a linear code;

2.) any cyclic shift of a codeword is also a codeword, that is, if a0a1 ··· an−1 is a codeword, then so is an−1a0a1 ··· an−2.

Example The code constructed from the Fano plane is cyclic. When considering cyclic codes we number the coordinate positions 0, 1, . . . , n − 1. This because we will identify the codewords with polynomials. n n Let F = IFq and f(x) = X − 1 ∈ F [X]. We will consider the ring Rn = F [X]/(f), a ring of q n elements. In Rn we have that X = 1. A vector a0a1 ··· an−1 ∈ V (n, q) will be identiﬁed with the n−1 polynomial a0 + a1X + ··· + an−1X . Adding vectors in V (n, q) now corresponds to adding the corresponding polynomials in Rn. The cyclic shift a0a1 ··· an−1 7→ an−1a0a1 ··· an−2 corresponds to multiplying the polynomial with X.

Lemma 3.3.1 A code C in Rn is a cyclic code if and only if C satisﬁes the following two conditions (i) if a(x), b(x) ∈ C, then a(x) + b(x) ∈ C.

(ii) if a(x) ∈ C and r(x) ∈ Rn, then r(x)a(x) ∈ C.

Proof. Suppose C is a cyclic code in Rn. Then C is linear so (i) holds. Let a(x) ∈ C. Since the multiplication with X corresponds to a cyclic shift the multiplication with Xi corresponds to a cyclic shift in i positions and since C is cyclic we have that Xia(x) ∈ C. Since C is linear we i n−1 also have λX a(x) ∈ C, for all λ ∈ IFq. If r(x) = r0 + r1X + ··· + rn−1X , then r(x)a(x) = n−1 r0a(x) + r1Xa(x) + ··· + rn−1X a(x). Since each summand is in C we have that (ii) holds. Suppose (i) and (ii) hold. Taking r(x) a scalar implies that the code is linear. Taking r(x) = X shows that C is cyclic. 2

There is an easy way to describe and construct cyclic codes. Let f(x) ∈ Rn, we deﬁne

hf(x)i = {r(x)f(x) | r(x) ∈ Rn}

the set of all multiples of f(x) in Rn.

Theorem 3.3.2 For any f(x) ∈ Rn, the set hf(x)i is a cyclic code: it is called the cyclic code generated by f(x).

Proof. This follows immediately from the previous lemma. 2

The following theorem shows that all cyclic codes can be constructed that way.

Theorem 3.3.3 Let C be a non-zero cyclic code in Rn. Then (i) there exists a unique monic polynomial g(x) of smallest degree in C; (ii) hg(x)i = C (iii) in F [X] the polynomial g(x) is a divisor of Xn − 1 3. CODING THEORY 10

Proof. (i) Suppose g(x) and h(x) are both monic polynomials in C of smallest degree. Then g(x)−h(x) ∈ C. If g(x) 6= h(x), then the degree of g(x) − h(x) is smaller then that of g(x) and a suitable multiple of g(x) − h(x) is monic and will be in C, a contradiction. Thus g(x) = h(x). (ii) Suppose a(x) ∈ C. View a(x) as a polynomial in F [X]. Then a(x) = q(x)g(x) + r(x), with deg(r(x)) < deg(g(x)) or r = 0. Since C is cyclic we have r(x) ∈ C, and by the minimality of the degree of g we have r = 0. (iii) By the division algorithm Xn − 1 = q(x)g(x) + r(x), with deg(r(x)) < deg(g(x)) or r = 0. In Rn we have r(x) = −q(x)g(x), hence r(x) ∈ C. By the minimality of the degree of g(x) we have r(x) = 0.

Example The code constructed form the Fano plane is cyclic. The polynomial of minimal degree is 1+X2+X3, corresponding to the fourth row of A. It is the generating polynomial. The rows of A are obtained by multiplying with monomials. The all one vector is obtained by multiplying the polynomial by itself and the rows of B are obtained by multiplying with X + 1 to obtain the last row from B and then with monomials to obtain the other rows of B. Example 3 2 In IF2[X], we have X − 1 = (X − 1)(X + X + 1). Thus there are 4 cyclic codes in V (3, 2):

Generator Corresponding polynomial Code in R3 code in V (3, 2) 1 all of R3 all of V (3, 2) X + 1 {0, 1 + X,X + X2, 1 + X2}{000, 110, 011, 101} X2 + X + 1 {0, 1 + X + X2}{000, 111} X3 − 1 {0}{000}

Deﬁnition The monic polynomial of least degree is called the generator polynomial of C.

Theorem 3.3.4 Suppose C is a cyclic code with generator polynomial

r g(x) = g0 + g1X + ··· + grX of degree r. Then g0 6= 0, dim(C) = n − r and a generator matrix for C is   g0 g1 g2 ··· gr 0 0 ··· 0  0 g0 g1 g2 ··· gr 0 ··· 0     0 0 g0 g1 g2 ··· gr ··· 0    .  . . .   . . .  0 0 ... 0 g0 g1 g2 ··· gr

n−1 −1 Proof. Suppose g0 = 0, then X g(x) = X g(x) is a codeword of degree r − 1, contradicting the minimality of the degree of g(x). The n − r rows of the matrix are linear independent, as the matrix is in echelon form, and are code words. It suﬃces to show that they generate the code. Let a(x) be a codeword, we may assume that deg(a(x)) < n. Since a(x) = g(x)q(x), for some q(x) ∈ Rn and deg(a(x)) < n, this equality also folds in IFq[X]. It follows that deg(q(x)) < n − r. 3. CODING THEORY 11

n−r−1 n−r−1 Let q(x) = q0 +q1X +···+qn−r−1X , then a(x) = q0g(x)+q1Xg(x)+···+qn−r−1X g(x), which gives the desired linear combination. 2

Deﬁnition Let C be cyclic [n, k]-code with generator polynomial g(x), then xn − 1 = g(x)h(x) for some monic polynomial h(x). Since g(x) has degree n − k, h(x) has degree k. This polynomial h(x) is called the check polynomial of C.

Theorem 3.3.5 Suppose C is a cyclic code in Rn with generator polynomial g(x) and check polynomial h(x). Then an element c(x) of Rn is a codeword of C if and only if c(x)h(x) = 0.

Proof. Observe that in Rn we have g(x)h(x) = 0. If c(x) ∈ C, then c(x) = a(x)g(x), for some a(x) ∈ Rn, hence c(x)h(x) = a(x)g(x)h(x) = 0. On the other hand, by the devision algorithm, c(x) = q(x)g(x) + r(x) with deg(r(x)) < deg(g(x)) or r(x) = 0. If c(x)h(x) = 0, then r(x)h(x) = 0, so viewed as polynomials in F [X], r(x)h(x) is a multiple of Xn −1. If r(x) 6= 0, then deg(r(x)h(x)) < n−k+k = n, contradicting the fact that it is a multiple of Xn −1. Hence r(x) = 0 and c(x) ∈ hg(x)i. 2

Theorem 3.3.6 Suppose C is a cyclic [n, k]-code with check polynomial k h(x) = h0 + h1X + ··· + hkX . Then (i) a parity-check matrix for C is   hk hk−1 ··· h0 0 0 ··· 0  0 hk hk−1 ··· h0 0 ··· 0    .  . . .   . . .  0 0 ... 0 hk hk−1 ··· h0

(ii) C⊥ is a cyclic code generated by the polynomial ¯ k h(x) = hk + hk−1X + ··· + h0X .

n−1 Proof. A polynomial c(x) = c0 + c1X + ··· + cn−1X is a codeword if and only if c(x)h(x) = 0. In particular, in c(x)h(x) = 0 the coeﬃcient in front of Xk, Xk+1,... Xn must be zero. Hence c0c1 . . . cn must be orthogonal to hkhk−1 ··· h00 ··· 0 and its cyclic shifts. This shows that the rows ⊥ of the above matrix are codewords of C . Since hk 6= 0 these rows are linear independent. Thus generate a subspace of C⊥ of dimension n − k. But since the dimension of C⊥ is n − k, these rows must be a basis of C⊥. Hence it is a parity-check matrix. Observe that if the polynomial h¯(x) is a divisor of Xn − 1, then the generator matrix of hh¯(x)i is the above matrix, which is the generator matrix of C⊥. So h¯(x) is the generator polynomial of C⊥. We need to show that h¯(x) is a divisor of Xn − 1. Note that h¯(x) = Xkh(x−1) and Xn−kg(x−1) are polynomials. Moreover (Xkh(x−1))(Xn−kg(x−1)) = Xn(X−n − 1) = 1 − Xn, showing that h¯(x) is n a divisor of X − 1. 2

Example In IF2[X] we have that X23 − 1 = (X − 1)(X11 + X10 + X6 + X5 + X4 + X2 + 1)(X11 + X9 + X7 + X6 + X5 + X + 1) is a factorization into irreducible polynomials. The cyclic code generated by X11 +X10 +X6 +X5 + X4 + X2 + 1 is a [23, 12]-code with minimum distance 7. It is known as the binary Golay code.