<<

Error-correcting linear Motivation

• CD’s get scratched. • Unreliable or noisy communication channels (Wi-Fi, Ethernet…) • Redundancy measures the number of extra bits 푟 = 푛 − 푘. + [0] [1] [0] [0] [1] [1] [1] [0] Hamming (7,4) – A practical approach

• Given a stream of bits, split it into 4 bit blocks and for each of those form a 7 bit block (codeword), e.g. 01010001011101101001… −> 0101 0101 0001 0111 0110 1001…

• For each block 푑1푑2푑3푑4, suffix 3 redundant bits 푝1푝2푝3 given by:

푝1 = 푑1 + 푑2 + 푑4 푝2 = 푑1 + 푑3 + 푑4 푝3 = 푑2 + 푑3 + 푑4

• The receiver then tries to detect and correct errors with the help of the redundant bits. Hamming (7,4) – Examples

푑1푑2푑3푑4 푝1푝2푝3 • 0100 101 • 1000 010 1000 110 • 1000 011 1100 011

• p1 = 0 + 1 + 0 = 1 • p1 = 1 + 0 + 0 = 1 • p1 = 1 + 0 + 0 = 1

• p2 = 0 + 0 + 0 = 0 • p2 = 1 + 0 + 0 = 1 • p2 = 1 + 0 + 0 = 1

• p3 = 1 + 0 + 0 = 1 • p3 = 0 + 0 + 0 = 0 • p3 = 0 + 0 + 0 = 0

• 0010 100 0011 100 • 0110 001 0111 001 Wrong!

• p1 = 0 + 0 + 0 = 0 • p1 = 0 + 1 + 0 = 1

• p2 = 0 + 1 + 0 = 1 • p2 = 0 + 1 + 0 = 1

• p3 = 0+ 1 + 0 = 1 • p3 = 1 + 1 + 0 = 0 Hamming (7,4) - Remarks

• With 3 extra bits we are able to communicate 23 = 8 “states”. • In the previous exemples, we saw that we can’t correct 2 errors. • In fact, the Hamming(7,4) is only capable of correcting 1 error per 7 bits. • Since we have 8 possible ways of communicating extra information, the Hamming(7,4) code is able to detect the no-errors state and the remaning 7 states where only 1 bit got flipped. • For each block of 4 bits, we send a total of 7 bits, so the efficiency of this code is 4/7. Challenges

• How can we ensure that this procedure is correct without explictly check all possibilities? Does this set of equations make sense? • What if we wanted a Hamming(푥, 푦)? What are the consequences? • How can we bound the number of errors that a code can correct? • What about correcting more than one error? Error-correcting (linear) code

• Definition: An error correcting code of length 푛 over a finite alphabet Σ is a subset of Σn. The elements of 풞 are called the valid codewords. If Σ = q we say that 풞 is a 푞-ary code. When q = 2, we say that 풞 is a binary code. • Definition: If Σ is a field and 풞 ⊂ Σ푛 is a subspace of Σ푛 then 풞 is said to be a linear code. More informally, a linear code is an error- correcting code for which any of codewords is also a codeword. 푛 • Definition: A linear [푛, 푘]-code 풞 is a 푘-dimensional subspace of 픽 2 푛 (풞 ⊂ 픽 2).

푛 • As a of 픽 q , the entire code 풞 may be represented as the span of a set of k codewords (the order of 풞). • The generator matrix 퐺 is a k x n matrix whose rows form a for a linear code 풞, i. e. , the linear code is spanned by the rows of 퐺. 푘 푛 • We can think of 퐺 as a linear map 푇: 픽 q → 픽 q , (푛 > 푘).

• When 퐺 has the form 퐺 = [퐼푘 | 푃] where 퐼푘 denotes the k x k identity matrix and P is some k x (n-k) matrix, then we say G is in standard form. • The generator matrix encodes our messages by projecting them into a higher dimensional space (intuitively, this corresponds to redundancy). Generator Matrix 푮 - Encoding

푘 • Let 푏 ∈ 픽 q be a block message and 퐺 = [퐼푘 | 푃] a generator matrix, where 푏 is assumed to be in row format.

푏. 퐺 = (푏 | 푟푒푑푢푛푑푎푛푡 푏푖푡푠) Hamming weight,

• Definition: The Hamming weight 푤 푥 is the number of non-zero values in 푥. In the binary case, it is the number of 1s. 푛 푛 • Definition: For 푥 ∈ 픽 q and 푦 ∈ 픽 q the Hamming distance is given by 푑(푥, 푦) = 푖|푥 ≠ 푦 푖 푖 i.e., it is the number of coordinate positions in which they differ.

푤(0) = 0 Let 푥 = 01011 and 푦 = 10110, then 푑 푥, 푦 = 4 푛 푤 0010 = 1 Let 푦 ∈ 픽 2, then 푑 0, 푦 = 푤(푦) 푛 • Symmetry: 푑 푥, 푦 = 푑 푦, 푥 ∀푥, 푦 ∈ 픽 2

푛 • Positive definiteness: 푑 푥, 푦 ≥ 0, 푑 푥, 푦 = 0 iff 푥 = 푦 ∀푥, 푦 ∈ 픽 2

푛 • Triangle inequality: 푑 푥, 푦 ≤ 푑 푥, 푧 + 푑 푧, 푦 ∀푥, 푦, 푧 ∈ 픽 2 • For length 푛 = 1: • If 푥 = 푦, then 퐿퐻푆 = 0 and 퐿퐻푆 ≤ 푅퐻푆. If 푥 ≠ 푦, then 퐿퐻푆 is 1 and either 푥 ≠ 푧 or 푦 ≠ 푧, so 푅퐻푆 is at least 1 ֜ 퐿퐻푆 ≤ 푅퐻푆 •

• For general n we use the fact that 푑 푥푖, 푦푖 ≤ 푑 푥푖, 푧푖 + 푑(푧푖, 푦푖) 푛 푛 푛 푛 • 푑 푥, 푦 = σ푖=1 푑 푥푖, 푦푖 ≤ σ푖=1 푑 푥푖, 푦푖 + 푑 푧푖, 푦푖 = σ푖=1 푑 푥푖, 푧푖 + σ푖=1 푑 푧푖, 푦푖 = = 푑 푥, 푧 + 푑(푧, 푦) Minimum distance 풅

• The minimum distance 푑 is the smallest Hamming distance between all distinct valid codewords:

푑 = m푖푛 푑 푥, 푦 푥,푦∈C 푥≠푦

• Computing the minimum distance of the code requires calculating |풞| |풞|2 ≈ 2 2 Hamming distances.

• Remark: Sometimes codes are described as [푛, 푘, 푑] codes. Computing 풅

• Observation: the sum of any two messages is a message.

• Let 푚1 and 푚2 be messages to be encoded by G and 푚3 the message 푚1 + 푚2 = 푚3. The following is true:

푚1 + 푚2 퐺 = 푚1퐺 + 푚2퐺 = 푚3퐺

• Important: We conclude that the sum of 2 valid codewords is a codeword Computing 풅

• 푑 = m푖푛 푑 푐1, 푐2 = m푖푛 푑 푐1 − 푐1, 푐2 − 푐1 푐1,푐2∈C 푐1,푐2∈C

= m푖푛 푑 0, 푐2 − 푐1 = m푖푛 푑 0, 푐2 + 푐1 = m푖푛 푑 0, 푐 푐1,푐2∈C 푐1,푐2∈C 푐∈C = m푖푛 푤(푐) where 푐 ≠ 0. 푐∈C Parity-Check Matrix 푯

푛 푛_푘 • An (n-k) x n matrix 퐻 representing a linear function 휙: 픽 2 → 픽 2 whose is 풞 is called a check matrix for 풞 (or a parity check matrix). 퐻푐푇 = 0 iff 푐 is a valid codeword

• If 풞 is a code with generating matrix 퐺 in standard form, 퐺 = [퐼푘 | 푃], then 퐻 = −푃푇 퐼 ] is a check matrix for 풞. 푛 − 푘

• It gives us a method of detecting errors when they occur.

• Remark: 퐻 has all the information about 풞, so as 퐺. 퐻푐푇 = 0 iff 푐 is a valid codeword. • Let 푐 be a valid codeword and 푒 an error vector, then we can write

퐻 푐 + 푒 푇 = 퐻푐푇 + 퐻푒푇 = 퐻푒푇

• Let 푞 be a valid codeword. If we introduce the same error vector 푒

퐻 푞 + 푒 푇 = 퐻푞푇 + 퐻푒푇 = 퐻푒푇

• The result 퐻푒푇 is called a syndrome.

• Remark: syndromes depend only on the error and never on the message. Syndrome decoding

푛 푛 • Given any valid codeword 푐 ∈ 픽 2 and an error vector 푒 ∈ 픽 2 , we saw that 퐻푒푇 is called the syndrome vector. Let 푧 = 푐 + 푒, then

퐻푧푇 = 퐻푐푇 + 퐻푒푇 = 퐻푒푇. • Build a map associating all possible syndromes 퐻푒푇 to their corresponding 푛 errors 푒 ∈ 픽 2. • Under the assumption that no more than 푡 errors were made during 푛 transmission, the receiver can look up the value 퐻푒푇 in a table of size σ푡 . 푖=0 푖 • After finding 푒 in the table, it is easy to correct 푧, i.e., find 푐: 푐 = 푧 − 푒 Standard array

• Standard arrays are used to decode linear codes. _ • A standard array for an [푛, 푘]-code is a 푞푛 푘 x 푞푘 array that lists all 푛 elements of a particular 픽 q , where : 1. The first row lists all codewords (with the 0 codeword on the extreme left). 2. Each row is a coset with the coset leader (representative) in the first column. 3. The entry in the 푖-th row and 푗-th column is the sum of the 푖-th coset leader and the 푗-th codeword. • Recall Lagrange’s Theorem: If 퐺 is a finite group and 퐻 ⊂ 퐺 a subgroup, then 퐺 = 퐻 퐺/퐻 = 퐻 퐻\퐺 . Example

2 4 • Let 휙: 픽 2 → 픽 2 be the encoder map for a [4,2] code, given by the generator matrix 1 0 1 1 퐺 = 0 1 0 1

• Therefore, 풞 = {0000, 1011, 0101, 1110} • By Lagrange’s Theorem, the standard array will have dimesions 4x4 = since the order of 풞 is 2푘 4 and 푛 = 4, so we have 2푛/2푘 cosets. Example

Valid 1011 0101 1110 codewords 0000 1000 0011 1101 0110 0100 1111 0001 1010 0010 1001 0111 1100 Example

• Suppose that the word 0110 was received as a message. How do we 1000 + 0110 = 1110 ensure that it does not contain errors?

0000 1011 0101 1110 1000 0011 1101 0110 0100 1111 0001 1010 0010 1001 0111 1100 Standard array

• The quotient group, with respect to the coset addition, satisfies: 푛 푛_푘 픽 q / 풞 ≅ 픽 q

• The standard array is the correspondence between the syndromes 푛_푘 푛 푠 ∈ 픽 and the cosets of 풞 in 픽 q , induced by this isomorphism. • All row yield the same syndrome. _ • A [푛, 푘] linear code can correct 푞푛 푘 error patterns. • In some systems, may be infeasible to implement, why? Perfect Codes

n • Definition: Let 푥 ∈ 픽 q. The ball 퐵푟(푥) with radius 푟 and center 푥 is defined by:

n 퐵푟 푥 = 푦 ∈ 픽 q 푑 푥, 푦 ≤ 푟}

• Definition: The covering radius 퐶푅(풞) of a code 풞 is defined by the smallest value of 푛 푟 such that every element of 픽 q is contained in at least one ball of radius 푟 centered at each codeword of 풞:

n 퐶푅 (풞) = max 푑(푥, 풞) 푥 ∈ 픽 q}

where 푑 푧, 풞 = min 푑 푧, 푐 푐 ∈ 풞}. Perfect Codes

• Definition: The packing radius w.r.t a code 풞 is defined by:

푃푅 풞 = max 푒 ∈ 0,1, … , 푛 퐵푒(푏) ∩ 퐵푒(푐) = ∅, ∀ 푏, 푐 ∈ 풞 푤푖푡ℎ 푏 ≠ 푐}

• Lemma: A code 풞 has 퐶푅 풞 ≤ 푟 if and only if the balls with radius 푟 centered at the codewords of 풞 cover the whole space:

푛 푐∈풞 퐵푟 (푐) = 픽 2ڂ Perfect Codes

• Definition: A code 풞 with covering radius 푟 is called perfect if the balls 퐵푟(푐) in the left-hand side of the following equality are disjoint:

푛 .푐∈풞 퐵푟(푐) = 픽 qڂ

• Remark: The inequality 푃푅 풞 ≤ 퐶푅 풞 is always true for any code 풞.

• Remark: The equality 푃푅 풞 = is always true if |풞| > 1. Minimum distance 풅 Example

• Let 풞 be a [3,1] code given by the generator matrix

퐺 = 1 1 1

• And from there we can deduce it’s parity-check matrix 1 1 1 퐻 = 1 0 1

• Therefore, the mapping is • 0 → 000 • 1 → 111 Repetition Code

• Suppose that we want to send the message 010100. • Using the [3,1] repetition code, the encoded message is 000 111 000 111 000 000 • After transmission, the received message was the following 001 110 010 111 000 011 000 111 000 111 000 111 General Hamming Codes

• A code whose parity-check matrix 퐻 has columns with every possible combination of 0s and 1s except all 0s. • 퐻 has 푟 rows has 2푟 – 1 columns. • 푟 of these columns are for parity check bits (columns with a single 1 inside). • All Hamming codes are [푛 = 2푟 – 1, 푘 = 2푟 – 1 – 푟, 푑 = 3] - codes. • For 푟 = 3, we get the Hamming(7,4) • For 푟 = 4, we get the Hamming(15,11) • For 푟 = 5, we get the Hamming(31,26) Final Remarks

푛 • All that we discussed works for any 픽 q where q is a prime. • Tietavainen and van Lint proved in 1973 that there are three kinds of perfect binary codes: 1. Hamming Codes 2. [23,12,7] Golay Code 3. Trivial codes • If you liked this topic, the next thing you should look up is the Hamming bound, which is related to perfect codes. H(7,4) – extras 1

Standard array for Hamming(7,4) H(7,4) – extras 2