Error-Correcting Linear Codes Motivation

Error-correcting linear codes Motivation • CD’s get scratched. • Unreliable or noisy communication channels (Wi-Fi, Ethernet…) • Redundancy measures the number of extra bits 푟 = 푛 − 푘. + [0] [1] [0] [0] [1] [1] [1] [0] Hamming (7,4) – A practical approach • Given a stream of bits, split it into 4 bit blocks and for each of those form a 7 bit block (codeword), e.g. 01010001011101101001… −> 0101 0101 0001 0111 0110 1001… • For each block 푑1푑2푑3푑4, suffix 3 redundant bits 푝1푝2푝3 given by: 푝1 = 푑1 + 푑2 + 푑4 푝2 = 푑1 + 푑3 + 푑4 푝3 = 푑2 + 푑3 + 푑4 • The receiver then tries to detect and correct errors with the help of the redundant bits. Hamming (7,4) – Examples 푑1푑2푑3푑4 푝1푝2푝3 • 0100 101 • 1000 010 1000 110 • 1000 011 1100 011 • p1 = 0 + 1 + 0 = 1 • p1 = 1 + 0 + 0 = 1 • p1 = 1 + 0 + 0 = 1 • p2 = 0 + 0 + 0 = 0 • p2 = 1 + 0 + 0 = 1 • p2 = 1 + 0 + 0 = 1 • p3 = 1 + 0 + 0 = 1 • p3 = 0 + 0 + 0 = 0 • p3 = 0 + 0 + 0 = 0 • 0010 100 0011 100 • 0110 001 0111 001 Wrong! • p1 = 0 + 0 + 0 = 0 • p1 = 0 + 1 + 0 = 1 • p2 = 0 + 1 + 0 = 1 • p2 = 0 + 1 + 0 = 1 • p3 = 0+ 1 + 0 = 1 • p3 = 1 + 1 + 0 = 0 Hamming (7,4) - Remarks • With 3 extra bits we are able to communicate 23 = 8 “states”. • In the previous exemples, we saw that we can’t correct 2 errors. • In fact, the Hamming(7,4) code is only capable of correcting 1 error per 7 bits. • Since we have 8 possible ways of communicating extra information, the Hamming(7,4) code is able to detect the no-errors state and the remaning 7 states where only 1 bit got flipped. • For each block of 4 bits, we send a total of 7 bits, so the efficiency of this code is 4/7. Challenges • How can we ensure that this procedure is correct without explictly check all possibilities? Does this set of equations make sense? • What if we wanted a Hamming(푥, 푦)? What are the consequences? • How can we bound the number of errors that a code can correct? • What about correcting more than one error? Error-correcting (linear) code • Definition: An error correcting code of length 푛 over a finite alphabet Σ is a subset of Σn. The elements of 풞 are called the valid codewords. If Σ = q we say that 풞 is a 푞-ary code. When q = 2, we say that 풞 is a binary code. • Definition: If Σ is a field and 풞 ⊂ Σ푛 is a subspace of Σ푛 then 풞 is said to be a linear code. More informally, a linear code is an error- correcting code for which any linear combination of codewords is also a codeword. 푛 • Definition: A linear [푛, 푘]-code 풞 is a 푘-dimensional subspace of 픽 2 푛 (풞 ⊂ 픽 2). Generator Matrix 푮 푛 • As a linear subspace of 픽 q , the entire code 풞 may be represented as the span of a set of k codewords (the order of 풞). • The generator matrix 퐺 is a k x n matrix whose rows form a basis for a linear code 풞, i. e. , the linear code is spanned by the rows of 퐺. 푘 푛 • We can think of 퐺 as a linear map 푇: 픽 q → 픽 q , (푛 > 푘). • When 퐺 has the form 퐺 = [퐼푘 | 푃] where 퐼푘 denotes the k x k identity matrix and P is some k x (n-k) matrix, then we say G is in standard form. • The generator matrix encodes our messages by projecting them into a higher dimensional space (intuitively, this corresponds to redundancy). Generator Matrix 푮 - Encoding 푘 • Let 푏 ∈ 픽 q be a block message and 퐺 = [퐼푘 | 푃] a generator matrix, where 푏 is assumed to be in row format. 푏. 퐺 = (푏 | 푟푒푑푢푛푑푎푛푡 푏푖푡푠) Hamming weight, Hamming distance • Definition: The Hamming weight 푤 푥 is the number of non-zero values in 푥. In the binary case, it is the number of 1s. 푛 푛 • Definition: For 푥 ∈ 픽 q and 푦 ∈ 픽 q the Hamming distance is given by 푑(푥, 푦) = 푖|푥 ≠ 푦 푖 푖 i.e., it is the number of coordinate positions in which they differ. 푤(0) = 0 Let 푥 = 01011 and 푦 = 10110, then 푑 푥, 푦 = 4 푛 푤 0010 = 1 Let 푦 ∈ 픽 2, then 푑 0, 푦 = 푤(푦) 푛 • Symmetry: 푑 푥, 푦 = 푑 푦, 푥 ∀푥, 푦 ∈ 픽 2 푛 • Positive definiteness: 푑 푥, 푦 ≥ 0, 푑 푥, 푦 = 0 iff 푥 = 푦 ∀푥, 푦 ∈ 픽 2 푛 • Triangle inequality: 푑 푥, 푦 ≤ 푑 푥, 푧 + 푑 푧, 푦 ∀푥, 푦, 푧 ∈ 픽 2 • For length 푛 = 1: • If 푥 = 푦, then 퐿퐻푆 = 0 and 퐿퐻푆 ≤ 푅퐻푆. If 푥 ≠ 푦, then 퐿퐻푆 is 1 and either 푥 ≠ 푧 or 푦 ≠ 푧, so 푅퐻푆 is at least 1 ֜ 퐿퐻푆 ≤ 푅퐻푆 • • For general n we use the fact that 푑 푥푖, 푦푖 ≤ 푑 푥푖, 푧푖 + 푑(푧푖, 푦푖) 푛 푛 푛 푛 • 푑 푥, 푦 = σ푖=1 푑 푥푖, 푦푖 ≤ σ푖=1 푑 푥푖, 푦푖 + 푑 푧푖, 푦푖 = σ푖=1 푑 푥푖, 푧푖 + σ푖=1 푑 푧푖, 푦푖 = = 푑 푥, 푧 + 푑(푧, 푦) Minimum distance 풅 • The minimum distance 푑 is the smallest Hamming distance between all distinct valid codewords: 푑 = m푖푛 푑 푥, 푦 푥,푦∈C 푥≠푦 • Computing the minimum distance of the code requires calculating |풞| |풞|2 ≈ 2 2 Hamming distances. • Remark: Sometimes codes are described as [푛, 푘, 푑] codes. Computing 풅 • Observation: the sum of any two messages is a message. • Let 푚1 and 푚2 be messages to be encoded by G and 푚3 the message 푚1 + 푚2 = 푚3. The following is true: 푚1 + 푚2 퐺 = 푚1퐺 + 푚2퐺 = 푚3퐺 • Important: We conclude that the sum of 2 valid codewords is a codeword Computing 풅 • 푑 = m푖푛 푑 푐1, 푐2 = m푖푛 푑 푐1 − 푐1, 푐2 − 푐1 푐1,푐2∈C 푐1,푐2∈C = m푖푛 푑 0, 푐2 − 푐1 = m푖푛 푑 0, 푐2 + 푐1 = m푖푛 푑 0, 푐 푐1,푐2∈C 푐1,푐2∈C 푐∈C = m푖푛 푤(푐) where 푐 ≠ 0. 푐∈C Parity-Check Matrix 푯 푛 푛_푘 • An (n-k) x n matrix 퐻 representing a linear function 휙: 픽 2 → 픽 2 whose kernel is 풞 is called a check matrix for 풞 (or a parity check matrix). 퐻푐푇 = 0 iff 푐 is a valid codeword • If 풞 is a code with generating matrix 퐺 in standard form, 퐺 = [퐼푘 | 푃], then 퐻 = −푃푇 퐼 ] is a check matrix for 풞. 푛 − 푘 • It gives us a method of detecting errors when they occur. • Remark: 퐻 has all the information about 풞, so as 퐺. 퐻푐푇 = 0 iff 푐 is a valid codeword. • Let 푐 be a valid codeword and 푒 an error vector, then we can write 퐻 푐 + 푒 푇 = 퐻푐푇 + 퐻푒푇 = 퐻푒푇 • Let 푞 be a valid codeword. If we introduce the same error vector 푒 퐻 푞 + 푒 푇 = 퐻푞푇 + 퐻푒푇 = 퐻푒푇 • The result 퐻푒푇 is called a syndrome. • Remark: syndromes depend only on the error and never on the message. Syndrome decoding 푛 푛 • Given any valid codeword 푐 ∈ 픽 2 and an error vector 푒 ∈ 픽 2 , we saw that 퐻푒푇 is called the syndrome vector. Let 푧 = 푐 + 푒, then 퐻푧푇 = 퐻푐푇 + 퐻푒푇 = 퐻푒푇. • Build a map associating all possible syndromes 퐻푒푇 to their corresponding 푛 errors 푒 ∈ 픽 2. • Under the assumption that no more than 푡 errors were made during 푛 transmission, the receiver can look up the value 퐻푒푇 in a table of size σ푡 . 푖=0 푖 • After finding 푒 in the table, it is easy to correct 푧, i.e., find 푐: 푐 = 푧 − 푒 Standard array • Standard arrays are used to decode linear codes. _ • A standard array for an [푛, 푘]-code is a 푞푛 푘 x 푞푘 array that lists all 푛 elements of a particular 픽 q vector space, where : 1. The first row lists all codewords (with the 0 codeword on the extreme left). 2. Each row is a coset with the coset leader (representative) in the first column. 3. The entry in the 푖-th row and 푗-th column is the sum of the 푖-th coset leader and the 푗-th codeword. • Recall Lagrange’s Theorem: If 퐺 is a finite group and 퐻 ⊂ 퐺 a subgroup, then 퐺 = 퐻 퐺/퐻 = 퐻 퐻\퐺 . Example 2 4 • Let 휙: 픽 2 → 픽 2 be the encoder map for a [4,2] code, given by the generator matrix 1 0 1 1 퐺 = 0 1 0 1 • Therefore, 풞 = {0000, 1011, 0101, 1110} • By Lagrange’s Theorem, the standard array will have dimesions 4x4 = since the order of 풞 is 2푘 4 and 푛 = 4, so we have 2푛/2푘 cosets. Example Valid 1011 0101 1110 codewords 0000 1000 0011 1101 0110 0100 1111 0001 1010 0010 1001 0111 1100 Example • Suppose that the word 0110 was received as a message. How do we 1000 + 0110 = 1110 ensure that it does not contain errors? 0000 1011 0101 1110 1000 0011 1101 0110 0100 1111 0001 1010 0010 1001 0111 1100 Standard array • The quotient group, with respect to the coset addition, satisfies: 푛 푛_푘 픽 q / 풞 ≅ 픽 q • The standard array is the correspondence between the syndromes 푛_푘 푛 푠 ∈ 픽 and the cosets of 풞 in 픽 q , induced by this isomorphism. • All row yield the same syndrome. _ • A [푛, 푘] linear code can correct 푞푛 푘 error patterns.

Load more