Theory and Coding

Topic 04/10 Maximum-likelihood decoding

M. Sc. Marko Hennhöfer

Ilmenau University of Technology Research Laboratory

Winter Semester 2019

M.Sc. Marko Hennhöfer, Communications Research Lab and Coding Slide: 1 Contents 4 Channel coding 4.1 Block codes, asymptotic coding gains 4.2 Convolutional codes, trellis diagram, hard-/soft decision decoding 4.3 Turbo Codes 4.4 LDPC codes 4.5 Polar codes (additional slide set) 5 Appendix 5.1 Polynomial representation of convolutional codes 5.2 Cyclic Redundancy check (CRC) 5.3 Matlab / Octave example, BER

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 2 4 Channel Coding

Overview on different coding schemes:

convolutional codes block codes concatenated codes

non- recursive non- linear serial parallel recursive linear Turbo Conv. e.g. used e.g. used Codes (TCC) in GSM, in 4G LTE in 4G LTE LTE Turbo codes non-cyclic cyclic

Polar LDPC Hamming RS BCH Golay codes in 5G e.g. used in „1977 5G, DVB-T2/S2, Voyager“, - in detail Wifi, Wimax now QR, - basics Audio CD - not treated

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 3 4 Channel Coding

Channel coding:

11,01, Channel 110,011, coder

info , add useful code word, length k redundancy length n e.g., for FEC

Defines a (n,k ) code rate R = k / n < 1

Example: (3,1) repetition code code info bandwidth bit bit expansion rate rate factor results in an increased data rate

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 4 4 Channel Coding

Code properties: Systematic codes: Info occur as a part of the code words

Code space:

Linear codes: The sum of two code words is again a codeword

bit-by-bit modulo 2 addition without carry

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 5 4 Channel Coding

Code properties: Minimum : A measure, how different the most closely located code words are. Example: compare all combinations of code words

For linear codes the comparison simplifies to finding the code word with the lowest Hamming weight:

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 6 4 Channel Coding Maximum likelihood decoding (MLD): Goal: Minimum word error probability

CW 11,01, 11,01, Channel 110,011, discrete 100,011, 110,011, encoder coder channel estimator inverse

Code word estimator:

is the mapping from all 2n possible received words to the 2k possible code words in

Example: (7,4) Hamming code 27 = 128 possible received words 24 = 16 valid code words in

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 7 4 Channel Coding

Decoding rule: Assumption: equal apriori probabilities, i.e., all 2k code words appear with probability 1/2k.

Probability for wrong detection if a certain cw was transmitted:

Probability to receice a CW that yields an estimate

Furthermore:

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 8 4 Channel Coding

Example: (n=3,k=1) Repetition Code: Assumption: equal apriori probabilities, i.e., each of the 2k =21 =2 code words (111,000) appear with probability 1/2k=1/21=1/2

Probability for wrong detection if a certain cw was transmitted:

e.g., assume was transmitted over a BSC:

Transmitted, Possibly Decoded Prob., e.g., if a BSC is a received, y considered 111 000 000 P(000|111) pe pe pe 001 000 consider all received P(001|111) pe pe (1-pe) 010 000 P(010|111) p (1-p ) p 011 111 words that yield a e e e 100 000 wrong estimate P(100|111) (1-pe) pe pe 101 111 110 111 111 111

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 9 4 Channel Coding

Probability for a wrong detection (considering all possibly transmitted CWs now):

mean over all transmitted CWs

combining the sums

wrong detection

any detection correct detection

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 10 4 Channel Coding

Probability for wrong detection:

To minimize choose for each received word such that gets maximized

is maximized, if we choose a CW with the minimum distance to the received word .

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 11 4 Channel Coding MLD for hard decision DMC: Find the CW with minimum Hamming distance.

MLD for soft decision AWGN:

Euklidean distance

Find the CW with minimum Euklidean distance.

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 12 4 Channel Coding Coding gain: Suitable measure: Bit error probability: (for the BER after decoding, we consider the errors in the k decoded bits)

Code word error probability:

Example: Transmit 10 CWs and 1 bit error shall occur

k info bits 1 bit wrong will yield 1 wrong code word  40 info bits have been transmitted 

As in general more than one error can occur in a code word, we can only approximate

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 13 4 Channel Coding

If we consider that a decoding error occurs only if bits are wrong:

Comparison of codes considering the AWGN channel: Energy per bit vs. energy per coded bit (for constant transmit power)

Example: (3,1) repetition code, coded bits, energy coding 1 1 1 1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 14 4 Channel Coding

Example: BER Performance using the (7,4) Hamming code

uncoded -2 P hard, approx 10 b P soft, approx b

-4 10 In the low SNR regime we suffer from the reduced energy per coded bit -6 10 asymptotic coding gain

-8 10 hard vs. soft decision gain

-10 10 3 4 5 6 7 8 9 10 11 12 E / N in dB c 0

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 15 4 Channel Coding Analytical calculation of the error probabilities: Hard decision: combinations for r errors Example: (3,1) repetition code in a sequence of length n Info code received word word word

1 combination for 3 errors

3 combinations for 2 errors

3 combinations for 1 error will be corrected

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 16 4 Channel Coding

error can be corrected

3 combinations 1 combination for 2 errors for 3 errors

general:

CW errors occur combinations probability probability for more than for r errors for r errors for n-r t+1 wrong bits in a sequence correct bits of length n

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 17 4 Channel Coding

Approximation for small values of only take the lowest power of into account

general:

Example: (7,4) Hamming code,

for a binary mod. scheme & AWGN channel

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 18 4 Channel Coding

Set of code words for the (7,4) Hamming-Code:

0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 1 0 1 0 0 1 1 0 0 1 1 1 0 1 1 0 1 1 0 0 0 0 1 0 0 1 1 1 1 1 0 0 0 1 0 0 1 0 1 1 0 0 1 1 0 1 0 0 1 0 1 1 0 0 0 1 1 1 1 0 1 0 0 0 1 1 1 0 1 0 1 1 1 1 1 1 1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 19 4 Channel Coding Example: BER Performance using the (7,4) Hamming code

uncoded -2 P hard 10 b simulated P hard w P approx b calculated P approx -4 w 10 as derived before

-6 10 more bits should have been simulated to get reliable results here

-8 10

3 4 5 6 7 8 9 10 11 12 E / N in dB b 0

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 20 4 Channel Coding Asymptotic coding gain for hard decision decoding: uncoded: good approximation for high SNR coded:

constant

Assume constant BER and compare signal-to-noise ratios

in dB

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 21 4 Channel Coding

Example: BER Performance using the (7,4) Hamming code

uncoded -2 P hard, approx 10 b P soft, approx b

-4 10

-6 10 Asymptotic coding gain

-8 10

-10 10 3 4 5 6 7 8 9 10 11 12 E / N in dB c 0

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 22 4 Channel Coding Analytical calculation of the error probabilities: Soft decision:

code word AWGN channel received word +

noise vector: i.i.d.

Example: (3,2) Parity check code +

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 23 4 Channel Coding Example continued

ML decoding rule, derived before

Pairwise error probability: Assume has been transmitted. What is the probability that the decoder decides for a different CW ?

The decoder will decide for if the received word has a smaller Euklidean distance to as compared to .

next: Evaluate the norm by summing the squared components

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 24 4 Channel Coding

For the whole CW we have different bits

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 25 4 Channel Coding

scales standard deviation Gaussian rv with standard deviation sum of Gaussian rvs: The variance of the sum will be the sum of the individual variances.

std. dev.

variance

Gaussian rv with zero mean and variance

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 26 4 Channel Coding

multiplied with -1

Question: What is the probability that our Gaussian r.v. becomes larger than a certain value? Answer: Integral over remaining part of the Gaussian PDF, e.g., expressed via the Q-function.

Q-Function:

normalized Gaussian r.v.

Probability that a normalized Gaussian r.v. becomes larger than certain value .

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 27 4 Channel Coding

normalized Gaussian r.v.

Pairwise error probability:

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 28 4 Channel Coding Example continued: e.g., for

transmitted

Number of CW within For we would distance get

The CWs with the minimum Hamming distance to the transmitted CW dominate the CW error probability

(ai & aj within dmin)

Mean over the transmitted CWs (assuming Admin holds for all of them).

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 29 4 Channel Coding

(ai & aj within dmin)

Best case: only one worst case: all CWs CW within within

For high SNR or if is unkown

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 30 4 Channel Coding Example: BER Performance using the (7,4) Hamming code

uncoded -2 10 P soft b simulated P soft w P approx b calculated -4 P approx 10 w as derived before using -6 10

-8 10

-10 10 3 4 5 6 7 8 9 10 11 12 E / N in dB b 0

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 31 4 Channel Coding Asymptotic coding gain for soft decision decoding:

Derivation analog to the hard decision case

uncoded: good approximation for high SNR coded:

Assume constant BER and compare signal-to-noise ratios

in dB

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 32 4 Channel Coding

Example: BER Performance using the (7,4) Hamming code

uncoded -2 P hard, approx 10 b P soft, approx b

-4 10

Asymptotic coding gain -6 10

-8 10

-10 10 3 4 5 6 7 8 9 10 11 12 E / N in dB c 0

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 33 Information Theory and Coding

Topic 05/10 Block codes

M. Sc. Marko Hennhöfer

Ilmenau University of Technology Communications Research Laboratory

Winter Semester 2019

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 1 4 Channel Coding Matrix representation of block codes: Example: (7,4) Hamming code Encoding equation:

systematic code

bitwise modulo 2 sum without carry

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 2 4 Channel Coding

Introducing the generator matrix we can express the encoding process as matrix-vector product.

The identity matrix is responsible that the code becomes a systematic code. It just copies the info multiply and word into the CW sum Parity matrix

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 3 4 Channel Coding

Other interpretation: The code word a is generated by linear combinations of the rows of G. The „1“ entries in u select the rows of G. Thus, the rows of G span the space of the codewords.

+

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 4 4 Channel Coding

General: For a (n,k ) block code: info words

code words

Encoding: For systematic codes:

Set of code words:

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 5 4 Channel Coding

Properties of the generator matrix

the rows of shall be linear independent the rows of are code words of the row space is the number of linear independent rows the column space is the number of linear independent columns row space and column space are equivalent, i.e., the rank of the matrix as has more columns than rows, the columns must be linear dependent

Example: (7,4) Hamming code easy to see: the rows are linear independent the last 3 columns can be written as linear comb. of the first 4 columns rank 4

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 6 4 Channel Coding

Properties of the generator matrix

rows can be exchanged without changing the code multiplication of rows with a scalar doesn‘t change the code sum of a scaled row with another row doesn‘t change the code exchanging columns will change the set of codewords but the weight distribution and the minimum Hamming distance will be the same

yields the same code:

each Generator matrix can be brought to the row echelon form, i.e., a systematic encoder

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 7 4 Channel Coding

Properties of the generator matrix

as the all zero word is a valid code word, and the rows of are also valid code words, the minimum Hamming distance must be less or equal the minimum weight of the rows.

Parity check matrix The code can be also defined via the parity check matrix

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 8 4 Channel Coding

Interpretation: Simple example for visualization

The rows of H span the null space of 0 0 1 the code words, i.e. the product of H = any code word with H T is 0. The rows of H are orthogonal to the code words. 1 0 0 H = [0 0 1] G = 0 1 0

a1 = [0 1 0] a2 = [0 0 0]

a0 = [1 0 0] a3 = [1 1 0]

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 9 4 Channel Coding

Parity check matrix If is a systematic generator matrix, e.g.,

then

can be used to check whether a received CW is a valid CW, or to determine what is wrong with the received CW (syndrom)

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 10 4 Matlab / Octave example Simulation of a (7,4) Hamming code transmission (hard decision) Use analytical approximations as reference Compare the coded and uncoded case

use the approximation for hard decision here

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 11 4 Matlab / Octave example comparing the uncoded case with the Hamming (7,4) code

b

P

for

approximations analytical

Eb / N0 in dB

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 12 4 Matlab / Octave example

Extending the simulation by adding the simulation of the uncoded bits

We need another loop here, to loop over the bits which we want to simulate now - generate a random bit (+1/-1) - add the Gaussian distributed noise according to the current Eb/N0 ratio - decode the bit (>< than 0?) - in case of errors, increment error count - after the loop calculate BER

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 13 4 Matlab / Octave example

Most simple loop over the bits

You can increase the simulation speed be processing vectors of bits instead of single bits

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 14 4 Matlab / Octave example

Now loop over frames

You can further increase the speed of the simulation if you terminate the loop at a certain error count. For low SNRs it won‘t be necessary to simulate as many bits as for high SNRs

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 15 4 Matlab / Octave example

Exit loop at a certain error count, to reduce simulation

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 16 4 Matlab / Octave example

Extending the simulation by adding the Hamming code

The Hamming code can go in here - for each SNR we need to come up with the Hamming encoder for a certain number of bits - we need to be careful with the energies Ec = 4/7 Eb - implement the syndrom decoding

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 17 4 Matlab / Octave example

Encoding and transmission Get random bits and multiply them with G (make sure you work in the modulo 2 space) -> get code word a Scale a by sqrt(R) to consider the reduced energy per bit Add noise and finally get the hard decision estimate for y

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 18 4 Matlab / Octave example

Decoding: Calculate the syndrom and find the corresponding error pattern Add the error pattern to the received bits to correct the errors

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 19 4 Matlab / Octave example comparing the uncoded case with the Hamming (7,4) code (using hard decision decoding)

b

P

, ,

ratio

error bit

Eb / N0 in dB M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 20 Information Theory and Coding

Topic 06/10 Syndrom decoding

M. Sc. Marko Hennhöfer

Ilmenau University of Technology Communications Research Laboratory

Winter Semester 2019

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 1 4 Channel Coding

Decoding: ML decoding is trivial but computationally very complex as the received CW has to be compared with all possible CWs. Impractical for larger code sets. Therefore, simplified decoding methods shall be considered. Syndrom decoding using Standard Arrays (or Slepian arrays) Assume an (n,k ) code with the parity check matrix The Syndrom for a received CW is defined as: with

valid CW + error word, error pattern

For a valid received CW the syndrom will be 0. Otherwise the Syndrom only depends on the error pattern.

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 2 4 Channel Coding

As we get 2k valid codewords and 2n possibly received words there must be 2n - 2k error patterns. The syndrom is only of size n -k, therefore the syndroms are not unique. E.g., (7,4) Hamming Code: 16 valid CWs, 128 possibly received CWs, 112 error patterns, 2(n-k )=8 syndroms.

Let the different syndroms be . For each syndrom we‘ll get a whole set of error patterns (cosets), that yield this syndrom.

Let , i.e., they’ll yield the same Syndrom

The difference of two error patterns in must be a valid CW then.

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 3 4 Channel Coding

The set can be expressed as one element plus the code set .

Within each can be chosen as coset leader to calculate the rest of the coset.

The coset leader is chosen with respect to the minimum Hamming weight

Example: (5,2) Code

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 4 4 Channel Coding

Syndrom 0 → valid CWs

coset coset syndrom leader

e.g., , all error patterns that yield the syndrom 011

choose the pattern with minimum Hamming weight as coset leader

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 5 4 Channel Coding

Syndrom decoding The same table as before only considering the coset leaders and the syndroms. syndrom table resort for easier look-up. contains already the address information

As the coset leader was chosen with the minimum Hamming distance, it is the most likely error pattern for a certain syndrom

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 6 4 Channel Coding

Example: (5,2) Code continued

Assume we receive

Calculate the Syndrom (“what is wrong with the received CW?“)

Look-up in the syndrom table at position 3 (011 binary).

Invert the corresponding bit to find the most likely transmitted CW.

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 7 Information Theory and Coding

Topic 07/10 Convolutional codes

M. Sc. Marko Hennhöfer

Ilmenau University of Technology Communications Research Laboratory

Winter Semester 2019

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 1 4 Channel Coding Convolutional codes: Features:

No block processing; a whole sequence is convolved with a set of generator coefficients No analytic construction is known → good codes have been found by computer search Description is easier as compared to the block codes Simple processing of soft decission information → well suited for iterative decoding Coding gains from simple convolutional codes are similar as the ones from complex block codes Easy implementation via shift registers

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 2 4 Channel Coding General structure: Example: (n,k ), e.g., (3,2) convolutional code with memory m=2 (constraint length K=m+1=3)

current input / info-block m=2 previous info-blocks

weights for 0 1 1 0 0 1 the linear 1 0 1 1 0 0 combination 0 1 0 0 0 0

generators usually in output [011001] block octal form [101100] [010000] (31, 54, 20)

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 3 4 Channel Coding Formal description as convoltution:

Describes the linear combinations, how to compute the n output bits from the k (m+1) input bits.

the bit from output block

the bit from input block sum over the input blocks corresponding weight, 0 or 1 sum over the bits of the input blocks

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 4 4 Channel Coding General structure: often used, input blocks of size 1: (n,1 ), e.g., (3,1) convolutional codes

current input / info-bit m=2 previous info-bits

1 0 0 1 0 1 1 1 1

generators octal form output [100] block [101] (4, 5, 7) [111]

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 5 4 Channel Coding General structure: visualization as shift register, e.g., (3,1) conv. code with generator (4,5,7), constraint length 3.

initialization X 0 0 m=2, memory

current state input bit

s0 = 0 0 s1 = 0 1 s2 = 1 0 s3 = 1 1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 6 4 Channel Coding Generation of Trellis diagram (example continued): output initialization current following X 0 0 input state output 0 state X=0 000 s0 = 0 0 0 input X=0 0 0

s1 = 0 1 0

1 s2 = 1 0 1 input X=1 0 0

s3 = 1 1 1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 7 4 Channel Coding Trellis diagram (example continued):

state 000 000 000 000 000 s0 = 0 0

s1 = 0 1

s2 = 1 0

101 101 101 s3 = 1 1

current input:0 current input:1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 8 4 Channel Coding Encoding via the Trellis diagram (example continued): Input seq.: 0 1 0 1 1 ... Output seq.: 000 111 001 100 110 ...

state 000 000 000 000 000 s0 = 0 0

s1 = 0 1

s2 = 1 0

101 101 101 s3 = 1 1

current input:0 current input:1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 9 4 Channel Coding State diagram (example continued): A more compact representation

s1 = 0 1 011 100 001 000 111 s0 = 0 0 s2 = 1 0

010

101 110

s3 = 1 1

current input:0 current input:1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 10 4 Channel Coding Encoding via state diagram (example continued): Input seq.: 0 1 0 1 1 ... Output seq.: 000 111 001 100 110 ...

s1 = 0 1 011 100 001 000 111 s0 = 0 0 s2 = 1 0

010 initialization; start here 101 110

s3 = 1 1

current input:0 current input:1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 11 4 Channel Coding Viterbi for hard decission decoding: termination / tail bits Info bits: 0 1 0 1 0 0 Transm.: 000 111 001 100 001 011 Received: 001 111 011 000 001 010 transmission errors

survivor s0 = 0 0 000 1 000 4 000 5 000 4 000 5 000 6 1 3 2 1 4 0 4 5 1 4 s = 2 1 0 1 0 4 6 2 0 1 2 3 3 1 4 7 4 1 2 3 s = 2 2 5 2 1 1 7 0 1 0 3 Viterbi 7 1 3 metric, i.e., 2 2 Hamming 2 s3 = distance 1 1 3 3 7 1 1 001 101 101 current input:0 111 sum current input:1 2 5 2 5

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 12 4 Channel Coding for hard decission decoding: termination / tail bits Info bits: 0 1 0 1 0 0 Transm.: 000 111 001 100 001 011 Received: 001 111 011 000 001 010 ML est.: 000 111 001 100 001 011 Decoded: 0 1 0 1 0 0

000 1 000 4 000 5 000 4 000 5 000 6 1 3 2 1 4 0 4 5 1 4

2 0 1 0 4 6 1 2 3 3 1 4 7 traceback 4 1 2 3 path with 2 5 minimum 2 1 1 7 0 metric 3 7 1 3 2 2 1 2 1 3 3 101 7 101 current input:0 2 5 2 5 current input:1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 13 4 Channel Coding blank Trellis diagram: termination / tail bits 0 0 state 0 0

0 1

1 0

1 1

current input:0 current input:1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 14 4 Channel Coding Summary: Viterbi algorithm for hard decission decoding: Generate the Trellis diagram depending on the code (which is defined by the generator) For any branch compute the Viterbi metrics, i.e., the Hamming distances between the possibly decoded sequence and the received sequence Sum up the individual branch metrics through the trellis (path metrics) At each point choose the suvivor, i.e., the path metric with the minimum weight At the end the zero state is reached again (for terminated codes) From the end of the Trellis trace back the path with the minimum metric and get the corresponding decoder outputs As the sequence with the minimum Hamming distance is found, this decoding scheme corresponds to the Maximum Likelihood decoding

Sometimes also different metrics are used as Viterbi metric, such as the number of equal bits. Then we need the path with the maximum metric.

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 15 4 Channel Coding How good are different convolutional codes? For Block codes it is possible to determine the minimum Hamming distance between the different code words, which is the main parameter that influences the bit error rate For convolutional codes a similar measure can be found. The free distance is the number of bits which are at least different for two output sequences. The larger , the better the code. A convolutional code is called optimal if the free distance is larger as compared to all other codes with the same rate and constraint length Even though the coding is a sequential process, the decoding is performed in chunks with a finite length (decoding window width) As convolutional codes are linear codes, the free distances are the distances between each of the code sequences and the all zero code sequence The minimum free distance is the minimum Hamming weight of all arbitrary long paths along the trellis that diverge and remerge to the all-zero path (similar to the minimum Hamming distance for linear block codes)

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 16 4 Channel Coding Free distance (example recalled): (3,1) conv. code with generator (4,5,7). remerge diverge 0 0 0 0 state 6 8 0 000 000 000 000 000 s0 = 0 0 2 3 3 2

s1 = 0 1 Hamming 1 weight of 1 1 the branch

s2 = 1 0 2 1

101 101 101 s3 = 1 1

The path diverging and remerging to all-zero path with minimum weight Note: This code is not optimal as there exists a better code with constraint 6 length 3 that uses the generator (5,7,7) and reaches a free distance of 8

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 17 4 Channel Coding How good are different convolutional codes? Optimal codes have been found via computer search, e.g., Code rate Constraint Generator Free distance length (octal) 1 / 2 3 (5,7) 5 1 / 2 4 (15,17) 6 1 / 2 5 (23, 35) 7 1 / 3 3 (5,7,7) 8 1 / 3 4 (13,15,17) 10 1 / 3 5 (25,33,37) 12

Extensive tables, see reference: John G. Proakis, “Digital Communications” As the decoding is done sequentially, e.g., with a large decoding window, the free distance gives only a hint on the number of bits that can be corrected. The higher the minimum distance, the more closely located errors can be corrected Therefore, interleavers are used to split up burst errors

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 18 4 Channel Coding standardization 1982-1992 Application example GSM voice transmission deployment starting 1992

The produces blocks of 260 bits, from which some bits are more or less important for the speech quality Class Ia: 50 bits most sensitive to bit errors Class Ib: 132 bits moderately sensitive to bit errors Class II: 78 bits least sensitive to bit errors

50 calc. parity 3 class Ia bits (CRC) 189 50 378 132 class Ib

encoder 189

4 multiplexer 456

termination bits 0 convolutional voice coder voice

78 class II

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 19 4 Channel Coding

Application example GSM voice transmission The voice samples are taken every 20ms, i.e., the output of the voice coder has a data rate of 260 bit / 20 ms = 12.7 kbit/s After the encoding we get 456 bits which means overall we get a code rate of about 0.57. The data rate increases to 456 bit / 20 ms = 22.3 kbit/s The convolutional encoder applies a rate ½ code with constraint length 5 (memory 4) and generator (23, 35), . The blocks are also terminated by appending 4 zero bits (tail bits). Specific decoding schemes or are usually not standardized. In most cases the Viterbi algorithm is used for decoding 24=16 states in the Trellis diagram In case 1 of the 3 parity bits is wrong (error in the most sensitive data) the block is discarded and replaced by the last one received correctly To avoid burst errors additionally an interleaver is used at the encoder output

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 20 standardization 1990-2000 4 Channel Coding deployment starting 2001 Application example UMTS:

Example: Broadcast channel (BCH) Convolutional code: Rate ½ Constraint length K=9 (memory m=8) generator (561,753), free distance: 12

→ 28=256 states in the Trellis diagram!

Also Turbo codes are standardized From: „Universal Mobile System (UMTS); Channel coding and multiplexing examples (ETSI 3GPP TR 25.944)“, 82 pages document

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 21 4 Channel Coding Latest standard for 4G, 5G phase 1, June 2019 Application example LTE Release 11 (2014), …,14,15 (2019):

Example: Broadcast channel (BCH) Tail biting convolutional code: Rate 1/3 Constraint length K=7 (memory m=6) generator (133,171,165) free distance: 15

Also Turbo codes are standardized

See document: 3GPP TS 36.212 version 11.5.0, 12.8.0, 13.2.0, 14.10.0, 15.6.0

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 22 4 Channel Coding Latest standard for 5G phase 2, ongoing

Turbo codes will be replaced by Low Density Parity Check Codess (LDPC) Tailbiting convolutional codes will be replaced by Polar Codes

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 23 5 Appendix

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 24 5.1 Polynomial description of convolutional codes Alternative description by polynomials: The convolutional coder generates an output sequence as linear combination of the states. It is a linear time invariant (LTI) system. In LTI systems the input-output relationship (in the time-domain) is given by the convolution. If we use a polynomial notation (similar to the z-transform or Fourier- transform, the convoltution becomes a multiplication. The so called D-Transform is pretty similar to the z-Transform, just substituting z-1 with D, where D can be understood as delay operator. The term transform might be a bit misleading as we stay in the time- domain (in contrast to the z-transform, where get a frequency domain representation)

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 25 5.1 Polynomial description of convolutional codes

Example: Rate ½ convolutional code with generators (7, 5)o

As multiplication of polynomials As convolution

For a specific input sequence

modulo-2 additions!

the second output sequence is calculated in the same way

also the interleaving of the outputs can be described in the polynomial form M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 26 5.2 Cyclic Redundancy Check (CRC)

Cyclic redundancy check (CRC): Frame (arbitrarily chosen): 1 0 0 1 Generator: 1 0 1 1 Example: CRC Length 3 bit Frame with 0-Bits: 1 0 0 1 0 0 0 Generator polynom: (append zeros; generator length - 1) g (D) = [D 3+D 1+1] CRC 3 Encoding by polynomial division (easy to Generator length: 4 implement by using shift registers and XOR gates)

1 0 0 1 0 0 0 1 0 1 1 ------0 0 1 0 0 0 0 1 0 1 1 ------0 0 0 0 1 1 0 remainder = parity bits Transmitted Frame: 1 0 0 1 1 1 0 From: 3GPP TS 36.212 version 11.5.0 Release 11

standardization July 2014

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 27 5.2 Cyclic Redundancy Check (CRC) standardization July 2014 Cyclic redundancy check (CRC), decoding:

Received word 1 0 0 0 1 1 0 Received word 1 0 0 1 1 1 0 Generator: 1 0 1 1 Generator: 1 0 1 1

Decoding by polynomial division Decoding by polynomial division

1 0 0 0 1 1 0 1 0 0 1 1 1 0 1 0 1 1 1 0 1 1 ------0 0 1 1 1 1 0 0 0 1 0 1 1 0 1 0 1 1 1 0 1 1 ------0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 ------quotient becomes zero, 0 0 1 1 remainder = syndrom

quotient becomes zero, Sydrom = 0 0 0 -> valid code word remainder = syndrom

Sydrom ≠ 0 0 0 -> transmission error

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 28 5.3 Matlab / Octave example Simulation of a (7,4) Hamming code transmission (hard decision) Use analytical approximations as reference Compare the coded and uncoded case

use the approximation for hard decission here

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 29 Information Theory and Coding

Topic 08/10 Recursive systematic codes (RSC)

M. Sc. Marko Hennhöfer

Ilmenau University of Technology Communications Research Laboratory

Winter Semester 2019

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 1 4 Channel Coding Recursive Systematic Codes (RSC): Example: rate ½ RSC Systematic: Info bit occurs directly as output bit

Recursive: 1 1 1 Feedback path in delay delay the shift register 1 0 1

generators

feedback generator: [111] → (7)octal

feedforward generator: [101] → (5)octal

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 2 4 Channel Coding Example continued:

0 1

0 0 1 1 delay delay current input:0 1 1 0 0 current input:1 00 1 state

s0 = 0 0

s1 = 0 1

s2 = 1 0

10 s3 = 1 1 M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 3 4 Channel Coding More detailed:

delay delay

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 4 4 Channel Coding Tailbits for the terminated code? Depend on the state! 0 0 tail bits 00 00 00 state 0 0

s0 = 0 0 The tailbits are 1 1 1 0 generated s1 = 0 1 automatically by the encoder, depending on the 1 11 encoded sequence

s2 = 1 0 0

10 0 1 s3 = 1 1

current input:0 current input:1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 5 4 Channel Coding How to terminate the code?

delay delay switch for termination

now generated will now be always zero, i.e., the from the state state will get filled with zeros

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 6 4 Channel Coding Example: Termination if the last state has been „11“:

As the input is not arbitrary anymore, we get only 4 cases to consider

1 11

0 01

From the state 11 we force the encoder back to the 00 state by generating the tail bits 0 1. The corresponding output sequence would be 01 11. See also the Trellis diagram for the termination.

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 7 Information Theory and Coding

Topic 09/10 Turbo Codes

M. Sc. Marko Hennhöfer

Ilmenau University of Technology Communications Research Laboratory

Winter Semester 2019

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 1 4 Channel Coding Turbo codes: developed around 1993 get close to the Shannon limit used in UMTS and DVB (Turbo Convolutional Codes, TCC) parallel convolutional encoders are used one gets a random permutation of the input bits the decoder benefits then from two statistically independent encoded bits slightly superior to TPC noticeably superior to TPC for low code rates (~1 dB) used in WLAN, Wimax (Turbo Product Codes, TPC) serial concatenated codes; based on block codes data arranged in a matrix or in a 3 dimensional array e.g., Hamming codes along the dimensions good performance at high code rates good coding gains with low complexity

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 2 4 Channel Coding System overview:

mapping from bit to , e.g., BPSK

Turbo encoder mapping channel: assume AWGN

+

bit Turbo mapping decoder soft noisy received outputs values

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 3 4 Channel Coding Turbo encoder (for Turbo Convolutional Codes, TCC): Structure of a rate 1/3 turbo encoder

convolutional encoder 1

convolutional interleaver encoder 2

pseudo two identical random convolutional permutation encoders

The is a block code, as a certain number of bits need to be buffered first in order to fill the interleaver

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 4 4 Channel Coding Example: UMTS Turbo encoder: Rate 1/3, RSC with feedforward generator (15) and feedback generator (13)

delay delay delay

interleaver Parallel Concatenated Convolutional Codes (PCCC)

delay delay delay

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 5 4 Channel Coding

Summary: Info bits: mapped to +1 (0) and -1 (1)

due to the fact that we encoded sequence use a systematic code ......

AWGN channel a-priori information + set to 0.5 → LLR=0 in the first stage noisy received bits

noisy observations extrinsic information from the decoding

yields the LLR and therefore, the bit estimate

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 6 4 Channel Coding Turbo decoder: Structure of a turbo decoder extrinsic information deinterleaver

interleaver MAP MAP decoder 1 decoder 2

interleaver extrinsic information

The MAP decoders produce a soft output which is a measure for the reliability of their decission for each of the bits. This likelihood is used as soft input for the other decoder (which decodes the interleaved sequence). The process is repeated until there‘s no significant improvement of the extrinsic information anymore.

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 7 4 Channel Coding MAP (Maximum a posteriori probability) Decoding: Difference compared to the Viterbi decoding: Viterbi decoders decode a whole sequence (maximum likelihood sequence estimation). If instead of the Hamming distance the Euklidean distance is used as Viterbi metric we easily get the Soft- Output Viterbi algorithm (SOVA) The SOVA provides a reliability measure for the decission of the whole sequence For the application in iterative decoding schemes a reliability measure for each of the bits is desirable, as two decoders are used to decode the same bit independently and exchange their reliability information to improve the estimate. The indepencence is artificially generated by applying an interleaver at the encoding stage. In the Trellis diagram the MAP decoder uses some bits before and after the current bit to find the most likely current bit MAP decoding is used in systems with memory, e.g., convolutional codes or channels with memory

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 8 4 Channel Coding

Consider the transmission over an AWGN channel applying a binary modulation scheme (higher order modulation schemes can be treated by grouping bits).

Mapping: 0 → 1 and 1 → -1

5 Suitable measure for the reliability 4

3

Log-Likelihood Ratio (LLR) 2

1

0

-1

-2

-3

-4

-5 0 0.2 0.4 0.6 0.8 1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 9 4 Channel Coding

The reliability measure (LLR) for a single bit at time r under the condition that a sequence ranging from 1 to N has been received is:

with Bayes rule: joint a-priori a-posteriori probability probability probability of of A B

unknown known, observed

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 10 4 Channel Coding

Example as used before Rate ½ RSC with generators 5 and 7:

The probability that becomes +1 or -1 can be expressed in terms of the starting and ending states in the trellis diagram

state before: state afterwards: 0 (+1) 1 (-1) 00 00 s s0 = 0 0 0

s s1 = 0 1 1

s2 = 1 0 s2

10 s 10 s3 = 1 1 3

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 11 4 Channel Coding

joint probability for a pair of starting and ending states 0 (+1) 1 (-1) 00

probability for all combinations of starting and ending states that will yield a +1

probability for all combinations of starting 10 and ending states that will yield a -1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 12 4 Channel Coding

The probability to observe a certain pair of states depends on the past and the future bits. Therefore, we split the sequence of received bits into the past, the current, and the future bits

......

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 13 4 Channel Coding

Using Bayes rule to split up the expression into past, present and future

Looking at the Trellis diagram, we see the the future is independent of the past. It only depends on the current state .

Using again Bayes rule for the last probability

Summarizing

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 14 4 Channel Coding

Identifying the metrics to compute the MAP estimate

probability for a certain probability to observe a probability for a certain future given the certain state and bit given state and a certain past, current state, called the state and the bit before, called Forward metric Backward metric called Transition metric

Now rewrite the LLR in terms of the metrics

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 15 4 Channel Coding

How to calculate the metrics? Forward metric :

probability for a certain state and a certain past, called Forward metric

known from example: r=2 initialization probability to arrive in a certain state and the corresponding sequence that yielded that state

using again Bayes rule and

r-1 r r+1 M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 16 4 Channel Coding

How to calculate the metrics? Backward metric :

probability for a certain future given the current state, called Backward metric

example: r=N known from termination

r -2 r-1 r=N

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 17 4 Channel Coding

How to calculate the metrics? Transition metric :

probability to observe a certain state and bit given the state and the bit before, called Transition metric

for a given state s the transition probability does not depend on the past

prob. to observe a received bit for a given pair of states

r -1 r r+1 prob. for this pair of states, i.e., the a-priori prob. of the input bit

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 18 4 Channel Coding

Now some math: starting with this one expressing the a-priori probability in terms of the Likelihood ratio

with

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 19 4 Channel Coding

now combining the terms in a smart way to one expression

1 for ‘+’ and for ‘-’

we get the a-priori probability in terms of the likelihood ratio as

with

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 20 4 Channel Coding continuing with this one Now some more math:

pair of observed pair of transmitted coded bits, belonging to the bits encoded info bit

example for code rate ½. Can easily be extended noisy observation, disturbed by AWGN

+1 or -1 squared → always 1

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 21 4 Channel Coding

Now the full expression:

a-priori information

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 22 4 Channel Coding

abbreviation

from before:

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 23 4 Channel Coding

unknown at the receiver, but resulting from the corresponding branch in the Trellis diagram s → s’

due to the assumptions positive negative

with

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 24 4 Channel Coding

Interpretation:

a-priori information information about the transmitted provided by the a-posteriori (extrinsic) information. bit, taken from an initial observation. Only Gained from the applied coding estimate before running depending on the scheme the MAP algorithm channel; not on the coding scheme

In a Turbo decoder the extrinsic information of one MAP decoder is used as a-priori information of the second MAP decoder. This exchange of extrinsic information is repeated, until the extrinsic information does not change significantly anymore.

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 25 4 Channel Coding

Summary: Info bits: mapped to +1 (0) and -1 (1)

due to the fact that we encoded sequence use a systematic code ......

AWGN channel a-priori information + set to 0.5 → LLR=0 in the first stage noisy received bits

noisy observations extrinsic information from the decoding

yields the LLR and therefore, the bit estimate

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 26 4 Channel Coding

Iterations:

constant over iterations → K Iteration #1: first iteration, first decoder,a-priori LLR=0

first iteration, second decoder: uses extrinsic information from the first Iteration #2: one as a-priori information

continuing in the same fashion with further iterations

Iteration #3: reference: see tutorials at www.complextoreal.com or http://www.vashe.org/ Notes: We used a slightly different notation. The first tutorial has some minor errors but most cancel out

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 27 4 Channel Coding Turbo code example: Code rate ½ (by puncturing) ar,1 oooooooooooo ar,2 oxoxoxoxoxox ar,3 xoxoxoxoxoxo for the not transmitted bits, the decoder input is set to 0 Constraint length K = 5 Generators

feedback [1 1 1 1 1]; (37)o feedforward [1 0 0 0 1] ;

(21)o 256 x 256 array interleaver Block length 65536 bits Berrou, C., Glavieux, A., and Thitimajshima, P., “Near Shannon Limit Error-Correcting Coding and Decoding: Turbo Codes,” IEEE Proceedings of the Int. Conf. on Communications, Geneva, Switzerland, May 1993 (ICC ’93), pp. 1064-1070.

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 28 Information Theory and Coding

Topic 10/10 Low density parity check (LDPC) codes

M. Sc. Marko Hennhöfer

Ilmenau University of Technology Communications Research Laboratory

Winter Semester 2019

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 1 4 Channel Coding Low-Density Parity Check (LDPC) codes: first proposed 1962 by Gallager due to comutational complexity neglegted until the 90s LDPC codes outperform Turbo Codes slightly (3 GPP TSG RAN WG1 #85, Document R1-164007, May 2016) reach the Shannon limit within hundredths decibel for large block sizes, e.g., size of the parity check matrix 10000 x 20000 are used already for satellite links (DVB-S2, DVB-T2) and in optical communications have been adopted in IEEE wireless local areal network standards, e.g., 802.11n or IEEE 802.16e (Wimax) are under consideration for fifth generation wireless systems are block codes with parity check matrices containing only a small number of non-zero elements complexity and minimum Hamming distance increase linearily with the block length

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 2 4 Channel Coding Low-Density Parity Check (LDPC) codes: not different to any other block code (besides the sparse parity check matrix) design: find a sparse parity check matrix and determine the generator matrix difference to classical block codes: LDPC codes are decoded iteratively

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 3 4 Channel Coding Tanner graph graphical representation of the parity check matrix LDPC codes are often represented by the Tanner graph Example: (7,4) Hamming code check nodes

bit nodes n bit nodes n -k check nodes, i.e., parity check equations Decoding via message passing (MP) algorithm. Likelihoods are passed back and forth between the check nodes and bit nodes in an iterative fashion

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 4 4 Channel Coding

The Hamming code is just used here to explain the principles. Obviously it has no sparse parity check matrix weight of the rows 4 4 4

weight of the columns 2 3 2 2 1 1 1

For a sparse matrix: column weights << number of rows and row weights << number of columns

Sparse matrices are beneficial for the decoding process (explained later)

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 5 4 Channel Coding Regular vs. irregular LDPC codes

e.g. irregular e.g. regular

4 4 4 4 4 4 4

2 3 2 2 1 1 1 2 2 2 2 2 2 2 „regular“ if is constant for every column and is constant for every row and

Regular codes can be encoded efficiently (linear time) No analytic method to construct these codes. Powerful codes have been found by computer search. They are often pseudo random and irregular. High encoding effort. Minimum Hamming distance hard to

M.Sc. determineMarko Hennhöfer, Communications. Research Lab Information Theory and Coding Slide: 6 4 Channel Coding Encoding use Gaussian elimination to find

construct the generator matrix

calculate the set of code words

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 7 4 Channel Coding Example: length 12 regular LDPC code parity check code as introduced by Gallager

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 8 4 Channel Coding Message Passing (MP) decoding soft- and hard decision algorithms are used often log-likelihood ratios are used (sum-product decoding) Example: (7,4) Hamming code with a binary symmetric erasure channel

Initialization:

1+x+0+1 x+0+1+x 1+x+1+x in order to be a valid code word, we want the syndrom to be zero. Therefore, x must be 0.

0 1 x 0 1 1 x x 1 x

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 9 4 Channel Coding Message Passing (MP) decoding 1+0+0+1 x+0+1+x 1+x+1+x

1 0 0 1 1 x x

1+0+0+1 0+0+1+x 1+0+1+x in order to be a valid code word, we want the sydrom to be zero. Therefore, x must be 1 and x must also be 1.

1 0 0 1 1 x x

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 10 4 Channel Coding Message Passing (MP) decoding 1+0+0+1 0+0+1+1 1+0+1+0

1 0 0 1 1 1 0

0 0 0

Decoding result: 1 0 0 1 1 1 0 1 0 0 1 1 1 0

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 11 4 Channel Coding Message Passing (MP) decoding sum-product decoding similar to the MAP Turbo decoding observations are used as a-priori information passed to the check nodes to calculate the parity bits, i.e., a-posteriory information / extrinsic information pass back the information from the parity bits as a-priori information for the next iteration actually, it has been shown, that the MAP decoding of Turbo codes is just a special case of LDPC codes already presented by Gallager

Robert G. Gallager,Professor Emeritus, Massachusetts Institute of Technology und publications you‘ll also find his Ph.D. Thesis on LDPC codes http://www.rle.mit.edu/rgallager/

M.Sc. Marko Hennhöfer, Communications Research Lab Information Theory and Coding Slide: 12 Polar Codes

Polar codes have been introduces by Prof. Erdal Arikan in 2009. Main adavantages Polar codes have shown to be capacity achieving, while having a low computational complexity.

Main idea The main idea behind polar codes is the polarization of channels, i.e., a noise transmission channel can be split up into numerous sub-channels which tend to show either perfect or useless behaviour. The best channels are selected for the information bits, whereas the bad channels get just fixed/”frozen” bits.

Reference: ”Erdal Arikan. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels, IEEE Transactions on Information Theory, July 2009” M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 1 / 19 Polar Codes: Example

We use u0 as the 1st information bit. x0 is the channel input and y0 is the observation at the channel output. In our example we asume W to be a binary erasure channel (BEC) with erasure probability p.

u0 x0 y0 W

Obviously the physical channel can not be changed by applying our ”channel splitting”. The multiple resulting channels can be understood as effictive channels along the time-domain after applying some precoding.

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 2 / 19 Polar Codes: Example

We just start by adding another time step, using the same channel again. This will just result in a repetition code.

u0 x0 y0 W

x1 y1 W

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 3 / 19 Polar Codes: Example

Now we modifiy this repetition code slightly by adding another information bit.

u0 x0 y0 + W

u1 x1 y1 W

The mapping between the bits ui and the coded bits xi can be described as vector-matrix product:

1 0 [x0 x1] =[u0 u1]  1 1  x u G | {z } | {z } 2 | {z }

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 4 / 19 Polar Codes: Example

Now let’s see how we can ”split” up the channel into a better one and a worse one. Starting with a look at the mutual information (as we assume the ui’s to be uniformly distributed this is also equivalent to the capacity):

I(W )= I(u0; y0)= I(x0; y0)

As also the values xi are uniformly distributed, we get the same capacity for the second channel. For both channels it turns out to be:

2I(W )= I(x0,x1; y0,y1)= I(u0,u1; y0,y1)

By applying the chain rule we can rewrite the expression as

2I(W )= I(u0; y0,y1)+I(u1; y0,y1|u0) = I(u0; y0,y1)+I(u1; y0,y1,u0)

u0,u1ind. |{z}

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 5 / 19 Polar Codes: Example

We can interpret the mutual information as being from two ”new” channels, in some references also called bit channels

2I(W ) = I( u0 ; y0,y1)+ I( u1 ; y0,y1,u0) input output input output = I(W|{z}−)+| {zI(}W +) |{z} | {z }

u0 y0,y1 W −

u1 y0,y1,u0 W +

The decoding needs to be done in a successive way. We first decode u0 by using the observations y0 and y1. After that, we can decode u1 by using y0, y1 and the estimate that we got for u1.

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 6 / 19 Polar Codes: Example

Consider W as a BEC with erasure probability p:

W = BEC(p), i.e., X with prob. 1 − p Y =  ? with prob. p u x y 0 + 0 BEC(p) 0 u x y 1 1 BEC(p) 1

For the W − channel the following outputs can occur:

2 u0 + u1,u1 with prob. (1 − p)  ?,u1 with prob. p(1 − p) (y0,y1) =   u0 + u1, ? with prob. (1 − p)p ?, ? withprob. p2   M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 7 / 19 Polar Codes: Example

2 u0 + u1,u1 with prob. (1 − p)  ?,u1 with prob. p(1 − p) (y0,y1) =   u0 + u1, ? with prob. (1 − p)p ?, ? withprob. p2  Note that only in the first case it will be possible to estimate u0 and u1 from the observations y0 and y1. The remaining cases contain at least one erasure symbol. Overall W − can be considered as BEC with erasure probability p− = 1 − (1 − p)2 = 2p − p2.

W − = BSEC(p−) = BSEC(2p − p2)

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 8 / 19 Polar Codes: Example

+ For the W channel we have additionally the estimate of u1 available. The following outputs can occur:

2 u0 + u1,u1,u0 with prob. (1 − p)  ?,u1,u0 with prob. p(1 − p) (y0,y1,u0) =   u0 + u1, ?,u0 with prob. (1 − p)p 2 ?, ?,u0 with prob. p   Note that in the first three cases it will be possible to estimate u1 as well. Just in the last case with two erasures it won’t be possible to get u1. Overall W + can be considered as BEC with erasure probability p+ = p2.

W + = BSEC(p+) = BSEC(p2)

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 9 / 19 Polar Codes: Example

Now we already notice how the channel polarization is working. Before, two equally bad channels,

u0 x0 y0 + BEC(p)

u1 x1 y1 BEC(p)

and after the transformation a better and a worse (bit)channel

u0 y0,y1 BSEC(2p − p2)

u1 y0,y1,u0 BSEC(p2)

Note that the average error probability remained p.

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 10 / 19 Polar Codes: Example

The whole structure is now repeated in order to get a better separation into almost perfect channels and useless channels. v0 u0 x0 y0 + + W

v1 u1 x1 y1 + W

v2 u2 x2 y2 + W

v u x y3 3 3 3 W

1000 G  1100  G G⊗2 2 0 [x0 x1 x2 x2]=[v0 v1 v2 v3] ; 4 = 2 = 1010  G2 G2     1111    G4

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information| Theory{z and Coding} July 25, 2019 11 / 19 Polar Codes: Example

Applying the same idea as before ...

v0 u0 + W −

v1 u1 + W +

v2 u2 W −

v3 u3 W +

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 12 / 19 Polar Codes: Example

... yields a further polarization of the channels.

v0 W −−

v1 W +−

v2 W −+

v3 W ++

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 13 / 19 Polar Codes: Example

If we use for example p = 0.2 we get the following effective/bit-channels.

v0 p−− = 0.5904

v1 p+− = 0.0784

v2 p−+ = 0.1296

v3 p++ = 0.0016

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 14 / 19 Polar Codes: Example

For a rate 1/2 code we would now just use the 2 good channels to transmit the information bits, whereas the two bad channels would just get ”frozen” bits, i.e., we just can set them to 0.

0 p−− = 0.5904

u0 p+− = 0.0784

0 p−+ = 0.1296

u1 p++ = 0.0016

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 15 / 19 Polar Codes: Example

If we continue the same principal we are getting more and more close to the ideal polarized channel. Looking at the histogram of the resulting effective erasure probabilities illustrates this:

for 4 subchannels for 16 subchannels 1 10 0.8 8 0.6 6 0.4 4

Occurance 0.2 Occurance 2 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.2 0.4 0.6 0.8 1 Erasure prob. Erasure prob. for 256 subchannels for 1048576 subchannels 200 1e+006 150 800000 600000 100 400000 50 Occurance Occurance 200000 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Erasure prob. Erasure prob. M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 16 / 19 Polar Codes: Example Decoding:

− We start by decoding the W channel first, as we need u0 to decode u1, i.e., the W + channel later on.

v0 u0 x0 y0 + + W

v1 u1 x1 y1 + W

v2 u2 x2 y2 + W

v u x y3 3 3 3 W

y0 1 1 0 0 ? ? 0 1 ? − y0 xor y1 for y0,1 6=? y1 1 0 1 0 0 1 ? ? ? D (y0,y1) =  ? otherwise u0 0 1 1 0 ? ? ? ? ? decoding fct. | {z } M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 17 / 19 Polar Codes: Example Decoding:

Doing the same for u2 in order to get the estimate for v0 afterwards.

v0 u0 x0 y0 − uˆ + + W D 0

v1 u1 x1 y1 + W

v2 u2 x2 y2 − uˆ + W D 2

v u x y3 3 3 3 W

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 18 / 19 Polar Codes: Example Decoding:

Now we can get the estimate of v0 from uˆ0 and uˆ2 in the same way.

v0 u0 x0 y0 − uˆ + + W D 0

v1 u1 x1 y1 + W

v2 u2 x2 y2 − uˆ + W D 2

v u x y3 3 3 3 W

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 19 / 19 Polar Codes: Example Decoding:

Now we can get the estimate of v0 from uˆ0 and uˆ2 in the same way.

v0 u0 x0 y0 − uˆ0 − vˆ0 + + W D D

v1 u1 x1 y1 + W

v2 u2 x2 y2 − uˆ2 + W D v u x y 3 3 3 W 3

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 20 / 19 Polar Codes: Example Decoding:

+ After that we decode the W channel, i.e., v2 from uˆ0, uˆ2 and vˆ1.

v0 u0 x0 y0 − uˆ0 − vˆ0 + + W D D

v1 u1 x1 y1 + W

v2 u2 x2 y2 − uˆ2 + W D v u x y 3 3 3 W 3

uˆ0 ∗ ∗ 0 1 0 1 ? uˆ2 foru ˆ2 6=? uˆ2 0 1 ? ? ? ? ? + D (ˆu0, uˆ2, vˆ0)=  uˆ0 xorv ˆ0 foru ˆ0 6=? vˆ0 ∗ ∗ 0 0 1 1 ?  ? otherwise vˆ2 0 1 0 1 1 0 ? 

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 21 / 19 Polar Codes: Example Decoding:

+ After that we decode the W channel, i.e., v2 from uˆ0, uˆ2 and vˆ1.

v0 u0 x0 y0 − uˆ0 − vˆ0 + + W D D

v1 u1 x1 y1 + W vˆ0 y v2 u2 x2 2 − uˆ2 vˆ2 + W D D+ v u x y 3 3 3 W 3

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 22 / 19 Polar Codes: Example Decoding:

Finally:

v0 u0 x0 y0 − uˆ0 − vˆ0 + + W D D uˆ0 v1 u1 x1 y1 + vˆ1 + W D D− vˆ0 y v2 u2 x2 2 − uˆ2 vˆ2 + W D D+ uˆ2 vˆ1 v3 u3 x3 y3 + vˆ3 W D D+

M. Sc., Dipl.-Ing. (FH) Marko Hennh¨ofer Information Theory and Coding July 25, 2019 23 / 19