Tornado Codes

Tornado codes Scribe notes for 15-853

Introduction Similar to Hamming codes and Reed Solomon (RS) codes, Tornado codes belong to the family of forward error correction (FEC) codes. While the former two codes focus on efficiency, Tornado codes serve for a different purpose. Designed by Michael Luby, Michael Mitzenmacher, Amin Shokrollahi et al., Tornado codes give up some efficiency in exchange for linear time encoding and decoding. Software-based implementations of Tornado codes are about 100 times faster on small lengths and about 10,000 times faster on larger lengths than other software-based Reed-Solomon erasure codes. In this notes, we focus on the ability of Tornado codes to correct erasures. Erasures are more common than errors in the Internet environment, because bitstreams are organized into packages and packages are often either received or lost (erased). The following discussion is thus very suitable for applications such as multicast of streaming media over the Internet. The notes are organized as follows. We first describe expander graph, which is the foundation of Tornado codes. We then build Tornado codes on top of expander graph and show that under certain assumptions, with a high probability Tornado codes can correct most of the erasures.

Expander Graph

k . k . . . n . nodes . 2

n nodes

Figure 1 An expander graph. We consider a special graph that is composed of two sets of nodes, as shown in Figure 1. The left set has n nodes, while the right set has n/2 notes. Given any two constants 0    1 and   0 , we define ,  -expansion as:

 This notes is based on the lecture given by Prof. Bruce Maggs, Sep. 23, 2002. The author of the notes is Cha Zhang. Definition 1:A graph has ,  -expansion if every subset of k  n nodes on the left has at least k neighbors on the right, where  and  are fixed constants which satisfies 0    1 and   0 . 1 Conclusion 1: In the expander graph in Figure 1,  and  must satisfy   . 2 Proof: From the definition of ,  -expansion, we know that every subset of k  n nodes on the left has k  n neighbors on the right. However, the right set has maximally n / 2 nodes. Therefore: n 1 k  n     . □ 2 2 Given a graph as Figure 1, the existence of an ,  -expansion can be verified by Upfal’s theorem 1. Theorem 1: (Upfal 89) Suppose each node on the left has degree d (d is a positive integer), and each node on the right has degree 2d. View each node as a collection of d (or 2d) mini-nodes. Select a random permutation on the dn×dn bipartite graph of mini- nodes. Collapse mini-nodes back into original nodes. If  1 ln2  d   1  1  ln   2  then with probability > 0, the graph has ,  -expansion. We discuss two interesting cases for the above theorem.  1  Case 1: When we have a very small  , i.e.,   0 , we have ln    . From  2  Equation ?? we know that d   1. Therefore the maximum  we may take is:   d 1. Case 2: If  is fixed and we have a large enough degree d, so that  is neglectable compared with d, Equation ?? can be simplified as:  1 ln2  d   1  ln   2  1 so that we are allowed to take  such that   . 2 d In this lecture we will need expanders where   . To see how we should choose for 2 d  , we let   and plug it into Equation ??. We have: 2  1 ln2  d   d 1   1  2 ln   2  d Replace all the  with and assume d  1, we get approximately: 2 d  lnd  2 d d  1  d d   or ln    lnd   1  2 2 d 2 ln    d  Therefore: d  lnd   1  lnd  ln   2  1 d  d d 2 2 lnd  1 1 2   e  e d 2  e  d d 2  e  d d d 2 1 Notice that d d  1 for large d . We obtain that 1 1 d 1   or     . e  d e  d 2 2e 1 This is not bad because we know that the maximum value of  is based on 2 Conclusion 1. Recently, O. Reingold gave a simple deterministic construction of an expander in 2, where  can be almost as large as possible. That is,   1  d where  is an arbitrary small number.

d

. . Unshared neighbors . . n . nodes . 2

n nodes Figure 2 Unshared neighbors for a graph. There is one property for expander graph that is extremely important to Tornado codes. It is called the unshared neighbor property. An unshared neighbor is a node on the right

1 1 1 This might not be obvious but can be justified as follows. We know that lnd  lnd  . For large d  e lnd   e

2 1 2 d, we have  . Therefore, should be close to 1. d lnd  d d that is only the neighbor of one node on the left, as shown in Figure 2. In other words, it is not shared as neighbors for multiple nodes on the left. The ,  unshared neighbor property is defined as follows. Definition 2:A graph has , unshared neighbor property if for any subset of k  n nodes on the left, at least k have an unshared neighbor on the right. We have the following theorem on the choice of  . Theorem 2: For a graph that has degree d on the left, if it has both ,  -expansion and , unshared neighbor property, it must be true that: 2   1 d Proof: Let  be the fraction of nodes on the right that have only one neighbor on the left. Given a subset of k nodes on the left, from the definition of ,   -extension we know that it has at least k neighbors on the right. Among them k are unshared neighbors. The left 1 k have at least 2 neighbors on the left. If we count the edges between the left subset and its neighbors on the right, it must be less than dk , because the graph has degree d on the left. Therefore, we have: k  21 k  dk where the second item on the left has a scale factor of 2 because 1 k nodes have at least 2 neighbors (therefore 2 edges) on the left. Simplify Inequality ?? and we have: 2  d   or k  2  d k  Since on the right k nodes are unshared neighbors, on the left there must be more k than nodes that have an unshared neighbor on the right. Here we divide k by d d because a node on the left has maximally d neighbors and they can all be unshared neighbors. Based on the definition of , unshared neighbor property, we know that k 2  d k 2 k   or   1. □ d d d

Tornado codes Consider a simple coding method based on the expander graph. As shown in Figure 3, let n the left n nodes be the n message bits, and let the right nodes be the parity bits. Denote 2

p0 , p1, p2 ,⋯, pn the message bits as b0 ,b1,b2 ,⋯,bn1 , and the parity bits as 1 . Each parity 2 bit on the right is set to the exclusive or of its neighbors on the left. For example, parity bit p0 can be defined as:

p0  b0  b2  b4 where  stands for exclusive or. Such codes are called Tornado codes. Let the graph have degree d on the left. It is obvious that in order to get all the parity bits, the number of exclusive or operations is less than nd . In other words, encoding Tornado codes has complexity nd , which is linear.

d

n . parity bits . 2 . n n message bits . nodes . 2

n nodes Figure 3 Tornado codes. We next work on the decoding of Tornado codes. We assume that the graph in Figure 3 has ,  unshared neighbor property, where   0 . In the simplest case, suppose no parity bits are lost, and k  n message bits are lost. The idea is to decode the message bits recursively by making use of the ,  unshared neighbor property.

2d loss Unshared neighbor for b j loss loss denoted as p j b j . n . parity bits 2 . n n message bits . nodes . 2 Lost message bits Received message bits n nodes Figure 4 Decoding Tornado codes when all the parity bits are received. Given the subset k  n message bits that have been lost, we know that at least k of loss them have an unshared neighbor on the right. Take any message bit b j (0  j  k ) loss among the k ones and consider it jointly with its unshared neighbor p j . As is shown loss loss in Figure 4, if p j has degree 2d , p j is the exclusive or of 2d 1 received message loss loss bits and the lost bit bj . Therefore, bj is recoverable. After we recover the k lost message bits that have an unshared neighbor on the right, k k lost message bits are still left. We can apply the above method again and again until all the lost message bits are recovered. The key really is that after we recover the k lost message bits, some of the left k k bits will have unshared neighbors that they did not own before. The above decoding method has linear time complexity, because every time we find a lost message bit, we only need to do at most 2d exclusive or operations. The above algorithm helps decoding Tornado codes when no parity bits are lost. However, in practice this is not true because parity bits have the same probability to be lost as the message bits. We need to design a new scheme to encode and decode the message bits. Figure 5 shows such a scheme based on a recursive structure. 1 ... n 4 .

. 1 Reed-Muller code . n 4 . parity bits 2n parity bits (can correct up to a fixed 4 constant  fraction of errors) . n . parity bits . 2

n message bits

Figure 5 Enhanced Tornado codes in a recursive manner. n The idea is pretty straightforward. To protect the parity bits from erasures, we use 2 n n n another parity bits to protect them. We further protect the parity bits with parity 4 4 8 1 bits, etc. At a certain point, e.g., when the number of nodes reduces to n 4 , we stop the 1 recursion and apply a Reed-Muller code to protect the n 4 nodes. The decoding process start from the Reed-Muller code and propagate to the left until we get all the lost message bits. Even the decoding of Reed-Muller code is in quartic complexity, the overall decoding time is still linear in n. n n n 1 1  1  4 4  4  In the above scheme, we used   ⋯ 2n  n  n   n parity bits to 2 4 8    1   4  protect the n message bits. Here n  is the number of parity bits used in Reed-Muller   1 codes to protect n 4 bits. Since the decoding process is from right to the left, if we fail at a certain level, we will fail to decode the erased message bits. In worst case, if all the errors  1   4  are in the Reed-Muller code, only n  errors can crash the code. If the Reed-Muller   code is decoded successfully, due to the ,  unshared neighbor property we know that at a level with m message or parity bits, we can correct at most m erasures. We next study for a certain level with m message or parity bits, what is the probability that we will have more than m erasures. If such a probability is low, we can claim that with a high probability Tornado codes are able to correct the erasures. Assume that the erasures happen at random positions2. Let the probability of each bit being erased is p. Let X be a random variable that denotes the number of erasures happened in the m bits. Obviously, X follows a Binomial distribution. The expectation of X is: EX   mp . Given a certain value r, we have:

m r mr PrX  r    p 1 p  r  where PrX  r stands for the probability of random variable X taking value r. The probability that we have more than r erasures can be represented by: m m k mk PrX  r    p 1 p k r  k  It can be proved3 that: m m k mk m r PrX  r    p 1 p    p k r  k   r  r r  m  m  me  Moreover, given the inequality that         , we can have:  r   r   r  r r r m r  me  r  mpe   e  EX  PrX  r    p    p        r   r   r   r  Suppose we choose value r that is greater than 2e  EX , we get: r r r  e  EX   e  EX    1  PrX  r           r   2e  EX   2 

2 This is a tricky assumption and we know that this is not true in the real environment. That is, in practice erasures are most likely to be bursting. However, we can assume that we randomly permute the order of the message bits before sending so that the erasures are random to the original sequence of message bits. m 3 r We can go through the math to prove this, but here is how to do it conceptually.   p is the  r  probability that we choose r bits out of m bits and all of them are erased. This is an over-count of PrX  r because the same case in X  r may be counted may times in the above probability. For example, let m  3 , r 1 . Let the 3 bits be b0 ,b1,b2 . The case b0 and b1 are both erased ( b2 not erased) will be counted twice in the above probability (once when we choose b0 and say it is erased and again when we choose b1 and say it is erased). 1 If r  log m , PrX  r  , which is a very small number given m is large. In the 2 m Tornado code, at any level, the number of erasures we can correct has the order of  1   4  r  n  . Therefore, with high probability we will be able to correct the erased codes.   The above analysis gives a quick bound on the ability Tornado codes can correct the erased codes. We can give a better bound by applying the inequality:  2 EX   PrX  1  EX   e 3 . The probability that Tornado codes fail at a certain level with m message or parity bits is:  2 pm  PrX  m  PrX  1  pm  e 3

 1 where   1 and . It can be easily shown that if p m  n 4

6 ln n  1         1  1  n8  n8  then the failure probability is bounded as:  2 pm  1 PrX  m  e 3  . n Roughly speaking, as  is very small, we can conclude that Tornado codes have a high probability to be successfully decoded when p   .

References [1] E. Upfal. An O(logN) deterministic packet routing scheme. In 21st Annual A CM Symposium on Theory of Computing, pages 241-250. ACM, May 1989. [2] M. Capalbo, O. Reingold, S. Vadhan and A. Wigderson, “Randomness Conductors and Constant-Degree Lossless Expanders”, To appear in STOC ’02, May 19-21, 2002, Montreal, Quebec, Canada.