Generalized sphere-packing upper bounds on the size of codes for combinatorial channels

Daniel Cullina Negar Kiyavash Dep. of Electrical & Computer Eng. Dep. of Industrial and Enterprise Systems Eng. Coordinated Science Laboratory Coordinated Science Laboratory University of Illinois at Urbana-Champaign University of Illinois at Urbana-Champaign Urbana, IL 61801 Urbana, IL 61801 [email protected] [email protected]

Abstract—A code for a combinatorial channel is a feasible give a different sphere-packing bound. Cullina and Kiyavash point in an integer linear program derived from that channel. applied this idea to the deletion channel to improve the best Sphere-packing upper bounds are closely related to the fractional known upper bound for a fixed number of errors [5]. We show relaxation of this program. When bounding highly symmetric channels, this formulation can often be avoided, but it is essential that the same phenomenon exists for substitution and erasure in less symmetric cases. We present a few low-complexity upper errors. Channels that perform various mixtures of substitutions bounds on the value of the relaxed linear program. We also and erasures all have the same codes but lead to different discuss a more general bound derived from the codeword bounds: the , Singleton bound, and a family constraint graph for the channel. This bound is not necessarily of intermediate bounds. computationally tractable. When there is a family of channels with the same constraint graph, tractable bounds can be applied In Section I, we discuss the linear programs associated to each channel and the best bound will apply to the whole family. with sphere-packing bounds. In Section II, we discuss various techniques for obtaining nonuniform sphere-packing bounds. Sphere-packing upper bounds on the size of a zero er- In Section III, we discuss families of channels that have the ror code are fundamentally related to . same codes but give different sphere-packing bounds. However, many classical combinatorial channels are highly I.SPHERE-PACKING BOUNDSAND LINEAR PROGRAMS symmetric. For these, it is often possible to get the best possible sphere-packing bound without directly considering In a combinatorial channel, also known as an adversarial any linear programs. For less symmetric channels, it is still channel, for each channel input there is a set of possible possible to obtain many upper bounds without writing down a outputs. This behavior of the channel can be represented by linear program. Recently a new upper bound, explicitly derived a bipartite graph which we call the channel graph. The left via linear programming, was applied to the deletion channel vertices of this graph are the channel inputs and the right by Kulkarni and Kiyavash [1]. It was subsequently applied vertices are the channel outputs. to grain error channels [2], [3] and multipermutation channels For each output, at most one of the associated inputs can be [4]. We will refer to this as the local degree bound and present in a zero-error code. Consequently, a zero-error code is a pack- a generalized version. ing of the neighborhoods of the inputs into the output space. Sphere-packing and sphere-covering arguments have been Maximum set packing is NP-Hard for general set systems, so applied in an ad hoc fashion throughout the coding theory we seek efficiently computable bounds. Maximum set packing literature. This work aims at presenting a unifying frame- is naturally expressed as an integer linear program. The relaxed work that that permits such arguments in their most general problem, maximum fractional set packing, provides an upper form applicable to both uniform and nonuniform error sphere bound on the original packing problem. sizes. More precisely, we derive a series of bounds resulting A. The linear program from approximiations to packing and . We Let U be the set of channel inputs and let V be the set characterize each bound as the solution to a linear program of channel outputs. For u ∈ U, let N(u) ⊆ V be the and study the relationships between them. These bounds use neighborhood of u in the bipartite channel graph (the set varying levels of information about structure of the error of outputs that can be produced from u). For v ∈ V , let model and consequently make tradeoffs between performace N(v) ⊆ U be the neighborhood of v in the bipartite channel and complexity. graph (the set of inputs that can produce v). Alternatively, we can directly consider the constraints on Each neighborhood of an output gives a constraint in the codes and forget the particular channel that produced the primal linear programming problem. constraints. This approach offers a better upper bound at the cost of increased complexity. In some cases, there is a Definition 1. Let A ∈ {0, 1}|U|×|V | be the bipartite adjacency family of channels that all have the same codes but each matrix for the channel graph H. ∗ Let 1S be the indicator vector for the set S. Note that If the minimum degree is far from the average degree, pMD T T ∗ A1{v} = 1N(v) and 1{u}A = 1N(u). is likely to be a bad approximation of p . A better bound comes from considering all of the input degrees. Definition 2. For a channel graph H, the size of the maximum set packing, is Definition 4. For a channel graph H, define the degree sequence bound p(H) = max{1T x : x ∈ {0, 1}|U|,AT x ≤ 1} ∗ T |U| T T pDS(H) = max{1 x : x ∈ R , 0 ≤ x ≤ 1, 1 A x ≤ |V |}. and the size of the maximum fractional set packing is p∗ ∗ T |U| T This is the same as the program for MD, except that we p (H) = max{1 x : x ∈ R , x ≥ 0,A x ≤ 1}. have added the constraint x ≤ 1. Again, it is easy to see that The size of the minimum set covering is this bounds p(H). Each u ∈ U is in the neighborhood of at least one v ∈ V . Thus in the linear program for p∗(H), u is T |V | κ(H) = min{1 y : y ∈ {0, 1} , Ay ≥ 1} included in at least one constraint and xu ≤ 1. This means ∗ that the feasible set for pDS(H) includes that feasible set for and the size of the minimum fractional set covering is ∗ ∗ ∗ p (H), so pDS(H) ≥ p (H). ∗ T |V | κ (H) = min{1 y : y ∈ R , y ≥ 0, Ay ≥ 1}. The optimum of this program can be found greedily. Sort the points in U by degree from low to high. In each step, By strong linear programming duality, p∗(H) = κ∗(H). take the first point in the list add weight to it until you hit a The channel must output something for each possible input, constraint. If the global constraint is binding, we are done. If so N(u) is nonempty for each u ∈ U. This ensures the primal we hit the local constraint xu ≤ 1 first, move on to the next programs are bounded and the dual programs are feasible. point in the list. This algorithm finds a maximum set C ⊆ U The value of the linear program can be computed in poly- such that for all c0 ∈ U \ C, nomial time. However, we are usually interested in a channels X 0 X with exponentially large input and output spaces. To analyze |N(c)| ≤ |V | ≤ |N(c )| + |N(c)| such channels, even simpler bounds are desired. c∈C c∈C In other words, degree information does not rule out the II.FOURAPPROXIMATIONSTOTHEMAXIMUM possibility that C is a code. However, if any other input is FRACTIONALSETCOVER added to C, the resulting set cannot be a code. ∗ ∗ In this section we consider four simple upper bounds on The next bound is intermediate between pMD and pDS. the maximum fractional set packing and minimum fractional Definition 5. For a channel graph H, define the degree set cover number. Each of these bounds is the value of some threshold bound simplified linear program. They are derived either by relaxing the constraints of the primal program or by tightening the ∗ T |U| T pDT (H, d) = max{1 x : x ∈ R , 0 ≤ x ≤ 1, a x ≤ |V |} constraints of the dual program. where The degree of u in the channel graph is |N(u)|. ( d |N(u)| ≥ d, aT = Definition 3. For a channel graph H, define the minimum u 0 |N(u)| < d. degree bounds ∗ ∗ ∗ T |U| T T Let pDT (H) = mind pDT (H, d). pMD(H) = max{1 x : x ∈ R , x ≥ 0, 1 A x ≤ |V |}, ∗ This is equivalent to applying the degree sequence bound to κMD(H) = min{|V |z : z ∈ R, z ≥ 0,A1z ≥ 1}, a modified degree sequence, one where all degrees less than = min{1T y : y = 1z, z ∈ , z ≥ 0, Ay ≥ 1}. R d have been reduced to 0 and all degrees at least d have been |V | The bounds are equal: reduced to d. The value of the program equals |S|+ d , where S = {u ∈ U : |N(u)| < d}, the members of U with small |V | ∗ ∗ degree. If we let d = minu∈U |N(u)|, then S is empty and pMD(H) = κMD(H) = . minu∈U |N(u)| the bound reduces to the minimum degree bound. The linear program for p∗(H) contains a constraint for each ∗ A. The local degree bound v ∈ V : 1N(v)x ≤ 1. In the linear program for p we have P MD Definition 6. For a channel graph H, define the local degree replaced these constraints with their sum, u∈U |N(u)|xu ≤ |V |. Thus the feasible space has been strictly increased. This bound optimal x in the new program for pMD(H) assigns all weight ∗ T |V | κLD(H) = min{1 y : y ∈ R , y ≥ 0, By ≥ 1}, to the input with the smallest degree. ∗ where B ∈ E(H)×|V | and By mechanically taking the dual of the program for pMD, R ∗ ( we obtain the first program for κMD. The second program for |N(u)| v = w κ∗ is a restriction of the program for κ∗: the same weight B = MD (u,v),w 0 v 6= w must be assigned to each output. ∗ To create the program for κLD, we have replaced each B. Generalization of the local degree bound constraint 1 y ≥ 1 with |N(u)| constraints: for each N(u) One way to look at the local degree bound is as distributed v ∈ N(u), we require 1 y ≥ 1 . The old constraint is {v} |N(u)| algorithm to find a fractional covering. Each input needs the sum of the new constraints, so the new constraints are more coverage totaling one and it requests an equal amount of restrictive. This results in the program in the above definition. coverage from each output. Each output receives a list of Now each y is subject to a constraint for each u ∈ v requests and must honor the largest. More generally, the inputs N(v). These can be combined as y ≥ max 1 or v u∈N(v) |N(u)| could request coverage in a non-symmetric manner. If we have y min |N(u)| ≥ 1. Thus, the optimal assignment is v u∈N(v) a feasible point for the program for κ∗, the following function 1 will run one iteration of our distributed algorithm and give us yv = max , u∈N(v) |N(u)| another feasible point.

∗ X 1 ∗ κLD(H) = max . Lemma 2. Let y be a feasible vector in the program for κ (H) u∈N(v) |N(u)| v∈V and let ∗ yv Because we created the program for κLD by restricting the f(y)v = max P . ∗ ∗ ∗ ∗ u∈N(v) w∈N(u) yw program for κ , κLD ≥ κ . We can also show that κLD is always a better bound than p∗ . DS Then f(y) is also feasible and 1T f(y) ≤ 1T y. Lemma 1. For a channel graph H, P Proof: For each input u, we need f(y)u ≥ 1: ∗ ∗ v∈N(u) κLD(H) ≤ pDS(H) X yv X yv Proof: We construct a point x that is feasible for the max P ≥ P = 1 t∈N(v) yw yw ∗ ∗ v∈N(u) w∈N(t) v∈N(u) w∈N(u) primal linear program associated with pDS with value κLD. ∗ The dual program for κLD(H) is so f(y) is feasible. Because y is feasible, for each t ∈ U, p∗ (H) = max{1T z : z ∈ E(H), z ≥ 0,BT z ≤ 1} P LD R w∈N(t) yw ≥ 1. Then We can map the parameter space for this program into the T X yv X T parameter space of the program for p∗ in a weight preserving 1 f(y)v = max P ≤ max yv = 1 yv DS t∈N(v) yw t∈N(v) P v∈V w∈N(t) v∈V way: let xu = v∈N(u) z(u,v). Now we just need to show that ∗ this map sends feasible points in program for pLD to feasible ∗ points in the program for pDS. ∗ For any channel graph H, 1 is a feasible vector in the In the program for pLD, z(u,v) is part of one constraint: ∗ ∗ program for κ (H). The optimum of the program for κLD(H) X is simply f(1). |N(t)|z(t,v) ≤ 1. t∈N(v) C. Symmetric channel graphs If we sum all the constraints and apply the mapping, we get X X If all inputs have the same degree d, then κ∗ = p∗ = |N(w)|z ≤ |V | LD DS (t,v) p∗ = p∗ = |V |/d. However, this is not necessarily equal v∈V t∈N(v) DT MD to κ∗. The binary erasure channel provides an example. The X X ∗ |N(u)| z(u,v) ≤ |V | erasure output covers both inputs, so p = κ = κ = 1. Both u∈U v∈N(u) inputs have degree 2, so all of the approximations equal 3/2. X ∗ |N(u)|xu ≤ |V | For each of the four upper bounds on κ , there is an ∗ u∈U analogous lower bound on p . The values of there lower bounds depend on large degree outputs rather than small which is exactly the global constraint in the program for p∗ . DS degree inputs. For example, the local degree lower bound on If we sum only the constraint involving u, we get p∗ uses the following feasible point in the linear program for X X ∗ |N(t)|z(t,v) ≤ |N(u)| p : v∈N(u) t∈N(v) 1 xu = min .   v∈N(u) |N(v)| X X |N(u)|z(u,v) + |N(w)|z(t,v) ≤ |N(u)| p∗ v∈N(u) t∈N(v)\u The lower bounds on they do not bound the value of X X original integer program, p(H). However, they may allow us |N(u)|xu + |N(t)|z(t,v) ≤ |N(u)|. to verify that our upper bound approximations are good. If v∈N(u) t∈N(v)\u output degrees are d0, then all four of these lower bounds 0 Thus xu ≤ 1, which is the local constraint on xu in the equal |U|/d . If both input and output degrees are constant, ∗ 0 |V | |U| |U||V | ∗ program for pDS. then |E| = |U|d = |V |d , so d = d0 = |E| = κ . D. Examples III.CONFUSION GRAPHSAND FAMILIESOF CHANNELS WITHTHE SAME CODES Consider the single-asymmetric-error channel. The input and output of this channel are binary vectors of length n. The We can restate the constraints on a zero-error code in channel acts separately on each entry of the vector. A zero another way. If two channel inputs have a common output, they input produces a zero output, but a one input can produce cannot appear together in a zero-error code. The confusion either a one or a zero (an error). graph, G, for a channel records these pairwise constraints. The Each input with k ones can produce k + 1 outputs. The all vertices of the confusion graph are the channel inputs and two ∗ n Pk−1 n inputs are adjacent if and only if they have a common output. zero input has degree one, so pMD = 2 . There are i=0 i inputs with degree strictly less than k + 1. Thus A zero-error code is an independent set in the confusion graph. The size of the largest independent set in a graph G is k−1   n denoted α(G). There are two natural integer linear programs ∗ X n 2 pDT (H) = min + . k i k + 1 that express the maximum independent set for a graph. The i=0 first has a constraint for each edge: Each output v with k ones is adjacent to an input with α(G) = max{1T x : x ∈ {0, 1}|U|, k ones and (for k < n) some inputs with k + 1 ones. The ∀(t, u) ∈ E(G) x + x ≤ 1} minimum degree among these inputs is k+1, so in the optimal t u ∗ 1 assignment in the program for κLD, yv = k+1 . Thus However, relaxing the integrality constraint for this program gives a useless upper bound. The vector with xu = 1/2 for n n X n 1 X n + 1 1 2n+1 − 1 all u ∈ U is feasible, so the optimum value is at least |U|/2 κ∗ = = = LD i i + 1 i + 1 n + 1 n + 1 regardless of the structure of the graph. i=0 i=0 To create the confusion graph from the channel graph, we To verify that this is a good bound on κ∗, we compute the added a clique for the neighborhood of each channel output. value of the local degree lower bound on p∗. Let j = n − k. The confusion graph may contain maximal cliques whose For k ≥ 1, each input u with k ones is adjacent to an output edges came from multiple outputs. In the second integer linear with k − 1 ones. That output has n − k + 1 = j + 1 zeros, so program, we include a constraint for each maximal clique in it has degree j + 2. The input with zero ones is adjacent only the confusion graph. Let Ω be the set of maximal cliques in to the output with zero ones, which has degree n + 1. Thus G. Then the second program for maximum independent set is the value of the local degree lower bound is T |U| X α(G) = max{1 x : x ∈ {0, 1} , ∀S ∈ Ω xu ≤ 1} n−1 1 X n 1 2n+1 2n+2 − 2 u∈S + = − n + 1 j j + 2 n + 1 (n + 1)(n + 2) and its dual, the program for minimum clique cover, is j=0 X θ(G) = min{1T y : y ∈ {0, 1}|Ω|, ∀u ∈ U y ≥ 1}. In this example, the input degrees are concentrated around S S∈Ω:u∈S the average degree so the degree threshold bound performs ∗ ∗ reasonably well. There is very little variation in input degree The fractional versions of these programs, α (G) and θ (G), within the neighborhood of a single output, so the local degree give a nontrivial upper bound on α(G). However, there may ∗ bound performs very well. be exponentially many maximal cliques in G, so θ (G) is not Now we give an example where the bounds do not perform guaranteed to be efficiently computable. as well. Consider the channel with input and output sets [q] = Theorem 1. Let H be a bipartite channel graph and let G {1, 2, . . . , q}. For each input i, let the possible outputs be all be the codeword constraint graph derived from H. Then j ≤ i. ∗ ∗ ∗ For this channel, κ∗, κ∗ , p∗ , p∗ , and p∗ are all p(H) = α(G) ≤ θ (G) ≤ p (H) ≤ κLD(H) ≤ LD DS DT MD ∗ ∗ ∗ distinct. The output one can be produced by any input, so pDS(H) ≤ pDT (H) ≤ pMD(H) κ(H) = κ∗(H) = p(H) = 1. The input one has degree one, ∗ Proof: The programs for p(H) and α(G) contain the same so κMD(H) = q. If we choose d as the degree threshold, then ∗ ∗ √ ∗ feasible points, but the feasible space of the program for α (G) κDT (H, d) = d+q/d. The best choice is q, so κDT (H, d) = is contained in the feasible space of the program for p∗(H). √ k+1 ∗ 2 q. The sum of the smallest k degrees is 2 , so κDS(H) The program for p is a maximization and p∗, p∗ , p∗ , and k+1 DS DT is the largest k such that ≤ q. This is approximately ∗ √ 2 pMD form a sequence of relaxations of that program. The 2q. Finally, each output j can be produced from each input κ∗ κ∗ q program for is a minimization and the program for LD is 1 ∗ ∗ i ≥ j and input i has degree i. Thus yj = mini=j i and a restriction of it. By Lemma 1, κ ≤ p . ∗ Pq 1 DS κLD = j=1 j , which is approximately log q. There are many channel graphs that reduce to the same q+1 In this example, the average input degree is 2 , so we constraint graph. If we are lucky, we can find a relatively might hope to get an upper bound on κ∗ of about 2. However, nice family of such channels. A more tractable alternative to the input degrees are not concentrated around the average, so computing θ∗(G) is to find κ∗(H) for each channel graph in none of our four approximations are very good. the family and take the best bound. 1 ∗ A. Hamming and Singleton Bounds limn→∞ n log κ Consider the channel that takes an q-ary vector of length log 4 n as its input, erases a symbols, and substitutes up to b n n n−a symbols. Thus there are q channel inputs, a q outputs, n Pb n−a i and each input can produce a i=0 i (q − 1) possible outputs. Two inputs share a common output if and only if their Hamming distance is at most s = a+2b. For each choice of q, n, and s, we have a family of channels with identical confusion 1 2 log 3 graphs. Call the bipartite graph for the q-ary n-symbol a- erasure b-substitution channel Hq,n,a,b. These channels are all input and output regular, so it is easy to compute κ∗: n n−a δ ∗ a q 0 1 1 κ (Hq,n,a,b) = 2 n Pb n−a i a i=0 i (q − 1) qn−a Fig. 1. The curved line is the Hamming bound, which is = . lim 1 log κ∗(A ). The upper straight line the Singleton Pb n−a i n→∞ n 4,n,0,s/2 (q − 1) 1 ∗ i=0 i bound, which is limn→∞ n log κ (A4,n,s,0). The straight line run- ning from ( 1 , 1 log 3) to (1, 0) is the best sphere-packing bound, Two special cases give familiar bounds. For even s, if we 2 2 lim 1 log min κ∗(A ). set a = 0 and b = s/2, then we obtain the Hamming bound: n→∞ n 0≤b≤s/2 4,n,s−2b,b n ∗ q κ (Hq,n,0,s/2) = . Ps/2 n i that converges to average input degree while the number of low i=0 i (q − 1) degree inputs is negligible. Recently, Kulkarni and Kiyavash Setting a = s and b = 0, gives us the Singleton bound: applied the local degree bound to the deletion channel [1], ∗ n−s κ (Hq,n,s,0) = q . Levenshtein’s asymptotically but significantly im- proving it for small n. This is not surprising given Lemma 1. For q = 2, the Hamming bound is always the best bound Any code capable of correcting s deletions can also correct in this family. When q is at least 3, each bound in the family any combination of s total insertions and deletions. Two input is relevant for some range of parameters. An easy calculation strings can appear in an s-deletion-correcting code if and only ∗ ∗ shows that κ (Hq,n,a,b) ≤ κ (Hq,n,a+2,b−1) when n − 1 ≥ if the deletion distance between them is more than s. This is a + qb, or equivalently, when 2(n − 1) ≥ qs − (q − 2)a. This exactly analogous to role Hamming distance plays for erasure means that the Hamming bound (a = 0) is the best in the s 2 substitution channels. Cullina and Kiyavash applied the degree family when 2(n − 1) ≥ qs, or n−1 ≤ q . For a ≥ 2, Hq,n,a,b threshold bound to channels performing a mixture of deletions gives the best bound when and insertions [5]. In the asymptotic regime with n going to (q − 2)(a − 2) + 2(n − 1) ≤ qs ≤ (q − 2)a + 2(n − 1) infinity and s fixed, the best bound comes from a channel qs s that performs approximately q+1 deletions and q+1 insertions, or equivalently when where q is the alphabet size. q − 2 a − 2 2 s q − 2 a 2 + ≤ ≤ + . REFERENCES q n − 1 q n − 1 q n − 1 q [1] A. A. Kulkarni and N. Kiyavash, “Non-asymptotic upper bounds for Thus each bound is the best in the family in some region deletion correcting codes,” IEEE Transactions on Information Theory, 2 s 2012. [Online]. Available: http://arxiv.org/abs/1211.3128 of q ≤ n−1 ≤ 1. This can be translated into an asymptotic 2 [2] N. Kashyap and G. Zmor, “Upper bounds on the size of grain-correcting bound. For fixed δ, q ≤ δ ≤ 1, and s = δn codes,” arXiv preprint arXiv:1302.6154, 2013. [Online]. Available: http://arxiv.org/abs/1302.6154 1 ∗ [3] R. Gabrys, E. Yaakobi, and L. Dolecek, “Correcting grain-errors in lim min log κ (Hq,n,s−2b,b) = (1 − δ) log(q − 1). n→∞ 0≤b≤s/2 n magnetic media,” in Information Theory Proceedings (ISIT), 2013 IEEE International Symposium on, 2013, p. 689693. This family of bounds fills in the convex hull of the Hamming [4] S. Buzaglo, E. Yaakobi, T. Etzion, and J. Bruck, “Error- and Singleton bounds. Figure 1 plots this optimized bound, the correcting codes for multipermutations,” 2013. [Online]. Available: http://authors.library.caltech.edu/36946/ Hamming bound, and Singleton bound for q = 4. [5] D. Cullina and N. Kiyavash, “An improvement to Levenshtein’s upper bound on the cardinality of deletion correcting codes,” in IEEE Interna- B. Applications to deletion insertion channels tional Symposium on Information Theory Proceedings, Jul. 2013. Levenshtein applied the degree threshold bound to the [6] V. I. Levenshtein, “Binary codes capable of correcting deletions, inser- tions, and reversals,” in Soviet physics doklady, vol. 10, 1966, p. 707710. deletion channel in 1966 [6]. He considered the asymptotic behavior of the upper bound with n, the length of the input string, going to infinity and s, the number of deletions, fixed. In this regime, the channel graph becomes approximately regular. This makes it possible to choose a degree threshold