arXiv:1405.1464v2 [cs.IT] 11 Jun 2015 y if of vector a in errors te rp hoei oe onssc sTur as with such bounds bounds lower lower packing theoretic sphere-covering graph to compare other trade approximations We and relationships from them. the program between bounds study linear and of problems to covering series and solutions a are derive sphere-pac We bounds possible best sphere-covering The Thes and nonuniform. models. are size spheres error the sphere-pac error when arbitrary important of especially to sphere- generalizations become generalizations the bounds derive discuss sphere-covering to th We on and easy codes. bounds lower it of sphere-covering size makes and bounds which upper packing symmetric, highly are oipoetebs nw pe onso h ie fsingle of sizes the it correctin error on use grain and bounds single Kiyavash and upper codes and known correcting Kulkarni best deletion of the generalizat bound improve a degree to present We local codes. the same family of the a across admit optimizing that by bounds channels upper obtain to how show hmag-raa raa lios681(mi:cullina (email: Univers 61801 Laboratory, Illinois Science Urbana, Coordinated Champaign-Urbana, the and gineering eedon depend upre npr yNFgat C 0597CRadCF10-65 CCF and CAR 10-54937 CCF grants NSF T by [1]. Kiyavash. part 2014 July in Honolulu, supported Theory, Information on Symposium nbt pe n oe onso h ieo h largest the of size the on bounds lower and upper both in pto up n ihazr u antrpaeazr ihaoe Binary one. have a we with symmetries zero two the a of replace neither a have cannot errors replace but asymmetric can zero which a errors, with simpl asymmetric one The binary the interest. set is of restricted example sometimes a are with substitutions errors allowed Substitution symmetry. of gree and Hamming the of simple. proofs extremely the bounds make symmetries Gilbert-Varshamov described two have The we bound. lower that sphere-covering a lowe is Gilbert-Varshamov bound the and i bound upper bound sphere-packing Hamming The codes. substitution-error-correcting h ubro etr htcnb rdcdfrom produced be can that vectors of number the hmag-raa raa lios681(mi:kiyavas (email: 61801 Illinois Unive Laboratory, Urbana, Science Champaign-Urbana, Coordinated the and Engineering then , ea iaahi ihteDprmn fIdsra n Ente and Industrial of Department Comp the with and is Kiyavash Electrical Negar of Department the with is Cullina Daniel h aeili hspprwspeetd(npr)a h Inte the at part) (in presented was paper this in material The s Abstract h lsi rbe fcdn hoy orcigsubstitut correcting theory, coding of problem classic The ayohritrsigerrmdl onthv hsde- this have not do models error interesting other Many rosaerqie ocag vector a change to required are errors eeaie peepcigadsphere-covering and sphere-packing Generalized s onso h ieo oe o combinatorial for codes of size the on bounds usiuin,tesz fteshr around sphere the of size the substitutions, s rosaeas eurdt change to required also are errors Mn ftecascpolm fcdn theory coding of problems classic the of —Many x h ie fteeshrspa rca rule crucial a play spheres these of sizes The . ailCullina, Daniel q .I I. aysmos shgl ymti.First, symmetric. highly is symbols, -ary NTRODUCTION tdn ebr IEEE Member, Student x noaohrvector another into y nsterm We theorem. an’s ´ into @illinois.edu). [email protected]). st fIlni at Illinois of rsity t fIlni at Illinois of ity x pieSystems rprise x x i okwas work his ymaking by osnot does , Second, . channels codes. g rnational trEn- uter fthe of s -offs king king ion 022 ion a s est of s of s. e e r - n ea Kiyavash, Negar and o xml,oebudue h itiuino h ie of sizes the of distribution the uses bound conse- one and complexit to example, model and For connection performance error between the the trade-offs of make level makes quently structure varying about that use information bounds fashion of These a natural. in programming linear model error an codes. correcting correctin error deletion grain best single single the of and improve sizes codes the to our on easy bound bounds apply degree it upper We known local bounds. makes the the characterization of between This generalization relationships step. the t is study single bound to original a bound the of each that associate such We result series. procedure the iterative of in an bounds bound with the degree of More local approximations one The is from sizes. problems. [2] bounds covering sphere of and packing error series to a nonuniform derive and we applicabl precisely, form uniform general theory both most coding their to in the arguments that throughout framework such unifying fashion permits a present hoc to attempt ad We literature. an in applied useful. are suc bound bounds degree simplified local and intractable, the packing often as possible inp is best large bounds the exponentially covering of an computation on like Because errors, act space. Deletion models, intractable. computation error is interesting because solution exact program degree the linear local of poi the sphere-packing feasible dual as the a this constructs for bound to degree refer local will The multip bound. We and [4] [5]. [3], errors [2]. errors mutation codes grain deletion-correcting to of applied find subsequently size to was the Kiyavash on and bound upper Kulkarni an by applied was programming, essential. becomes perspective directl channe programming symmetric linear without less the For bound programs. possible sphere-packing linear any often possible considering is cases best it symmetric the bounds models, highly get error the In classical of programs. many versions linear including to possible linea solutions best to are the related and fundamentally programming a are Sphere-packing nonuniform. bounds are sphere-covering spheres th error when the important of sizes especially become mode error generalizations arbitrary These to bounds sphere-covering and packing niptfo n e n rdcsa uptfo another. from output takes an produces operation and error set the one way: from fundamental input more an substit a from differ in errors errors deletion and Erasure described. euetecneto obntra hne orepresent to channel combinatorial a of concept the use We been have arguments sphere-covering and Sphere-packing linear via derived explicitly bound, new a recently fact, In nti ae,w ildsustegnrlztoso sphere of generalizations the discuss will we paper, this In eirMme,IEEE Member, Senior ution most that er- nd he ls. ls, to ut nt y. It h y g e e s 1 r - , 2

spheres in the space while another uses only the size of the an input x by the channel if Ax,y = 1. Each row or column smallest sphere. of A must contain at least a single one, so each input can In general, there are many different combinatorial channels produce some output and each output can be produced from that admit the same codes. However, each channel gives some input. a different sphere-packing upper bound. We show that the We will often think of a channel as a bipartite graph. In this , which can be derived from a substitution case, the left vertex set is X, the right vertex set is Y , and A error channel, the , which can be derived is the bipartite adjacency matrix. We will refer to this bipartite from an erasure channel, and a family of intermediate bounds graph as the channel graph.1 For x ∈ X, let N (x) ⊆ Y be provide an example of this phenomenon. A the neighborhood of x in the channel graph (the set of outputs Sphere-packing upper bounds and sphere-covering lower that can be produced from x). The degree of x is |N (x)|. bounds are not completely symmetric. The A For y ∈ Y , let N (y) ⊆ X be the neighborhood of y in the approximation techniques that we discuss each yields a sphere- A channel graph (the set of inputs that can produce y). In most packing upper bound and a sphere-covering lower bound. cases, the channel involved will be evident and we will drop We explore the relationship between sphere-covering lower the subscript on N. bounds and other graph theoretic lower bounds such as Tur´an’s Note that A1 = 1 and 1T A = 1T . Thus A1 is theorem. {y} N(y) {x} N(x) the vector of input degrees of the channel graph, T 1 is the Our contributions can be summarized as follows. We pro- A vector of output degrees, and 1T 1 is the number of edges. vide a unified framework for describing both upper and lower A We are interested in the problem of transmitting information bounds on code size. This allows us to make very general through a combinatorial channel with no possibility of error. statements about the relative strengths of the bounds. In To do this, the transmitter only uses a subset of the possible particular, our generalization of the local degree bound allows channel inputs in such a way that the receiver can always us to improve the best known upper bounds for a few channels. determine which input was transmitted. Finally, we demonstrate the power of considering families of channels with the same codes. Definition 2. A code for a combinatorial channel A ∈ In Section II, we discuss the linear programs associated {0, 1}X×Y is a set C ⊆ X such that for all y ∈ Y , with sphere-packing bounds. In Section III, we present a |N(y) ∩ C|≤ 1. generalization of the local degree bound that is related to an This condition ensures that decoding is always possible: if y iterative procedure. We use this to improve the best known is received, the transmitted symbol must have been the unique upper bounds on the sizes of single deletion correcting code element of N(y) ∩ C. and single grain error correcting codes. In Section IV, we discuss sphere-packing bounds related to the degree sequence and average degree of a channel. In Section V-B, we dis- C. Sphere-packing cuss families of channels that have the same codes but give A code is a packing of the neighborhoods of the inputs different sphere-packing bounds. In Section VI, we discuss into the output space. The neighborhoods of the codewords sphere-covering lower bounds, lower bounds related to Tur´an’s must be disjoint and each neighborhood contains at least theorem, and their relationships. minx∈X |N(x)| outputs. Thus the simplest sphere-packing upper bound on the size of a code C is II. SPHERE-PACKING BOUNDS AND LINEAR PROGRAMS |Y | |C|≤ . A. Notation minx∈X |NA(x)| X Let X and Y be a finite sets. For a semiring R, let R This is the minimum degree upper bound, because |NA(x)| is denote the set of |X|-dimensional column vectors of elements the degree of x in the channel graph of A. The sphere-packing of R indexed by X. Let RX×Y denote the set of |X| by |Y | upper bounds discussed in this paper are generalizations of and matrices of elements of R with the rows indexed by X and improvements on this bound. the columns indexed by Y . Let 2X denote the power set of X. Maximum input packing and its dual, minimum output Let N denote the set of nonnegative integers and let [n] denote covering, are naturally expressed as integer linear programs. the set of nonnegative integers less than n: {0, 1,...,n − 1}. 1 0 1 An equivalent approach, taken by Kulkarni and Kiyavash [2], is to Let be the column vector of all ones and let be the column represent an error model by a hypergraph. A hypergraph consists of a vertex X vector of all zeros. For a set S ⊆ X, let 1S ∈{0, 1} be the set and a family of hyperedges. Each hyperedge is a nonempty subset of the indicator column vector for the set S. vertices. A hypergraph H = (V,E) can be described by a vertex-hyperedge incidence matrix H ∈ {0, 1}V ×E . There are two ways to encode as error model as a hypergraph. Let A ∈ {0, 1}X×Y be the combinatorial channel B. Combinatorial channels for that error model. The first option is to take H = A to be the incidence matrix of the hypergraph. The hypergraph vertices and the channel inputs and We use the concept of a combinatorial channel to formalize there is an edge for each output. Alternatively, we can let H = AT and a set of possible errors. obtain the dual of the previous hypergraph. Now the hypergraph vertices are the channel outputs. This is the option taken by Kulkarni and Kiyavash. Definition 1. A combinatorial channel is a matrix A ∈ Throughout this paper, we use the language of channels and bipartite X×Y channel graphs rather than that of hypergraphs. This allows us to refer {0, 1} , where X is the set of channel inputs and Y is to channel inputs and outputs using symmetric language and avoids any the set of channel outputs. An output y can be produced from confusion between a hypergraph and its dual. 3

X×Y Definition 3. For a channel A ∈ {0, 1} , the size of the III. THE LOCAL DEGREE ITERATIVE ALGORITHM largest input packing, or code, is Let A ∈{0, 1}X×Y be a channel. We can obtain an upper ∗ T bound for p (A) (and consequently p(A)) by finding a feasible p(A)= max 1 w ∗ RY w∈NX point in the program for κ (A). Given some t ∈ , consider R ∗ s. t. AT w ≤ 1. the smallest c ∈ such that z = ct is feasible for κ (A). To satisfy Atc ≥ 1, we need The size of the smallest output covering is 1 c ≥ (At)x κ(A) = min 1T z z∈NY for all x. As long as (At)x > 0 for all x, the vector s. t. Az ≥ 1. ty zy = (1) minx∈X (At)x An output covering can be thought of as a strategy for the adversary operating the channel that is independent of the is feasible and we have the upper bound p∗(A) ≤ 1T z, which ∗ transmitter’s choice of code. In Section VI, we use output we call κMDU (A, t). The special case coverings of an auxiliary channel to obtain a lower bound on ∗ 1 |Y | the p(A). κMDU (A, )= , minx∈X |NA(x)| is the minimum degree upper bound, which explains our D. Fractional relaxations choice of notation. There is an analogous construction of a feasible point for However, the maximum independent set and minimum p∗(A): dominating set problems over general graphs are NP-Hard 1T t p∗ (A, t)= . [6]. The approximate versions of these problems are also hard. MDL T maxy∈Y (A t)y The maximum independent set of an n-vertex graph cannot be 1 1 approximated within a factor of n1−ǫ for any epsilon unless If a channel A is input regular, then A = d and P=NP [7]. We seek efficiently computable bounds. These |Y | |X||Y | κ∗ (A, 1)= = , bounds cannot be good for all graphs, but they will perform MDU d |E| reasonably well for many of the graphs that we are interested in. where E is the edge set of the channel graph. If A is output 1T ′1T The relaxed problem, maximum fractional , regular, then A = d and provides an upper bound on the original packing problem. ∗ 1 |X| |X||Y | pMDL (A, )= = . X×Y d′ |E| Definition 4. Let A ∈ {0, 1} be a channel. The size of the maximum fractional input packing in A is Consequently, if A is both input and output regular, then |X||Y | p∗(A)= max 1T w p∗(A)= . w∈RX |E| s. t. w ≥ 0 Each bound for p∗(A) or p∗(A) that we present will have T 1 A w ≤ . a vector parameter t ∈ RY or t ∈ RX . Usually there is some t for which the approximation is exact, but finding The size of the minimum fractional output covering is this t is no easier that solving the original problem so this is not our motivation for including the extra parameter. The ∗ T κ (A) = min 1 z t parameter allows us to convert a single bound into an z∈RY s. t. z ≥ 0 iterative approximation procedure. Computation of the bound for some t will also suggest next another t. By taking this Az ≥ 1. perspective for all of the bounds we consider, we can make strong comparisons between them. The fractional programs have larger feasible spaces, so p(A) ≤ p∗(A) and κ∗(A) ≤ κ(A). By strong linear pro- gramming duality, p∗(A)= κ∗(A). A. The local degree bound Unlike the integer programs, the values of the fractional lin- For channels that are both input and output regular, compu- ear programs can be computed in polynomial time. However, tation of the bound p∗ is trivial: the minimum we are usually in sequences of channels with exponentially degree bound is exact. However, even a single low degree large input and output spaces. In these cases, finding exact input will ruin the effectiveness of the minimum degree bound. solutions to the linear programs is intractable but we would To obtain a better upper bound on p(A) and p∗(A), we will still like to know as much as possible about the behavior of construct a more sophisticated feasible point in the program the solutions. for κ∗(A) by making a small change to (1). 4

Definition 5. Let A ∈{0, 1}X×Y be a channel. For z ∈ RY Kulkarni and Kiyavash computed the local degree upper Y ∗ such that Az > 0, define ϕA(z) ∈ R as follows: bound, or equivalently ϕ(1) [2]. This shows that κ (An) is at zy most ϕA(z)y = . n n n min (Az)x 2 2 2 2 x∈N(y) = 1+ = (1 + O(n−1)). ∗ 1T n − 1 n +1 n − 1 n +1 Define the local degree upper bound κLDU (A, z)= ϕA(z).   Recently, Fazeli et al. found a fractional covering for A that Lemma 1. For z ∈ RY such that z ≥ 0 and Az > 0, ϕ (z) n A provides a better upper bound [9]. In this section, we compute is feasible in the program for κ∗(A). If z is feasible for κ∗(A), ϕ ◦ ϕ(1) for these channels and analyze the values of these then ϕ (z) ≤ z. A points. We show that Fazeli’s improved covering is related to Proof: To demonstrate feasibility of ϕ(z), we need the covering ϕ ◦ ϕ(1), but ϕ ◦ ϕ(1) provides a better bound ϕ(z) ≥ 0 and Aϕ(z) ≥ 1. The first condition is trivially asymptotically. met. For x ∈ X and y ∈ N(x), we have More precisely, the upper bound from ϕ ◦ ϕ(1), given in ∗ zy zy Theorem 2, shows that κ (A) is at most ϕ(z)y = ≥ mint∈N(y)(Ay)t (Az)x 2n 2 2n 1 − + O(n−2) = (1 + O(n−2)). 1 n − 1 n − 1 n +1 (Aϕ(z))x = ϕ(z)y ≥ zy =1   (Az)x y∈XN(x) y∈XN(x) The covering in Fazeli et al. gives an upper bound of and ϕ(y) is feasible. 2n 1 1 1+ + O(n−2) . If z is feasible, then Az ≥ . For all y ∈ Y we have n +1 n − 1   zy ∗ ϕ(z)y = ≤ zy. Let r,u,b ∈ N[2] be vectors such that for all x ∈ [2]∗, r minx∈N(y)(Az)x x is the number of runs in x, ux is the number of length-one runs, or unit runs, in x, and b is the number of unit runs at More generally, we can view this as a single step in an x the start or end of x. iterative procedure. Suppose that we have a vector z ∈ RY that Proofs of the theorems and lemmas stated in this section is a feasible vector in the program for κ∗(A). For any channel, can be found in Appendix A. we can take z = 1 as an initial vector. At each input x, the total coverage, (Az)x, is at least one. The input x informs Theorem 1. Let each output in that it can reduce its value by a factor of −1 N(x) 1 max(2u − b − 2, 0) (Az) . Each output y receives such a message for each input f(r,u,b)= 1+ . x r (r + 2)(r + 1) in N(y), then makes the largest reduction consistent with the   ∗ messages. Then the vector zy = f(ry,uy,by) is feasible for κ (An), so ∗ T We can iterate this optimization step. An iteration fails κ (An) ≤ 1 z. to make progress under the following condition. From the Theorem 2. For n ≥ 2, definition ϕ(z)y = zy if and only if minx∈N(y)(Az)x = 1. Thus ϕ(z)= z if for all y ∈ Y there is some x ∈ N(y) such 2n 26 κ∗(A ) ≤ 1+ . that (Az) =1. This algorithm is monotonic in each entry of n n +1 n(n − 1) x   the feasible vector, so it cannot make progress if its input is Now we will compare this bound to the bound correspond- at the frontier of the feasible space. ing to the cover of Fazeli et al. Let Scaling the input by a positive constant does not affect the output of ϕ: for c ∈ R, c> 0, ϕ(A, z)= ϕ(A,cz). We could 1 u−b ′ r 1 − r2 u − b ≥ 2 think of κ∗ (A, t) as an involving iterative procedure as well. f (r,u,b)= 1 MDL ( u − b ≤ 1. It has the same scaling property. In contrast to the local degree r  ′ iteration, the maximum degree iteration always stops after a Fazeli et al. establish that zy = f (ry,uy,by) is feasible for ∗ ′ single step because the output vector is a constant multiple of κ (An). Compare this with the cover given by f and note the input. The local degree iteration scales different entries in that the coefficient on u is 1 in f ′′ and 2 in f ′. the initial vector by different amounts, so it is possible for it Lemma 2. Let z = f ′(r ,u ,b ). Then to make progress for multiple iterations. y y y y 2n − 2 1 3 1T z ≥ 1+ − B. Application to the single deletion channel n +1 n − 1 (n − 1)(n − 2)   Let An be the n-bit 1-deletion channel. The input to the This shows that the bound of Theorem 2 is asymptotically binary single deletion channel is a string x ∈ [2]n and the better than the bound corresponding to the cover of Fazeli output is a substring of x, y ∈ [2]n−1. Each output vertex in n et al. We could continue to iterate ϕ to produce even better ∗ ∗ 2 An has degree n +1. Thus p (An) ≥ pMDU (An)= n+1 . bounds. The fractional covers produced would depend on more Levenshtein [8] showed that statistics of the strings. For example, the value at a particular 2n output of the cover produced by the third iteration of ϕ would κ∗(A ) ≤ (1 + o(1)). n n +1 depend on the number of runs of length two in that output 5

∗ n |VT0| p (A) Thm.1 FVY KK Thm.2 and overlapping grain error channels [12]. We discuss the 5 6 6 7 7 7 12 degree sequence bound and its relationship to the local degree 6 10 10 12 12 12 17 bound in Section IV. Kashyap and Z´emor applied the local 7 16 17 20 20 21 25 degree bound to improve on Mazumdar et al. for the 1,2, or 8 30 30 35 35 36 41 3 error cases [3]. They conjectured an extension for larger 9 52 53 61 61 63 69 numbers of errors. Gabrys et al. applied the local degree bound 10 94 96 109 109 113 119 to improve on Sharov and Roth [4]. 11 172 175 196 197 204 211 The input and output of this channel are strings x, y ∈ [2]n. 12 316 321 357 358 372 377 To produce an output from an input, select a grain pattern with 13 586 593 653 657 682 682 at most one grain of length two and no larger grains. The grain 14 1096 1104 1205 1212 1260 1248 of length two, if it exists, bridges indices j and j +1 for some 15 2048 2237 2251 2340 2301 0 ≤ j ≤ n − 2. Then the channel output is 16 3856 4174 4202 4368 4272 17 7286 7825 7882 8191 7977 xi i 6= j yi = 18 13798 14727 14845 15420 14969 (xi+1 i = j 19 26216 27820 28059 29127 28207 If u = u or if there is no grain of length two, then y = x. 20 49940 52720 53202 55188 53348 j j+1 The degree of an input string is equal to the number of 21 95326 100194 101163 104857 101226 runs r: each of the r − 1 run boundaries could be bridged by 22 182362 190912 192850 199728 192623 a grain or there could be no error. A grain error reduces the 23 349536 364621 368478 381300 367485 number of runs by 0,1, or 2. The number of runs is reduced 24 671092 697865 705511 729444 702697 by 1 if j = 0 and x0 6= x1, by 2 if j ≥ 1, xj 6= xj+1, and Fig. 1. The cardinality of the VT construction and several upper bounds xj−1 = xj−1, and by 0 otherwise. Equivalently, the number on p(An), where An is the n-bit single deletion channel. For n ≤ 14, ∗ Kulkarni and Kiyavash were able computed the exact value of p (An) [2]. of runs is reduced by 1 if x has a length-1 run at index 0 is

This requires solving an exponentially large linear program.n Kulkarni and eliminated and by 2 if a length-1 run elsewhere is eliminated. Kiyavash also constructed a dual feasible point with weight 2 −2 (column n−1 In the previous section, we let ux be the number of length-1 KK). This is equivalent to the first iteration of the local degree algorithm. Fazeli et al. improved on this construction (column FVY) [9]. Our Theorem 1 runs in x and bx be the number of length-1 runs appearing uses two interactions of the local degree algorithm. Computing the value of at the start or end of x. For the grain channel, we need to the FVY and Thm. 1 columns requires a sum over about n2 terms. Our distinguish between length-1 runs at the start and at the end, Theorem 2 gives an analytic upper bound on the weight of the feasible point L R from Theorem 1, which improves on existing bounds for n ≥ 22. so let bx and bx count these.

Theorem 3. Let An be the n-bit 1-grain-error channel. The vector string, in addition to the total number of runs and the number −1 R L of runs of length one. 1 2uy − 2b − b − 2 z = 1+ y y The largest known single deletion correcting codes are the y ry (ry + 2)(ry + 1) ! Varshamov-Tenengolts (VT) codes. The length-n VT code n ∗ 2 is feasible for κ (An). contains at least n+1 codewords, so these codes are asymp- totically optimal. The VT codes are known to be maximum By applying the techniques used in the proof of Theorem 2, independent sets for n ≤ 10, but this question is open for ∗ it can be shown that Theorem 3 implies that κ (An) = n+1 larger n [10]. Kulkarni and Kiyavash computed the exact 2 (1 + O(n−2)). ∗ n+2 value of κ (An) for n ≤ 14 [2]. For 7 ≤ n ≤ 14, the gap ∗ between κ (An) and the size of the VT codes was at least one, IV. THEDEGREESEQUENCEUPPERBOUND so it is unlikely that sphere-packing bounds will resolve the The degree sequence upper bound is an important technique optimality of the VT codes for larger n. Despite this, it would n for dealing with nonuniform combinatorial channels that pre- be interesting to know whether κ∗(A ) ≤ 2 + O(2cn) for n n+1 dates the local degree bound by many decades. This is a simple some constant c< 1. generalization of the minimum degree bound and the basic idea behind the bound does not require linear programming. C. Application to the single grain error channel For any code C ⊆ X, x∈C |N(x)| ≤ |Y |. The size of Recently, there has been a great deal of interest in grain the largest input set C satisfying this inequality is an upper error channels, which are related to high-density encoding on bound on the size of theP largest code. This set can be found magnetic media. A grain in a magnetic medium has a single greedily by repeatedly adding the minimum degree remaining polarization. If an encoder attempts to write two symbols to input vertex. This approach is more robust than the minimum a single grain, only one of them will be retained. Because degree upper bound because it takes all of the input degrees the locations grain boundaries are generally unknown to the into account. Call this the degree sequence upper bound. encoder, this situation can be modeled by a channel. Levenshtein applied this idea to obtain an upper bound on Mazumdar et al. applied the degree sequence bound to codes for the deletion channel. Kulkarni and Kiyavash applied non-overlapping grain error channels [11]. Sharov and Roth the local degree bound to the deletion channel and showed applied the degree sequence bound to both non-overlapping that the resulting bound improved on Levenshtein’s result. 6

Although its definition does not require a linear program, the of inputs: degree sequence bound still has a nice linear programming interpretation. Taking this perspective, we compare the per- X− = {x ∈ X : (At)x < d}, formance of the degree sequence bound to the local degree X0 = {x ∈ X : (At)x = d}. bound for arbitrary channels and show that the local degree If 1T At ≤ 1T t ≤ (1T + 1T )At, then the point bound is always better. The degree sequence bound uses less X− X− X0 information about the channel than the local degree bound and 1 1 1 achieves a weaker but still not trivial result. At the end of this zy = ty + max − , 0 section, we discuss bounds that use even less information the d (At)x d  x∈XN(y)   degree sequence bound. ∗ ∗ 1T  is feasible in κ (A), pDSU (A, t)= z, and ϕA(t) ≤ z. Thus ∗ ∗ κLDU (A, t) ≤ pDSU (A, t). A. Linear programs for the degree sequence bound ∗ Proof: First we establish that p (A, t) = |X−| + T T DSU 1 t−1X At While the local degree upper bound is naturally expressed − by constructing a primal feasible point and a dual ∗ d as a feasible point in the program for κ , the easiest linear feasible point with this value. T T programming interpretation of the degree sequence bound 1 t−1 At 1 X− 1 The point w = X− + X is feasible for works differently. The degree sequence upper bound is the d|X0| 0 value of a further relaxation of the program for p∗. It turns out the primal program. This puts the maximum possible weight that the minimum degree upper bound is easily expressed as on each of the inputs with degree below the threshold and both a dual feasible point and a primal relaxation. Section III fractional weight on inputs with degree equal to the threshold. included the former interpretation and the latter is given here. The dual program is X×Y Y For a channel A ∈{0, 1} and a vector t ∈ R , define min 1T tz′ + 1T z′ ′ R ′ RX 0 z0∈ ,z ∈ ∗ 1T ′ ′ pMDU (A, t)= max w s. t. (z ,z ) ≥ 0 w∈RX 0 ′ ′ 1 s. t. w ≥ 0 Atz0 + z ≥ . T T T t A w ≤ t 1. ′ 1 ′ d−(At)x The point z0 = d , zx = max(0, d ) is feasible in the dual program. Note that z′ > 0 only for x ∈ X . The value The solution to this program puts all of the weight on x − ∗ of this point is the minimum degree input argminx(At)x, so pMDU (A, t) = ∗ T 1T 1T κMDU (A, t). 1 t d − (At) t − X− At + x = |X | + . Recall that A1 is the vector of input degrees of the channel d d − d graph of A. Thus the main constraint of the program for xX∈X− ∗ 1 ′ ′ κMDU (A, ) is x∈X |N(x)|wx ≤ |Y |. In a code, each vertex We construct z from (z0,z ). Let can only be included once. We can capture this fact and 1 ′ T −1 ′ improve the upperP bound by adding the additional constraint z = diag(t)( z0 + A diag(At) z ) w ≤ 1 to the program. Then Definition 6. For a channel A ∈ {0, 1}X×Y and a vector 1T ′ 1T 1 ′ T −1 ′ z = diag(t)( z0 + A diag(At) z ) RY t ∈ , define the degree sequence bound T 1 ′ T T −1 ′ = t z0 + t A diag(At) z ∗ 1T 1T ′ 1T ′ pDSU (A, t)= max w = tz0 + z w∈RX 0 1 1 s. t. ≤ w ≤ so z has the necessary weight. Substituting the values z0 = d T T T d−(At)x t A w ≤ t 1. and zx = max( d , 0), we obtain z′ 1 1 d − (At) The degree sequence upper bound is tight: for a given y = + max x , 0 input degree distribution and output space size, there is some ty d (At)x d x∈XN(y)   channel where the neighborhoods of the small degree inputs 1 1 1 are disjoint. For this channel, the degree sequence upper bound = + max − , 0 d (At)x d is tight. The bound cannot be improved with incorporating x∈XN(y)   more information about the structure of the channel. 1 1 1 ≥ + max max − , 0 The local degree upper bound, which incorporates informa- d x∈N(y) (At) d  x  tion about the channel beyond the degree sequence, is always 1 1 at least as good as the degree sequence bound. To do this, we = max max , x∈N(y) (At) d  x  associate the degree sequence bound with a particular feasible 1 point in the program for κ∗(A). ≥ max x∈N(y) (At)x Theorem 4. Let A ∈{0, 1}X×Y be a channel and let t ∈ RY . 1 R = ϕA(t). Let d ∈ be a degree threshold and define the following sets ty 7

∗ Because ϕA(t) is feasible in κ (A), z is as well. 0 This interpretation of the degree sequence bound allows us 0 to iteratively construct dual feasible points, but these points are 1 0 0 1 dominated by those produced by the local degree algorithm. 1 1 0 A =   1 However, this interpretation does give some intuition about 1 0 1  0 1 1  2 the source of the superior performance of the local degree   bound. When multiple low-degree inputs have a common   2 output, the local degree bound takes advantage of this fact. 3 This information is not contained in the degree distribution. The degree sequence upper bound is monotonic and de- 1 1 1 0 0 0 creasing in the degree of each vertex. For a complicated 1 1 1 1 A ◦ AT =   channel, as a tractable alternative to computing the exact 1 1 1 1 degree sequence bound, we can compute the bound for a  0 1 1 1  1 1   simplified degree sequence. Pick a degree threshold d. Treat   each vertex with degree at least d as if it had degree exactly d 2 2 and treat each vertex with degree less than d as if it had degree one. The bound coming from this modified degree sequence 1 3 3 is |X| − |S| p∗ (A, 1) ≤ + |S|, DSU d 0 3 where S = {x ∈ X : |N(x)| < d}, the set of low degree inputs. This approach is effective when the degree distribution concentrates around its mean but still has a few vertices with 2 much lower degree. Fig. 2. At the top, a channel A ∈ {0, 1}[4]×[3] with its corresponding channel graph. In the middle, the channel A ◦ AT , and the composition of T T B. Bounds that use only the number of edges the channel graph of A with the channel graph of A . (A ◦ A )i,j = 1 if there is a path going left to right from i to j in that graph. At the bottom, The degree sequence bound uses the full input degree the confusability graph of A, which has adjacency matrix A ◦ AT − I. distribution of the channel graph and the local degree bounds use the degrees of the endpoints of each edge. Suppose that we only know the number of inputs, output, and edges in they are disjoint, and x∈S N(x)= Y . Meeting the first two the channel graph. This means that we know the average conditions is possible is possible because |S| ≤ |Y |. Because input degree and the average output degree but nothing else the neighborhoods areS disjoint, S is a packing. For each x ∈ about the degree distributions. Recall from Section III that X \ S, let N(x) = Y . Then all output degrees are all equal ∗ |X||Y | X×Y to |X| − |S| +1. κ (A) = |E| for a channel A ∈ {0, 1} with constant input and output degrees. The point |Y | 1 ∈ RX is feasible in In other words, a large number of edges are necessary to |E| rule out the existence of a set of input vertices with small the primal program for ∗ 1 , so |X||Y | ∗ 1 . pDSU(A, ) |E| ≤ pDSU (A, ) degree. To give us a relationship between κ∗(A) and the average input degree of A, we would need an inequality running in V. CONFUSABILITY GRAPHS AND FAMILIES OF CHANNELS the other direction. It turns out that the number of edges in a bipartite graph gives us weak bounds on the packing and Sphere packing upper bounds are obtained from combina- covering numbers for the graph. torial channels. However, for any channel there is a simpler object that also characterizes the set of codes: the confusability X×Y Lemma 3. Let A ∈ {0, 1} be a channel and let E be graph. Furthermore, any particular confusability graph arises the edge set of the channel graph. Then from many combinatorial channels. To obtain upper bounds |E| on the size of codes for one channel it can be useful to κ(A) ≤ |X|− +1. (2) |Y | consider the sphere packing bounds that arise from some other equivalent channel. At the end of this section, we show how For any X, Y , and S ⊆ X such that |S| ≤ |Y |, there is a the Hamming and Singleton bounds are an example of this channel A such that |E| = |Y |(|X| − |S| + 1) and S is an phenomenon. input packing in A. Thus (2) cannot be improved. Proof: For any output y ∈ Y , we can construct a cover A. Confusability graphs and independent sets using y together with |X| − |N(y)| other outputs: for each Definition 7. Let A ∈ {0, 1}X×Y and B ∈ {0, 1}Y ×Z be x ∈ X \ N(y), we add an arbitrary member of N(x) to our channels. Then define A ◦ B ∈{0, 1}X×Z , the composition of cover. Because , there is some with y∈Y |N(y)| = |E| y A and B, such that |N(y)|≥ |E| . |Y | P We construct the tightness example A as follows. Choose NA◦B(x)= NB(y) the neighborhoods of the inputs in S so that each is nonempty, y∈N[A(x) 8

We can characterize A ◦ B in two other ways: Each edge in G is a clique, so E(G) is one natural choice X×E(G) for Ω. Then α(G) = p(HE), where HE ∈ {0, 1} is 1 N (x) ∩ N (z) 6= ∅ (A ◦ B) = A B (3) the vertex edge incidence matrix for G. However, relaxing the x,z ∅ (0 NA(x) ∩ NB(z)= integrality constraint for this program gives a useless upper 1 1 ∗ |X| = max min(Ax,y,By,z) (4) bound. The vector w = 2 is feasible, so p (HE) ≥ 2 y∈Y regardless of the structure of G. The characterization (4) states that A◦B is the matrix product Definition 10. Let Ω be the set of maximal cliques in G and of A and B in the Boolean semiring. X×Ω let H ∈ {0, 1} be the vertex-clique incidence matrix. Let I denote the identity matrix. Ω Then α(G) = p(HΩ). Define the minimum clique cover of X×Y Definition 8. For a channel A ∈ {0, 1} , define the G, θ(G) = κ(HΩ) and the minimum fraction clique cover ∗ ∗ confusability graph of A to be the graph with vertex set X θ (G)= κ (HΩ). and adjacency matrix A ◦ AT − I. Unlike the program derived from the edge set, θ∗(G) gives Because A ◦ AT − I is a zero-one symmetric matrix with a nontrivial upper bound on α(G). In fact, θ∗(G) is the zeros on the diagonal, the confusability graph is simple and best sphere packing bound for any channel that has G as its undirected. From (3), vertices u and v are adjacent in the con- confusability graph. fusability graph of if and only if and intersect. A N(u) N(v) Lemma 5. Let G be a graph with vertex set X and let Figure 2 shows an example of a channel, its composition with Ω , Ω ⊆ 2X be families of cliques that cover every edge its reverse, and its confusability graph. 1 2 in G. Let H1,H2 be the vertex-clique incidence matrices for Definition 9. Let G be an undirected simple graph with vertex Ω1 and Ω2 respectively. If for each R ∈ Ω1 there is some ∗ ∗ set X. A set S ⊆ X is independent in G if and only if for all S ∈ Ω2 such that R ⊆ S, then p (H2) ≤ p (H1). u, v ∈ S, u and v are not adjacent. The maximum size of an Proof: A clique S gives the constraint wx ≤ 1 in independent set in G is denoted by α(G). x∈R p. If R ∈ Ω1, S ∈ Ω2, and R ⊆ S, then the constraint from P Now we have a second important characterization of codes. R is implied by the constraint for S. Any additional cliques in Ω2 can only reduce the feasible space for p(H2). Thus the Lemma 4. Let G be the confusability graph for a channel feasible space for p(H2) is contained in the feasible space for A ∈ {0, 1}X×Y . Then a set C ⊆ X is code for a A if and p(H1). only if it is an independent set in G. Thus α(G)= p(A). Corollary 1. Let A ∈{0, 1}X×Y be a channel and let G be Proof: A set C is not a code if and only if there is some the confusability graph for A. Then θ∗(G) ≤ κ∗(A). y such that N(y) contains distinct codewords u and v, or T equivalently y ∈ N(u) ∩ N(v). This means (A ◦ A )u,v =1, Proof: For each output y ∈ Y , N(y) is a clique in u and v are adjacent in the confusability graph, and C is not G and these clique cover every edge of G. Each clique in independent. G is contained in a maximal clique, so the claim follows The confusability graph does not contain enough informa- immediately from Lemma 5. tion to recover the original channel graph, but it contains Corollary 1 suggests that we should ignore the structure of enough information to determine whether a set is a code for our original channel A and try to compute θ∗(G) instead of the original channel. κ∗(A). However, there is no guarantee that we can efficiently construct the linear program for θ∗(G) by starting with G and searching for all of the maximal cliques. We are often B. Families of channels with the same codes interested in graphs with an exponential number of vertices. There are many different channels that have G as a con- Even worse, the number of maximal cliques in G can exponen- fusability graph. A clique in a graph G is a set of vertices S tially in the number of vertices. To demonstrate this, consider such that for all distinct u, v ∈ S, {u, v} ∈ E(G). If G is a complete k-partite graph with 2 vertices in each part. If we the confusability graph for a channel A ∈ {0, 1}X×Y , then select one vertex from each part, we obtain a maximal (and for each y ∈ Y , N(y) is a clique in G. Let Ω ⊆ 2X be a also maximum) clique. The graph has 2k vertices, but there family of cliques that covers every edge in G. This means are 2k maximal cliques. that for all {u, v} ∈ E(G), there is some S ∈ Ω such that The fractional clique cover number has been considered in u, v ∈ S. Let H ∈{0, 1}X×Ω be the vertex-clique incidence the literature in connection with the Shannon matrix: Hx,S = 1 is x ∈ S and Hx,S = 0 otherwise. Then capacity of a graph, Θ(G). The capacity of a combinatorial n 1 α(G)= p(H). channel A is limn→∞ p(A ) n , the number of possible mes- Thus each family of cliques that covers every edge gives sages per channel use when the channel can be used many us an integer linear program that expresses the maximum times. Like p(A), the capacity of the channel depends only independent set problem for G. These programs all contain on its confusability graph. Thus the Shannon capacity of a the same integer points, the indicators of the independent sets graph G can be defined as the capacity of a channel with of G. However, their polytopes are significantly different so the confusability graph G. The Shannon capacity of a graph is at fractional relaxations of these programs give widely varying least as large as the maximum independent set and is extremely upper bounds on α(G). difficult to compute. Shannon used something equivalent to a 9 clique cover as an upper bound for Shannon capacity [13]. Two special cases give familiar bounds. For even s, setting Rosenfeld showed the connection between Shannon’s bound a =0 and b = s/2 produces the Hamming bound: and linear programming [14]. Shannon also showed that the qn ∗ ∗ feedback capacity of a combinatorial channel A is p (A). κ (Aq,n,0,s/2)= . s/2 n (q − 1)i Lovasz introduced the Lovasz theta function of a graph, ϑ(G), i=0 i and showed that it was always between the Shannon capacity Setting a = s and b =0 producesP the Singleton bound: and the fractional clique cover number [15]. All together, we ∗ n−s have κ (Aq,n,s,0)= q . α(G) ≤ Θ(G) ≤ ϑ(G) ≤ θ∗(G). For q = 2, the Hamming bound is always the best bound in this family. When q is at least 3, each bound in the family The Lovasz theta function is derived via semidefinite program- is the best for some region of the parameter space. ming and consequently is not a sphere-packing bound. Lemma 6. κ∗(A ) ≤ κ∗(A ) when a + qb ≤ There are also several connections between these con- q,n,a,b q,n,a+2,b−1 n − 1. cepts and communication over probabilistic channels. For a combinatorial channel A, the minimum capacity over the The proof of Lemma 6 can be found in Appendix A. probabilistic channels with support A is p∗(A). Recently Dalai Theorem 5. Let q,n,s ∈ N such that q ≥ 3, 0 ≤ s ≤ n − 1, has proven upper bounds on the reliability function of a and s even. Then probabilistic channel that are finite for all rates above at the s/2 s ≤ 2 (n − 1) (logarithmic) Shannon capacity of the underlying confusion ∗ q graph, in contrast to previous bounds that were finite for rates argmin κ (Aq,n,s−2b,b)= n−1−s 2 0≤b≤s/2 ( q−2 s ≥ q (n − 1) above log p∗(A) [16]. The idea of multiple channels with the 2 j k same confusion graph plays an important role here. For fixed δ, q ≤ δ ≤ 1, and s = δn

1 ∗ lim log min κ (Aq,n,s−2b,b)=(1 − δ) log(q − 1). C. Hamming and Singleton Bounds n→∞ n 0≤b≤s/2 Sometimes a channel has some special structure that allows Proof: Let a +2b = s, so a + qb = s + (q − 2)b. From ∗ us to find an easily described family of channels with the same Lemma 6, κ (Aq,n,0,s/2) is the smallest in the family when s 2 codes. Then we can optimize over the family by computing the s + (q − 2) 2 ≤ n − 1 or equivalently s ≤ q (n − 1). a bound for each channel and using the best. This technique For b ≥ 1 the following are equivalent: has been successfully applied to deletion-insertion channels by ∗ ∗ ∗ κ (Aq,n,a+2,b−1) ≥ κ (Aq,n,a,b) ≤ κ (Aq,n,a−2,b+1) Cullina and Kiyavash [1]. Any code capable of correcting s s + (q − 2)b ≤ n − 1 ≤ s + (q − 2)(b + 1) deletions can also correct any combination of s total insertions n − 1 − s and deletions. Two input strings can appear in an s-deletion- b ≤ ≤ b +1. correcting code if and only if the deletion distance between q − 2 them is more than s. In the asymptotic regime with n going Let b∗ be the optimal choice of b. Then to infinity and s fixed, each channel in the family becomes b∗ 1 − δ approximately regular. Thus the degree threshold bound gives lim = , n→∞ n q − 2 a good approximation to the exact sphere-packing bound for n − s +2b∗ 1 − δ q(1 − δ) these channels. The best bound comes from a channel that lim =1 − δ +2 = , performs approximately qs deletions and s insertions, n→∞ n q − 2 q − 2 q+1 q+1 b∗ 1 where q is the alphabet size. lim ∗ = . In this section we present a very simple application of this n→∞ n − s +2b q technique. Consider the channel that takes a q-ary vector of Finally, ∗ length n as its input, erases a symbols, and substitutes up to b n−s+2b n n n−a 1 q symbols. Thus there are q channel inputs, q outputs, lim log ∗ ∗ a n→∞ b n−s+2b i n b n−a n (q − 1) and each input can produce (q − 1)i possible i=0 i a i=0 i  n − s +2b∗ outputs. Two inputs share a common output if and only if    = lim P log q their Hamming distance is at mostP s = a + 2b. For each n→∞ n n − s +2b∗ b∗ b∗ choice of n and s, we have a family of channels with identical − H − log(q − 1) n 2 n − s +2b∗ n confusability graphs. Call the q-ary n-symbol a-erasure b-   substitution channel A . These channels are all input and q(1 − δ) q(1 − δ) 1 − δ q,n,a,b = log q − H (1/q) − log(q − 1) output regular, so q − 2 q − 2 2 q − 2

n n−a 1 − δ q ∗ a q = q log q − log q − (q − 1) log − log(q − 1) κ (Aq,n,a,b)= q − 2 q − 1 n b n−a i   a i=0  i (q − 1) 1 − δ qn−a = ((q − 1) log(q − 1) − log(q − 1)) =  P  . q − 2 b n−a i i=0 i (q − 1) = (1 − δ) log(q − 1) P  10

1 lim log κ∗ is a dominating set in G if and only if S is an output covering n→∞ n for B. Thus γ(G)= κ(B). log 4 We are interested in dominating sets because of their relationship with independent sets. Lemma 7. For any graph G, γ(G) ≤ α(G). Proof: If no additional vertices can be added to an inde- pendent set, each vertex of G is either in the independent set 1 or adjacent to a vertex in the independent set. Consequently, 2 log 3 any is dominating. Because dominating set is a minimization problem, its fractional relaxation is a lower bound. Overall, we have

δ ∗ ∗ 0 1 1 p (B)= κ (B) ≤ κ(B)= γ(G) ≤ α(G)= p(A). 2 This motivates simple lower bounds for p∗(B). In Sections III Fig. 3. Sphere-packing bounds for channel performing substitution and IV, we discussed the minimum degree, degree sequence, errors and erasures. The curved line is the Hamming bound, which is 1 ∗ and local degree upper bounds. Each of these has a lower limn→∞ n log κ (A4,n,0,s/2). The upper straight line is the Single- 1 ∗ bound analogue. The maximum degree lower bound is ton bound, which is limn→∞ n log κ (A4,n,s,0). The straight line run- ning from ( 1 , 1 log 3) to (1, 0) is the optimized sphere-packing bound, 2 2 1T 1 ∗ ∗ t limn→∞ log min0≤b≤s/2 κ (A4,n,s−2b,b). n pMDL (A, t)= T . maxy∈Y (A t)y The degree sequence lower bound is which proves the last claim. ∗ 1T This family of bounds fills in the convex hull of the pDSU(A, t)= max w w∈RX Hamming and Singleton bounds. Figure 3 plots this optimized s. t. 0 ≤ w ≤ 1 bound, the Hamming bound, and Singleton bound for q =4. T T T 1 There are several open questions regarding families of t A w ≤ t . channels with the same confusability graphs. Under what The local degree lower bound is conditions can we find these families? What is the relationship t between these families and distance metrics? When we have ∗ x κLDL (A, t)= T . max (A t)y a family of channels that are not input or output regular, what x∈X y∈N(x) X should we do to get the best bounds? These satisfy the same inequalities as the upper bound versions but in the reverse order: VI. LOWER BOUNDS p∗ (A, t) ≤ p∗ (A, t) ≤ κ∗ (A, t) ≤ p∗(A). In this section, we will discuss two sources of lower bounds MDL DSL LDL on the size of an optimal code: sphere-covering bounds related The quantity p(B) also has combinatorial significance. Let to the dominating set problem and bounds related to Tur´an’s G2 be a graph with the same vertices as G. Distinct vertices theorem. The dominating set problem is closely connected are adjacent in G2 if they are connected by a path of length to linear programming, so there is a local degree sphere- at most 2 in G. Then the confusability graph of the channel covering lower bound. This lower bound has some interesting B is G2 and α(G2)= p(B). relationships with the Tur´an type bounds. B. Caro-Wei, Motzkin-Straus, and Turan´ Theorems A. Sphere-covering and dominating sets Tur´an’s theorem is Recall a few facts from Section V-A. Codes for a channel are |X| independent sets in the confusability graph for that channel. α(G) ≥ , For a channel A ∈{0, 1}X×Y , the confusability graph G has 1+ d(G) adjacency matrix B − I where B = A ◦ A. The size of the 2|E| 1 where d(G) = |X| = |X| x∈X dG(x) is the average largest code is α(G) = p(A). We will use these notational degree of G. In this section, we discuss two strengthenings conventions throughout this section. of Tur´an’s theorem: the Caro-WeiP theorem and the Motzkin- Definition 11. Let G be a graph with vertex set X. A set Straus theorem. All three of these theorems are often stated in S ⊆ X is dominating in G if and only if for all x ∈ X \ S, terms of cliques rather than independent sets. To convert from there is some u ∈ S, such that x and u are adjacent. The one form to the other, replace the graph with its complement. ∗ 1 minimum size of a dominating set in G is denoted by γ(G). Like pDSL (B, ), the Caro-Wei theorem uses the degree sequence of the graph [17]. It states that for a graph G, Dominating set is a covering problem. A vertex u ∈ S covers itself and all adjacent vertices. Let G be a simple graph 1 1 α(G) ≥ = 1 . with vertex set X and adjacency matrix B − I. Then S ⊆ X 1+ dG(x) (B )x xX∈X xX∈X 11

The following is an slight generalization of the Caro-Wei α(G2)= p(B) κ(B)= γ(G) theorem. ∗ ∗ Lemma 8. Let B − I be the adjacency matrix of a graph p (B)= κ (B) G with vertex set X. For any t ∈ RX such that t ≥ 0 and Bt > 0, define ∗ ∗ κLDL (B,t) κLDU (B,t) t α (G, t)= x . CW ∗ (Bt)x pDSL (B,t) αCW (G, t) α(G) xX∈X Then α(G) ≥ α (G, t). CW ∗ pMDL (B,t) αMS (G, t) Proof: Let h ∈ RX be a vector of independent expo- nentially distributed random variables such that hx has mean Fig. 4. Lower bounds on α(G), where B − I is the adjacency matrix of G. 1/tx. Order the vertices by the value of their entry in h. Let For each arrow, the quantity at the tail is at most as large as the quantity at the head. If G is regular, then p∗ (B, t) = κ∗ (B, t) so all of the fractional S(h) = {x ∈ X : h < min h }. This set contains MDL LDU x u∈N(x) u sphere-covering bounds and Tur´an type bounds to that value. the vertices that appear earlier in the ordering than all of their neighbors. For any h, S(h) is an independent set in G. Then t The inequalities among all of these lower bounds on α(G) E |S(h)| = Pr[x ∈ S(h)] = x . (Bt)x are summarized in Figure 4. Note that there is no inequality xX∈X xX∈X relating the Caro-Wei number to either the domination number E Some independent set must be at least as large as |S(h)|. or the fractional domination number. The star graph K1,k The Motzkin-Straus theorem is an immediate consequence demonstrates that the Caro-Wei number can much larger than of Lemma 8. It states that for any X such that 0 1 k 1 t ∈ R t ≥ the domination number: αCW (K1,k, ) = 2 + k+1 while 0 2 and Bt > , α(K ) = γ(K1,k)=1. For the n-vertex path graph Pn, T 2 1,k (1 t) the inequality can go in the other direction: α(G) ≥ T . 2 t Bt G αCW (G, 1) α(G )= γ(G) 1 Call the quantity in this lower bound αMS (G, t). The function P3k k + 3 k −1 2 f(x)= x is convex, so by Jensen’s inequality P3k+1 k + 3 k +1 −1 P3k+2 k +1 k +1 1T 2 For the strong graph product of copies of , the Caro- T tx −1 T tx ( t) n P4 1 t (Bt) ≥ 1 t (Bt) = . 5 n n 1T x 1T x T Wei number is and the domination number is 2 . Thus t t ! t Bt 3 xX∈X xX∈X the gap between the two bounds can be arbitrarily large in Specializing to t = 1, we obtain Tur´an’s theorem: either direction. The Caro-Wei number is always larger than any of the approximations to the fractional domination number (1T 1)2 |X|2 |X| |X| = = = . presented in this paper. If the exact value fractional domination 1T B1 |X| +2|E(G)| 2|E(G)| 1+ d(G) 1+ |X| number cannot be efficiently computed, the Caro-Wei number The expression for Tur´an’s theorem is very similar to the is likely to be the best available lower bound. ∗ 1 expression for the maximum degree bound pMDL (B, ), except that the maximum degree has been replaced with the average C. Bounds using only the number of edges degree. Thus Tur´an’s theorem is always stronger. In fact, Tur´an’s theorem uses the number of edges in G, or equiva- ∗ pMDL (B,t) ≤ αMS (G, t) for all admissible t. Additionally, the lently the number of edges in the channel graph of B, to give 1 |X| vector z = 1 = T 1 is feasible in the program for 1+d(G) 1 B1 a reasonably good lower bound on α(G) = p(A). However, ∗ 1 ∗ 1 1 pDSL (B, ), so pDSL (B, ) ≤ αMS (G, ). the best lower bound on p(A) that uses only the number of The Caro-Wei lower bound αCW (G, t) is always at least as edges in the channel graph of A is very weak. ∗ large as the degree sequence lower bound p (B,t). In fact, X×Y DSL Lemma 9. Let A ∈ {0, 1} be a channel and let E be the Caro-Wei number is always between the local degree lower the edge set of the channel graph. Then and upper bounds on κ∗(B). For any x ∈ X, |X| + |Y | − |E|≤ p(A). (5) min (Bt)y ≤ (Bt)x ≤ max (Bt)y y∈N(x) y∈N(x) For any X, Y , and R ⊆ Y such that |R| ≤ |X|, there is a so channel A such that |E| = |X|+|Y |−|R| and R is an output ∗ tx covering in A. Thus (5) cannot be improved. κLDL (B,t) = max (Bt)y x∈X y∈N(x) Proof: For each y ∈ Y , we select |N(y)|− 1 inputs to X tx forbid from the code. We forbid at most |E|−|Y | total inputs, ≤ αCW (G, t) = (Bt)x so our code contains at least |X| + |Y | − |E| inputs. x∈X X We construct the tightness example A as follows. Choose ∗ tx ≤ κLDU (B,t)= . the neighborhoods of the outputs in R so that each is min (Bt)y x∈X y∈N(x) nonempty, they are disjoint, and . Meeting X y∈R N(y) = X S 12 the first two conditions is possible because |R| ≤ |X|. Because [8] V. I. Levenshtein, “Binary codes capable of correcting deletions, inser- the union of neighborhoods cover all of X, R is a covering tions, and reversals,” in Soviet physics doklady, vol. 10, 1966, p. 707710. [9] A. Fazeli, A. Vardy, and E. Yaakobi, “Generalized sphere packing and p(A) ≤ |R|. We have included |X| edges so far. For each bound,” arXiv:1401.6496 [cs, math], Jan. 2014. [Online]. Available: y ∈ Y \ R, let |N(y)| =1 and choose the neighbor arbitrarily. http://arxiv.org/abs/1401.6496 Thus |E| = |X| + |Y | − |R|. [10] N. J. A. Sloane, Challenge Problems: Independent Sets in Graphs. [Online]. Available: http://neilsloane.com/doc/graphs.html Only a few edges are needed to create a small number of [11] A. Mazumdar, A. Barg, and N. Kashyap, “Coding for high-density output vertices with large degree. Compare this to Lemma 3. recording on a 1-d granular magnetic medium,” Information Theory, For the tightness example A, note that the channel graph of IEEE Transactions on, vol. 57, no. 11, p. 74037417, 2011. 2 2 |X| [12] A. Sharov and R. M. Roth, “Bounds and constructions for granular B = A ◦ A has y∈R N(y) edges. This is at least |R| , media coding,” in Information Theory Proceedings (ISIT), 2011 IEEE which is usually much larger than |X| + |Y | − |R| and is also International Symposium on, 2011, p. 23432347. exactly the numberP of edges forced by Tur´an’s theorem. [13] C. Shannon, “The zero error capacity of a noisy channel,” IRE Trans- actions on Information Theory, vol. 2, no. 3, pp. 8–19, 1956. [14] M. Rosenfeld, “On a problem of C. E. Shannon in graph theory,” Proceedings of the American Mathematical Society, vol. 18, no. 2, pp. VII. CONCLUSION 315–319, Apr. 1967. We have discussed a wide variety of upper and lower [15] L. Lovasz, “On the Shannon capacity of a graph,” IEEE Transactions on Information Theory, vol. 25, no. 1, pp. 1–7, Jan. 1979. bounds on the size of codes for combinatorial channels. We [16] M. Dalai, “Lower bounds on the probability of error for classical and can summarize the most important points as follows. When classical-quantum channels,” Information Theory, IEEE Transactions on, the channel is input-regular, the minimum degree, degree se- vol. 59, no. 12, pp. 8027–8056, 2013. [17] N. Alon and J. H. Spencer, “Tur´an’s theorem,” in The probabilistic quence, and local degree upper bounds here are equivalent, but method. John Wiley & Sons, 2004, pp. 95–96. not necessarily equal to the fractional covering number. When the channel is also output-regular, all of the upper bounds APPENDIX A discussed become equivalent. Knowledge of the average input PROOFS degree alone gives a very weak bound. If the input degrees concentrate around the average, the degree sequence bound Theorem 1. Let will be fairly strong. The local degree bound is always at 1 max(2u − b − 2, 0) −1 least as good as the degree sequence bound but uses more f(r,u,b)= 1+ . r (r + 2)(r + 1) information about the structure of the channel. The local   ∗ degree bound can be iterated to obtain stronger bounds. The Then the vector zy = f(ry,uy,by) is feasible for κ (An), so ∗ T best sphere packing bound for a given channel can be much κ (An) ≤ 1 z. weaker than the best sphere packing bound for some other Proof: By Lemma 1, ϕ ◦ ϕ(1) is feasible for κ∗(A ). channel that admits the same codes. Consequently, finding n From the definition of ϕ, a family of channels equivalent to the channel of interest can be very powerful. If the confusion graph is regular, all zy = min (Anz)x lower bound discussed in this paper are equivalent. In contrast ϕ(z)y x∈N(y) to the situation for upper bounds, knowledge of the average n ′′ Each x ∈ [2] has rx total substrings, so (Anz )x = rx, degree alone is sufficient to obtain a good lower bound. The best of the lower bounds discussed are the Caro-Wei bound 1 = min (An1)x = min rx = ry, and the fractional dominating set number. These bounds are ϕ(1)y x∈N(y) x∈N(y) incomparable in general, but the Caro-Wei bound is better than and ϕ(1) =1/r . the local degree lower bound for fractional dominating set. y y Of the substrings of x, ux − bx have rx − 2 runs, bx have rx − 1 runs, and rx − ux have rx runs, so REFERENCES (Anϕ(1))x [1] D. Cullina and N. Kiyavash, “An improvement to Levenshtein’s upper 1 bound on the cardinality of deletion correcting codes,” in IEEE Inter- = national Symposium on Information Theory Proceedings, Jul. 2013. ry [2] A. A. Kulkarni and N. Kiyavash, “Nonasymptotic upper bounds for y∈XN(x) deletion correcting codes,” Information Theory, IEEE Transactions on, u − b b r − u = x x + x + x x vol. 59, no. 8, pp. 5115–5130, 2013. r − 2 r − 1 r [3] N. Kashyap and G. Z´emor, “Upper bounds on the size of grain- x x x correcting codes,” arXiv preprint arXiv:1302.6154, 2013. [Online]. 1 1 1 1 Available: http://arxiv.org/abs/1302.6154 =1+ ux − + bx − rx − 2 rx rx − 1 rx − 2 [4] R. Gabrys, E. Yaakobi, and L. Dolecek, “Correcting grain-errors in     magnetic media,” in Information Theory Proceedings (ISIT), 2013 IEEE 2ux(rx − 1) − bxrx International Symposium on, 2013, p. 689693. =1+ rx(rx − 1)(rx − 2) [5] S. Buzaglo, E. Yaakobi, T. Etzion, and J. Bruck, “Error- (2u − b )(r − 2)+2(u − b ) correcting codes for multipermutations,” 2013. [Online]. Available: =1+ x x x x x http://authors.library.caltech.edu/36946/ rx(rx − 1)(rx − 2) [6] D. B. West, Introduction to graph theory. Prentice Hall Upper Saddle River, 2001, vol. 2. 2ux − bx 1−ǫ ≥ 1+ . [7] J. Hastad, “Clique is hard to approximate within n ,” in Foundations rx(rx − 1) of Computer Science, 1996. Proceedings., 37th Annual Symposium on. IEEE, 1996, p. 627636. The inequality follows from ux − bx ≥ 0. 13

n−1 n Let y ∈ [2] be a string and let x ∈ [2] be a superstring If zx = f(rx,ux,bx), then of y. It is possible to create a superstring by extending an existing run, adding a new run at an end of the string, or by splitting an existing run into three new runs, so T rx ≤ ry +2 1 z = f(rx,ux,bx) The only way to destroy a unit run in y is to extend it into a run n xX∈[2] of length two, so ux ≥ uy −1. Similarly, ux −bx ≥ uy −by −1, =2f(1, 0, 0)+ so 2ux − bx ≥ 2uy − by − 2. Applying these inequalities to n r 2 1 n − r − 1 r − 2 2 (Anϕ( ))x, we conclude that 2 f(r,u,b) n − 2r + u u − b b 1 r=2 u=0 ϕ( )y 1 X X Xb=0     1 = min (Anϕ( ))x (ϕ ◦ ϕ( ))y x∈N(y) =2n E E [f(r,u,b)] (6) max(2u − b − 2, 0) r u,b ≥ 1+ ,   (ry + 2)(ry + 1) 1 max(2u − b − 2, 0) −1 (ϕ ◦ ϕ(1)) ≤ 1+ . Analysis of the feasible point constructed in Theorem 1 y r (r + 2)(r + 1) y  y y  relies on the following identities. For k ≥ 0,

Lemma 10. The number of strings in [2]n with r runs is n+k−1 n−1 . 1 2 n−r E r≥1 n−r n = The number of strings in [2] with r runs and u unit runs r r+k−1 n−1 n+k−1 " r−1 # P2 n−1  is2 n−r−1 r . n−2r+u r−u 2k For r ≥ 2, the number of strings in [2]n with r runs, u unit  ≤  (7)   n−r−1 r−2 2 n+k−1 runs and b external unit runs is 2 n−2r+u u−b b . n−1 r r−1 n+k−1 u r−k r−k−1 Proof: For k ≥ 1, there are n ways to partition E = (8) u,b u − k n−1 n identical items into k distinguished groups. Thus there are   n−k−1  n−lk+k−1 n−(l−1)k−1  = ways to partition n items into k 2(r − 1) n−lk n−lk E [b]= .  (9) groups such that each group contains at least l items. u,b n − 1 A binary string is uniquely specified by its first symbol and it run length sequence. We have n symbols to distribute among r runs such that each run contains at least one symbol, Each of these can be easily derived from the binomial theorem n−1 so there are n−r arrangements. This proves the first claim. and Vandermonde’s identity. We can also specify the run sequence of a string by giving the locations of the unit runs and the lengths of the longer Theorem 2. For n ≥ 2, runs. The r − u runs of length at least two can appear in r positions so there are r arrangements, We have n − u r−u n symbols to distribute among r − u runs such that each run ∗ 2 26  n−u−(r−u)−1 κ (An) ≤ 1+ . contains at least 2 symbols, so there are = n +1 n(n − 1) n−u−2(r−u)   n−r−1 arrangements, which proves the second claim. As n−2r+u  long as r ≥ 2, the internal unit runs two can appear in r − 2 positions and the external unit runs can appear in 2 positions, Proof: For n ≤ 13, this follows from the bound of r−2 2 so there are u−b b possible arrangements, which proves the Kulkarni and Kiyavash [2]. This proof covers n ≥ 10. third claim. ∗   From Theorem 1 and (6), we have κ (An+1) ≤ Note that Lemma 10 uses the polynomial definition of n 2 Er[Eu,b[f(r,u,b)]] where binomial coefficients, which can be nonzero even when the top entry is negative. For example, the number of strings of length n with n runs, n unit runs, and 2 external unit runs is −1 −1 n−2 2 1 max(2u − b − 2, 0) 2 0 n−2 2 =2. f(r,u,b)= 1+ . For compactness, let r (r + 2)(r + 1)      1 n − 1 E[f(r)] = f(r). r n−1 2 n − r −1 2 Xr≥1   For x> 0, (1 + x) ≤ 1 − x + x , so f(r,u,b) is at most and let E [f(r,u,b)] equal u,b r 2 1 max(2u − b − 2, 0) (max(2u − b − 2, 0))2 1 n − r − 1 r − 2 2 1 − + f(r,u,b) r (r + 2)(r + 1) (r + 2)2(r + 1)2 n−1 n − 2r + u u − b b   n−r u=0 b=0     1 2u − b − 2 2u(2u − 2) X X ≤ − + for r> 1and let Eu,b[f(1,u,b)] = f(1, 0, 0). r (r + 2)(r + 1)r (r + 2)2(r + 1)2r 14

We will bound this term by term using (7), (8), and (9). First 1 u − b E E 1 2u − b − 2 2n E E 1 − − r r u,b r (r + 2)(r + 1)r u,b r (r − 1)(r − 2)        1 1 r(r − 1) 2(r − 1) n E 1 1 r(r − 1) 2(r − 1) = E − 2 − − 2 =2 1 − − r r (r + 2)(r + 1)r n − 1 n − 1 r r (r − 1)(r − 2) n − 1 n − 1        2 1 1 E 1 2 (r − 1) − n +1 =2n E 1 − = − · r r n − 1 r r n − 1 (r + 2)(r + 1)r      2n − 1 1 2 E n − 1 1 5 n − 10 =2n 1 − = − + + 2n−1n n − 1 n − 1 r 2r r (r + 1)r (r + 2)(r + 1)r     n+1 2 n − 3 5 n − 10 2 − 2 (n + 2)(n − 2) = E + + = n − 1 r 2r (r + 1)r (r + 2)(r + 1)r n +2 n(n − 1)   n+1   2 n − 3 20 8n − 80 2 − 2 1 3 ≤ + + = 1+ − . n − 1 n (n + 1)n (n + 2)(n + 1)n n +2 n n(n − 1)     2 n2 − n − 6 28n − 40 = + n − 1 (n + 2)n (n + 2)(n + 1)n   Theorem 3. Let An be the n-bit 1-grain-error channel. The 2 n2 − n − 6 28n − 40 = + vector n +2 n(n − 1) (n + 1)n(n − 1) −1   R L 2 22n − 46 1 2uy − 2b − b − 2 = 1+ z = 1+ y y n +2 (n + 1)n(n − 1) y r (r + 2)(r + 1)   y y y ! 2 22 < 1+ . is feasible for ∗ . n +2 (n + 1)n κ (An)   Proof: By Lemma 1, ϕ◦ϕ(1) is feasible for κ∗(A). From Second, the definition of ϕ, 4u(u − 1) z E E y = min (A z) r u,b (r + 2)2(r + 1)2r n x    ϕ(z)y x∈N(y) 4 r(r − 1)2(r − 2) = E · Each x ∈ [2]n has r total neighbors, so (A z′′) = r , r (r + 2)2(r + 1)2r (n − 1)(n − 2) x n x x   4 (r − 1)(r − 2) 1 1 < E · · = min (A1)x = min rx = ry, r (r + 2)(r + 1) (n − 1)(n − 2) r ϕ(1) x∈N(y) x∈N(y)   y 4 (r + 2)(r + 1) 1 ≤ E · · and ϕ(1)y =1/ry. r (r + 2)(r + 1) (n + 2)(n + 1) r L R L   Of the neighbors of x, ux − bx − bx have rx − 2 runs, bx 4 1 R 1 = E · have rx −1 runs, and rx −ux +bx have rx runs, so (Anϕ( ))x r (n + 2)(n + 1) r   equals 8 ≤ . 1 (n + 2)(n + 1)n ry y∈N(x) Combining these two terms, we get X u − bL − bR bL r − u + bR = x x x + x + x x x 2 22 4 r − 2 r − 1 r κ∗(A ) ≤ 2n 1+ + . x x x n+1 n +2 (n + 1)n (n + 1)n 1 1 1 1   =1+(u − bR) − + bL − x x r − 2 r x r − 1 r − 2  x x   x x  R L 2(ux − bx )(rx − 1) − bx rx ′ =1+ Lemma 2. Let zy = f (ry,uy,by). Then rx(rx − 1)(rx − 2) (2u − 2bR − bL)(r − 2)+2(u − bR − bL) 2n − 2 1 3 =1+ x x x x x x x 1T z ≥ 1+ − r (r − 1)(r − 2) n +1 n − 1 (n − 1)(n − 2) x x x   2u − 2bR − bL ≥ 1+ x x x . Proof: rx(rx − 1) 1 u − b Let x ∈ [2]n be an input and let y ∈ N(x). A grain error f ′(r,u,b) ≥ 1 − r r2 can leave the number of runs unchanged, destroy a unit run   1 u − b at the start of x, or destroy a unit run in the middle of x, ≥ 1 − r (r − 1)(r − 2) merging the adjacent runs. Thus ry ≥ rx − 2 The only way to   produce a unit run in y is shorten a run of length two in x, so 15

R L R L ux ≥ uy −1. Similarly, 2ux −2bx −bx ≥ 2uy −2by −by −2. Applying these inequalities to (Aϕ(1))x, we conclude that R L ϕ(1)y 2uy − 2by − by − 2 = min (Aϕ(1))x ≥ 1+ , (ϕ ◦ ϕ(1))y x∈N(y) (ry + 2)(ry + 1) −1 R 1 2uy − 2by − by − 2 (ϕ ◦ ϕ(1))y ≤ 1+ . ry (ry + 2)(ry + 1) !

∗ ∗ Lemma 6. κ (Aq,n,a,b) ≤ κ (Aq,n,a+2,b−1) when a + qb ≤ n − 1. Proof: We can rewrite the initial inequality as

∗ ∗ κ (Aq,n,a+2,b−1) ≥ κ (Aq,n,a,b) qn−a−2 qn−a ≥ b−1 n−a−2 i b n−a i i=0 i (q − 1) i=0 i (q − 1) b b−1 P n − a  P n −a − 2 (q − 1)i ≥ q2 (q − 1)(10)i i i i=0 i=0 X   X   To simplify (10), we us the following identity:

b n − c +2 (q − 1)i i i=0   Xb n − c n − c n − c = +2 + (q − 1)i i − 2 i − 1 i i=0 X       b−2 n − c b−1 n − c = (q − 1)i+2 +2 (q − 1)i+1+ i i i=0   i=0   X b X n − c (q − 1)i i i=0 X   n − c n − c = (q − 1)b − (q − 1)b+1+ b b − 1     b−1 n − c ((q − 1)2 + 2(q − 1)+1) (q − 1)i i i=0 X   n − c n − c − b +1 = (q − 1)b − q +1 + b − 1 b     b−1 n − c q2 (q − 1)i. i i=0 X   By setting c = a +2, we can use this to rewrite the left side of (10). Eliminating the common term from both sides of the inequality gives n − a − 2 n − a − b − 1 (q − 1)b − q +1 ≥ 0 b − 1 b     n − a − b − 1 − q +1 ≥ 0 b n − a − 1 − qb ≥ 0 which proves the claim.