On Designing Probabilistic Supports to Map the Entropy Region
Total Page:16
File Type:pdf, Size:1020Kb
On Designing Probabilistic Supports to Map the Entropy Region John MacLaren Walsh, Ph.D. & Alexander Erick Trofimoff Dept. of Electrical & Computer Engineering Drexel University Philadelphia, PA, USA 19104 [email protected] & [email protected] Abstract—The boundary of the entropy region has been shown by time-sharing linear codes [2]. For N = 4 and N = 5 to determine fundamental inequalities and limits in key problems r.v.s, the region of entropic vectors (e.v.s) reachable by time- in network coding, streaming, distributed storage, and coded sharing linear codes has been fully determined ([5] and [15], caching. The unknown part of this boundary requires nonlinear constructions, which can, in turn, be parameterized by the [16], respectively), and is furthermore polyhedral, while the support of their underlying probability distributions. Recognizing full region of entropic vectors remains unknown for all N ≥ 4. that the terms in entropy are submodular enables the design of As such, for N ≥ 4, and especially for the case of N = 4 such supports to maximally push out towards this boundary. ¯∗ and N = 5 r.v.s, determining ΓN amounts to determining Index Terms—entropy region; Ingleton violation those e.v.s exclusively reachable with non-linear codes. Of I. INTRODUCTION the highest interest in such an endeavor is determining those From an abstract mathematical perspective, as each of the extreme rays generating points on the boundary of Γ¯∗ re- key quantities in information theory, entropy, mutual informa- N quiring such non-linear code based constructions. Once these tion, and conditional mutual information, can be expressed as extremal e.v.s from non-linear codes have been determined, linear combinations of subset entropies, every linear informa- all of the points in Γ¯∗ can be generated through time-sharing tion inequality, or inequality among different instances of these N their constructions and the extremal linear code constructions. quantities, specifies a half-space in which the entropy region For N = 4 r.v.s, determining whether a entropic vector Γ¯∗ [1], [2] must live, and vice-versa. This in turn implies that N can be achieved with the time-sharing of linear codes is characterizing the boundary of Γ¯∗ is equivalent to determining N equivalent to checking whether it obeys Ingleton’s inequality all fundamental inequalities in information theory[1], [3], [17], [5], and for N ≥ 5 violation of Ingleton’s inequality is a [2]. Other abstract equivalences have shown that determining sufficient condition for an entropic point to require nonlinear it amounts to determining all inequalities in the sizes of constructions. These sorts of conditions identify the e.v.s of in- N subgroups and their intersections[4], and all inequalities terest directly through conditions expressed in their entropies, among Kolmogorov complexities[5], and that the faces of but a method for clearly working these conditions back to Γ¯∗ encode information about implications among conditional N structure in the joint probability mass function is yet unknown. independences [6] which are key to the language of graphical Connections with inequalities for the sizes of subgroups of a models in machine learning. From a more applied perspective, common finite group [4] has provided one direction for con- it has been shown the determining the boundary of Γ¯∗ is N structions probably sufficient in the limit [18], but the groups equivalent to determining the capacity regions of all networks must grow arbitrary large in proofs for these constructions under network coding[7], [8], which in turn have been shown to suffice, and extremality of scores in various directions is to be key ingredients in building optimal coding protocols for difficult to couple with group properties other than being non- streaming information with low delay over multipath routed abelian. On the other hand, in practice numerical optimization networks[9], [10], limits for secret sharing systems[11], as well [19], [20] to map Γ¯∗ consistently exhibits limited support of as fundamental tradeoffs between the amount of information N a non-quasi-uniform nature, motivating a study of arbitrary a large distributed information storage system can store and supports for non-uniform distributions [21]. A particularly the amount of network traffic it must consume to repair failed simple way of encoding nonlinear dependence between r.v.s disks[12], [13], [14]. in a joint probability mass function is through its probabilistic The entropy region can be broadly broken into two parts, support, or the collection of N-dimensional outcomes for the the part that can be achieved by time-sharing linear codes, and N discrete r.v.s which have strictly positive probabilities. Main the part that cannot. Here, linear codes construct the discrete Contribution: This paper advances a nascent structural theory random variables (r.v.s) as vectors created by multiplying vec- identifying and organizing which probabilistic supports can tors of r.v.s uniformly distributed over a finite field by matrices yield e.v.s in the part of Γ¯∗ reachable exclusively with non- with elements drawn from the same field. For collections of N linear codes. Additionally, in order to push the constructions as exclusively N ≤ 3 r.v.s, all entropy vectors are achievable ¯∗ close to the boundary of ΓN as possible, structural theory and This material is based upon work support by the National Science Foun- methods for ranking supports relative to one another in terms dation under Grant No. 1812965 & 1421828. of Ingleton score and other measures of code nonlinearity are investigated. The new theorems provide guidance and where we have defined H(ξ; p), and, explanation for observations that were made exclusively with ! ^ numerical experiments in [21]. H(XA) = H ξn; p ; (7) II. ENCODING SUPPORTS WITH SETS OF SET PARTITIONS n2A A collection of N discrete random variables X := so we can formally express the entropic vector as a function (X1;:::;XN ) taking values in the set X := X1 ×· · ·×XN , is " ! # often specified by defining the joint probability mass function ^ h(ξ; p) = H ξn; p jA ⊆ f1;:::;Ng ; (8) (PMF) pX : X ! [0; 1], which assigns to each vector n2A x = (x1; : : : ; xN ) 2 X its probability pX (x) := P[X = x]. Perhaps the most familiar representation of a probabilistic and the set of entropy vectors reachable with a given equiva- support is the set of outcomes mapped to strictly positive lence class of probabilistic supports as probabilities under the PMF HN (X >0) = HN (ξ) := fπ(h(ξ; Sk)) jπ 2 SN g ; (9) n ⏐ o X >0 = fx 2 X jpX (x) > 0g : (1) ⏐ Pk where Sk = p ⏐p ≥ 0; i=1 pi = 1 . Two different sup- A k-atom probabilistic support, or k-atom support for short, ports X >0 can generate identical collections of N set par- is one with jX >0j = k. titions ξ, and thus via (9) exactly the same set of entropy The key object under study is the closure of the set of vectors. For this reason, we choose to represent a N-variable entropy vectors reachable with a given k-atom support k-atom probabilistic support not as a set of vectors of out- 8 ⏐ P 9 pX (x) = 1; comes X >0 but instead as the collection of N set partitions < ⏐ x2X >0 = o ⏐ ξ. Additionally, as we are adding all permutations of the N HN (X >0) = h(pX ) ⏐ pX (x) ≥ 0 8x 2 X ; ; (2) ⏐ random variable labels back in (9), which also form the index : ⏐ pX (x) = 0 8x 2= X >0 ; ordering ξ, we can take ξ to be a multi-set, as two different with the explicit goal of providing a structural characterization orderings of the indexing of this multiset will yield identical of those k-atom supports which can generate e.v.s in the un- H (ξ)s. We further restrict consideration to ξ being a set known part of Γ¯∗ requiring non-linear codes. In this respect, N N of set partitions, as repetition of the same partition is more because Γ¯∗ is invariant to the identification of the labels of the N properly thought of as an N 0 variable k-atom support with N-r.v.s, but identification of a particular k-atom support can N 0 < N together with a repetition of identical r.v.s. Finally, break this symmetry, we consider the set of entropy vectors since the entropies will be independent of the ordering of the to be reachable from X to be not only Ho (X ) but >0 N >0 labels f1; :::; kg of the elements in (1), from the orbit of the o HN (X >0) = fπ(h) jh 2 HN (ξ); π 2 SN g (3) set of permutations of these labels Sk (the symmetric group of order k), we select the ξ which is minimum under the natural where π 2 , a permutation in the symmetric group of order SN lexicographic ordering. N, acts on the entropy vector h by permuting the subsets of f1; :::; Ng that index its dimensions. From the viewpoint of Definition 1 (Canonical Minimal k-atom support): A canon- ¯∗ the goal of mapping ΓN , those supports X >0 generating the ical minimal k-atom support is a set of N set partitions ξ same sets HN (X >0) are functionally equivalent. It is thus of of the set f1; : : : ; kg whose meet is the singletons interest to group probabilistic supports into equivalence classes ^ based on those entropy vectors HN (X >0) that they can reach. ξn = ffigji 2 f1; : : : ; kgg (10) In this regard, an important observation is that a k-atom n2f1;:::;Ng N support induces a collection of set partitions which directly and whom is minimal under lexicographic ordering in its determine the entropy vectors it can reach.