arXiv:1207.5458v1 [cs.IT] 23 Jul 2012 ls t.i oefie re) etrin vector A qua order). of fixed triples, some pairs, in all etc. of ples, entropies involved, variables ftesto l nrp rfie.Ta s o vr joint every for is, That profiles. of entropy vector an all of of distribution set the of eoe neult fadol if only and if equality an becomes rpriso hno’ entropy. the Shannon’s to ve due of the is of properties interpretations science) natural computer and and simplicity mathematics intuitive in scie natural as and well engineering as (in applications of myriad a in ca are type this [19]. also inequality, of is Shannon’s Inequalities but basic intuitive. view, and of natural point very mathematical the from complicated oain,ti nqaiymasta h uulinformati mutual the that means between inequality this notations, lsi nomto inequality information classic sntgetrta h u fteetoiso h marginal the of entropies the of sum variables i.e., the random than distributions, distributed these greater jointly of not of instance, Some is pair For a meaning. 1940s. of intuitive the entropy clear in a first have papers used very inequalities (and seminal The proven Shannon’s were distribution). tuple in inequalities every distributed information jointly for of examples of variables entropies random wh Shannon of (inequalities the inequalities in for expressed information hold be of can language theory” information the of “laws general and enn fteeieulte n h aiiyo hi s i use the their of of question validity the the raises and result inequalities applications. This practice-oriented these sense. of strong meaning Zhang very non are a inequalities asymptoti the these for in that means hold (including This not points. entropic do entropies inequality) Shannon’s conditional Yeung’97 for inequalities ai ttmn a eesl xlie,eg,i emof term optimal an in of e.g., length distribution This explained, a average for components. the code easily theorems: both be coding in can standard “uncertainties” statement of basic sum “uncertain the more hav contain than cannot properties pair These a theory. meaning: natural probability very of sense usual the vrg egh o w eaaecdsfor codes separate two for lengths average omly nomto nqaiisaejs uldescrip dual a just are inequalities information Formally, theory information Shannon’s of success the that believe We olwn ipne 1]w a a httems basic most the that say can we [15] Pippenger Following Abstract nteNnrbsns fEsnilyConditional Essentially of Non-robustness the On a W hwta w setal odtoa linear conditional essentially two that show —We 2 and n Email: − b 1 IM nv otele 2 Montpellier Univ. & LIRMM snon-negative, is ree nrpe etoiso l random all of (entropies entropies ordered H n [email protected] tpeo admvralsw aea have we variables random of -tuple .I I. ( ,b a, ( ,b a, NTRODUCTION ai Kaced Tarik ) ) sntgetrta h u fthe of sum the than greater not is ≤ I ( H a nomto Inequalities Information I : a ( b ( a | a and c + ) : ) b ) ≥ b ≥ H r needn in independent are 0 a 0 ( ssihl more slightly is R b hsinequality this ; and ) 2 nstandard In . n − 1 b Another . scalled is ybasic ry -robust cally nces dru- tion lled ,b a, the ich a e ty” on n – s ) Email: aif oelna osrit;te r locle nthe in called also are they entrop constraints; whose literature linear distributions for some as true “universal satisfy are expressed other which be entropies can mean for inequalities that We information those linear understood. theory”, conditional information less of much laws still inequ of is type another However, interpretations. intuitive variables). of permutations instanc different possible for all inequality satisfy this must (mor polymatroid above a formulated precisely, inequality the Zhang–Yeung the with cardinality satisfies polymatroid of a set that ground Mat´uˇs proved established. polymatr was and t inequalities where information to [11], between paper connection Mat´uˇs his related F. in by explained closely was nection is inequality inequality This e sev- [22]. discussed, have [17], inequality. were we this interpretations of questions, information-theoretic explanations last and in- interpretations the it eral to understand answer to How comprehensive questions mean? challenging inequality tuitively? other this raised does What result unexpected This a of example inequality? a basic represented Shannon’s an be of exist cannot a combination there that convex [5] does inequality specifically, and information More [15] linear vectors?) In a.e. of vectors. cone a.e. of inequalities? set information raised: the was question to e natural is cone inequalities dual information linear the all of class The short. constructible asymptotically nomto nqaiy ic h niels falknown linear all “conditional of a list of entire definition the general since inequality” a give information not do I nrpcvcosi ovxcn in cone convex a is vectors entropic entropic e 2] htfrevery all for for that vectors to entropic [20], is of problem see set difficult) the very probably describe (and fundamental The eogt hscoueaecalled are closure this to belong ( c hs h nqaiyfo 2]hssm xlntosand explanations some has [21] from inequality the Thus, first the with up came Yeung R.W. and Zhang Z. 1998 In : IM,CR nv otele 2 Montpellier Univ. & CNRS LIRMM, d ) [email protected] ≤ fi ersnsetoyvle fsm distribution. some of values entropy represents it if nlaefo IP Moscow IITP, from leave On 2 osrie nomto inequalities information constrained lhuhw tl ontko opeeand complete a know not do still we Although o ak flna pcs 3,[] 1] hscon- This [12]. [6], [3], spaces, linear of ranks for nriRomashchenko Andrei I ( c non-Shannon-type : d | a )+ I ( c : d Euvlnl,hwt eciethe describe to how (Equivalently, 4 n | b hti h ls faluniversal all of class the is What etr,[2,syae etr for vectors a.e. say [12], vectors, sslahsv fadol fit if only and if selfadhesive is )+ h lsr ftesto all of set the of closure the I nomto nqaiy[21]: inequality information ( a smttclyentropic asymptotically : b R )+ 2 n I − ( lna inequalities (linear 1 a h onsthat points The . : n c e 1].We [19]). see , | ti known, is It . d )+ Ingleton’s I ( a g,in .g., xactly alities Some : sof es d oids | a s c ies he or ) y e . : nontrivial inequalities in this class is very short. Here are three Together with [7], where (1–3) are proven to be essentially of them: conditional, our result indicates that (1) and (3) are very fragile (1) [20]: if I(a:b|c)= I(a:b)=0, then and non-robust properties of entropies. We cannot hope that similar inequalities hold when the constraints become soft. For I(c:d) ≤ I(c:d|a)+ I(c:d|b), instance, assuming that I(a:b) and I(a:b|c) are “very small” we cannot say that (2) [9]: if I(a:b|c)= I(b:d|c)=0, then I(c:d) ≤ I(c:d|a)+ I(c:d|b) I(c:d) ≤ I(c:d|a)+ I(c:d|b)+ I(a:b), holds also with only “a small error”; even a negligible devia- (3) [7]: if I(a:b|c)= H(c|a,b)=0, then tion from the conditions in (1) can result in a dramatic effect I(c:d) ≤ I(c:d|a)+ I(c:d|b)+ I(a:b). I(c:d) ≫ I(c:d|a)+ I(c:d|b). Conditional information inequalities (in particular, inequal- It is known that (1-3) are “essentially conditional”, i.e., they ity (2)) were used in [9] to describe conditional independences cannot be extended to any unconditional inequalities, [7], e.g., among several jointly distributed random variables. Condi- for (1) this means that for any values of “Lagrange multipliers” tional independence is known to have wide applications in λ1, λ2 the corresponding unconditional extension statistical theory (including methods of parameter identifica- tion, causal inference, data selection mechanisms, etc.), see, I(c:d) ≤ I(c:d|a)+ I(c:d|b)+ λ1I(a:b)+ λ2I(a:b|c) e.g., surveys in [2], [16]. We are not aware of any direct or does not hold for some distributions (a,b,c,d). In other words, implicit practical usage of (1-3), but it would not be surprising (1-3) make some very special kind of “information laws”: to see such usages in the future. However, our results indicate they cannot be represented as “shades” of any unconditional that these inequalities are non-robust and therefore might be inequalities on the subspace corresponding to their linear misleading in practice-oriented applications. constraints. The rest of the paper is organized as follows. We provide A few other nontrivial conditional information inequalities a new proof of why two conditional inequalities (1) and (3) can be obtained from the results of F. Mat´uˇs in [9]. For are essentially conditional. This proof uses a simple algebraic example, Mat´uˇsproved that for every integer k > 0 and for example of random variables. Then, we show that (1) and (3) all (a,b,c,d) are not valid for a.e. vectors, leaving the question for (2) open. 1 I(c:d) ≤ I(c:d|a)+ I(c:d|b)+ I(a:b)+ I(c:d|a) II. WHY “ESSENTIALLYCONDITIONAL” : AN ALGEBRAIC (∗) k COUNTEREXAMPLE k +1 + (I(a:c|d)+ I(a:d|c)) 2 Consider the quadruple (a,b,c,d)q of geometric objects, resp. A, B, C, D, on the affine plane over the finite field F (this is a special case of theorem 2 in [9]). Assume that q defined as follows : I(a:c|b) = I(b:c|a)=0. Then, as k → ∞ we get from (*) another conditional inequality: • First choose a random non-vertical line C defined by the equation y = c + c x (the coefficients c and c are (4) if I(a:c|d)= I(a:d|c)=0, then 0 1 0 1 independent random elements of the field); I(c:d) ≤ I(c:d|a)+ I(c:d|b)+ I(a:b). • pick points A and B on C independently and uniformly at random (these points coincide with probability 1/q); It can be proven that (4) is also an essentially conditional • then pick a parabola D uniformly at random in the set inequality, i.e., whatever are the coefficients λ , λ , 2 1 2 of all non-degenerate parabolas y = d0 + d1x + d2x (where d0, d1, d2 ∈ Fq, d2 6=0) that intersect C at A and I(c:d) ≤ I(c:d|a)+I(c:d|b)+I(a:b)+λ1I(a:c|d)+λ2I(a:d|c) B; (if A = B we require that C is a tangent line to D). does not hold for some distribution (a,b,c,d). When C and A, B are chosen, there exist (q −1) different Since (*) holds for a.e. vectors, (4) is also true for a.e. parabolas D meeting these conditions. vectors. Inequality (4) is robust in the following sense. Assume A typical quadruple is represented on Figure 1. that entropies of all variables involved are bounded by some h. Then for every ε> 0 there exists a δ = δ(h,ε) such that Remark 1. This picture is not strictly accurate, for the plane is discrete, but helps grasping the general idea since the relevant if I(a:c|d) ≤ δ and I(a:d|c) ≤ δ, then properties used are also valid in the continuous case. I(c:d) ≤ I(c:d|a)+ I(c:d|b)+ I(a:b)+ ε Let us now describe the entropy profile of this quadruple. (note that δ is not linear in ε). In this paper we prove that this • Every single random variable is uniform over its support. is not the case for (1) and (3) – these inequalities do not hold • The line and the parabola share some , for a.e. vectors, and they are not robust. So, these inequalities (the fact that they intersect) which is approximately one are, in some sense, similar to the nonlinear (piecewise linear) bit. Indeed, C and D intersect iff the corresponding equa- conditional information inequality from [10]. tion discriminant is a quadratic residue, which happens III. WHY (1) AND (3) DO NOT HOLD FOR A.E. VECTORS We are going to use the previous example to show that conditional inequalities (1) and (3) are not valid for asymptot- ically entropic vectors. We will use the Slepian–Wolf coding theorem (cf. [18]) as our main tool. Lemma 1 (Slepian–Wolf). Let (x, y) be joint random vari- C B ables and (X, Y ) be N independent copies of this distribution. Then there exists X′ such that H(X′|X)=0, H(X′) = H(X|Y )+ o(N) and H(X|X′, Y )= o(N). A This lemma constructs a hash of a random variable X which is almost independent of Y and has approximately the entropy of given . We will say that ′ is the Slepian–Wolf hash D X Y X of X given Y and write X′ = SW (X|Y ). In what follows we call by the entropy profile of (x1,...,xn) the vector of entropies for all non-empty subset of these random variable in the lexicographic order. We denote Fig. 1. An algebraic example it ~ H(x1,...,xn) = (H(S))∅6=S⊆{x1,...,xn}. n This is a vector in R2 −1 (dimension is equal to the number almost half of the time. of nonempty subsets in the set of n elements). q − 1 I(c:d)= Theorem 2. (1) and (3) are not valid for a.e. vectors. q Proof: For each given inequality, we construct an asymp- • When an intersection point is given, the line does not totically entropic vector which excludes it. The main step is to give more information about the parabola. ensure, via Slepian–Wolf lemma, that the constraints are met. I(c:d|a)= I(c:d|b)=0 a) An a.e. counterexample for (1): 1. Start with the quadruple (a,b,c,d)q from the previous • When the line is known, an intersection point does not section for some fixed q to be defined later. Notice that help knowing the other (by construction). it does not satisfy the constraints. 2. Serialize it: define a new quadruple (A,B,C,D) such I(a:b|c)=0 that each entropy is N times greater. (A,B,C,D) is • The probability that there is only one intersection point is obtained by sampling N times independently (ai,bi,ci, di) 1/q. In that case, the line can be any line going through according to the distribution (a,b,c,d) and letting, e.g., this point. A = (a1,a2,...,aN ). 3. Apply Slepian–Wolf lemma to get A′ = SW (A|B) such log q ′ ′ I(a:b)= H(c|a,b)= 2 that I(A :B)= o(N), and replace A by A in the quadru- q ple. The entropy profile of (A′,B,C,D) cannot vary much Now we plug the computations into the following inequal- from the profile of (A,B,C,D). More precisely, entropies ′ ities for A ,B,C,D differ from the corresponding entropies for log2 q A,B,C,D by at most I(A:B)+ o(N) = O q N . I(c:d) ≤ I(c:d|a)+ I(c:d|b)+ λ1I(a:b)+ λ2I(a:b|c) Notice that I(A′ :B|C)=0 since A′ functionally depends or on A and I(a:b|c)=0. 4. Scale down the entropy profile of (A′,B,C,D) by a factor I(c:d) ≤ I(c:d|a)+I(c:d|b)+I(a:b)+λ1I(a:b|c)+λ2H(c|a,b), of 1/N. This operation can be done within a precision of, say, o(N). Basically, this can be done because the set of which are “unconditional” counterparts of (1) and (3) respec- all a.e. points is convex (see, e.g., [19]) tively. For every constants λ1, λ2 we get 5. Tend N to infinity to define an a.e. vector. This limit vector 1 log q is not an entropic vector. For this a.e. vector, inequality 1 − ≤ (λ + λ ) 2 q 1 2 q (1) does not hold when q is large. Indeed I(A:B)/N and I(A:B|C)/N both approaches zero as N tends to infinity. and conclude they can not hold when q is large. Thus, we get On the other hand, for the resulting limit vector, inequality the following theorem (originally proven in [7]): (1) turns into Theorem 1. Inequalities (1) and (3) are essentially condi- log q log q 1+ O 2 ≤ O 2 , tional.  q   q  which can not hold if q is bigger than some constant. H(B′′|Z′′), H(A′′,B′′|Z′′) and so on. Thus, for a quasi- uniform distribution we can do “relativization” as follows. b) An a.e. counterexample for (3): We start with another Fix any value z of Z′′ and take the conditional distribution lemma based on the Slepian–Wolf coding theorem. on (A′′,B′′, C′′,D′′) given Z′′ = z. In this conditional ′′ ′′ ′′ Lemma 2. For every distribution (a,b,c,d) and every integer distribution the entropy of C given (A ,B ) is not greater N there exists a distribution (A′,B′, C′,D′) such that than ′ ′ ′ k · (H(C|A,B,Z)+ δ)= k · (δ + o(M)). • H(C |A ,B )= o(N), • The difference between corresponding components of the Also, by letting δ be small enough (e.g., δ = 1/M), all ~ ′ ′ ′ ′ ~ entropy profile H(A ,B , C ,D ) and N · H(a,b,c,d) is entropies of (A′′,B′′, C′′,D′′) given Z′′ = z differ from at most N · H(c|a,b)+ o(N). the corresponding entropies of kM · H~ (a,b,c,d) by at most ′′ Proof: First we serialize (a,b,c,d), i.e., we take M i.i.d. H(Z ) ≤ kM · H(c|a,b)+ o(kM). ′′ ′′ ′′ ′′ copies of the initial distribution. The result of this serialization Moreover, entropies of (A ,B ) given (C ,Z ) are the ′′ ′′ ′′ is a distribution (A,B,C,D) whose entropy profile is the same as entropies of (A ,B ) given C , since Z functionally exactly the entropy profile of (a,b,c,d) multiplied by M. In depends on C. If in the original distribution I(a:b|c)=0, then ′′ ′′ ′′ ′′ particular, we have I(A:B|C)=0. Then, we apply Slepian– the mutual information between A and B given (C ,Z ) Wolf encoding (Lemma 1) and get a Z = SW (C|A, B) such is o(kM). ′ ′ ′ ′ that Denote N = kM and (A ,B , C ,D ) the above-defined conditional distribution to get the theorem. • H(Z|C)=0, • H(Z)= H(C|A, B)+ o(M), c) Rest of the proof for (3): • . H(C|A,B,Z)= o(M) 1. Start with the distribution (a,b,c,d)q for some q, to be The entropy profile of the conditional distribution of fixed later, from the previous section. (A,B,C,D) given Z differs from then entropy profile of 2. Apply the “relativization” lemma 2 and get (A′,B′, C′,D′) (A,B,C,D) by at most H(Z) = M · H(c|a,b) + o(M). such that H(C′|A′,B′) = o(N). Lemma 2 guarantees Also, if in the original distribution I(a:b|c) = 0, then that other entropies are about N times larger than the I(A:B|C,Z)= I(A:B|C)=0. corresponding entropies for (a,b,c,d), possibly with an We would like to “relativize” (A,B,C,D) conditional on overhead of size Z and get a new distribution for a quadruple (A′,B′, C′,D′) log2 q whose unconditional entropies are equal to the corresponding O(N · H(c|a,b)) = O N .  q  entropies of (A,B,C,D) conditional on Z. For different values of Z, the corresponding conditional distributions on Moreover, since the quadruple (a,b,c,d) satisfies (A,B,C,D) can be very different. So there is no well- I(a:b|c) = 0, we also have I(A′ :B′|C′) = 0 by defined “relativization” of (A,B,C,D) conditional on Z. The construction of the random variables in Lemma 2. simplest way to overcome this obstacle is the method of quasi- 3. Scale down the entropy profile of (A′,B′, C′,D′) by a uniform distributions suggested by T.H. Chan and R.W. Yeung, factor of 1/N within a o(N) precision. see [1]. 4. Tend N to infinity to get an a.e. vector. Indeed, all entropies from the previous profile converge when N goes to infinity. Definition 1 (Quasi-uniform random variables, [1]). A random Conditions of inequality (3) are satisfied for I(A′ :B′|C′) variable u distributed on a finite set U is called quasi-uniform and H(C′|A′,B′) both vanish at the limit. Inequality (3) if the probability distribution function of u is constant over eventually reduces to its support (all values of u have the same probability). That log q log q is, there exists c > 0 such that Prob[u = u] ∈ {0,c} for all 1+ O 2 ≤ O 2 u ∈U. A set of random variables (x1,...,xn) is called quasi-  q   q  uniform if for any non-empty subset {i1,...,is}⊂{1,...,n} which can not hold for large enough q. the joint distribution (xi1 ,...,xis ) is quasi-uniform. In [1][theorem 3.1] it is proven that for every distribution Remark 2. In both cases of the proof we constructed an a.e. (A,B,C,D,Z) and every δ > 0 there exists a quasi-uniform vector such that the corresponding unconditional inequalities distribution (A′′,B′′, C′′,D′′,Z′′) and an integer k such that with Lagrange multipliers reduces (as N → ∞) to 1 kH~ (A,B,C,D,Z) − H~ (A′′,B′′, C′′,D′′,Z′′)k < δ. k log2 q log2 q 1+ O ≤ O + o(λ1 + λ2), For a quasi-uniform distribution for all values z of Z′′  q   q  the corresponding conditional distributions (A′′,B′′, C′′,D′′) which cannot hold if we choose q appropriately. have the same entropies, which are equal to the conditional entropies. That is, entropies of the distribution of A′′, B′′, Remark 3. Notice that in our proof even one fixed value of q (A′′,B′′), etc. given Z′′ = z are equal to H(A′′|Z′′), suffices to prove that (1) and (3) do not hold for a.e. points. The choice of the value of q provides some freedom in controlling Every essentially conditional linear inequality for a.e. vec- the gap between the lhs and rhs of both inequalities. tors has an interesting geometric interpretation: it provides a proof of Mat´uˇs’ theorem from [13], which claims that the In fact, we may combine the two above constructions into convex cone of a.e. vectors for 4 variables is not polyhedral. one to get a single a.e. vector to prove the previous result. Open problem 2: Do (1) and (3) (that hold for entropic but not for a.e. vectors) have any geometric or “physical” Proposition 1. There exists one a.e. vector which excludes meaning? both (1) and (3) simultaneously. ACKNOWLEDGEMENTS Proof sketch: The authors are grateful to Alexander Shen for many useful 1. Generate (A,B,C,D) from (a,b,c,d) with entropies N q discussions. This work was supported in part by NAFIT ANR- times greater. 08-EMER-008-01 grant. 2. Construct A′′ = SW (A|B) and C′ = SW (C|A, B) simultaneously (with the same serialization (A,B,C,D)). REFERENCES ′′ 3. Since A is a Slepian–Wolf hash of A given B, we have [1] T.H. Chan, R.W. Yeung, On a relation between information inequalities ′′ • H(C|A ,B)= H(C|A, B)+ o(N) and and group theory , IEEE Trans. on Inform. Theory, 48, pp. 1992–1995, ′′ ′ ′ July 2002. • H(C|A ,B,C )= H(C|A, B, C )+ o(N)= o(N). [2] A.P. Dawid, Conditional independence in statistical theory, Journal of 4. By inspecting the proof of the Slepian–Wolf theorem we the Royal Statistical Society. 41(1), 1979, pp. 1–31. conclude that A′′ can be plugged into the argument of [3] R Dougherty, C. Freiling, and K. Zeger. Networks, Matroids, and Non- Shannon Information Inequalities. IEEE Transactions on Information Lemma 2 instead of A. The entropy profile of the quadruple Theory 53(6), June 2007, pp. 1949–1969. (A′,B′, C′,D′) thusly obtained from Lemma 2 is approx- [4] D. Hammer, A. Romashchenko, A. Shen, N. Vereshchagin. Inequalities for Shannon Entropy and . Journal of Computer imately N times the entropy profile of (a,b,c,d)q with a and System Sciences. 60 (2000) pp. 442–464. possible overhead of [5] T.S. Han, A uniqueness of Shannon’s information distance and related log q nonnegativity problems, J. Comb., Inform. Syst. Sci., vol. 6, pp. 320– O(I(A:B)+ H(C|A, B)) + o(N)= O 2 N , 321, 1981.  q  [6] A. W. Ingleton, Representation of Matroids in Combinatorial Matheatics and Its Applications, D. J. A. Welsh, Ed. London, U.K.: Academic, 1971, and further : pp. 149–167. ′ ′ ′ [7] T. Kaced, A. Romashchenko. On essentially conditional information • I(A :B |C )=0, inequalities. IEEE ISIT 2011, pp. 1935–1939. ′ ′ • I(A :B )= o(N), [8] K. Makarychev, Yu. Makarychev, A. Romashchenko, N. Vereshchagin. ′ ′ ′ • H(C |A ,B )= o(N). A New Class of non-Shannon Type Inequalities for Entropies. Commu- nications in Information and Systems. 2(2), 2002, pp. 147–166. 5. Scale the corresponding entropy profile by a factor 1/N [9] F. Mat´uˇs. Conditional independences among four random variables III: and tend N to infinity to define the desired a.e. vector. final conclusion. Combinatorics, Probability & Computing, 8 (1999), pp. 269–276. IV. CONCLUSION &DISCUSSION [10] F. Mat´uˇs, Piecewise linear conditional information inequality, IEEE Transactions on , 2006, pp. 236–238. In this paper we discussed the known conditional informa- [11] F. Mat´uˇs, Adhesivity of polymatroids, Discrete Math., 307, 2007, tion inequalities. We presented a simple algebraic example pp. 2464–2477. [12] F. Mat´uˇs, Two constructions on limits of entropy functions. IEEE which provides a new proof that two conditional informa- Transactions on Information Theory, 53(1), Jan. 2007, pp. 320–330. tion inequalities are essentially conditional (they cannot be [13] F. Mat´uˇs, Infinitely many information inequalities. IEEE ISIT 2007, obtained as a direct corollary of any unconditional information pp. 41–44. [14] An. Muchnik, Conditional complexity and codes, Theoretical Computer inequality). Then, we prove a stronger result: two linear condi- Science, 271(1–2), 2002, pp. 97–109. tional information inequalities are not valid for asymptotically [15] N. Pippenger, What are the laws of information theory. 1986 Special entropic vectors. Problems on Communication and Computation Conf., Palo Alto, CA. [16] Judea Pearl, Causal inference in statistics: An overview. Statistics This last result has a counterpart in the Kolmogorov Surveys, 3, 2009, pp. 96–146. complexity framework. It is known that unconditional linear [17] A. Romashchenko, Extracting the Mutual Information for a Triple of information inequalities for Shannon’s entropy can be directly Binary Strings. Proc. 18th Annual IEEE Conference on Computational Complexity (2003). Aarhus, Denmark, July 2003, pp. 221–235. translated into equivalent linear inequalities for Kolmogorov [18] D. Slepian and J. K. Wolf, Noiseless coding of correlated informa- complexity, [4]. For conditional inequalities the things are tion sources, IEEE Transactions on Information Theory, 19(4), 1973, more complicated. Inequalities (1) and (3) could be rephrased pp. 471–480. [19] R.W. Yeung, A first course in information theory. Norwell, MA/New in the Kolmogorov complexity setting; but the natural counter- York: Kluwer/Plenum, 2002. parts of these inequalities prove to be not valid for Kolmogorov [20] Z. Zhang and R. W. Yeung, A non-Shannon-type conditional information complexity. The proof of this fact is very similar to the inequality. IEEE Trans. Inform. Theory, vol. 43, pp. 1982–1985, Nov. 1997. argument in Theorem 2 (we need to use Muchik’s theorem [21] Z. Zhang and R. W. Yeung, On characterization of entropy function on conditional descriptions [14] instead of the Slepian–Wolf via information inequalities. IEEE Transactions on Information Theory, theorem employed in Shannon’s framework). We skip details 44(1998), pp. 1440–1450. [22] Z. Zhang, On a new non-Shannon-type information inequality, Commu- for the lack of space. nications in Information and Systems. 3(1), pp. 47–60, June 2003. Open problem 1: Does (2) hold for a.e. vectors?