<<

arXiv:2001.01847v1 [cs.IT] 7 Jan 2020 n[6,[7,aohrieaiemto hc a il both capacity. channel yield the can for which bounds method lower [1 iterative and [14], another upper [13], in [17], accelerated [16], Arimoto-Blah be In can as which such [3] soluti [2], others algorithmic also efficient is Thus, There discrete optimiza- exist. [12]. convex a [11], a of as problem capacity formulated tion the be finding can th of channel On memoryless [10]. problem matrices channels the channel output hand, the two of other Cotae and eigenvalues input Paul the two of [9]. of term series in capacity infinite the an found al. as thre et and expressed inputs be two can with channels Lia outputs of [8]. establishe capacity DMCs channel al. the binary of that general et showed a class Martin for expression small example, to form For closed a able alphabets. for been small expressions have form with form closed papers a closed several in obtain DMC Recently, possibl arbitrary [1]. not an of often expression capacity is well-known the it express However, written many to [1]. be of can formulas [4] channels elementary capacities [3], symmetric in [2], (weakly) particular, a [1], as In such DMCs of channels of [7]. outputs capacities [6], the and [5], on ex inputs literature there Thus, continuous rich symbols. a discrete the into quantized which many are channel in studying ap- ) for an its QAM PSK useful and (e.g., schemes especially theory modulation/demodulation well-known information are of DMCs development plications. early the in t related disk. conditions Gershgorin’s certain its capac satisfies and the value posit for singular expression invertible channel form an the closed a having if (2) channel and matrix memoryless boun channel discrete upper good a Karus a (1) on of obtain based to technique conditions elementary (KKT) ch Kuhn-Tucker an form memoryless closed describes discrete arbitrary a paper an obtain This of to capacity possible the not for still sion is it studied, well hne aaiy uulInformation. Mutual Capacity, Channel iceemmrls hnes(M)pa rtclrole critical a play (DMC) channels memoryless Discrete Terms Index Abstract aaiiso iceeMmrls hneswith Channels Memoryless Discrete of Capacities nBud n lsdFr xrsin for Expressions Form Closed and Bounds On Wiecpcte fdsrt eoyescanl are channels memoryless discrete of capacities —While Wrls omncto,Cne Optimization, Convex Communication, —Wireless mi:[email protected] Email: .I I. colo lcrcland Electrical of School rgnSaeUniversity State Oregon optrEngineering Computer ovli,O,97331 OR, Corvallis, NTRODUCTION ha Nguyen Thuan netbePstv Matrices Positive Invertible expres- annel. its o ons ists ive 5]. ng ity h- ut d d d e e e , .Cne piiainadKTConditions KKT and Optimization Convex A. { n h esgrnsds ftecanlmatrix. channel the of v disk optimal singular Gershgorin’s the the the between particular, and relationship In a capacity. establish the conditions up th precisely the that (2) is such and matrix bound channel memoryless matrix the discrete of channel conditions of invertible optimality for capacities positive on expressions with bound form channels upper closed new theory find a the to (1) ou on optimization, end, based convex that technique To of elementary steps. an intermediate describes the paper in usefu of needed are expression is form formulas capacity closed the (3) where determine and derivations to algorithms, analytical for way of channe faster that different a than and offer capacity These formulas capacity abou reasons. (2) the intuition parameters, good of between a number relationship provide often a the can for formulas (1) expression include form closed in { R fipt n outputs A and inputs of h rbblt of probability the nu rbblt asvco pf of (pmf) vector mass probability input transmitted, between is where x y ij m M scaatrzdb admvariable random a by characterized is DMC A htsi,i ssilbnfiilt n h hne capacity channel the find to beneficial still is it said, That 1 1 q y , x , × ersnstecniinlpoaiiyta given that probability conditional the represents n ( = 2 2 nti ae,w osdrDC iheulnumber equal with DMCs consider we paper, this In . mi:[email protected] Email: y , . . . , x , . . . , X q 1 H q , and y colo lcrcland Electrical of School rgnSaeUniversity State Oregon ( n optrEngineering Computer j H m Y 2 } q , . . . , I srcie.Let received. is } | ( ovli,97331 Corvallis, Y X ( hn Nguyen Thinh Y o h upt,adacanlmatrix channel a and outputs, the for X o h nus admvariable random a inputs, the for = ) = ) x is: ; i I P II. Y n ob rnmte,te h m of pmf the then transmitted, be to n = ) ) T thus , RELIMINARIES − − = H X X j i ( =1 =1 n n Y A A T p ) X q j p ∈ − =1 j n ( = h uulinformation mutual The . log H R p i p ( n A q 1 Y × j X p , ij n | X where , h arxentry matrix The . 2 log p , . . . , ) , A ij . n p ) i T denotes ethe be X Y x A alue i per the (1) (3) (2) ity Y ∈ ∈ ∈ is e r l t l −1 −1 The mutual information function can be written as: where Aji denotes the entry (j, i) of the inverse matrix A . n n n Kmax = maxj Kj and Kmin = minj Kj are called the max- T T I(X; Y )= − (A p)j log (A p)j + piAij log Aij , imum and minimum inverse row entropies of A, respectively. j=1 i=1 j=1 X X X (4) Definition 2. Let A ∈ Rn×n be a . The T th th where (A p)j denotes the j component of the vector q = Gershgorin radius of i row of A [21] is defined as: (AT p). The capacity C associated with a channel matrix A n is the theoretical maximum rate at which information can be Ri(A)= |Aij |. (7) transmitted over the channel without the error [5], [18], [19]. j=6 i X It is obtained using the optimal pmf p∗ such that I(X; Y ) The Gershgorin ratio of ith row of A is defined as: is maximized. For a given channel matrix A, I(X; Y ) is a concave function of p [1]. Therefore, maximizing I(X; Y ) is Aii ci(A)= , (8) equivalent to minimizing −I(X; Y ), and finding the capacity Ri(A) can be cast as the following convex problem: and the minimum Gershgorin ratio of A is defined as: Minimize: n n n Aii T T cmin(A) = min . (9) (A p)j log (A p)j − piAij log Aij . i Ri(A) j=1 i=1 j=1 X X X We note that since the channel matrix is a , Subject to: therefore 0 p  Aii Aii 1T cmin(A) = min = min . (10) ( p =1. i Ri(A) i 1 − Aii The optimal p∗ can be found efficiently using various Definition 3. Let A ∈ Rn×n be a square matrix. algorithms such as gradient methods [20], but in a few cases, (a) A is called a positive matrix if Aij > 0 for ∀ i, j. ∗ p can be found directly using the Karush-Kuhn-Tucker (KKT) (b) A is called a strictly diagonally dominant positive matrix conditions [20]. To explain the KKT conditions, we first state [22] if A is a positive matrix and the canonical convex optimization problem below: Problem P1: Minimize: f(x) Aii > Aij , ∀i, j. (11) 6 Subject to: Xj=i Lemma 1. Let A ∈ Rn×n be a strictly diagonally dominant gi(x) ≤ 0,i =1, 2, . . . n, positive channel matrix then (a) it is invertible; (b) the h (x)=0, j =1, 2, . . . , m, ( j eigenvalues of A−1 are 1 ∀ i where λ are eigenvalues of λi i −1 th where f(x), gi(x) are convex functions and hj (x) is a linear A, (c) Aii > 0 and the largest absolute element in the i −1 −1 −1 −1 function. column of A is Aii , i.e., Aii ≥ |Aji | for ∀ j. Define the Lagrangian function as: Proof. The proof is shown in Appendix A. n m n×n L(x, λ, ν)= f(x)+ λigi(x)+ νj hj (x), (5) Lemma 2. Let A ∈ R be a strictly diagonally dominant i=1 j=1 positive matrix, then: X X then the KKT conditions [20] states that, the optimal point cmin(A) − 1 ∗ c (A−T ) ≥ , ∀i. (12) x must satisfy: i (n − 1) ∗ gi(x ) ≤ 0, Moreover, for any rows k and l, ∗ hj (x )=0, −1 −1 −1 cmin(A) dL(x,λ,ν) |A | + |A | ≤ A , ∀i. (13)  | ∗ ∗ ∗ =0, (6) ki li ii c (A) − 1  dx x=x ,λ=λ ,ν=ν min  ∗ ∗ λi gi(x )=0, Proof. The proof is shown in Appendix B. λ∗ ≥ 0.  i Rn×n  Lemma 3. Let A ∈ be a strictly diagonally dominant for  , . i =1, 2,...,n j =1, 2,...,m positive matrix, then:

B. Elementary Results −1 1 max Aij ≤ , (14) Definition 1. Let A ∈ Rn×n be an invertible channel matrix i,j σmin(A) n th and H(A )= − A log A be the entropy of i row, −1 i k=1 ik ik where max A is the largest entry in A−1 and σ (A) is define i,j ij min P the minimum singular value of A. n n n −1 −1 Proof. The proof is shown in Appendix C. Kj = − Aji Aik log Aik = Aji H(Ai), i=1 i=1 X Xk=1 X Rn×n ∗ Lemma 4. Let A ∈ be an invertible channel matrix, Based on (24) and (25), we must have λj =0, ∀j. Therefore, then all five KKT conditions (20-24) are reduced to the following A−11 = 1, two conditions: n i.e., the sum of any row of A−1 equals to 1. Furthermore, for q∗ =1, (26) any probability mass vector x, sum of the vector y = A−T x j j=1 equal to 1. X dI(X; Y ) ν∗ − =0. (27) Proof. The proof is shown in Appendix D. ∗ dqj III. MAIN RESULTS Next, Our first main result is an upper bound on the capacity n n dI(X; Y ) −1 of discrete memoryless channels having invertible positive = A Aik log Aik − (1 + log qj ) dq ji j i=1 channel matrices. X Xk=1 = −Kj − (1 + log qj ). (28) Proposition 1 (Main Result 1). Let A ∈ Rn×n be an invertible positive channel matrix and Using (27) and (28), we have: ∗ −Kj ∗ −Kj −ν −1 ∗ 2 qj =2 . (29) qj = n , (15) 2−Ki i=1 Plugging (29) to (26), we have: ∗ −T ∗ p =PA q , (16) n ∗ −Kj −ν −1 then the capacity C associated with the channel matrix A is 2 =1, j=1 upper bounded by: X n n n n ∗ ∗ ∗ ν∗ = log 2−Kj −1. C ≤− qj log qj + pi Aij log Aij . (17) j=1 i=1 j=1 j=1 X X X X Proof. Let q be the pmf of the output Y , then q = A−T p. From (29), −K −K Thus, ∗ 2 j j ∗ −Kj −ν −1 2 q =2 = ∗ = , ∀j. (30) j ν +1 n −Kj I(X; Y ) = H(Y ) − H(Y |X) (18) 2 j=1 2 n n n If q∗ is such that p∗ = A−T q∗ P 0 and (A−T q∗)T 1 = = − q log q + (A−T q) A log A . j j i ik ik n p∗ = 1, then p∗ is a valid p.m.f and Proposition 1 j=1 i i i X X Xk will hold with equality by the KKT conditions. However We construct the Lagrangian in (5) using −I(X; Y ) as the Pthese two constraints might not hold in general. On the other objective function and optimization variable qj : hand, maximizing I(X; Y ) in terms of q and ignoring these n n constraints is equivalent to enlarging the feasible region, will L(qj, λj ,νj )= −I(X; Y ) − qj λj + ν( qj − 1), (19) necessarily yield a value that is at least equal to the capacity ∗ j=1 j=1 C. Thus, by plugging q into (18), we obtain the proof for the X X where the constraints g(x) and h(x) in problem P1 are upper bound. n translated into −qj ≤ 0 and j=1 qj =1, respectively. Next, we present some sufficient conditions on the channel ∗ ∗ Using the KKT conditions in (6), the optimal points qj , λj , matrix A such that its capacity can be written in closed ∗ P ν for all j, must satisfy: form expression. We note that the channel capacity closed ∗ form expression is also discovered in [4] and [6] using the qj ≥ 0, (20) n input distribution variables. However in both [4] and [6], the ∗ sufficient conditions for closed form expression are not fully qj =1, (21) j=1 characterized. X n×n ∗ ∗ dI(X; Y ) Proposition 2 (Main Result 2). Let A ∈ R be a strictly ν − λj − ∗ =0, (22) dqj diagonally dominant positive matrix, if ∀i, ∗ λj ≥ 0, (23) −T Kmax−Kmin ci(A ) ≥ (n − 1)2 , (31) ∗ ∗ λj qj =0. (24) then the capacity of channel matrix A admits a closed form n Since 0 ≤ pi ≤ 1 and i=1 pi = 1, there exists at least expression which is exactly the upper bound in Proposition 1. one pi > 0 . Since Aij > 0 ∀i, j, we have: P Proof. Based on the discussion of the KKT conditions, it is n sufficient to show that if p∗ = A−T q∗  0 and n p∗ = q∗ = p∗A > 0, ∀j. (25) i i j i ij (A−T q∗)T 1 = 1 then C has a closed form expression. The i=1 X P −T ∗ T condition (A q ) 1 =1 is always true as shown in Lemma to Kmin and Kmax, respectively. Thus, from the Definition 1, 4 in the Appendix D. Thus, we only need to show that if we have: −T Kmax−Kmin ∗ −T ∗ n n ci(A ) ≥ 2 , then p = A q  0. −1 −1 ∗ ∗ ∗ ∗ Kmax −Kmin = ALi H(Ai) − ASi H(Ai) Let q = minj q and q = maxj q , we have: min j max j i=1 i=1 Xn Xn p∗ = q∗A−1 −1 −1 i j ji ≤ | ALi H(Ai)| + | ASi H(Ai)| (38) j i=1 i=1 X X X ∗ −1 ∗ −1 n n = q A + q A −1 −1 i ii j ji ≤ | ALi ||H(Ai)| + | ASi ||H(Ai)|(39) j=6 i i=1 i=1 X X X ∗ −1 ∗ −1 n ≥ qminAii − ( qj )( |Aji |) (32) −1 −1 ≤ Hmax(A) (|ALi | + |ASi |) (40) j=6 i j=6 i X X i=1 ∗ −1 ∗ −1 Xn ≥ qminAii − (n − 1)qmax( |Aji |), (33) −1 cmin(A) ≤ Hmax(A) Aii (41) j=6 i cmin(A) − 1 i=1 X X with (32) due to A−1 > 0 which follows by Lemma 1-c, (33) −1 cmin(A) ii ≤ nHmax(A)(max Aij ) (42) ∗ ∗ ∗ i,j cmin(A) − 1 is due to qmax ≥ qj ∀ j. Now if we want pi ≥ 0, ∀ i, from nH (A)V (33), it is sufficient to require that, ∀i, ≤ max , (43) σmin(A) A−1 (n − 1)q∗ c (A−T )= ii ≥ max where (38) due to the property of absolute value function, i −1 q∗ j=6 i |Aji | min (39) due to Schwarz inequality, (40) due to Hmax(A) is the −Kmin maximum row entropy of A, (41) due to (13), (42) due to P 2 n −1 −1 2−Kj maxi,j Aij is the largest entry in A and (43) is due to = (n − 1) j=1 (34) Lemma 3. Thus, 2−Kmax P nH (A)V n max K −K −Kj σmin(A) max min j=1 2 (n − 1)2 ≥ (n − 1)2 . (44) Kmax−Kmin = (n − 1)2P , From (37) and (44), if ∗ ∗ c (A) − 1 nHmax(A)V with (34) due to (30) and q , q are corresponding to min σ (A) max min ≥ (n − 1)2 min , (45) Kmin, Kmax, respectively. Thus, Proposition 2 is proven. (n − 1) then the capacity C is the upper bound in Proposition 1. (45) is equivalent to (35). Thus Proposition 3 is proven. We are now ready to state and prove the third main result that characterizes the sufficient conditions on a channel matrix An easy to use version of Proposition 3 is stated in Corollary so that the upper bound in Proposition 1 is precisely the 1. capacity. Corollary 1. The capacity C is the upper bound in Proposi- Proposition 3 (Main Result 3). Let A ∈ Rn×n be a strictly tion 1 if 2n log n diagonally dominant positive channel matrix and Hmax(A) be cmin(A) − 1 ≥ 2 σmin(A) . (46) the maximum row entropy of A. The capacity C is the upper (n − 1)2 bound in Proposition 1 i.e., hold with equality if Proof. Similar to Proposition 3, n n nHmax(A) −1 −1 V cmin(A) − 1 Kmax −Kmin = A H(Ai) − A H(Ai) ≥ 2 σmin(A) , (35) Li Si n 2 i=1 i=1 s ( − 1) Xn Xn ≤ | A−1H(A )| + | A−1H(A )| (47) where σ (A) is the minimum singular value of channel Li i Si i min i=1 i=1 matrix A, and Xn Xn −1 −1 cmin(A) ≤ | A ||H(A )| + | A ||H(A )|(48) V = . (36) Li i Si i cmin(A) − 1 i=1 i=1 X n X −1 −1 Proof. From (12) in Lemma 2 and Proposition 2, if we can ≤ Hmax(A) (|ALi | + |ASi |) (49) show that i=1 X −1 ≤ Hmax(A)n(2max Aij ) (50) cmin(A) − 1 i,j ≥ (n − 1)2Kmax−Kmin , (37) 2n log n (n − 1) ≤ , (51) σmin(A) then Proposition 3 is proven. Suppose that Kmax and Kmin are obtained at rows j = L and j = S, respectively. We note that with (47), (48), (49) are similar to (38), (39), (40), re- −1 from (30), qmax = maxj qj and qmin = minj qj correspond spectively. (50) is due to maxi,j Aij is the largest entry in −1 A , (51) due to Hmax(A) ≤ log n and Lemma 3. Thus, by satisfies both conditions in Proposition 3 and Corollary 2. The changing nHmax(A)V in (45) by 2n log n , the Corollary 1 is optimal input and output probability mass vectors are: σmin(A) σmin(A) proven. qT = 0.33087 0.32806 0.34107 ,

A direct result of Proposition 3 without using singular value pT = 0.33067 0.33480 0.33453 , is shown in Corollary 2. respectively and the capacity is 1.2715.  Corollary 2. The capacity C is the upper bound in Proposi- In general, for a good channel with n inputs and n outputs tion 1 if whose symbol error probabilities are small, then it is likely that the channel matrix will satisfy the optimality conditions nH∗ (A) V cmin(A) − 1 max in Proposition 1. This is because the diagonal entries Aii σ∗ 2 ≥ 2 , (52) th s (n − 1) (probability of receiving correct the i symbol) tend to be larger than the sum of other entries in its row (probability of where, errors), satisfying the property of diagonally dominant matrix. c (A) V = min , (53) B. Example 2: Cooperative Relay-MISO Channels cmin(A) − 1 In this example, we investigate the channel capacity for a c (A) − n/2 σ∗ = min , (54) class of channels named Relay-MISO (Relay - Multiple Input cmin(A)+1 Single Output). Relay-MISO channel [23] can be constructed

∗ log(n − 1) − cmin(A) log cmin(A) by the combination of a relay channel [24] [25] and a Multiple Hmax(A) = log(cmin(A)+1)+ . cmin(A) + 1 Input Single Output channel, as illustrated in Fig. 1. (55) In a Relay-MISO channel, n senders want to transmit data to a same receiver via n relay base station nodes. The Proof. We will construct the lower bound for σmin(A) and the uplink of these senders using wireless links that are prone upper bound for Hmax(A). From Lemma 5 in Appendix E to transmission errors. Each sender can transmit bit “0” or cmin(A) − n/2 ∗ “1” with the probability of bit flipping is , . σmin(A) ≥ = σ , (56) α 0 ≤ α ≤ 1 cmin(A)+1 For a simplicity, suppose that n relay channels have the same and error probability α. Next, all of the relay base station nodes will relay the signal by a reliable channel such as optical fiber

log(n − 1)−cmin(A) log cmin(A) cable to a same receiver. The receiver adds all the relay signals Hmax(A) ≤ log(cmin(A)+1)+ cmin(A) + 1 (symbols) to produce a single output symbol. = H∗ (A). (57) It can be shown that the channel matrix of this Relay-MISO max channel [23] is an of size (n + 1) × (n + 1) whose Aij can be computed as:

Therefore s=min(n+1−j,i−1) j−i+s s j−i+2s n−(j−i+2s) ∗ Aij = α (1−α) . nHmax(A)V nHmax(A) n+1−i i−1 (58) s=max(i−j,0) ! ! ≤ ∗ . X σmin(A) σ ∗ We note that this Relay-MISO channel matrix is invertible nHmax(A) nHmax(A) Thus, by changing in (35) by ∗ , the and the inverse matrix has the closed form expression which σmin(A) σ Corollary 2 is proven. is characterized in [23]. For example, the channel matrix of a We note that, when cmin(A) is relatively larger than the Relay-MISO channel with n =3 is given as follows: size of matrix n, the lower bound of σmin(A) goes to 1. We also note that (52) can be checked efficiency without requiring (1−α)3 3(1−α)2α 3(1−α)α2 α3  2 2 3 2 3 2 α(1−α) 2α (1−α)+(1−α) 2(1−α) α+α (1−α)α , both Hmax(A) and σmin(A) at the expense of a looser upper (1−α)α2 2(1−α)2α+α3 2α2(1−α)+(1−α)3 α(1−α)2   bound as compare to (35).  α3 3(1−α)α2 3(1−α)2α (1−α)3 

IV. EXAMPLES AND NUMERICAL RESULTS where 0 ≤ α ≤ 1. We note that this channel matrix is A. Example 1: Reliable Channels strictly diagonally dominant matrix when α is close to 0 or α We illustrate the optimality conditions in Proposition 3 is close to 1. In addition, for α values that are close to 0 or 1, using a reliable channel having the channel matrix: it can be shown that channel matrix A satisfies the conditions 0.95 0.01 0.04 in Proposition 3. Thus, the channel capacity admits a closed A = 0.03 0.95 0.02 . form expression in Proposition 1. For other values of α, e.g. 0.02 0.02 0.96 closer to 0.5, the optimality conditions in Proposition 3 no   ∗ longer holds. In this case, Proposition 1 can still be used as a Here, n = 3, σmin(A) = 0.92424, σ = 0.875 and ∗ good upper bound on the capacity. Hmax(A)=0.33494, Hmax(A)=0.3364. From Definition 2, cmin(A) = 19. The closed form channel capacity can be We show that our upper bound is tighter than existing upper readily computed by Proposition 1 since the channel matrix bounds. In particular, Fig. 2 shows the actual capacity and the Arimoto upper bound Boyd-Chiang upper bound Closed form upperbound Real channel capacity 2

1.5

1

Channel Capacity 0.5

0 0 0.2 0.4α 0.6 0.8 1

Figure 1. Relay-MISO channel Figure 2. Channel capacity and various upper bounds as functions of α known upper bounds as functions of parameter α for Relay- entries. Therefore, from Proposition 1, the optimal output MISO channels having n =3. The green curve depicts the ac- probability mass vector tual capacity computed using convex optimization algorithm. −Kj ∗ 2 The red curve is constructed using our closed form expression qj = n (61) 2−Ki in Proposition 1, and the blue dotted curve is the constructed i using the well-known upper bound result of channel capacity are equal each other for all j.P As a result, the input probability in [26], [27]. Specifically, this upper bound is: mass function p∗ = A−T q∗ is the uniform distribution, and n the channel capacity is upper bounded by: C ≤ log( max Aij ). (59) n n n i ∗ ∗ ∗ j=1 C ≤ − q log q + p A log A (62) X j j i ij ij j=1 i=1 j=1 Finally, the red dotted curve shows another well-known X X X upper bound by Arimoto [3] which is: = log n − H(Arow). (63) n Interestingly, our result also shows the capacities of many Aji C ≤ log(n)+max[ Aji log( n )]. (60) channels that are not weakly symmetric, but admits the closed j A i=1 k=1 ki X form formula of weakly symmetric channels. In particular, We note that the second term is negative.P consider a channel matrix called semi-weakly symmetric Fig. 2 shows that our closed form upper bound is precisely whose all rows are permutations of each other, but the sum of the capacity (the red and green graphs are overlapped) when entries in each column might not be the same. Furthermore, α values are close to 0 or 1 as predicted by the optimality if the optimal condition is satisfied (Proposition 3), then the conditions in Proposition 3. On the other hand, when α values channel has closed-form capacity which is identical to the are closer to 0.5, our optimality conditions no longer hold. In capacity of a symmetric and weakly symmetric channel: this case, we can only determine the upper bound. However, C = log n − H(A ). (64) it is interesting to note that our upper bound in this case is row tighter than both the Boy-Chiang [26] and Arimoto [3] upper For example, the following channel matrix: bounds. 0.93 0.04 0.03 C. Example 3: Symmetric and Weakly Symmetric Channels A = 0.04 0.93 0.03   Our results confirm the capacity of the well known sym- 0.04 0.03 0.93 metric and weakly symmetric channel matrices. In particular, is not a weakly symmetric channel even though its rows when the channel matrix is symmetric and positive definite, are permutations of each other since the column sums are all our results are applicable. Indeed, since the channel matrix different. However, this channel matrix satisfies Proposition 3 ∗ is symmetric and positive definite, the inverse channel matrix and Corollary 2 since n =3, σmin(A)=0.88916, σ =0.825, ∗ exists and also is symmetric. From Definition 1, all values of Hmax(A)=0.43489, Hmax(A)=0.43592 and cmin(A) = Kj is the same since they are the same sum of permutation 13.286. Thus, it has closed form formula for capacity, and Arimoto upper bound Boyd-Chiang upper bound (Semi) Weakly Symmetric Closed Form Closed form upper bound Real Channel Capacity Real Channel Capacity 2 2

1.5 1.5

1 1

0.5 Channel Capacity 0.5 Channel Capacity

0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 γ β

Figure 3. Channel capacity of (semi) weakly symmetric channel as a function Figure 4. Channel capacity and various upper bounds functions of β of γ

E. Example 5: Bounds as Function of Channel Reliability can be easily shown to be C = log 3 − H(0.93, 0.04, 0.03) = Since we know that our proposed bounds are tight if the 1.1501. The optimal output and input probability mass vectors can be shown to be: channel is reliable, we want to examine quantitatively how channel reliability affects various bounds. In this example, we qT = 0.33333 0.33333 0.33333 , consider a special class of channel whose channel matrix en- tries are controlled by a reliability parameter β for 0 ≤ β ≤ 1 T p = 0.32959 0.33337 0.33704 , as shown below: respectively.   1 − β 0.3β 0.4β 0.3β The following channel matrix is another example of semi- 0.4β 1 − β 0.3β 0.3β A = . weakly whose entries are controlled by a  0.5β 0.4β 1 − β 0.1β  parameter γ in the range of (0, 1) and given by the following  0.1β 0.2β 0.7β 1 − β form:     (1 − γ)3 3(1 − γ)2γ 3(1 − γ)γ2 γ3 When β is small, the channel tends to be reliable and when 3(1 − γ)2γ (1 − γ)3 γ3 3(1 − γ)γ2 β is large, the channel tends to be unreliable. Fig. 4 shows  3 2 3 2  . γ 3(1 − γ)γ (1 − γ) 3(1 − γ) γ various upper bounds as a function of β together with the  γ3 3(1 − γ)γ2 3(1 − γ)2γ (1 − γ)3    actual capacity. The actual channel capacities for various β are   numerically computed using a convex optimization algorithm Fig. 3 shows the capacity upper bound of the semi-weakly [11]. As seen, our closed form upper bound expression for symmetric channel and the actual channel capacity as function capacity (red curve) from Proposition 1 is much closer to of γ. As seen, for most of γ, the upper bound is identical to the actual capacity (black dash curve) than other bounds for the actual channel capacity which is numerically determined most values of β. When β is small (β ≤ 0.6) or channel using CVX [11]. is reliable, the closed form upper bound is precise the real channel capacity, and we can verify that the optimal conditions D. Example 4: Unreliable Channels in Proposition 3 holds. When the channel becomes unreliable, We now consider an unreliable channel whose channel i.e., β ≥ 0.6, our upper bound is no longer tight, however, it matrix is: is still the tightest among all the existing upper bounds. We note that when the β is small, the channel matrix becomes 0.6 0.3 0.1 a nearly diagonally dominant matrix, and our upper bound is A = 0.7 0.1 0.2 . tightest. 0.5 0.05 0.45 In this case, our optimality conditions do not satisfy, and V. CONCLUSION the Arimoto upper bound is tightest (0.17083) as compared In this paper, we describe an elementary technique based to our upper bound (0.19282) and Boyd-Chiang upper bound on Karush-Kuhn-Tucker (KKT) conditions to obtain (1) a (0.848). good upper bound of a discrete memoryless channel having an invertible positive channel matrix and (2) a closed form [25] Boris Rankov and Armin Wittneben. Achievable rate regions for the expression for the capacity if the channel matrix satisfies two-way relay channel. In Information theory, 2006 IEEE international symposium on, pages 1668–1672. IEEE, 2006. certain conditions related to its singular value and its Ger- [26] Mung Chiang and Stephen Boyd. Geometric programming duals of shgorin’s disk. We provide a number of channels where the channel capacity and rate distortion. IEEE Transactions on Information proposed upper bound becomes precisely the capacity. We also Theory, 50(2):245–258, 2004. [27] Stephen Boyd, Seung-Jean Kim, Lieven Vandenberghe, and Arash demonstrate that our proposed bounds are tighter than other Hassibi. A tutorial on geometric programming. Optimization and existing bounds for these channels. engineering, 8(1):67, 2007. [28] Kaare Brandt Petersen, Michael Syskind Pedersen, et al. The matrix cookbook. Technical University of Denmark, 7(15):510, 2008. REFERENCES [29] Rayleigh quotient and the min-max theorem. Available at [1] Thomas M Cover and Joy A Thomas. Elements of information theory. http://www.math.toronto.edu/mnica/hermitian2014.pdf, 2014. John Wiley & Sons, 2012. [30] Charles R Johnson. A gersgorin-type lower bound for the smallest [2] Richard Blahut. Computation of channel capacity and rate-distortion singular value. Linear Algebra and its Applications, 112:1–7, 1989. functions. IEEE transactions on Information Theory, 18(4):460–473, [31] YP Hong and C-T Pan. A lower bound for the smallest singular value. 1972. Linear Algebra and its Applications, 172:27–32, 1992. [3] Suguru Arimoto. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Transactions on Information APPENDIX Theory, 18(1):14–20, 1972. [4] Saburo Muroga. On the capacity of a discrete channel, mathematical A. Proof of Lemma 1 expression of capacity of a channel which is disturbed by noise in its For claim (a), since the channel matrix is strictly diagonally every one symbol and expressible in one state diagram. Journal of the Physical Society of Japan, 8(4):484–494, 1953. dominant, using Gershgorin circle theorem [21] that for any [5] Claude Shannon. The zero error capacity of a noisy channel. IRE eigenvalues λ1, λ2,...,λn, we must have: Transactions on Information Theory, 2(3):8–19, 1956. [6] B Robert. Ash. information theory, 1990. λ ≥ A − |A | > 0. [7] Thuan Nguyen and Thinh Nguyen. On closed form capacities of i ii ij 6 discrete memoryless channels. In 2018 IEEE 87th Vehicular Technology Xj=i Conference (VTC Spring), pages 1–5. IEEE, 2018. [8] Keye Martin, Ira S Moskowitz, and Gerard Allwein. Algebraic in- Thus, det(A)= λ1λ2 ...λn > 0. Therefore, A is invertible. formation theory for binary channels. Theoretical Computer Science, Claim (b) is a well-known algebra result [28]. −1 411(19):1918–1927, 2010. For claim (c), due to AA = I and Aij > 0 ∀ i, j, [9] Xue-Bin Liang. An algebraic, analytic, and algorithmic investigation therefore, for ∀ j exists at least i such that A−1 6=0. Therefore on the capacity and capacity-achieving input probability distributions of ij finite-input–finite-output discrete memoryless channels. IEEE Transac- the largest absolute entry in each column 6=0. Claim (c) can tions on Information Theory, 54(3):1003–1023, 2008. be obtained by contradiction. Suppose that the largest absolute th −1 −1 th [10] Paul Cotae, Ira S Moskowitz, and Myong H Kang. Eigenvalue character- entry in j column of A is Aij in i row, that said ization of the capacity of discrete memoryless channels with invertible |A−1| ≥ |A−1| for ∀ k. We suppose that A−1 < 0. Thus: channel matrices. In Information Sciences and Systems (CISS), 2010 ij kj ij 44th Annual Conference on, pages 1–6. IEEE, 2010. n n [11] Michael Grant, Stephen Boyd, and Yinyu Ye. Cvx: Matlab software for A A−1 ≤ −A |A−1| + A |A−1| (65) disciplined convex programming, 2008. ik kj ii ij ik ij [12] Abhishek Sinha. Convex optimization methods for computing channel k=1 k=1,k=6 i X n X capacity. 2014. −1 [13] Fr´ed´eric Dupuis, Wei Yu, and Frans MJ Willems. Blahut-arimoto = (−Aii + Aik)|Aij | algorithms for computing channel capacity and rate-distortion with side k=1,k=6 i X information. In Information Theory, 2004. ISIT 2004. Proceedings. < 0, (66) International Symposium on, page 179. IEEE, 2004. [14] Gerald Matz and Pierre Duhamel. Information geometric formulation which contradicts with n A A−1 = I ≥ 0. Thus, the and interpretation of accelerated blahut-arimoto-type algorithms. In k=1 ik kj ij −1 Information theory workshop, 2004. IEEE, pages 66–70. IEEE, 2004. largest absolute value in each column of A is positive. That [15] Yaming Yu. Squeezing the arimoto–blahut algorithm for faster con- th P−1 −1 −1 said in j column, if |Aij | ≥ |Akj | for ∀ k, then Aij > 0. IEEE Transactions on Information Theory vergence. , 56(7):3149–3157, Now, suppose that the largest absolute element in jth 2010. −1 −1 −1 [16] Bernd Meister and Werner Oettli. On the capacity of a discrete, constant column of A , is Aij with i 6= j and Aij > 0. Then: channel. Information and Control, 11(3):341–351, 1967. n [17] Masakazu Jimbo and Kiyonori Kunisawa. An iteration method for −1 calculating the relative capacity. Information and Control, 43(2):216– 0 = AikAkj 223, 1979. Xk=1 [18] Claude E Shannon and Warren Weaver. The mathematical theory of n −1 −1 communication. University of Illinois press, 1998. ≥ Aii|Aij |− Aik|Aij | (67) [19] T Cover. An achievable rate region for the broadcast channel. IEEE k=1,k=6 i Transactions on Information Theory, 21(4):399–404, 1975. n X [20] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cam- = (A − A )A−1 bridge university press, 2004. ii ik ij 6 [21] Eric W Weisstein. Gershgorin circle theorem. 2003. k=1X,k=i [22] Miroslav Fiedler and Vlastimil Pt´ak. Diagonally dominant matrices. > 0, (68) Czechoslovak Mathematical Journal, 17(3):420–433, 1967. −1 th [23] Thuan Nguyen and Thinh Nguyen. Relay-miso channel. Available at with (67) due to Aij is the largest absolute element in j http://ir.library.oregonstate.edu/concern/articles/tb09jb69h, 2018. column and (68) due to A is strictly diagonally dominant [24] Thomas Cover and A EL Gamal. Capacity theorems for the relay channel. IEEE Transactions on information theory, 25(5):572–584, matrix. This is a contradiction. Therefore, the largest absolute th −1 −1 −1 1979. entry in j column of A should be Ajj and Ajj > 0. B. Proof of Lemma 2 jth column. Hence, n −1 −1 −1 Akj Ajj ≥ Akk|Akj |− Aki|Akj | First, let’s show that the second largest absolute value in i=1,i=6 k;i=6 j −1 each column of A is a negative entry by contradiction − Xn |A 1|(A − A ) method. Suppose that the second largest absolute value in th −1 kj kk i=1,i=6 k;i=6 j ki j Ajj ≥ −1 th −1 Akj column of A is positive and in k row (k 6= j), Akj ≥ 0. P Consider, Akk Akk − −1 −1 cmin(A) Ajj ≥ |Akj | (79) Akk

n cmin(A) −1 −1 −1 0 = AkiAij Ajj ≥ |Akj |[cmin(A) − 1], (80) i=1 X n for ∀ j, with (79) due to Definition 2 and (9) such that −1 −1 −1 Akk n n ≥ Akj A + AkkA − | AkiA | (69) jj kj ij ≥ i=1,i=6 k Aki ≥ i=1,i=6 k,i=6 j Aki. Thus, we i=1,i=6 k;i=6 j cmin(A) X n have: P P − − − A A 1 A A 1 A A 1 −1 ≥ kj jj + kk kj − | ki ij | (70) A c (A) − 1 c (A−T )= jj ≥ min . (81) i=1,i=6 k;i=6 j j −1 n Xn k=6 j |Akj | − 1 −1 −1 −1 ≥ Akj Ajj + AkkAkj − Aki|Aij | (71) P 6 6 Thus, (12) is proven. i=1,iX=k;i=j n Next, we note that from (80) −1 −1 −1 ≥ Akj Ajj + AkkAkj − Aki|Akj | (72) −1 Ajj i=1,i=6 k;i=6 j −1 ≥ |Akj |, (82) Xn cmin(A) − 1 = A A−1 + A−1(A − A ) (73) kj jj kj kk ki for ∀ k. Moreover, from Lemma 1, A−1 ≥ 0 and is the largest i=1,i=6 k;i=6 j jj X th > 0, (74) entry in j row. Thus, for an arbitrary L and S, −1 −1 −1 −1 Ajj |ALj | + |ASj | ≤ Ajj + cmin(A) − 1 with (69) due to the fact that C ≥ −|C| for ∀ C, (70) due −1 cmin(A) = Ajj , (83) to the triangle inequality, (71) due to Aki is positive, (72) due cmin(A) − 1 to −1 is the second largest absolute value in th column of Akj j for ∀ j. Thus, (13) is proven. −1 −1 A , (73) due to the assumption that Akj ≥ 0 and (74) due n n C. Proof of Lemma 3 to (11) such that Akk ≥ i=1,i=6 k Aki ≥ i=1,i=6 k;i=6 j Aki. Thus, the second largest absolute value in column of A−1 is Consider the matrix B = A−1A−T , B is symmetric, all its −1 P P−1 negative (Akj < 0). Due to Lemma 1 part c, Ajj is the largest eigenvalues are real and satisfy the Rayleigh quotient [29]. Let −1 max absolute value entry and Ajj > 0. Similarly, λB be the maximum eigenvalue of B then from [29] x∗Bx R(B, x)= ≤ λmax. (84) x∗x B n T 0 = A A−1 Consider the vector e = [0,..., 1,..., 0] with entry ki ij th i=1 “1” is in the i column. Let x = e in (84), we have: X n max −1 −1 −1 Bii ≤ λ . (85) ≤ Akj Ajj + AkkAkj + | AkiAij | (75) B i=1,i=6 k;i=6 j Xn Thus, −1 −1 −1 ≤ Akj Ajj + AkkAkj + |AkiAij | (76) max λB ≥ Bii 6 6 i=1,iX=k;i=j n n −1 −1 −1 −1 −1 = Aij Aij ≤ Akj Ajj + AkkAkj + Aki|Aij | (77) j=1 i=1,i=6 k;i=6 j X Xn −1 2 −1 −1 −1 ≥ (Aii ) . (86) ≤ Akj Ajj − Akk|Akj | + Aki|Akj | (78) max i=1,i=6 k;i=6 j Now since B is a symmetric matrix λB = σmax(B) X −1 −T [28]. However, from [28], σmax(B) = σmax(A A ) = 2 −1 −1 1 σmaxA and σmaxA = . Thus: with (75) due to the fact that C ≤ |C| for ∀ C, (76) due to σmin(A) the triangle inequality, (77) due to Aki ≥ 0, ∀ i and (78) due 1 − −1 −1 1 to A < 0 and A is the second largest absolute value in ≥ Aii . (87) kj kj σmin(A) −1 th From Lemma 1-c, the largest entry in A must be a • Suppose that Hmax(A) achieves at k row, then diagonal element, thus n

Hmax(A) = −( Aki log Aki) −1 1 i=1 max Aij ≤ . X n i,j σmin(A) = −(Akk log Akk + Aki log Aki) 6 D. Proof of Lemma 4 i=1X,i=k = −Akk log Akk For the first claim, since is a stochastic matrix, n A Aki Aki − (1−Akk) (log +log(1−Akk)) 1−Akk 1−Akk 6 A1 = 1. i=1X,i=k = −Akk log Akk n Left multiply both sides by −1 results in 1 −11 For the Aki Aki A = A . − (1 − Akk) log −T T 1 − Akk 1 − Akk second claim, left multiplying y = A x by 1 , we have: 6 i=1X,i=k − (1 − Akk) log(1 − Akk) 1T 1T −T T −11 T 1 y = A x = x A = x =1, ≤ −Akk log Akk + (1 − Akk) log(n − 1)

− (1 − Akk) log(1 − Akk) (94) where we use A−11 = 1 in the previous claim. 1 − Akk n ∗ ∗ = −(Akk log Akk + (1 − Akk) log( )) Thus, we have i pi = 1 since from (30), q is a n − 1 probability mass vector. c (A) c (A) P ≤ −( min log min cmin(A) + 1 cmin(A) + 1 c (A) E. Proof of Corollary 2 1 − min c (A) c (A) + 1 + (1 − min ) log min ) (95) Lemma 5. Lower bound of σmin(A) and upper bound of cmin(A) + 1 n − 1 ∗ ∗ Hmax(A) are σ and Hmax(A), respectively log(n−1)−cmin(A) log cmin(A) = log(cmin(A)+1)+ , cmin(A) + 1 ∗ cmin(A) − n/2 σmin(A) ≥ σ = , (88) A A c (A)+1 n ki ki min with (94) is due to − i=1,i=6 k log is the 1 − Akk 1 − Akk and entropy of n−1 elementsP which is bounded by log(n−1). For 1 − x (95), first we show that f(x)= −(x log x+(1−x) log( )) ∗ n − 1 Hmax(A) ≤ H (A), (89) x max is monotonically decreasing function for ≥ n − 1. 1 − x where Indeed, d(f(x)) ∗ log(n−1)−cmin(A) log cmin(A) = log x − log(1 − x) − log(n − 1) Hmax(A)=log(cmin(A)+1)+ . d(x) cmin(A)+1 x (90) = −(log − log(n − 1)). 1 − x Proof. Due to the channel matrix is a strictly diagonally x d(f(x)) dominant positive matrix. Thus, we have Thus, if ≥ n − 1 then ≤ 0. However, from 1 − x d(x) (91), cmin(A) Akk ≥ , (91) cmin(A) cmin(A)+1 Akk cmin(A)+1 ≥ = cmin(A). (96) 1 − Akk cmin(A) cmin(A) 1 1 − Rk(A)=1 − Akk ≤ 1 − = , (92) cmin(A)+1 cmin(A)+1 cmin(A)+1 From (52) ∗ nH (A) j=n j=n 2 max∗ 2 n − 1 cmin(A) ≥ 1+(n−1) 2 σ ≥ 1+(n−1) >n−1, (97) Ck(A)= Ajk ≤ Rj (A) ≤ , cmin(A)+1 ∗ j=1,j=6 k j=1,j=6 k nHmax(A) Akk X X due to σ∗ ≥ 0 and n ≥ 2. Thus, > n − 1. (93) 1 − Akk for ∀ k with (91) due to (10), (92) due to (91), (93) due to From (96) and (97), f(x) is decreasing function and (95) is constructed by plugging the lower bound of Akk in (91). the fact that ∀ j 6= k, Ajk ≤ j=6 k Ajk = Rj (A) and each 1 • Secondly, the lower bound of σmin(A) can be found in Rj (A) ≤ which isP proven in (91). Now, we are [30] (Theorem 3) cmin(A)+1 ready to establish the upper bound of Hmax(A) and the lower 1 bound of σ (A), respectively. σmin(A) ≥ min |Akk|− (Rk(A)+ Ck(A)), (98) min 1≤k≤n 2 or in [31] (Theorem 0)

1 2 2 1/2 σmin(A)≥ min ({4|Akk| +(Rk(A)−Ck(A)) } −[Rk(A)+Ck(A)]), 1≤k≤n 2 (99) j=n with Rk(A) = j=1,j=6 k |Akj | and Ck(A) = j=n |A |, respectively. Thus, if we use the lower bound j=1,j=6 k jk P established in (99), P 1 cmin(A) 2 1/2 σmin(A) ≥ ({4[ ] } 2 cmin(A)+1 1 n − 1 − [ + ]) (100) cmin(A)+1 cmin(A)+1 c (A) − n/2 = min = σ∗, cmin(A)+1

with (100) due to (91), (92), (93) and the fact that {Rk(A) − 2 Ck(A)} ≥ 0. A similar lower bound can be constructed using (98)

cmin(A) σmin(A) ≥ cmin(A)+1 1 1 n − 1 − ( + ) (101) 2 cmin(A)+1 cmin(A)+1 c (A) − n/2 = min = σ∗, cmin(A)+1 with (101) due to (91), (92) and (93). As seen, both our approaches yield a same lower bound of σmin(A). However, 2 (99) is tighter than (98) due to {Rk(A) − Ck(A)} .