On Bounds and Closed Form Expressions for Capacities Of
Total Page:16
File Type:pdf, Size:1020Kb
On Bounds and Closed Form Expressions for Capacities of Discrete Memoryless Channels with Invertible Positive Matrices Thuan Nguyen Thinh Nguyen School of Electrical and School of Electrical and Computer Engineering Computer Engineering Oregon State University Oregon State University Corvallis, OR, 97331 Corvallis, 97331 Email: [email protected] Email: [email protected] Abstract—While capacities of discrete memoryless channels are That said, it is still beneficial to find the channel capacity well studied, it is still not possible to obtain a closed form expres- in closed form expression for a number of reasons. These sion for the capacity of an arbitrary discrete memoryless channel. include (1) formulas can often provide a good intuition about This paper describes an elementary technique based on Karush- Kuhn-Tucker (KKT) conditions to obtain (1) a good upper bound the relationship between the capacity and different channel of a discrete memoryless channel having an invertible positive parameters, (2) formulas offer a faster way to determine the channel matrix and (2) a closed form expression for the capacity capacity than that of algorithms, and (3) formulas are useful if the channel matrix satisfies certain conditions related to its for analytical derivations where closed form expression of the singular value and its Gershgorin’s disk. capacity is needed in the intermediate steps. To that end, our Index Terms—Wireless Communication, Convex Optimization, Channel Capacity, Mutual Information. paper describes an elementary technique based on the theory of convex optimization, to find closed form expressions for (1) a new upper bound on capacities of discrete memoryless I. INTRODUCTION channels with positive invertible channel matrix and (2) the Discrete memoryless channels (DMC) play a critical role optimality conditions of the channel matrix such that the upper in the early development of information theory and its ap- bound is precisely the capacity. In particular, the optimality plications. DMCs are especially useful for studying many conditions establish a relationship between the singular value well-known modulation/demodulation schemes (e.g., PSK and and the Gershgorin’s disk of the channel matrix. QAM ) in which the continuous inputs and outputs of a II. PRELIMINARIES channel are quantized into discrete symbols. Thus, there exists a rich literature on the capacities of DMCs [1], [2], [3], [4], A. Convex Optimization and KKT Conditions [5], [6], [7]. In particular, capacities of many well-known A DMC is characterized by a random variable X ∈ channels such as (weakly) symmetric channels can be written {x1, x2,...,xm} for the inputs, a random variable Y ∈ in elementary formulas [1]. However, it is often not possible {y1,y2,...,yn} for the outputs, and a channel matrix A ∈ Rm×n arXiv:2001.01847v1 [cs.IT] 7 Jan 2020 to express the capacity of an arbitrary DMC in a closed form . In this paper, we consider DMCs with equal number expression [1]. Recently, several papers have been able to of inputs and outputs n, thus A ∈ Rn×n. The matrix entry obtain closed form expressions for a small class of DMCs Aij represents the conditional probability that given xi is T with small alphabets. For example, Martin et al. established transmitted, yj is received. Let p = (p1,p2,...,pn) be the closed form expression for a general binary channel [8]. Liang input probability mass vector (pmf) of X, where pi denotes showed that the capacity of channels with two inputs and three the probability of xi to be transmitted, then the pmf of Y T T outputs can be expressed as an infinite series [9]. Paul Cotae is q = (q1, q2,...,qn) = A p. The mutual information et al. found the capacity of two input and two output channels between X and Y is: in term of the eigenvalues of the channel matrices [10]. On the I(X; Y )= H(Y ) − H(Y |X), (1) other hand, the problem of finding the capacity of a discrete memoryless channel can be formulated as a convex optimiza- where n tion problem [11], [12]. Thus, efficient algorithmic solutions H(Y ) = − q log q (2) exist. There is also others algorithms such as Arimoto-Blahut j j j=1 algorithm [2], [3] which can be accelerated in [13], [14], [15]. Xn n In [16], [17], another iterative method which can yield both H(Y |X) = − piAij log Aij . (3) upper and lower bounds for the channel capacity. i=1 j=1 X X −1 −1 The mutual information function can be written as: where Aji denotes the entry (j, i) of the inverse matrix A . n n n Kmax = maxj Kj and Kmin = minj Kj are called the max- T T I(X; Y )= − (A p)j log (A p)j + piAij log Aij , imum and minimum inverse row entropies of A, respectively. j=1 i=1 j=1 X X X (4) Definition 2. Let A ∈ Rn×n be a square matrix. The T th th where (A p)j denotes the j component of the vector q = Gershgorin radius of i row of A [21] is defined as: (AT p). The capacity C associated with a channel matrix A n is the theoretical maximum rate at which information can be Ri(A)= |Aij |. (7) transmitted over the channel without the error [5], [18], [19]. j=6 i X It is obtained using the optimal pmf p∗ such that I(X; Y ) The Gershgorin ratio of ith row of A is defined as: is maximized. For a given channel matrix A, I(X; Y ) is a concave function of p [1]. Therefore, maximizing I(X; Y ) is Aii ci(A)= , (8) equivalent to minimizing −I(X; Y ), and finding the capacity Ri(A) can be cast as the following convex problem: and the minimum Gershgorin ratio of A is defined as: Minimize: n n n Aii T T cmin(A) = min . (9) (A p)j log (A p)j − piAij log Aij . i Ri(A) j=1 i=1 j=1 X X X We note that since the channel matrix is a stochastic matrix, Subject to: therefore 0 p Aii Aii 1T cmin(A) = min = min . (10) ( p =1. i Ri(A) i 1 − Aii The optimal p∗ can be found efficiently using various Definition 3. Let A ∈ Rn×n be a square matrix. algorithms such as gradient methods [20], but in a few cases, (a) A is called a positive matrix if Aij > 0 for ∀ i, j. ∗ p can be found directly using the Karush-Kuhn-Tucker (KKT) (b) A is called a strictly diagonally dominant positive matrix conditions [20]. To explain the KKT conditions, we first state [22] if A is a positive matrix and the canonical convex optimization problem below: Problem P1: Minimize: f(x) Aii > Aij , ∀i, j. (11) 6 Subject to: Xj=i Lemma 1. Let A ∈ Rn×n be a strictly diagonally dominant gi(x) ≤ 0,i =1, 2, . n, positive channel matrix then (a) it is invertible; (b) the h (x)=0, j =1, 2, . , m, ( j eigenvalues of A−1 are 1 ∀ i where λ are eigenvalues of λi i −1 th where f(x), gi(x) are convex functions and hj (x) is a linear A, (c) Aii > 0 and the largest absolute element in the i −1 −1 −1 −1 function. column of A is Aii , i.e., Aii ≥ |Aji | for ∀ j. Define the Lagrangian function as: Proof. The proof is shown in Appendix A. n m n×n L(x, λ, ν)= f(x)+ λigi(x)+ νj hj (x), (5) Lemma 2. Let A ∈ R be a strictly diagonally dominant i=1 j=1 positive matrix, then: X X then the KKT conditions [20] states that, the optimal point cmin(A) − 1 ∗ c (A−T ) ≥ , ∀i. (12) x must satisfy: i (n − 1) ∗ gi(x ) ≤ 0, Moreover, for any rows k and l, ∗ hj (x )=0, −1 −1 −1 cmin(A) dL(x,λ,ν) |A | + |A | ≤ A , ∀i. (13) | ∗ ∗ ∗ =0, (6) ki li ii c (A) − 1 dx x=x ,λ=λ ,ν=ν min ∗ ∗ λi gi(x )=0, Proof. The proof is shown in Appendix B. λ∗ ≥ 0. i Rn×n Lemma 3. Let A ∈ be a strictly diagonally dominant for , . i =1, 2,...,n j =1, 2,...,m positive matrix, then: B. Elementary Linear Algebra Results −1 1 max Aij ≤ , (14) Definition 1. Let A ∈ Rn×n be an invertible channel matrix i,j σmin(A) n th and H(A )= − A log A be the entropy of i row, −1 i k=1 ik ik where max A is the largest entry in A−1 and σ (A) is define i,j ij min P the minimum singular value of A. n n n −1 −1 Proof. The proof is shown in Appendix C. Kj = − Aji Aik log Aik = Aji H(Ai), i=1 i=1 X Xk=1 X Rn×n ∗ Lemma 4. Let A ∈ be an invertible channel matrix, Based on (24) and (25), we must have λj =0, ∀j. Therefore, then all five KKT conditions (20-24) are reduced to the following A−11 = 1, two conditions: n i.e., the sum of any row of A−1 equals to 1. Furthermore, for q∗ =1, (26) any probability mass vector x, sum of the vector y = A−T x j j=1 equal to 1. X dI(X; Y ) ν∗ − =0. (27) Proof. The proof is shown in Appendix D. ∗ dqj III. MAIN RESULTS Next, Our first main result is an upper bound on the capacity n n dI(X; Y ) −1 of discrete memoryless channels having invertible positive = A Aik log Aik − (1 + log qj ) dq ji j i=1 channel matrices.