Capacity of Continuous Channels with Memory via Directed Information Neural Estimator
Ziv Aharoni1, Dor Tsur1, Ziv Goldfeld2, Haim H. Permuter1
1Ben-Gurion University of the Negev
2Cornell University International Symposium on Information Theory June 21st, 2020
Ziv Aharoni Capacity via DINE 1/18 Communication Channel
Xi Yi M EncoderChannel Decoder M
b
Yi−1 ∆
Continuous alphabet Time invariant channel with memory channel is unknown
Ziv Aharoni Capacity via DINE 2/18 Capacity
Feedback is not present:
1 n n CFF = lim sup I (X ; Y ) n→∞ PX n n
Feedback is present:
1 n n CFB = lim sup I (X → Y ) n→∞ n PX nkY n−1
where I (X n → Y n) is the directed information (DI)
Ziv Aharoni Capacity via DINE 3/18 Capacity
Feedback is not present:
1 n n CFF = lim sup I(X → Y ) n→∞ PX n n
Feedback is present:
1 n n CFB = lim sup I(X → Y ) n→∞ n PX nkY n−1
where I (X n → Y n) is the directed information (DI)
DI is a unifying measure for feed-forward (FF) and feedback (FB) capacity
Ziv Aharoni Capacity via DINE 3/18 Talk Outline
Directed Information Neural Estimator (DINE)
Xi Yi M Channel Decoder M
b Gradient
Yi−1 ∆
Ziv Aharoni Capacity via DINE 4/18 Talk Outline
Directed Information Neural Estimator (DINE) Neural Distribution Transformer (NDT)
Xi Yi M Channel Decoder M
b Gradient
Yi−1 ∆
Ziv Aharoni Capacity via DINE 4/18 Talk Outline
Directed Information Neural Estimator (DINE) Neural Distribution Transformer (NDT) Capacity estimation
Xi Yi M Channel Decoder M
b Gradient
Yi−1 ∆
Ziv Aharoni Capacity via DINE 4/18 Preliminaries - Donsker-Varadhan
Theorem (Donsker-Varadhan Representation) The KL-divergence between the probability measures P and Q, can be represented by
T DKL(PkQ)= sup EP [T] − log EQ e T:Ω−→R where, T is measurable and expectations are finite.
For mutual information:
E E T I (X ; Y )= sup PXY [T] − log PX PY e T:Ω−→R
Ziv Aharoni Capacity via DINE 5/18 MINE (Y. Bengio Keynoe ISIT ’19)
Mutual Information Neural Estimator: n Given {xi , yi }i=1 Approximation
ˆ E E Tθ I (X ; Y ) = sup PXY [Tθ] − log PX PY e θ∈Θ Estimation
n n 1 1 Tθ(xi ,yei ) Iˆn(X , Y ) = sup Tθ(xi , yi ) − log e θ∈Θ n n i=1 i=1 X X
Ziv Aharoni Capacity via DINE 6/18 Estimator Derivation
DI as entropies difference
I (X n → Y n)= h (Y n) − h (Y nkX n)
n n n i i−1 where h (Y kX )= i=1 h Yi |X , Y Using an reference measure:P I (X n → Y n)= I (X n−1 → Y n−1)+
DKL(PY nkX n kPY n−1kX n−1 ⊗ PYe |PX n ) − DKL(PY n kPY n−1 ⊗ PYe )
D(n) D(n) Y kX Y | {z } | {z } PYe is some uniform i.i.d reference measure of the dataset.
Ziv Aharoni Capacity via DINE 7/18 Estimator Derivation
DI Rate as a difference of KL-divergences:
n n n−1 n−1 (n) (n) I (X → Y )= I (X → Y ) + DY kX − DY
increment in info. in step n | {z }
Ziv Aharoni Capacity via DINE 8/18 Estimator Derivation
DI Rate as a difference of KL-divergences:
(n) (n) n→∞ DY kX − DY −−−→ I (X →Y) The limit exists for ergodic and stationary processes
Ziv Aharoni Capacity via DINE 8/18 Estimator Derivation
DI Rate as a difference of KL-divergences:
(n) (n) n→∞ DY kX − DY −−−→ I (X →Y) The goal: (n) (n) Estimate DY kX , DY
Ziv Aharoni Capacity via DINE 8/18 Directed Information Neural Estimator
(n) (n) Apply DV formula on DY kX , DY :
D(n) E Y n E Y n−1, Y˜ Y = sup PY n [T( )] − PY n−1 ⊗PY˜ exp T( ) T :Ω→R h n oi b P n−1 where the optimal solution is T∗ = log Yn|Y PY˜
Ziv Aharoni Capacity via DINE 9/18 Directed Information Neural Estimator
Approximate T with a recurrent neural network (RNN)
D(n) E Y n E Y n−1, Y˜ Y = sup PY n [TθY ( )] − PY n−1 ⊗PY˜ exp TθY ( ) θY h n oi b
Ziv Aharoni Capacity via DINE 9/18 Directed Information Neural Estimator
Estimate expectations with empirical means
n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X b
Ziv Aharoni Capacity via DINE 9/18 Directed Information Neural Estimator
Estimate expectations with empirical means
n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X b (n) (n) (n) Finally, I (X →Y)= DXY − DY
b b b
Ziv Aharoni Capacity via DINE 9/18 Consistency
Theorem (DINE consistency) ∞ P Let {Xi , Yi }i=1 ∼ be jointly stationary ergodic stochastic processes.
Then, there exist RNNs F1 ∈ RNNdy ,1, F2 ∈ RNNdxy ,1, such that
DINE In(F1, F2) is a strongly consistent estimator of I (X →Y), i.e.,
a.s b lim In(F1, F2) = I (X →Y) n→∞ b
Ziv Aharoni Capacity via DINE 10/18 Consistency
Theorem (DINE consistency) ∞ P Let {Xi , Yi }i=1 ∼ be jointly stationary ergodic stochastic processes.
Then, there exist RNNs F1 ∈ RNNdy ,1, F2 ∈ RNNdxy ,1, such that
DINE In(F1, F2) is a strongly consistent estimator of I (X →Y), i.e.,
a.s b lim In(F1, F2) = I (X →Y) n→∞
Sketch of proof: b Represent the solution T∗ by a dynamic system. Universal approximation of dynamical system with RNNs. Estimation of expectations with empirical means.
Ziv Aharoni Capacity via DINE 10/18 Implementation
n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X b
Ziv Aharoni Capacity via DINE 11/18 Implementation
n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X Adjustb RNN to process both inputs and carry the state generated by true samples
Ziv Aharoni Capacity via DINE 11/18 Implementation
n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X Adjustb RNN to process both inputs and carry the state generated by true samples
S1 S1 ST ST
e e
0 ST −1 F F ... F F
Y1 Y1 YT YT Ziv Aharoni Capacity via DINE 11/18 e e Implementation
(n) Complete system layout for the calculation of DY
b Input Yi Si Dense i−1 Tθ (Yi |Y ) Modified Y DY (θY , Dn) LSTM DV Layer b Yi Si Reference Gen. i−1 Dense TθY (Yi |Y ) e e e
Ziv Aharoni Capacity via DINE 12/18 NDT
Neural Distribution Transformer (NDT)
Xi Yi M Channel Decoder M
b Gradient
Yi−1 ∆
Ziv Aharoni Capacity via DINE 13/18 NDT
Model M as i.i.d Gaussian noise {Ni }i∈Z. The NDT a mapping i w/o feedback: NDT : N 7−→ Xi i i−1 w/ feedback: NDT : N , Y 7−→ Xi
Ziv Aharoni Capacity via DINE 14/18 NDT
Model M as i.i.d Gaussian noise {Ni }i∈Z. The NDT a mapping i w/o feedback: NDT : N 7−→ Xi i i−1 w/ feedback: NDT : N , Y 7−→ Xi NDT is modeled by an RNN
Yi−1 Channel Xi LSTM Dense Dense Ni Constraint
Ziv Aharoni Capacity via DINE 14/18 Capacity Estimation
Iterating between DINE and NDT.
Yi−1 Feedback ∆
NDT Xi Channel (RNN) PY |X i Y i−1 Noise i Ni
Gradient
Yi Output DINE In(X →Y) (RNN) b
Ziv Aharoni Capacity via DINE 15/18 Results
Channel - MA(1) additive Gaussian noise (AGN):
Zi = αUi−1 + Ui
Yi = Xi + Zi
i.i.d. where, Ui ∼ N (0, 1), Xi is the channel input sequence bound E 2 to the power constraint [Xi ] ≤ P, and Yi is the channel output.
Ziv Aharoni Capacity via DINE 16/18 MA(1) AGN Results
Estimation performance
1.8 2
1.6 1.8
1.6 1.4
1.4 1.2
1.2 1 1 0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0 -20 -15 -10 -5 0 5 10 15 -20 -15 -10 -5 0 5 10 15
(a) Feed-forward Capacity (b) Feedback Capacity
Ziv Aharoni Capacity via DINE 17/18 Conclusion and Future Work
Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds
Ziv Aharoni Capacity via DINE 18/18 Conclusion and Future Work
Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds
Future Work: Generalize for more complex scenarios (e.g multi-user) Obtain provable bounds on fundamental limits
Ziv Aharoni Capacity via DINE 18/18 Conclusion and Future Work
Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds
Future Work: Generalize for more complex scenarios (e.g multi-user) Obtain provable bounds on fundamental limits Thank You!
Ziv Aharoni Capacity via DINE 18/18