Capacity of Continuous Channels with Memory Via Directed Information Neural Estimator

Capacity of Continuous Channels with Memory via Directed Information Neural Estimator

Ziv Aharoni1, Dor Tsur1, Ziv Goldfeld2, Haim H. Permuter1

1Ben-Gurion University of the Negev

2Cornell University International Symposium on Information Theory June 21st, 2020

Ziv Aharoni Capacity via DINE 1/18 Communication Channel

Xi Yi M EncoderChannel Decoder M

Yi−1 ∆

Continuous alphabet Time invariant channel with memory channel is unknown

Ziv Aharoni Capacity via DINE 2/18 Capacity

Feedback is not present:

1 n n CFF = lim sup I (X ; Y ) n→∞ PX n n

Feedback is present:

1 n n CFB = lim sup I (X → Y ) n→∞ n PX nkY n−1

where I (X n → Y n) is the directed information (DI)

Ziv Aharoni Capacity via DINE 3/18 Capacity

Feedback is not present:

1 n n CFF = lim sup I(X → Y ) n→∞ PX n n

Feedback is present:

1 n n CFB = lim sup I(X → Y ) n→∞ n PX nkY n−1

where I (X n → Y n) is the directed information (DI)

DI is a unifying measure for feed-forward (FF) and feedback (FB) capacity

Ziv Aharoni Capacity via DINE 3/18 Talk Outline

Directed Information Neural Estimator (DINE)

Xi Yi M Channel Decoder M

b Gradient

Yi−1 ∆

Ziv Aharoni Capacity via DINE 4/18 Talk Outline

Directed Information Neural Estimator (DINE) Neural Distribution Transformer (NDT)

Xi Yi M Channel Decoder M

b Gradient

Yi−1 ∆

Ziv Aharoni Capacity via DINE 4/18 Talk Outline

Directed Information Neural Estimator (DINE) Neural Distribution Transformer (NDT) Capacity estimation

Xi Yi M Channel Decoder M

b Gradient

Yi−1 ∆

Ziv Aharoni Capacity via DINE 4/18 Preliminaries - Donsker-Varadhan

Theorem (Donsker-Varadhan Representation) The KL-divergence between the probability measures P and Q, can be represented by

T DKL(PkQ)= sup EP [T] − log EQ e T:Ω−→R where, T is measurable and expectations are ﬁnite.

For mutual information:

E E T I (X ; Y )= sup PXY [T] − log PX PY e T:Ω−→R

Ziv Aharoni Capacity via DINE 5/18 MINE (Y. Bengio Keynoe ISIT ’19)

Mutual Information Neural Estimator: n Given {xi , yi }i=1 Approximation

ˆ E E Tθ I (X ; Y ) = sup PXY [Tθ] − log PX PY e θ∈Θ Estimation

n n 1 1 Tθ(xi ,yei ) Iˆn(X , Y ) = sup Tθ(xi , yi ) − log e θ∈Θ n n i=1 i=1 X X

Ziv Aharoni Capacity via DINE 6/18 Estimator Derivation

DI as entropies diﬀerence

I (X n → Y n)= h (Y n) − h (Y nkX n)

n n n i i−1 where h (Y kX )= i=1 h Yi |X , Y Using an reference measure:P I (X n → Y n)= I (X n−1 → Y n−1)+

DKL(PY nkX n kPY n−1kX n−1 ⊗ PYe |PX n ) − DKL(PY n kPY n−1 ⊗ PYe )

D(n) D(n) Y kX Y | {z } | {z } PYe is some uniform i.i.d reference measure of the dataset.

Ziv Aharoni Capacity via DINE 7/18 Estimator Derivation

DI Rate as a diﬀerence of KL-divergences:

n n n−1 n−1 (n) (n) I (X → Y )= I (X → Y ) + DY kX − DY

increment in info. in step n | {z }

Ziv Aharoni Capacity via DINE 8/18 Estimator Derivation

DI Rate as a diﬀerence of KL-divergences:

(n) (n) n→∞ DY kX − DY −−−→ I (X →Y) The limit exists for ergodic and stationary processes

Ziv Aharoni Capacity via DINE 8/18 Estimator Derivation

DI Rate as a diﬀerence of KL-divergences:

(n) (n) n→∞ DY kX − DY −−−→ I (X →Y) The goal: (n) (n) Estimate DY kX , DY

Ziv Aharoni Capacity via DINE 8/18 Directed Information Neural Estimator

(n) (n) Apply DV formula on DY kX , DY :

D(n) E Y n E Y n−1, Y˜ Y = sup PY n [T( )] − PY n−1 ⊗PY˜ exp T( ) T :Ω→R h n oi b P n−1 where the optimal solution is T∗ = log Yn|Y PY˜

Ziv Aharoni Capacity via DINE 9/18 Directed Information Neural Estimator

Approximate T with a recurrent neural network (RNN)

D(n) E Y n E Y n−1, Y˜ Y = sup PY n [TθY ( )] − PY n−1 ⊗PY˜ exp TθY ( ) θY h n oi b

Ziv Aharoni Capacity via DINE 9/18 Directed Information Neural Estimator

Estimate expectations with empirical means

n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X b

Ziv Aharoni Capacity via DINE 9/18 Directed Information Neural Estimator

Estimate expectations with empirical means

n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X b (n) (n) (n) Finally, I (X →Y)= DXY − DY

b b b

Ziv Aharoni Capacity via DINE 9/18 Consistency

Theorem (DINE consistency) ∞ P Let {Xi , Yi }i=1 ∼ be jointly stationary ergodic stochastic processes.

Then, there exist RNNs F1 ∈ RNNdy ,1, F2 ∈ RNNdxy ,1, such that

DINE In(F1, F2) is a strongly consistent estimator of I (X →Y), i.e.,

a.s b lim In(F1, F2) = I (X →Y) n→∞ b

Ziv Aharoni Capacity via DINE 10/18 Consistency

Theorem (DINE consistency) ∞ P Let {Xi , Yi }i=1 ∼ be jointly stationary ergodic stochastic processes.

Then, there exist RNNs F1 ∈ RNNdy ,1, F2 ∈ RNNdxy ,1, such that

DINE In(F1, F2) is a strongly consistent estimator of I (X →Y), i.e.,

a.s b lim In(F1, F2) = I (X →Y) n→∞

Sketch of proof: b Represent the solution T∗ by a dynamic system. Universal approximation of dynamical system with RNNs. Estimation of expectations with empirical means.

Ziv Aharoni Capacity via DINE 10/18 Implementation

n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X b

Ziv Aharoni Capacity via DINE 11/18 Implementation

n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X Adjustb RNN to process both inputs and carry the state generated by true samples

Ziv Aharoni Capacity via DINE 11/18 Implementation

n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X Adjustb RNN to process both inputs and carry the state generated by true samples

S1 S1 ST ST

e e

0 ST −1 F F ... F F

Y1 Y1 YT YT Ziv Aharoni Capacity via DINE 11/18 e e Implementation

(n) Complete system layout for the calculation of DY

b Input Yi Si Dense i−1 Tθ (Yi |Y ) Modiﬁed Y DY (θY , Dn) LSTM DV Layer b Yi Si Reference Gen. i−1 Dense TθY (Yi |Y ) e e e

Ziv Aharoni Capacity via DINE 12/18 NDT

Neural Distribution Transformer (NDT)

Xi Yi M Channel Decoder M

b Gradient

Yi−1 ∆

Ziv Aharoni Capacity via DINE 13/18 NDT

Model M as i.i.d Gaussian noise {Ni }i∈Z. The NDT a mapping i w/o feedback: NDT : N 7−→ Xi i i−1 w/ feedback: NDT : N , Y 7−→ Xi

Ziv Aharoni Capacity via DINE 14/18 NDT

Model M as i.i.d Gaussian noise {Ni }i∈Z. The NDT a mapping i w/o feedback: NDT : N 7−→ Xi i i−1 w/ feedback: NDT : N , Y 7−→ Xi NDT is modeled by an RNN

Yi−1 Channel Xi LSTM Dense Dense Ni Constraint

Ziv Aharoni Capacity via DINE 14/18 Capacity Estimation

Iterating between DINE and NDT.

Yi−1 Feedback ∆

NDT Xi Channel (RNN) PY |X i Y i−1 Noise i Ni

Gradient

Yi Output DINE In(X →Y) (RNN) b

Ziv Aharoni Capacity via DINE 15/18 Results

Channel - MA(1) additive Gaussian noise (AGN):

Zi = αUi−1 + Ui

Yi = Xi + Zi

i.i.d. where, Ui ∼ N (0, 1), Xi is the channel input sequence bound E 2 to the power constraint [Xi ] ≤ P, and Yi is the channel output.

Ziv Aharoni Capacity via DINE 16/18 MA(1) AGN Results

Estimation performance

1.8 2

1.6 1.8

1.6 1.4

1.4 1.2

1.2 1 1 0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0 -20 -15 -10 -5 0 5 10 15 -20 -15 -10 -5 0 5 10 15

(a) Feed-forward Capacity (b) Feedback Capacity

Ziv Aharoni Capacity via DINE 17/18 Conclusion and Future Work

Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds

Ziv Aharoni Capacity via DINE 18/18 Conclusion and Future Work

Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds

Future Work: Generalize for more complex scenarios (e.g multi-user) Obtain provable bounds on fundamental limits

Ziv Aharoni Capacity via DINE 18/18 Conclusion and Future Work

Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds

Future Work: Generalize for more complex scenarios (e.g multi-user) Obtain provable bounds on fundamental limits Thank You!

Ziv Aharoni Capacity via DINE 18/18