Capacity of Continuous Channels with Memory via Directed Neural Estimator

Ziv Aharoni1, Dor Tsur1, Ziv Goldfeld2, Haim H. Permuter1

1Ben-Gurion University of the Negev

2Cornell University International Symposium on June 21st, 2020

Ziv Aharoni Capacity via DINE 1/18 Communication Channel

Xi Yi M EncoderChannel Decoder M

b

Yi−1 ∆

Continuous alphabet Time invariant channel with memory channel is unknown

Ziv Aharoni Capacity via DINE 2/18 Capacity

Feedback is not present:

1 n n CFF = lim sup I (X ; Y ) n→∞ PX n n

Feedback is present:

1 n n CFB = lim sup I (X → Y ) n→∞ n PX nkY n−1

where I (X n → Y n) is the directed information (DI)

Ziv Aharoni Capacity via DINE 3/18 Capacity

Feedback is not present:

1 n n CFF = lim sup I(X → Y ) n→∞ PX n n

Feedback is present:

1 n n CFB = lim sup I(X → Y ) n→∞ n PX nkY n−1

where I (X n → Y n) is the directed information (DI)

DI is a unifying measure for feed-forward (FF) and feedback (FB) capacity

Ziv Aharoni Capacity via DINE 3/18 Talk Outline

Directed Information Neural Estimator (DINE)

Xi Yi M Channel Decoder M

b Gradient

Yi−1 ∆

Ziv Aharoni Capacity via DINE 4/18 Talk Outline

Directed Information Neural Estimator (DINE) Neural Distribution Transformer (NDT)

Xi Yi M Channel Decoder M

b Gradient

Yi−1 ∆

Ziv Aharoni Capacity via DINE 4/18 Talk Outline

Directed Information Neural Estimator (DINE) Neural Distribution Transformer (NDT) Capacity estimation

Xi Yi M Channel Decoder M

b Gradient

Yi−1 ∆

Ziv Aharoni Capacity via DINE 4/18 Preliminaries - Donsker-Varadhan

Theorem (Donsker-Varadhan Representation) The KL-divergence between the probability measures P and Q, can be represented by

T DKL(PkQ)= sup EP [T] − log EQ e T:Ω−→R   where, T is measurable and expectations are finite.

For :

E E T I (X ; Y )= sup PXY [T] − log PX PY e T:Ω−→R  

Ziv Aharoni Capacity via DINE 5/18 MINE (Y. Bengio Keynoe ISIT ’19)

Mutual Information Neural Estimator: n Given {xi , yi }i=1 Approximation

ˆ E E Tθ I (X ; Y ) = sup PXY [Tθ] − log PX PY e θ∈Θ   Estimation

n n 1 1 Tθ(xi ,yei ) Iˆn(X , Y ) = sup Tθ(xi , yi ) − log e θ∈Θ n n i=1 i=1 X X

Ziv Aharoni Capacity via DINE 6/18 Estimator Derivation

DI as entropies difference

I (X n → Y n)= h (Y n) − h (Y nkX n)

n n n i i−1 where h (Y kX )= i=1 h Yi |X , Y Using an reference measure:P  I (X n → Y n)= I (X n−1 → Y n−1)+

DKL(PY nkX n kPY n−1kX n−1 ⊗ PYe |PX n ) − DKL(PY n kPY n−1 ⊗ PYe )

D(n) D(n) Y kX Y | {z } | {z } PYe is some uniform i.i.d reference measure of the dataset.

Ziv Aharoni Capacity via DINE 7/18 Estimator Derivation

DI Rate as a difference of KL-divergences:

n n n−1 n−1 (n) (n) I (X → Y )= I (X → Y ) + DY kX − DY

increment in info. in step n | {z }

Ziv Aharoni Capacity via DINE 8/18 Estimator Derivation

DI Rate as a difference of KL-divergences:

(n) (n) n→∞ DY kX − DY −−−→ I (X →Y) The limit exists for ergodic and stationary processes

Ziv Aharoni Capacity via DINE 8/18 Estimator Derivation

DI Rate as a difference of KL-divergences:

(n) (n) n→∞ DY kX − DY −−−→ I (X →Y) The goal: (n) (n) Estimate DY kX , DY

Ziv Aharoni Capacity via DINE 8/18 Directed Information Neural Estimator

(n) (n) Apply DV formula on DY kX , DY :

D(n) E Y n E Y n−1, Y˜ Y = sup PY n [T( )] − PY n−1 ⊗PY˜ exp T( ) T :Ω→R h n oi b P n−1 where the optimal solution is T∗ = log Yn|Y PY˜

Ziv Aharoni Capacity via DINE 9/18 Directed Information Neural Estimator

Approximate T with a (RNN)

D(n) E Y n E Y n−1, Y˜ Y = sup PY n [TθY ( )] − PY n−1 ⊗PY˜ exp TθY ( ) θY h n oi b

Ziv Aharoni Capacity via DINE 9/18 Directed Information Neural Estimator

Estimate expectations with empirical means

n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X b

Ziv Aharoni Capacity via DINE 9/18 Directed Information Neural Estimator

Estimate expectations with empirical means

n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X b (n) (n) (n) Finally, I (X →Y)= DXY − DY

b b b

Ziv Aharoni Capacity via DINE 9/18 Consistency

Theorem (DINE consistency) ∞ P Let {Xi , Yi }i=1 ∼ be jointly stationary ergodic stochastic processes.

Then, there exist RNNs F1 ∈ RNNdy ,1, F2 ∈ RNNdxy ,1, such that

DINE In(F1, F2) is a strongly consistent estimator of I (X →Y), i.e.,

a.s b lim In(F1, F2) = I (X →Y) n→∞ b

Ziv Aharoni Capacity via DINE 10/18 Consistency

Theorem (DINE consistency) ∞ P Let {Xi , Yi }i=1 ∼ be jointly stationary ergodic stochastic processes.

Then, there exist RNNs F1 ∈ RNNdy ,1, F2 ∈ RNNdxy ,1, such that

DINE In(F1, F2) is a strongly consistent estimator of I (X →Y), i.e.,

a.s b lim In(F1, F2) = I (X →Y) n→∞

Sketch of proof: b Represent the solution T∗ by a dynamic system. Universal approximation of dynamical system with RNNs. Estimation of expectations with empirical means.

Ziv Aharoni Capacity via DINE 10/18 Implementation

n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X b

Ziv Aharoni Capacity via DINE 11/18 Implementation

n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X Adjustb RNN to process both inputs and carry the state generated by true samples

Ziv Aharoni Capacity via DINE 11/18 Implementation

n n i (n) 1 i 1 θ ye y −1 −1 T Y ( i | ) DY = sup TθY (yi |y ) − log e θ n n Y i=1 i=1 ! X X Adjustb RNN to process both inputs and carry the state generated by true samples

S1 S1 ST ST

e e

0 ST −1 F F ... F F

Y1 Y1 YT YT Ziv Aharoni Capacity via DINE 11/18 e e Implementation

(n) Complete system layout for the calculation of DY

b Input Yi Si Dense i−1 Tθ (Yi |Y ) Modified Y DY (θY , Dn) LSTM DV Layer b Yi Si Reference Gen. i−1 Dense TθY (Yi |Y ) e e e

Ziv Aharoni Capacity via DINE 12/18 NDT

Neural Distribution Transformer (NDT)

Xi Yi M Channel Decoder M

b Gradient

Yi−1 ∆

Ziv Aharoni Capacity via DINE 13/18 NDT

Model M as i.i.d Gaussian noise {Ni }i∈Z. The NDT a mapping i w/o feedback: NDT : N 7−→ Xi i i−1 w/ feedback: NDT : N , Y 7−→ Xi

Ziv Aharoni Capacity via DINE 14/18 NDT

Model M as i.i.d Gaussian noise {Ni }i∈Z. The NDT a mapping i w/o feedback: NDT : N 7−→ Xi i i−1 w/ feedback: NDT : N , Y 7−→ Xi NDT is modeled by an RNN

Yi−1 Channel Xi LSTM Dense Dense Ni Constraint

Ziv Aharoni Capacity via DINE 14/18 Capacity Estimation

Iterating between DINE and NDT.

Yi−1 Feedback ∆

NDT Xi Channel (RNN) PY |X i Y i−1 Noise i Ni

Gradient

Yi Output DINE In(X →Y) (RNN) b

Ziv Aharoni Capacity via DINE 15/18 Results

Channel - MA(1) additive Gaussian noise (AGN):

Zi = αUi−1 + Ui

Yi = Xi + Zi

i.i.d. where, Ui ∼ N (0, 1), Xi is the channel input sequence bound E 2 to the power constraint [Xi ] ≤ P, and Yi is the channel output.

Ziv Aharoni Capacity via DINE 16/18 MA(1) AGN Results

Estimation performance

1.8 2

1.6 1.8

1.6 1.4

1.4 1.2

1.2 1 1 0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0 -20 -15 -10 -5 0 5 10 15 -20 -15 -10 -5 0 5 10 15

(a) Feed-forward Capacity (b) Feedback Capacity

Ziv Aharoni Capacity via DINE 17/18 Conclusion and Future Work

Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds

Ziv Aharoni Capacity via DINE 18/18 Conclusion and Future Work

Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds

Future Work: Generalize for more complex scenarios (e.g multi-user) Obtain provable bounds on fundamental limits

Ziv Aharoni Capacity via DINE 18/18 Conclusion and Future Work

Conclusions: Estimation method for both FF and FB capacity. Pros: mild assumptions on the channel Cons: lack of provable bounds

Future Work: Generalize for more complex scenarios (e.g multi-user) Obtain provable bounds on fundamental limits Thank You!

Ziv Aharoni Capacity via DINE 18/18