arXiv:2005.08670v1 [eess.SY] 15 May 2020 δ w rbblt measures probability two ihfis marginal first with eak1. Remark 1)that [1]) rpsto 1. Proposition Proof. inlsaet h te,adcnb nepee stecs f cost the as of interpreted choice be particular can The and [1]. other, plan the to shape tional oolr 1. Corollary ento 1.1. Definition Preliminaries Mathematical 1 n usiuigi 2,w e h result. the get we (2), in substituting and olcino l rbblt measures probability all of collection in ( x h asrti itnei 3 a lob rte as, written be also can (3) in distance Wasserstein The R aaAsmlto nOtmlTasotFaeok(DRAFT) Framework Transport Optimal in Assimilation Data − n µ sgvnby given is enn h ia et ucina seeg,p 6-6,[3]) 160-161, p. e.g., (see as function delta Dirac the Defining c sgvnby, given is ) W 2 ensamti ntemnfl fPDFs. of manifold the on a defines nutvl,Wsesendsac qastelatamount least the equals distance Wasserstein Intuitively, W h asrti itnebtenGusa distribution Gaussian between distance Wasserstein The 2 2 h asrti distance Wasserstein The ( ( N asrti distance Wasserstein p 1 ( µ n eodmarginal second and W 1 , W 2 Σ eopc niern,TxsAMUniversity. A&M Texas Engineering, Aerospace ( p 2 p 1 2 1 ) 1 ( p , , N p , N 2 W ( 2 sdfie as defined is , µ, ( ) 2 µ 2 , ℓ ( 2 Σ) 2 N ,  Σ omwt re smtvtdb 2.Frhr n a rv ( prove can one Further, [2]. by motivated is 2 order with norm δ , ( p p 2 µ, δ ∈P )= )) ( ( upre ntepoutspace product the on supported x x Σ) atmBhattacharya Raktim 2 inf − − ( W p osdrtevectors the Consider ) δ , p 1 k µ 2 µ ,p 2 µ ( h asrti itneo re ,dntdas denoted 2, order of distance Wasserstein The . c c a 9 2020 19, May x 2 )= )) 1 ewe w utvraeGaussians multivariate two between lim = ) ) − − Z ( = = = R µ µ 2 µ k tr tr n 2 c → µ µ )= )) k 1 k µ 2 − − c ( ( x + , µ µ Σ µ 1 µ k → − − tr c c − µ ) k 0 T µ µ 2  x − N ( c c 2 + rMneKnooihotmltransportation optimal Monge-Kantorovich or Σ µ )( )( µ ( k 1 µ, − tr µ µ c ℓ 2 Σ + k 2 − − ( µ Σ) 2 (Σ) R x c + n µ µ + ) 1 , 2 ) x , c c fwr eddt op n distribu- one morph to needed work of tr , ) ) − dp N T T 2 tr (Σ) 2  R ( Σ + ( ∈ µ, x  + 2 (Σ) n 1 p . R x , )adteDrcdlafunction delta Dirac the and Σ) tr aigfiiescn , second finite having ,  Σ , n . 2 (Σ) Let . 1 ) Σ  2 , N 1 2 p . ( Σ µ P 1 1 2 ,  ( Σ p 1 2 1 1  p , and ) . 2 eoethe denote ) W 2 N between , ( .208, p. µ 2 , Σ (1) (3) (2) (4) 2 ) 2 Data Assimilation by Minimizing Wasserstein Distance

Here we consider the problem of updating the prior state estimate with available measurement data to arrive at the posterior state estimate. We assume x ∈ Rn is the true state, a deterministic quantity. The prior estimate of x is denoted by x−, which is a with associate probability density function − + px− (x ). Similarly, the posterior estimate is denoted by x , which is also a random variable with associated + m probability density function px+ (x ). We next assume that measurement y ∈ R is a function of the true state x, but corrupted by an additive noise n. It is modeled as

y := g(x)+ n. (5)

The noise is assumed to be a random variable with associated probability density function pn(n). The errors associated with the prior and posterior estimates are defined as

e− := x− − x, (6) e+ := x+ − x, (7)

− + which are also random variables with probability density functions pe− (e ) and pe+ (e ) respectively. The objective of data assimilation is to determine the mapping (x−,y) 7→ x+ such that some performance metric on e+ is optimized. Abstractly, this can be written as

min d(e+), (8) T (x−,y) where T : (x−,y) 7→ x+, and d(e+) is some cost function to be minimized. + 2 + + In this paper, we define d(e ) := W2 (pe+ (e ),δ(e )), i.e. minimize the Wasserstein distance of the posterior-error’s probability density function from the Dirac at the origin δ(e+). The Dirac is a degenerate probability density function that represents zero error in the posterior estimate. The optimization therefore + + determines the map T (·, ·) that takes pe+ (e ) closest to δ(e ) with respect to the Wasserstein distance, resulting in the best posterior estimate of x in this sense. In the next subsections, we present few specific cases of (8), which are of engineering importance.

2.1 Linear Measurement with Gaussian Uncertainty

Here we consider the simplest case, where the measurement model is linear and uncertainty is Gaussian. We show that we recover the classical Kalman update. + + + In this case, the joint probability density function associated with e be Gaussian, denoted by N (µe , Σe ), where

+ E + µe := e , (9)   and

T T + E + + + + T E + + + + Σe := (e − µe )(e − µe ) = e e − µe µe . (10)   h i We also note that zero error is associated with the degenerate probability density function, the Dirac delta + + + + function δ(e ). The Wasserstein distance between N (µe , Σe ) and the δ(e ) is given by,

T 2 + + + + +T + E + + W2 (N (µe , Σe ),δ(e )) = tr µe µe +Σe = tr e e . (11)    h i

2 We first define the sensor model as

y := Cx + n, (12)

where n is a Gaussian random variable defined by n ∼ N (0, R). Consequently, y is also a Gaussian random variable. We next define the random variable x+ to be a linear combination of random variables x− and y, i.e.

x+ := Gx− + Hy, (13)

2 + + + and determine G and H such that W2 (N (µe , Σe ),δ(e )) is minimized, i.e.

2 + + + min W2 (N (µe , Σe ),δ(e )), G,H where all the posterior quantities are functions of G and H. We next express e+ as

e+ := x+ − x, = Gx− + Hy − x, = Gx− + H(Cx + n) − x, = Ge− + (G + HC − I)x + Hn.

Noting that E [e−] = 0 (for unbiased prior estimate), E e−nT = 0 (because e− and n are uncorrelated), and R := E nnT , we can express E e+e+T as     h i

T T E e+e+ = E Ge− + (G + HC − I)x + Hn Ge− + (G + HC − I)x + Hn , h i     T = GE e−e− GT + (G + HC − I)xxT (G + HC − I)T + HRHT , h i − T T T T = GΣe G + (G + HC − I)xx (G + HC − I) + HRH .

Choosing G := I − HC, reduces E e+e+T to h i E + +T − T T e e = (I − HC)Σe (I − HC) + HRH , h i − − T T − − T T =Σe + H(CΣe C + R)H − HCΣe − Σe C H which is further minimized by solving for H satisfying

∂ tr E e+e+T  h i =0, ∂H or ∗ − T − T H (CΣe C + R) − Σe C =0, resulting in the optimal solution

∗ − T − T −1 H := Σe C (CΣe C + R) , (14)

which is also the optimal Kalman gain. Further, choosing G := I − HC, also ensures E [e+] = 0, i.e. posterior error is unbiased. Therefore, we see that for linear measurement and Gaussian uncertainty models, minimization of the Wasserstein distance results in the classical Kalman update.

3 References

[1] C. Villani. Topics in optimal transportation, volume 58. Amer Mathematical Society, 2003. [2] A. Halder and R. Bhattacharya. Further results on probabilistic model validation in wasserstein metric. In 51st IEEE Conference on Decision and Control, Maui, 2012. [3] Sadri Hassani. Mathematical physics, a modern introduction to its foundations, 1999.

4