<<

arXiv:1611.02989v1 [cs.IT] 9 Nov 2016 atUiest,Edinburgh. University, Watt ainlUiest fSingapore. of University National yaia ytm n hc suulyhnldi nad-hoc an in handled usually is which and multi systems probabilistic dynamical for assimilation data underlies which luiiiyhv lobe tde [ studied been also have plausibility and odrv rnildslto othe to solution principled [ a derive populations to stochastic of representation general approache [ Dempster-Shafer in th and discussed furthers Bayesian which the plausibility, between a lation to reduces uncertainty of u-diiecmoet.I ssonta hstp fouter of suc operations. type concepts assimilation intuitive this data the general of that on and introduction highly shown pullback based the and is is enables additive that It combining measure hence components. outer supremum, sub-additive of a type of specific represemeasure a This properties. on its of relies study a the with measure along probabilistic presented of framework standard the within ntnei a enue n[ in used systems, been physical has of it types instance different describing variables n ifrn prahshv enitoue ae on based introduced been sets have uncert approaches associated different the and a representing measures for probability suitable situatio the acute, always in is not which However, knowledge on of represented. lack space be mod- the state where can usually a interest is on of and measure system knowledge probability precise a by of elled lack from as randomnes well from fo originates characteristics Uncertainty meaningful system. be considered obtain to to needs order and in system physical quantified given any about knowledge of aeindt siiainoeainaeas tde,sim [ studied, also to are operation assimilation data Bayesian nomto.I ssonta hsoeainyed Dempste yields [ operation combination this of data that rule shown sources Bayesian is independent a It combines information. deriving that for operation allows assimilation Indee approach uncertainty. handling proposed when beneficial the framewor be considered to the prove rather represe will is introduced it previously uncertainty, any of considered than tation approach general the more that is probabilist claim here of not framework do We standard theory. measure the within completely is ..Caki ihteSho fEgneigadPyia Sci Physical and Engineering of School the with is Clark D.E. .Husna swt h eateto ttsisadAppli and Statistics of Department the with is Houssineau J. ne Terms Index Abstract netit a ecniee sbigihrn oaltypes all to inherent being as considered be can Uncertainty nti ril,w td ersnaino netit th uncertainty of representation a study we article, this In 5 plausibilities .Ti prto a eue ncnucinwt random with conjunction in used be can operation This ]. [ aeindt siiainbsdo aiyof family a on based assimilation data Bayesian 33 ], Aflxberpeetto fucranyta remains that uncertainty of representation flexible —A pe n oe probabilities lower and upper 5 Otrmaue aaassimilation Data measure; —Outer .Temteaia rpriso h proposed the of properties mathematical The ]. [ 28 11 .Vrosetnin ftecnetof concept the of extensions Various ]. I , NTRODUCTION 29 hntepooe representation proposed the when ] 19 hp.2 oehrwt a with together 2] Chapt. , 32 aaassociation data , eei osieuadDne .Clark E. Daniel and Houssineau Jeremie 18 [ ]. 10 or ] ue measures outer 20 possibilities dProbability, ed ne,Heriot- ences, norder in ] problem -object ntation that k ainty, r is ory ilarly fuzzy re- e the r as h as s for r’s ns of n- re d, at ic s a.Ti nbe e ehd o ut-agtfitrn ob to [ filtering in multi-target as for derived, methods new enabled This way. g ento famaigu oinof notion meaningful a of definition ersnaino netit nSection recalled in are uncertainty measures of representation of convolution Section of in and measure outer Notations Section in theorems main the of Section in use ,f f, n h mappings the and fra nnngtv)nmesi denoted is numbers (non-negative) real of ersn netit r ie nSection t in measures given are probability uncertainty using represent when encountered t limitations require bi-measurable. not made does be does to it which that i.e. on measurable defined, – space considered is measurable measure function the associated pushforward of the modification of a concept require not usual the of ulakmaue r ie nSection in given are measures pullback e faset a of h Wirtasp.Frayset any For p”. “Weierstrass the if X X admvral ilbe will variable random pushforward the be to found is law its seeyhr qa ooei denoted is one to equal everywhere is h Borel the eoetesto esrsadtesto rbblt measure probability of set the and on measures of set the denote fucrany ftorno variables random two If uncertainty. of (Ω on · ′ h rpsdrpeetto fucranyas nbe th enables also uncertainty of representation proposed The h eso o-eaieitgr sdenoted is integers non-negative of sets The h tutr fteppri sflos xmlsof Examples follows. as is paper the of structure The ecnie w opeepoaiiyspaces probability complete two consider We If X ′ · ) g Ω E , ∗ ′ X ′ Σ hnteascae on admvral is variable random joint associated the then , ( ( : and and P ⋊ P epciey If respectively. E ′ ′ E , If . , P E → f ′ g σ f ) X ) × X II ⋊ agbaon -algebra ′ htaeasmdt ersn needn sources independent represent to assumed are that f ⋉ F ′ A ) samaual pc then space measurable a is olwdb h nrdcino h proposed the of introduction the by followed , . ′ f g g r elrno aibe hnterpoutis product their then variables random real are ⋉ f n any and IV 9 ′ ′ sdfie on defined is sdenoted is · ′ n [ and ] ( : ( : f g rlmnr eut bu uhowr and pushforward about results Preliminary . on ′ ′ g ,x x, ,x x, : : ⋊ x x E X ′ ′ g 7→ 7→ ,g g, 19 ) ) as X ′ saPls pc,then space, Polish a is 7→ 7→ and X hp.4]. Chapt. , . g ( ℘ ′ f ( : g ( x ( ( × f A x E ( Ω ) f x g ( ) ) X ′ x f , ) ′ hr “ where , → × ( g ) hntecrepnigjoint corresponding the then E VI ′ x f , ′ ′ f ( ) ( h ucinon function the , R n t a ilbe will law its and x . x ′ ′ ∈ pullback edfietemappings the define we , ′ ( )) on ) x ( R V X ∈ ′ 1 X ∈ )) , E o n set any For . III eoetestatement the before ℘ R ⋉ M R F and ∈ I × srfre oas to referred is ” , h ocpsof concepts The . X × ( ( n xmlsof examples and F R E E steinverse the as – ′ ) F X + ) × N ∗ (Ω as B , .Tepower The ). P ′ n h set the and , and F X ( moreover, , r defined are , X . Σ ⋉ E ) , M X F P denote which ( ) any , ′ X 1 ( and and E he × o e e ) s 1 2

I. UNCERTAINTY AND PROBABILITY MEASURES sensors are finite-resolution, e.g. if the sensor is a camera Random experiments are mathematically defined by proba- then An would be one of the pixels in the received image at bility spaces, defining possible events and attributing them a time step n. This type of data could be treated as such using probability. However, the use of a probability space for rep- non-linear filtering methods, as in [21] for finite-resolution resenting uncertainty that is not induced by pure randomness, radars; however the associated uncertainty is often modelled i.e. for describing random or non-random experiments about by a Gaussian distribution so that the Kalman filter can be used instead. Yet, replacing information of the form yn An which only partial knowledge is available, can become less ∈ intuitive as well as technically challenging in some cases. In by a probability distribution can have a great impact on the the following examples, we study when uncertainty can be value of the denominator in Bayes’ theorem. Although this encoded into a probability measure as well as cases where the is unimportant in Kalman filtering since the numerator is existence of a procedure for doing so becomes less clear. equally affected, the value of the denominator does matter in different contexts, such as in multi-target tracking [3] P Example 1. Let X be a random variable from (Ω, Σ, ) to where it is used as a weighting coefficient for the relative (R, (R)) and assume that it is only known that X has its importance of a given track when compared to others [26]. imageB in the Borel A in (R) with probability α B ∈ We will see in Section VI that the Gaussian distribution can [0, 1]. This information can be encoded via the sub-σ-algebra be replaced by another object that better represents data of . c R R R = , A, A , of by defining the law p of X on ( , ) the form y A while preserving the advantages of Kalman A {∅ R} R A n n rather than on ( , ( )) and by setting p(A)= α. Similarly, filtering. ∈ if nothing is knownB about X, then this can be encoded via the trivial sub-σ-algebra , R . Another important aspect of probability theory is the gap {∅ } between probability measures on countable and uncountable The concept of sub-σ-algebra is useful for describing dif- sets, as explained in the following example. ferent levels of knowledge, such as when used for conditional expectations [22, Chapt. 27]. However, we will see in the Example 4. Let X be a countable set equipped with its next example that their use can become challenging in some discrete σ-algebra and assume that some physical system situations. can be uniquely characterised by its state in X. Let X be a random variable on (Ω, Σ, P) and X′ be another random Example 2. Let p be a probability measure on (E, ) and ′ ′ P′ E variable on another probability space (Ω , Σ , ) and assume let p′ be another probability measure on (E, ′), with ′ ′ E E ⊂ that both X and X represent some uncertainty about the , then for any scalar a (0, 1), the probability measure same physical system. The fact that the two random variables E ′ ∈ qa = (1 a)p + ap can only be defined on the coarsest σ- − represent the same system can be formalised by the event algebra, that is ′. When considering the extreme case where . X X X E ∆ = (x, x) s.t. x , which is the diagonal of . The ′ is the trivial σ-algebra , E , it results that nothing is { ∈ } . P ′ .× ′ P′ E {∅ } information contained in the laws p = X∗ and p = (X )∗ known about qa, however small is a. One way to bypass this can then be combined into a conditional probability measure drawback is to single out a finite reference measure λ in M(E) pˆ( ∆) M1(X), characterised by and to define an extended version of p′ denoted p¯′ as a uniform · | ∈ . p ⋊ p′(B B ∆) probability measure as follows ( B X)p ˆ(B ∆) = × ∩ , ∀ ⊆ | p ⋊ p′(∆) . λ(A) ( A )p ¯′(A) = . ∀ ∈E λ(E) where p and p′ are assumed to be compatible, i.e. that p ⋊ p′(∆) =0. This result is justified by the transformation of the In this way, the probability of a given event in is equal to the conditional6 probability measure p ⋊ p′( ∆) with support on probability of any other event of the same “size”E with respect ∆ into a probability measure on X (isomorphism· | mod0). Let to the measure λ. In other words, no area of the space E is w, w′ : X [0, 1] be the probability mass functions induced preferred over any other. Besides the facts that a reference by p and p→′ and characterised by measure is required and that the size of the space is limited ′ ′ (there is no uniform measure over the whole real line with p = w(x)δx, and p = w (x)δx, respect to the ), this way of modelling the x∈X x∈X information is not completely equivalent to the absence of X X then the probability measure can be more naturally information. There exist ways of modelling uncertainty on a pˆ( ∆) characterised via its probability mass· | function on X, which probability measure itself, such as with Dirichlet processes wˆ is found to be [16] and to some extent with Wishart distributions [31]; yet, these solutions do not directly help with the non-informative w(x)w′(x) wˆ : x ′ , case since they require additional parameters to be set up. 7→ y∈X w(y)w (y) Example 3. Consider a discrete-time joint process However, if X is uncountableP and equipped with its Borel ′ (Xn, Yn)n assumed to be a hidden Markov model σ-algebra (X) and if the probability measures p and p are ≥0 B [2] and consider the case where realisations yn of the diffuse on X, then they will not be compatible by construction. observation process are not received directly but are instead Indeed, even though the diagonal ∆ can still be defined and known to be in some subset An of the observation space. is measurable under extremely weak conditions on X, it holds This is in fact what happens in practice since most real that p⋊p′(∆) = 0 (more specifically, the diagonal ∆ of X X × 3 is measurable in a separable [4, Lemma 6.4.2], A. X and remains measurable if is generalised to a Hausdorff The concept of outer measure is fundamental in measure topological space with a countable base; an interesting result theory and is defined as follows. from [24], detailed in [27, Chapt. 21], is that the diagonal is never measurable when the cardinality of X is strictly Definition 1. An outer measure on E is a function µ : ℘(E) ∆ → larger than the cardinality of the continuum.) The fact that [0, ] verifying the following conditions: ∞ p ⋊ p′(∆) = 0 is caused by the strong assumption that the a) (Null ) µ( )=0 probabilities p(B) and p′(B) are known for all measurable b) (Monotonicity) if A∅ B E then µ(A) µ(B) ′ ⊆ ⊆ ≤ in (X) and, because p and p are diffuse, tend to c) (Countable sub-additivity) for every sequence (An)n∈N zero when BB reduces to a singleton. The introduction of an of subsets of E appropriately coarse sub-σ-algebra on X, such as the one generated by a given countable partition, would allow for µ An µ(An). ≤ n N n N recovering some of the results that hold for countable spaces.  [∈  X∈ However, such an approach will not be natural or intuitive in Outer measures allow for constructing both σ-algebras and most of the situations. Alternatively, if p and p′ are absolutely measures on them via Carath´eodory’s method [17, Sect. 113]: continuous with respect to a reference measure λ M(X), the outer measure µ induces a σ-algebra of subsets of X ∈ then the probability density fˆ of pˆ( ∆) can be expressed as composed of sets A verifying X a function of the probability densities· | f and f ′ of p and p′ ( C X) µ(C)= µ(C A)+ µ(C Ac), respectively as ∀ ⊆ ∩ ∩ which are referred to as µ-measurable sets. The measure space f(x)f ′(x) ˆ (X, ,µ) is a complete measure space. A classical example f : x ′ . 7→ f(y)f (y)λ(dy) of measuresX constructed in this way is the Lebesgue measure on the σ-algebra of Lebesgue measurable subsets of R. If F However, to resort to thisR approach amounts to ignoring the is a set, µ and µ′ are outer measures on E and F respectively incompatibility between the two random variables, in addition and f : E F is a function, then to the loss of meaningful interpretation of the denominator → which becomes dependent on the choice of the reference C F µ(f −1[C]) and A E µ′(f[A]) measure. In a filtering context, one of the laws, say p, would be ⊆ 7→ ⊆ 7→ the prior whereas p′ would represent the observation. Although are outer measures. One way of constructing outer measures X the prior might be appropriately represented by a probability from measures is as follows: let ( , ,m) be a measure space, X∗ X measure, the uncertainty in the observation is often mostly due then m induces an outer measure m on defined as to lack of knowledge for which a probability measure might m∗(C) = inf µ(A) s.t. A , A C , (3) be ill-suited. { ∈ X ⊇ } for any C X. Example 5. Attributing a probability measure to data orig- ⊆ inated from natural language is often inappropriate. For in- stance, if an observer locates an object in the real world around B. Convolution of measures position x R3, then representing this data with a distribution An operation that will prove to be of importance is the on R3 centred∈ on x, such as a Gaussian distribution with a operation of convolution of measures. We consider that the given variance, highly overstates the given information. The set X is a Polish space equipped with its Borel σ-algebra observer is not giving the exact probability for the object to (X). be in any measurable subset of R3; he only gives the fact that B Definition 2. Let m and m′ be two finite measures on a the probability should be low for subsets that are far from x topological semigroup (X, ), then the convolution m m′ of in a given sense and high when close to or when containing x. m and m′ is defined as · ∗ Overall, there is a need for the introduction of additional m m′(A)= m ⋊ m′( (x, y) s.t. x y A ), (4) concepts that could account for these non-informative types of ∗ { · ∈ } knowledge, and which would in turn enable data assimilation for any A (X). ∈B to be performed in more general spaces. The objective in ′ the next section is to find an alternative way of representing The set function m m defined in (4) is a measure on (X, (X)). The following∗ properties of the operation of uncertainty while staying in the standard formalism of measure B and probability theory. convolution are corollaries of [17, Sect. 444A-D]. Corollary 1. If (X, ) is a topological semigroup, then it holds that · II. FUNDAMENTAL CONCEPTS m (m′ m′′) = (m m′) m′′ ∗ ∗ ∗ ∗ We first recall two concepts of measure theory that will be for all finite measures m, m′ and m′′ on X. If (X, ) is · useful in this article, namely the concepts of outer measure commutative, then it holds that m m′ = m′ m for all ∗ ∗ and of convolution of measures. finite measures m and m′ on X. 4

III. MEASURECONSTRAINT The information provided by this lower bound is limited since X Henceforth, we consider a space X which is assumed µM is sub-additive and might reach m( ) on any given set ′ ′ L0 X R B X. In this case, (8) only implies that m(B ) 0 which to be a Polish space. We denote ( , ) the set of all ⊂ ≥ measurable functions from (X, (X)) to (R, (R)) and we is not informative. . use the shorthand notation L0(X)B= L0(X, R+B) for the subset We will be particularly interested in the situation where of L0(X, R) made of non-negative functions. The general m is a probability measure, in which case M satisfies definition of measure constraint is given below, followed by M(χ(X, )) = m(X)=1 and is said to be a probabilistic · the introduction of more specific properties and operations. constraint. The advantage with condition (7) is that if m is a finite measure that is not a probability measure, then m and M can be renormalised to be respectively a probability measure A. Definition of measure constraint and a probabilistic constraint by dividing the inequality (6) by Using the notion of outer measure defined in the previous the total mass m(X). When uncertainty is induced by a lack section as well as the technical results about the set L0(X, R) of knowledge on the actual law of a given random experiment detailed in [17, Sect. 245], we introduce the concept of then this law should be dominated by the considered prob- measure constraint as follows. abilistic constraint. However, in cases of uncertain but non- Definition 3. Let M be a measure on L0(X, R), if it holds random experiments, there is no such thing as a “true” law and the probabilistic constraint becomes the primary representation that the function µM defined on the ℘(X) of X as of uncertainty. µM : A M(χ(A, )) (5) A useful case is found when M is supported by the set 7→ · L∞(X) of non-negative bounded measurable functions and, is an outer measure for a given collection of measurable assuming that sup : L∞(X) R is measurable, when χ is functions χ(A, ) X on L0(X, R), then M is said to be a A⊆ such that → measure constraint{ · } on X with characteristic function χ. If m χ : (A, f) 1A f , (9) is a finite measure on X verifying 7→ k · k 1 m(B) M(χ(B, )) (6) where A denotes the indicator of A and where is the ≤ · uniform norm on L∞(X). All outer measures do notk·k take the for any B (X) and form assumed in (5) with χ as in (9), but this case offers ∈B suitably varied configurations by combining a linear part and m(X)= M(χ(X, )), (7) · a very sub-additive part, that is the measure by M and the then M is said to be dominating m. uniform norm respectively. This case will be understood as the default situation in the sense that (9) will be considered The motivation behind the introduction of measure con- when the characteristic function of a measure constraint is not straints is to partially describe a measure by limiting the specified. The subset of probabilistic constraints with such a mass in some areas while possibly leaving it unconstrained supremum-based characteristic function is denoted C1(X) and elsewhere. Measure constraints that would bound a measure we consider the weighted semi-norm defined on this set from below could also be defined using the associated concept as k·k of . In general, a measure on X could be : M f M(df). dominated by a measure on the set L0(Y, R) of measurable k·k 7→ k k functions on a different set Y, as long as the associated Z This semi-norm is not a norm since M =0 does not imply characteristic function χ is defined accordingly. that M is the null measure in general.k k Remark 1. In Definition 3, the condition that µM is an outer measure is used to reduce the set of measures on L0(X, R) that would verify (6) to the ones that have natural properties. B. Properties of measure constraints As explained by [17, Sect. 113B]: “The idea of the outer Henceforth, ∞(X) will denote the Borel σ-algebra in- measure of a set A is that it should be some kind of upper duced on L∞(XL) by the topology of convergence in measure bound for the possible measure of A”. In fact, the use of outer on L0(X) presented in [17, Sect. 245]. The characterisation of measures as a way of dealing with uncertainty has first been ∞(X) is highly technical and out of the scope of this article proposed by [15]. In particular, the condition of monotonocity soL that the measurability of the function f sup f and the imposes that if a given mass is allowed in a set A then at mapping f f ξ, for some measurable mapping7→ ξ in X, is least the same mass should be allowed in a larger set B A. assumed rather7→ than◦ demonstrated. ⊇ Similarly, the condition of sub-additivity allows for reaching Given the definition of C1(X), it appears that many dif- the maximum mass m(X) in several disjoint sets while still ferent measures in this set will dominate exactly the same verifying M(χ(X, )) = m(X). probability measures in M X since a rescaling of the func- · 1( ) Remark 2. A direct consequence of (6) is that a lower bound tions in the support can be compensated for by a rescaling of for m(B) is also available for any B (X) and is found to the measure itself. However such a multiplicity can be avoided be ∈B by identifying a canonical form and a rescaling procedure as in the following proposition. m(B)= m(X) m(Bc) m(X) M(χ(Bc, )). (8) − ≥ − · 5

Proposition 1. For any probabilistic constraint M C1(X), Proposition 2 is only an implication since a joint probability † ∈ there exists a probabilistic constraint M C1(X) that is measure dominated by a probabilistic constraint of the same equivalent to M and is also a probability measure,∈ which can form might not correspond to independent random variables. be determined via the following rescaling procedure: This means that another concept is needed to describe this special class of joint probabilistic constraints. ∞ † . † ( F (X)) M (F ) = 1F (f ) f M(df), (10) ∀ ∈ L k k Definition 4. Let X L0(Ω, X) and X′ L0(Ω′, X) be two Z ∈ ′ ∈ † L∞ X random variables then X and X are said to be independently where the function f ( ) is defined as ′ ∈ constrained by Mˆ C1(X X) if there exist M,M ∈ × ∈ f C1(X) such that Mˆ can be expressed as a convolution on the . if f =0 ∞ f † = f k k 6 semigroup (L (X), ⋊) as 1k k  otherwise. Mˆ = M M ′, (11) Proof. By construction, it holds that the support of M † is ∗ included in the subset of L∞(X) made of functions with uni- recalling that, in this case, form norm equal to one, so that M †(L∞(X)) = M † = 1. k k The measure M † is then both a probability measure and a ′ . ′ ′ ′ M M (F ) = 1F (f ⋊ f )M(df)M (df ) probabilistic constraint. We now have to show that M and ∗ † Z M dominate the same probability measures: Let p M1(X) ∈ for any measurable subset F of L∞(X X). be a probability measure dominated by M, then × Definition 4 implies that any function fˆ in the support of † ′ m(B) 1B f M(df)= 1B f M (df), ˆ ′ L∞ X ≤ k · k k · k M is such that there exist f and f in ( ) for which Z Z fˆ = f ⋊f ′. Functions of this type can be said to be separable. ∫ for all B (X), where M ′(df) = f M(df) so that, by a The concept of independently constrained random variables change of∈B variable, k k introduced in Definition 4 differs in general from the standard concept of independence since, even if X and X′ were † ′ ′ † ′ m(B) 1B f M (df)= 1B f M (df ), defined on the same probability space, then a) independently ≤ k · k k · k Z Z constrained random variables might actually be correlated, which terminates the proof. and b) independent random variables that are not known to be independent might be represented by a probabilistic † Remark 3. The definition of f when f = 0 is irrelevant constraint that does not exclude correlation. However, the k k † 1 because of the form of (10). Yet, considering f = when concepts coincide when the involved probabilistic constraints f =0 implies that f † is always in the subset k k are equivalent to probability measures. In cases where the . probability spaces on which these random variables are defined L(X) = f L∞(X) s.t. f =1 { ∈ k k } are different, independence or correlations cannot even be of measurable functions with uniform norm equal to one. defined. C∗ X . M L X Probabilistic constraints in the set 1( ) = 1( ( )) will In order to illustrate the use of measure constraints for mod- be referred to as canonical probabilistic constraints. elling uncertainty, several examples of probabilistic constraints The unary operation of Proposition 1 does not affect the are given in the next section. norm, i.e. the equality M † = M holds by construction for k k k k any M in C1(X). It is also idempotent and it distributes over the product ⋊, that is (M †)† = M † and (M ⋊ M ′)† = M † ⋊ IV. EXAMPLESOFPROBABILISTICCONSTRAINTS M ′† hold for any M,M ′ C (X). Moreover, any canonical ∈ 1 Throughout this section, X L0(Ω, X) will denote a probabilistic constraint P C∗(X) verifies P † = P . ∈ ∈ 1 random variable and P C∗(X) will be assumed to dominate C X . ∈ 1 Remark 4. If M 1( ) is a probabilistic constraint and its law p = X∗P. if M † is supported∈ by the set I(X) of measurable indicator functions, then the conditions (6) and (7) can be replaced by “m agrees with the measure induced by µM ”, i.e. m(B) = A. Uninformative case µM (B) for any B in the σ-algebra of µM -measurable subsets. If P = δ1 then the only information provided by P on X ′ L0 X Proposition 2. If X,X (Ω, ) are two independent is that random variables on X with∈ respective laws p and p′ and if M and M ′ are probabilistic constraints for p and p′ respectively, p(B) 1B f P (df)=1 ≤ k · k then the joint law p ⋊ p′ M (X) verifies Z ∈ 1 for any B (X), which is non-informative since p is already ′ ˆ 1 ′ ′ ′ ∈B p ⋊ p (B) ˆ (f ⋊ f ) M(df)M (df ) known to be a probability measure. In other words, nothing ≤ k B · k Z is known about the random variable X. For instance, the σ- for any Bˆ (X) (X). algebra of µP -measurable sets is the trivial σ-algebra , X . ∈B ⊗B {∅ } 6

B. Indicator function E. Constraint based on a partition If there exists a measurable countable partition of X and If there exists A (X) such that P = δ1A , then it holds π that ∈B if the probability measure P is of the form 1 if A B = P = q(B)δ1B , p(B) 1A∩B = ∩ 6 ∅ ≤k k 0 otherwise, B π ( X∈ where q is a probability measure on the sub-σ-algebra gener- for any B (X). As a probability measure, p is always less ated by π, then or equal to∈B1, so that the only informative part in the previous inequality is: ( B π) p(B)= q(B), ∀ ∈ ( B (X)) (A B = ) (p(B)=0), i.e. the information available about p is the one embedded into ∀ ∈B ∩ ∅ ⇒ q. that is, all the probability mass of B lies within A. This means Proof. From the definition of probabilistic constraint, we that the random variable X is only known to be in A almost deduce that the inequality p(B) q(B) holds for any B π. surely. A uniform distribution over A, assuming it can be Assume there exists B π such≤ that p(B) < q(B).∈ This defined, would not model the same type of knowledge as the assumption implies that ∈ corresponding interpretation would be “realisations of X are equally likely everywhere inside A”, whereas the interpretation p(X)= p(B) < q(B)= q(X)=1, associated with the probabilistic constraint δ1A is “whatever B π B π X∈ X∈ the law of X, it is only known that realisations will be inside which is a contradiction since p is a probability measure. A”. Following the construction of an outer measure from a Remark 5. The σ-algebra of µP -measurable subsets is found ∗ . c measure described in (3), we define the outer measure q to be the completion of the σ-algebra B = ,B,B , X X {∅ } induced by q on X as follows with respect to any measure m on (X, B) verifying m(B) > c X 0 as well as m(B )=0. ( C X) q∗(C) = inf q(B) s.t. B σ(π), B C . ∀ ⊆ { ∈ ⊇ } Since π is a partition of X, it is easy to check that the outer C. Upper bound measure µP verifying

If there exists f L(X) such that P = δf then for any µ (C)= q(B) ∈ P B (X), B∈π : B∩C6=∅ ∈B X for all subsets C of X is equal to the outer measure q∗. ′ ′ ′ ′ p(B) 1B f P (df )= 1B f δf (df )= 1B f . ≤ k · k k · k k · k This example shows that probabilistic constraints are versatile Z Z enough to model the same level of information as with a sub- In this case, δf can be seen as the simplest non-trivial form of σ-algebra generated by a partition. probabilistic constraint. The two previous examples are special cases with f = 1 and f = 1 . A F. Probability measure Remark 6. Probabilistic constraints of this form are equivalent Given that the singletons of X are measurable, if the support to possibility distributions [25, 12, 13, 14] and are also related of P is in the subset I (X) I(X) of indicator functions on to the notion of membership function of a fuzzy set [33]. s singletons, then there exists a⊂ measure q in M (X) such that, However, the approach considered in this work does not rely 1 for any B (X), it holds that on the notion of fuzzy set and the form considered here is ∈B only used as a simple example of probabilistic constraint. p(B) 1B f P (df)= 1B(x)q(dx)= q(B), ≤ I X k · k Z s( ) Z D. Combination of upper bounds from which we conclude that p = q. The proof of the equality is very similar to the proof for constraints based on a partition. If the probabilistic constraint P is of the form P = N a δ i then i=1 i f G. Plausibility P N N If P has the form ′ ′ p(B) 1B f P (df )= ai 1B fi = ai sup fi ≤ k · k k · k B P = g(A)δ1A Z i=1 i=1 X X A∈A for any B (X). This combination of upper bounds is X for some set of measurable subsets and some function g : not equivalent∈ B to a single bound in general and allows for R+, thenA modelling more accurate information. For instance, if P = A → ′ 0.5δ1B + 0.5δ1 with B B = , then one half of the B′ p(B) 1B f P (df)= g(A) ∩ ∅ ′ probability mass of p is in B and the other half is in B . ≤ I(X) k · k Z A∈AX: A∩B6=∅ 7

for any B (X), and the probabilistic constraint P reduces X1 X2 for any B (X2). The fact that Tξ is measurable to a plausibility∈B as defined in the context of Dempster-Shafer follows× from the assumption∈B that sup : L∞(X) R is theory [10, 28]. A non-technical overview of the concepts measurable. It then holds that → of this theory is given by [30, Sect. 4.4]. The two previous

1 ′ ′ ′ 1 1 examples can be seen as special cases of plausibility where g B f M (df )= ξ− [B] f M(df), I X k · k k · k is a measure on = π or = s( ). Z Z A A The multiple cases given in this section show that proba- ′ where M is defined as the pushforward (Tξ)∗M. bilistic constraints can model information with various degrees of precision. In this work, we will be mainly interested in the Remark 7. Proposition 3 can be straightforwardly generalised most informative and the most uninformative cases. to characteristic functions of the form χ : (B,f) (χ′ ′ 7→ k ◦ 1B) f for some suitable function χ : R R. V. OPERATIONS ON MEASURE CONSTRAINTS · k → In the following example, the operation of marginalisation It is essential to be able to adapt a probabilistic constraint is translated to probabilistic constraints. when the underlying probability measure is transformed, such as when considering a pushforward for the considered measure Example 6. Let X be the Cartesian product of X1 and X2 and with respect to a given measurable function. In the following let p be a given probability measure on X. The objective is to C X sections, X1 and X2 are assumed to be Polish spaces equipped project a given probabilistic constraint P 1( ) for p into ′ C X ∈ with their Borel σ-algebra. a probabilistic constraint P in 1( 2) for the corresponding ′ ′ marginal p M1(X2) characterised by p (B)= p(X1 B). Marginalisation∈ can be performed on P ′ by using Proposi-× A. Pushforward tion 3 with the canonical projection map ξ : X X2, from The operation of pushforwarding measures from one mea- which we find that → surable space to another is central to measure theory and should therefore be translated in terms of measure constraints. ′ ′ ( B (X2)) p(B) 1B f (Tξ)∗P (df ), In other words, if dominates the measure and is ∀ ∈B ≤ k · k M m ξ Z a measurable function, then what is the measure constraint ∞ ∞ where the mapping Tξ : L (X) L (X2) is found to be dominating ξ∗m? → The objective is to find sufficient conditions for the exis- ′ Tξ(f): y sup f = f( ,y) . tence of a measure constraint M verifying 7→ ξ−1[{y}] k · k M ′(χ(B, )) = M(χ(ξ−1[B], )) (12) · · Another case, studied in the next example, is the pushfor- for all B (X2), where χ is the characteristic function ward of a probabilistic constraint defined as being equivalent based on the∈ Bsupremum norm defined in (9). It first appears to a probability measure on a sub-σ-algebra. from the properties of outer measures described in Section II-A Example 7. Considering again the case detailed in Section IV that A M(χ(ξ−1[A], )) is an outer measure, so that M ′ X 7→ · where π is a measurable countable partition of 1 and where is possibly dominating ξ∗m. The existence of such a measure the probabilistic constraint P is of the form constraint is proved in the next proposition. ′ 1 Proposition 3. Let M be a measure constraint on X1 and let P = p (B)δ B , X X B∈π ξ be a measurable mapping from 1 to 2, then the measure X ′ X constraint M on 2 defined as the pushforward of M by the ′ ∞ ∞ where p is a probability measure on the sub-σ-algebra gen- mapping Tξ from L (X1) to L (X2) characterised by erated by π, then for any measurable mapping ξ : X1 X2 ∞ → ( f L (X1)) Tξ(f): y sup f such that σ(ξ) σ(π), it holds that ∀ ∈ 7→ ξ−1[{y}] ⊆ 1 1 1 verifies (12). ( B π) Tξ( B): y sup B = ξ(B)(y), ∀ ∈ 7→ ξ−1[{y}]

Proof. The first step is to rewrite the uniform norm over X1 in a suitable way as that is, since ξ[B] is a singleton, 1 1 I X 1ξ−1[B] f = sup sup 1ξ−1[B](x)f(x) Tξ( B)= ξ(B) s( 2), 1 k · k y∈X2 x∈ξ− [{y}] ∈    so that (Tξ)∗P has its support in Is(X2) and is therefore = sup 1B(y) sup f 1 X y∈X2 ξ− [{y}] equivalent to a probability measure on 2.   for any f L∞(X ) and any B (X ), where the second The concepts of pushforward and marginalisation for proba- ∈ 1 ∈B 2 line is explained by the fact that the function 1ξ−1[B] is con- bilistic constraints are important and will be useful in practice. −1 stant over ξ [ y ], and is equal to 1B(y) everywhere on this We now consider the converse operation, referred to as pull- { } ∞ subset. By [6, Corollary 2.13], it holds that Tξ(f) L (X2) back and which will also contribute to the formulation of the since ξ−1[ y ] y : y B is a measurable∈ subset of main results in this work. { { } ×{ } ∈ } 8

B. Pullback where a probabilistic constraint on a sub-σ-algebra generated The objective in this section is to understand in which situ- by a partition was shown to induce a probability measure on X ations it is possible to reverse the operation of pushforwarding 2. measures and measure constraints. We first define what is Now equipped with the concepts of pushforward and pull- understood as a pullback. back for measure constraints and measures, we can address the question of data assimilation between different sources of Definition 5. Let m be a measure on the space X2 and let information. ξ be a measurable mapping from X1 to X2, then a pullback ∗ measure, denoted ξ m, is a measure in M1(X1) satisfying VI. DATA ASSIMILATION WITH PROBABILISTIC ∗ ξ∗(ξ m)= m. CONSTRAINTS Note that there are possibly many pullback measures for a The objective in this section is to introduce a data- given measure so that the concept of measure constraint will be assimilation operation for independent sources of information useful in order to represent this multiplicity, as in the following about the same physical system. proposition. As with the pushforward, we want to obtain ′ the existence of a measure constraint M with characteristic A. State-space fusion function χ for the pullback of a measure m M (X ) such 1 2 We first address the case where the state satisfies some that ∈ desirable but stringent conditions before showing that a much M ′(χ(ξ−1[B], )) = M(χ(B, )) (14) · · larger class of state spaces can be considered under a weak holds for any B (X2), where M is a measure constraint assumption. We assume that there exists a representative set dominating m. It appears∈ B from the properties of outer measures X in which the state of the system of interest can be uniquely described in Section II-A that A M(χ(ξ[A], )) is an characterised and we denote ξ the projection map between outer measure, so that M ′ is possibly7→ a measure constraint· X and X. Both X and X are assumed to be Polish spaces dominating the pullback measures of m. equipped with their Borel σ-algebras. Two probabilistic constraints P, P ′ C (X ) are considered Proposition 4. Let M be a measure constraint on X and let 1 2 and assumed to be compatible, that∈ is, at most one of them ξ be a measurable mapping from X to X , then the measure 1 2 corresponds to an unknown probability measure. These proba- constraint M ′ on X defined as the pushforward of M by the 1 bilistic constraints can be understood as either prior knowledge mapping T ′ : L∞(X ) L∞(X ) defined as ξ 2 → 1 and observation or both as observations, which justifies the use ′ Tξ : f f ξ of the generic term data to cover the two types of information. 7→ ◦ Although one or both of these probabilistic constraint do not verifies (14). represent a true law, it is useful to consider two probability ′ Proof. We rewrite the uniform norm of 1B f as measures respectively dominated by P and P in order to · unveil the mechanism behind data assimilation. 1B f = 1 −1 (f ξ) 0 ′ 0 ′ k · k k ξ [B] · ◦ k Let X L (Ω, X) and X L (Ω , X) be two random ∞ ∈ X ∈ ′ for any f L (X2) and any B (X2). Since the variables in constrained by P and P respectively and composition∈ of measurable mappings is∈ measurable, B we verify consider the predicate that the codomain of ′ is L∞ X . Since the mapping ′ is Tξ ( 1) Tξ “X and X′ represent the same physical system”. (#) assumed to be measurable, we write The random variables X and X′ are defined on different 1 1 1 ′ ′ ′ B f M(df)= ξ− [B] f (Tξ)∗M(df ), probability spaces as they are assumed to originate from k · k k · k Z Z different representations of randomness. As a consequence, which terminates the proof of the proposition. the predicate (#) cannot be expressed as X = X′. A random Remark 8. Other transformations than T ′ could lead to a mea- variable X X′ can be defined on the probability space ξ × sure on L∞(X ) that verifies (14). However, these measures (Ω Ω′, Σ Σ′, P⋊P′) of joint outcomes/events and the law 1 × ⊗ would not dominate all the pullback measures of m. of X X′ is the probability measure p⋊p′ where p and p′ are the respective× laws of X and X′. The predicate (#) cannot Example 8. If m is a known probability measure, then be expressed as an event in X X in general, yet, it can be the corresponding canonical probabilistic constraint M would expressed as the event ∆= (x,× x) s.t. x X in the product I X have its support in the set s( 2) of indicator functions of σ-algebra (X ) (X ) because{ of the representativity∈ } of the X ′ singletons in 2. The induced probabilistic constraint M on set X . B ⊗B X 1 would then be of the form The data-assimilation operation for independent sources of ′ ′ information can now be formalised as an operation between M = m (B)δ1B , B π probabilistic constraints as shown in the following theorem. X∈ ′ where π is the partition generated by ξ on X2 and m is the Theorem 1. Let X be a representative set and let P and probability measure induced by m on the sub-σ-algebra gen- P ′ be compatible probabilistic constraints on X assumed erated by ξ. This example closes a loop started in Example 7 to represent the same physical system, then the probabilistic 9 constraint resulting from data assimilation is characterised by Remark 9. Using the notations of Theorem 1 and defining the the following convolution of measures on (L∞(X ), ): likelihood function ℓ(∆ ) on L∞(X ) L∞(X ) as · | · × ′ † ′ ′ ′ (P P ) ℓ(∆ ) : (f,f ) f f , P ⋆ P = ∗ , (15) | · 7→ k · k P P ′ k ∗ k the probability measure P ⋆ P ′ can be expressed for any F whenever P P ′ =0 holds. ∞(X ) as ∈ k ∗ k 6 L † ⋊ ′ The use of the unary operation in the numerator of (15) ′ P P (ℓ(∆ )Φ( , F )) · ′ P ⋆ P (F )= | · · , has for only objective to make P ⋆P a canonical probabilistic P ⋊ P ′(ℓ(∆ )) constraint. An equivalent choice would be to define P ⋆ P ′ as | · L∞ X L∞ X the normalised version of P P ′. However, (15) is preferred where Φ is a Markov kernel from ( ) ( ) to L∞ X × since it ensures that P ⋆ P ′ is∗ also a probability measure on ( ) defined as L∞ X . A similar rule has been proposed without proof in ′ ( ) Φ : ((f,f ), F ) δ(f·f ′)† (F ). [23] in the context of fuzzy Dempster-Shafer theory [32]. 7→ This shows that the canonical version of P ⋆ P ′ is a proper L0 X ′ L0 ′ X Proof. Let X (Ω, ) and X (Ω , ) be two Bayes’ posterior probability measure on L∞(X ). This is a ∈ X ∈ P ′ ′ P′ random variables in which laws p = X∗ and p = X∗ fundamental result since it shows that general data assimila- ′ are respectively dominated by P and P . The main idea of the tion can be performed in a fully measure-theoretic Bayesian ⋊ ′ X X proof is to build the joint law p p in and to study paradigm. the diagonal ∆= (x, x) s.t. x X . The× data assimilation One immediate objection to Theorem 1 is that the assump- mechanism for probabilistic{ constraints∈ } can then be deduced tion that the set X is representative is highly restrictive. Very from the one for p and p′. Let ∆(B) be the intersection often there is no interest in handling sophisticated state spaces, ∆ B B for any B (X ) and consider the law p ⋊ p′ especially when the received information does not allow for verifying∩ × ∈ B such an in-depth representation. However, several systems ′ ′ ′ ′ ′ could have the same state in X and (#) cannot be expressed p ⋊ p (B B ) 1B×B′ (f ⋊ f ) P (df)P (df ), × ≤ k · k as an event in (X) (X) so that a different approach has Z B ⊗B for any B B′ (X X ) so that to be considered. In the next corollary, we show that the result × ∈B × of Theorem 1 can be extended to a simpler state space as long p ⋊ p′(∆(B)) 1 (f ⋊ f ′) P (df)P ′(df ′) as it verifies a natural assumption. ≤ k ∆(B) · k Z Corollary 2. Assuming that there exists a representative set in ′ † ′ ′ ′ 1B (f f ) f f P (df)P (df ) ≤ k · · kk · k which the physical system can be uniquely characterised, the Z result of Theorem 1 holds for random variables in the state 1 ˆ ˆ B f M(df) space X X . ≤ k · k ( , ( )) Z B holds for any B (X ), with Proof. Let ξ be a given projection map from X to X. The ∈B operations of pushforward and pullback can be respectively . ′ † ′ ′ ′ ′ † M(F ) = 1F ((f f ) ) f f P (df)P (df ) = (P P ) (F ) expressed via the following mappings on the space of mea- · k · k ∗ Z surable functions: for any measurable subset F of L∞(X ). If both p and p′ L∞ X were the actual probability laws depicting the two sources ( f ( )) Tξ(f): x sup f, ∀ ∈ 7→ ξ−1[{x}] of information, it would be necessary to find a lower bound for p ⋊ p′(∆) in order to determine the constraint for the and ( f L∞(X)) T ′(f)= f ξ. hypothetical posterior law given ∆ (note that such a lower ∀ ∈ ξ ◦ bound would be equal to zero in most of the situations). We compute the result of the pusforward of combined pullback On the contrary, it has been assumed that at least one of functions as follows: the probabilistic constraints is a primary representation of ′ L∞ X ′ ′ ′ ′ uncertainty so that it is the measure constraint M that has to ( f,f ( )) Tξ Tξ(f) Tξ(f ) = Tξ((f f ) ξ), ∀ ∈ · · ◦ be normalised. We conclude that the probabilistic constraint so that  P ⋆ P ′ on X verifies ′ ′ ′ ′ ′ ′ 1 ′ † Tξ Tξ(f) Tξ(f ) : x sup Tξ((f f ) ξ)= f f . P ⋆ P (F )= (P P ) (F ) · 7→ ξ−1[{x}] · ◦ · C ∗  for any F ∞(X ), where the normalising constant is found This result indicates that the pushforward and the pullback via to be ∈ L ξ do not need to be considered when combining probabilistic constraints on the non-representative state space X and that, C = (P P ′)†(L∞(X)) = f f ′ P (df)P ′(df ′)= P P ′ . as a result, data assimilation with probabilistic constraints on ∗ k · k k ∗ k Z X and on X can be performed in the same way. The convolution of P and P ′ is well defined since (L∞(X ), ) is a topological semigroup, as shown in [17, Sect. 245D]. · The following examples make use of the notations of Theorem 1 and detail two simple cases for which only one 10 of the elements in the probabilistic constraint is maintained that is, P ⋆ P ′ is a canonical probabilistic constraint and (f through the data assimilation operation. f ′)† L(X) can be seen as the resulting upper bound. · ∈ C X Example 9. If P has its support in Is(X) and is therefore The binary operation ⋆ on 1( ) introduced in Theorem 1 ′ ′ equivalent to p M1(X) and if P is of the form P = δf ′ , has some additional properties that are detailed in the follow- then the probabilistic∈ constraint P ⋆ P ′ C (X) satisfies ing theorem. ∈ 1 C X ′ ′ † ′ Theorem 2. The space ( 1( ),⋆) is a commutative semi- P ⋆ P (F ) 1F ((f (x)1 ) )f (x)p(dx). ∝ {x} group with an identity element. Z ′ Since f (x)1{x} is either the null function or is supported by Proof. Theorem 1 together with Corollary 2 show that ⋆ is ′ C X a singleton, P ⋆ P has its support in Is(X) and is equivalent a proper binary operation on 1( ) in the sense that it is C X C X C X to the measure pˆ M1(X) defined as found to be a relation from 1( ) 1( ) to 1( ). In ∈ order to prove the result of the theorem,× we need to show that ∫ 1 ′ ⋆ is also associative, has an identity element in C (X) and is pˆ(dx) = ′ f (x)p(dx), 1 p(f ) commutative: let P, P ′ and P ′′ be probabilistic constraints in that is, the law of the combined random variable is known to C1(X), then be pˆ. The same type of result holds if the roles of P and P ′ a) ⋆ is associative: Since the convolution of measures is ′ ′∗ † are interchanged. If f is actually a likelihood of the form ℓz, known to be associative, we need to show that ((P P ) then pˆ can be expressed as P ′′)† = (P (P ′ P ′′)†)† holds. This can be done∗ by∗ verifying that∗ ∗ ∫ ℓ (x)p(dx) pˆ(dx) = z , ℓ (x′)p(dx′) f f ′ fˆ f ′′ = f f ′ f ′′ z k · kk · k k · · k which is the usual Bayes’R posterior of p given z, where z and fˆ f ′′ f f ′ f ′′ is interpreted as an observation and p is the prior probability · = · · ′′ ′ ′′ measure. The fact that ℓz is an element of L(X) rather than fˆ f f f f k · k k · · k a probability density does not affect the result because of the ′ ˆ f·f ′ ′′ L∞ X ′ ′ hold with f = ′ for any f,f ,f ( ). normalisation factor ℓz(x )p(dx ). As mentioned in Exam- kf·f k ∈ b) ⋆ has an identity element: consider that P ′ = δ1, then ple 3, one convenient choice for ℓz is the Gaussian-shaped R for any measurable subset F ∞(X) upper bound defined in the one-dimensional observation case ∈ L as ′ † 2 P ⋆ P (F ) 1F ((f 1) ) f P (df)= P (F ), . (Hx z) ∝ · k k ℓz(x) = exp − , − 2σ2 Z   with the coefficient of proportionality where σ corresponds to the standard deviation of the corre- f 1 P (df)=1, sponding normal distribution and where H is the observation k · k matrix. This choice allows for keeping the Kalman filter re- Z so that ′ and ′ is the identity element of cursion while better representing the information received via P ⋆ P = P P (C (X),⋆). finite-resolution sensors. Solutions to the multi-target tracking 1 problem in the linear Gaussian case can be formulated in terms This terminates the proof. of Kalman filters in interaction [7, 8] and depend on the value The absence of inverse element for the operator ⋆ on C1(X) of the denominator in Bayes’ theorem which would usually is due to the fact that it is not always possible to “forget” be of the form what has been learnt. We now study the relation between the 1 (Hm z)2 proposed operation and Dempster’s rule of combination in the exp − , next example. √2πς − 2ς2   Example 11. Let P and P ′ be two canonical probabilistic with ς2 = HPHT + σ2, where m and P are the mean of constraints on X of the form variance of the prior distribution and where T is the matrix · ′ ′ ′ transposition. With the proposed solution, we find that P = m(A)δ1A and P = m (A )δ1A′ A A′ ′ σ (Hm z)2 X∈A X∈A ℓz(x)p(dx)= exp − , ′ ς − 2ς2 for some sets and of measurable subsets and some Z   functions m : A [0, 1]A and m′ : ′ [0, 1]. The posterior which is dimensionless, takes value in the interval [0, 1] and probabilistic constraintA → P ⋆ P ′ on XAverifies→ can be interpreted as the probability for the observation z to ′ ′ ′ P ⋆ P (F ) 1F (1A A′ )m(A)m (A ) belong to the target with distribution p. ∝ ∩ A∩A′6=∅ ′ ′ X Example 10. If and have the form and ′ , P P P = δf P = δf ∞ X ′ C X for any F ( ), which can be re-expressed as then the probabilistic constraint P ⋆ P 1( ) is found to ∈ L ∈ be ′ 1 ′ ′ P ⋆ P = m(A)m (A )δ1 , ′ P P ′ A∩A′ P ⋆ P = δ(f·f ′)† ′ k ∗ k A∩XA 6=∅ 11 with objective is to extend the fusion of probabilistic constraints to the case where Y is an arbitrary set. P P ′ = m(A)m′(A′)=1 m(A)m(A′), k ∗ k − ′ A∩A′6=∅ A∩A′=∅ Theorem 3. Let P and P be probabilistic constraints on X X Y so that P ⋆ P ′ reduces to the result of Dempster’s rule of assumed to represent the same physical system, then the combination between m and m′ [11, 29]. A discussion about probabilistic constraint resulting from data assimilation is the connections between Dempster’s rule of combination and characterised by the following convolution of measures on L∞ X Bayes’ theorem applied to second-order probabilities can be ( ( ), ): ⊙ (P P ′)† found in [1]. P ⋆ P ′ = ∗ , (18) P P ′ k ∗ k B. General fusion where the binary operation on L∞(Y) characterised by ⊙ In Section VI-A, it has been assumed that the set X f f ′ :y ˆ sup ℓ(y,y′)f(y)f ′(y′) (19) on which the probabilistic constraints are defined can be 1 ⊙ 7→ (y,y′)∈θ− [{yˆ}] understood as a state space, with the consequence that the system of interest should be represented at a single point of X. is assumed to be associative. L∞ X The way elements of ( ) are fused is highly dependent on The level of generality of Theorem 3 is necessary for tack- this assumption. Yet, a different viewpoint must be considered ling estimation problems for systems that display hierarchical when introducing probabilistic constraints on more general levels of uncertainty, such as in the modelling of stochastic M X C X sets such as 1( ) and 1( ). populations [20]. Remark 10. In order to have a suitable topology defined on ′ 0 Proof. Let p and p be probability measures on Y dominated L (C1(X)), a reference measure must be introduced. As there by P and P ′ respectively. Equation (17) shows that the is no natural reference measure on C1(X), a case-by-case reference measure must be defined, e.g., as an integer-valued measure m to be dominated is characterised by measure that singles out a countable number of elements of ( B (Y)) m(B)= p ⋊ p′(ℓ Φ( ,B)), C1(X), or as a parametric family of elements of C1(X) which ∀ ∈B · parameter lies in an Euclidean space. and we find that Let Y be a topological space equipped with its Borel m(B) sup ℓ(y,y′)f(y)f ′(y′) P (df)P ′(df ′) σ-algebra and assume there is a given way of combining ≤ ′ ′ Z (y,y ): θ(y,y )∈B elements of Y which is characterised by a potential function  Y Y Y for all B (Y). In order to express the right hand side of the ℓ : [0, 1] and a surjective mapping θ : S , with ∈B S Y× Y→, which defines a stochastic kernel Φ from→ Y Y previous inequality as a probabilistic constraint, the argument to ⊆Y as× × of the supremum has to be normalised, and we find that ′ 1 . δθ(y,y′) if (y,y ) S ′ 1 ′ † ′ ′ ′ Φ((y,y′), ) = ∈ P ⋆ P (F )= F ((f f ) ) f f P (df)P (df ) 0 C ⊙ k ⊙ k · ( otherwise. Z (P P ′)†(F ) We also assume that ℓ verifies ℓ(y,y′)=0 for any (y,y′) ∝ ∗ Sc. Note that the kernel Φ becomes a Markov kernel when∈ for all F ∞(Y), where the operation is defined as in ∈ L ⊙ restricted to S. When it exists, the probability measure pˆ in the statement of the theorem. The constant C is found to be M1(Y) defined as C = f f ′ P (df)P ′(df ′)= P P ′ , p ⋊ p′(ℓΦ( ,B)) k ⊙ k k ∗ k pˆ(B)= · , (17) Z p ⋊ p′(ℓ) thus proving the result of the theorem. for all B (Y) can be introduced. As in Remark 9, pˆ can be For the posterior probabilistic constraint of Theorem 3 to seen as the∈B projection of the Bayes’ posterior corresponding be well defined, the mappings θ and ℓ have to have sufficient to the prior p ⋊ p′ and the likelihood ℓ. properties for (L∞(Y), ) to be a semigroup. Informally, the Example 12. If Y is equal to L∞(X ), with X the represen- mapping θ can be loosely⊙ seen as a binary operation which tative state space defined in Section VI-A, then the laws p and has to be associative for to be associative. More rigorously, . p′ are probabilistic constraints on X . In this case, the natural we can define an extension⊙ Y¯ = Y ϕ of the set Y by an choice for ℓ and θ is ℓ(f,f ′)= f f ′ and θ(f,f ′) = (f f ′)† isolated point ϕ and extend θ and ℓ∪{as follows:} θ(y,y′) = ϕ for any f,f ′ L∞(X ). The probabilityk · k measure pˆ then takes· for any (y,y′) / S and ℓ(y,y′)=0 if (y,y′) / Y Y. With the form ∈ these notations,∈ we can formulate the following∈ propositio× n.

′ † ′ ′ ′ pˆ(F ) 1F ((f f ) ) f f p(df)p (df ), Proposition 5. The binary operation is associative if θ is ∝ · k · k ⊙ Z associative and if for all F ∞(Y), that is pˆ = p ⋆ p′. ∈ L ℓ(y,y′)ℓ(θ(y,y′),y′′)= ℓ(y,θ(y′,y′′))ℓ(y′,y′′) We assume that there is a reference measure on Y which ′ ′′ enables probability measures to be defined on L∞(Y). The holds for any y,y ,y Y. ∈ 12

Proof. We have to show that (f f ′) f ′′ = f (f ′ f ′′) general form since two probabilistic constraints do not have to for any f,f ′,f ′′ L∞(Y) under⊙ the conditions⊙ ⊙ given⊙ in the be equal to represent the same individual. Also, the mapping proposition. It holds∈ that for any y˜ Y θ is surjective but not bijective in general for probabilistic ∈ constraints, as opposed to the case of Example 13. This level ((f f ′) f ′′)(˜y)= ⊙ ⊙ of generality was required in [21, Chapt. 2] when deriving sup ℓ(y,y′)ℓ(ˆy,y′′)f(y)f ′(y′)f ′′(y′′), a principled solution to the problem of data association for 1 (ˆy,y′′)∈θ− [{y˜}] multi-object dynamical systems since these systems are best ′ −1 (y,y )∈θ [{yˆ}] represented by hierarchical level of uncertainty, as already which can be expressed as identified in [26, 8].

((f f ′) f ′′)(˜y)= ⊙ ⊙ CONCLUSION sup ℓ(y,y′)ℓ(θ(y,y′),y′′)f(y)f ′(y′)f ′′(y′′). (y,y′,y′′): θ(θ(y,y′),y′′)=˜y Supremum-based outer measures have demonstrated their ′ ′′ ability to represent uncertainty in a flexible way and to comply Obtaining the equivalent expression for f (f f ) confirms with intuitively appealing operations such as pullback and that is associative if ⊙ ⊙ ⊙ data assimilation. Future work includes the generalisation of θ(θ(y,y′),y′′)= θ(y,θ(y′,y′′)) the proposed type of outer measure to product spaces, where ℓ(y,y′)ℓ(θ(y,y′),y′′)= ℓ(y,θ(y′,y′′))ℓ(y′,y′′) correlations between the different components of the product space can take a more involved form. holds for any y,y′,y′′ Y, the first line being equivalent to ∈ the associativity of θ when seen as a binary operation. REFERENCES Notice that when the mapping θ is bijective, the function [1] J. Baron. Second-order probabilities and belief functions. f f ′ is also characterised by the relation Theory and Decision, 23(1):25–36, 1987. ⊙ [2] L. E. Baum and T. Petrie. Statistical inference for f f ′(θ(y,y′)) = ℓ(y,y′)f(y)f ′(y′) (22) ⊙ probabilistic functions of finite state Markov chains. for all (y,y′) S. Indeed, the supremum in (19) is taken The annals of mathematical statistics, 37(6):1554–1563, ∈ over θ−1[ yˆ ] which is a singleton when θ is bijective. The 1966. { } following example makes the connection between state-space [3] S. S. Blackman. Multiple-target tracking with radar fusion and the more general fusion operation introduced in applications, volume 1. Artech House, Norwood, 1986. Theorem 3. [4] V. I. Bogachev. Measure Theory. Springer, 2007. [5] A. K. Brodzik and R. H. Enders. Semigroup structure of Example 13. If Y is equal to the state space X defined in singleton Dempster-Shafer evidence accumulation. IEEE Section VI-A, then the natural way to set ℓ and θ is to take transactions on Information theory, 55(11):5241–5250, ℓ(x, x′) = 1 (x′) for all x, x′ X and to define θ on the {x} 2009. diagonal of X X only, by θ(x,∈ x)= x for all x X. Since [6] H. Crauel. Random probability measures on Polish θ is bijective in× this case, we can use (22) to find∈ that spaces, volume 11. CRC Press, 2003. f f ′ : x ℓ(x, x)f(x)f ′(x)= f(x)f ′(x). [7] P. Del Moral. Mean field simulation for Monte Carlo ⊙ 7→ integration. Chapman & Hall/CRC Monographs on In other words, it holds that f f ′ = f f ′ and the result of Statistics & Applied Probability, 2013. Section VI-A about fusion for⊙ state spaces· is recovered. This [8] P. Del Moral and J. Houssineau. Particle association confirms that the fusion operation introduced in this section measures and multiple target tracking. In Theoreti- is not different from the one of Section VI-A, it is instead a cal Aspects of Spatial-Temporal Modeling, pages 1–30. more general formulation of the same operation. Springer, 2015. Example 14. If Y = C1(X) where X is the state space [9] E. Delande, J. Houssineau, and D. E. Clark. Multi-object defined in Section VI-A, then the natural way to set ℓ and filtering with stochastic populations. ArXiv preprint θ is ℓ(P, P ′) = P P ′ and θ(P, P ′) = P ⋆ P ′ for any arXiv:1501.04671v2, 2016. k ∗ k ′ probabilistic constraints P and P in C1(X). In this case, we [10] A. P. Dempster. Upper and lower probability inferences find that based on a sample from a finite univariate population. Biometrika, 54(3-4):515–528, 1967. f f ′ : Pˆ sup P P ′ f(P )f ′(P ′). [11] A. P. Dempster. A generalization of Bayesian infer- ⊙ 7→ (P,P ′): P ⋆P ′=Pˆ k ∗ k ence. Journal of the Royal Statistical Society. Series B The binary operation is associative since it indeed holds ⊙ (Methodological), pages 205–247, 1968. that [12] D. Dubois and H. Prade. Possibility theory. Wiley Online (P ⋆ P ′) ⋆ P ′′ = P ⋆ (P ′ ⋆ P ′′) Library, 1988. [13] D. Dubois and H. Prade. Possibility theory: qualitative P P ′ (P ⋆ P ′) P ′′ = P (P ′ ⋆ P ′′) P ′ P ′′ k ∗ kk ∗ k k ∗ kk ∗ k and quantitative aspects. In Quantified representation of ′ ′′ for any P, P , P C1(X). The set C1(X) is one of the uncertainty and imprecision, pages 169–226. Springer, useful examples of∈ sets on which the fusion operation takes a 1998. 13

[14] D. Dubois and H. Prade. Fundamentals of fuzzy sets, volume 7. Springer Science & Business Media, 2000. [15] R. Fagin, J. Y. Halpern, and N. Megiddo. A logic for rea- soning about probabilities. Information and computation, 87(1):78–128, 1990. [16] T. S. Ferguson. A Bayesian analysis of some nonpara- metric problems. The annals of statistics, pages 209–230, 1973. [17] D. H. Fremlin. Measure theory. 4th edition, 2000. [18] N. Friedman and J. Y. Halpern. Plausibility measures and default reasoning. Journal of the ACM, 48(4):648–685, 2001. [19] J. Houssineau. Representation and estimation of stochas- tic populations. PhD thesis, Heriot-Watt University, 2015. [20] J. Houssineau and D. E. Clark. On a representation of partially-distinguishable populations. arXiv preprint arXiv:1608.00723, 2016. [21] J. Houssineau, D. E. Clark, and P. Del Moral. A sequential Monte Carlo approximation of the HISP filter. In European Signal Processing Conference (EUSIPCO), 2015. [22] M. Lo`eve. Probability theory, vol. II. Graduate texts in mathematics, 46:0–387, 1978. [23] R. P. S. Mahler. Can the Bayesian and Dempster-Shafer approaches be reconciled? Yes. In IEEE International Conference on Information Fusion, 2005. [24] J. Nedoma. Note on generalized random variables. In Transactions of the First Prague Conference on In- formation Theory, Statistical Decision Functions, Ran- dom Processes (Liblice, 1956), Publishing House of the Czechoslovak Academy of Sciences, Prague, pages 139– 141, 1957. [25] C. V. Negoita, L. A. Zadeh, and H. J. Zimmermann. Fuzzy sets as a basis for a theory of possibility. Fuzzy sets and systems, 1:3–28, 1978. [26] M. Pace and P. Del Moral. Mean-field PHD filters based on generalized Feynman-Kac flow. IEEE Journal of Selected Topics in Signal Processing, 7(3):484–495, 2013. [27] E. Schechter. Handbook of Analysis and its Foundations. Academic Press, 1996. [28] G. Shafer. A mathematical theory of evidence, volume 1. Princeton university press Princeton, 1976. [29] G. Shafer. The combination of evidence. International Journal of Intelligent Systems, 1(3):155–179, 1986. [30] R. C. Williamson. Probabilistic arithmetic. PhD thesis, University of Queensland, 1989. [31] J. Wishart. The generalised product moment distribu- tion in samples from a normal multivariate population. Biometrika, pages 32–52, 1928. [32] J. Yen. Generalizing the Dempster-Schafer theory to fuzzy sets. IEEE Transactions on Systems, Man and Cybernetics, 20(3):559–570, 1990. [33] L. A. Zadeh. Fuzzy sets. Information and Control, 8:338–353, 1965.