arXiv:1907.08860v2 [math.OC] 24 Mar 2020 fdffrniblt ntesaeo rbblt esr int measure of space the in differentiability of ( processes of pair a through stochasti Mckean–Vlasov of control optimal the to sociated don qaini hscs,culdwt owr D,co SDE, forward a with coupled case, this in equation adjoint [ Lions and Lasry, Delarue, Cardaliaguet, to due is equations McKean–Vlasov on work i allow pioneering also are The but functionals process, reward state the the Similarly, of noise). paths common the different time, stochastic current a contr the by governed on optimal is of process in state problem the equations the where differential study stochastic to Mckean–Vlasov paper called this in propose We Introduction 1 oet ftelw erfrto refer We of coefficients law. the the of where moments case the to corresponding equation, ce the of use condit sufficient the and/or on necessary derive relied to problem d allows strategy the McKean–Vlasov tackle of to control approach optimal first The op dynamic the possible. the in as of establishment is general rigorous mentioned, the already in as precisely more paper, this in investigat interest were Our which differences subtle competit some the have problems is (this whic equilibrium agents, Nash many a of seek behaviour who the and describe the to states, tries with han one shares one when it the that on by connection simultaneously hand the and to papers independently initial due introduced these decade, following past equation the this on propagatio worked have general establishing of in notes and SDEs, uncontrolled rjc AMNANR–16–CE05–0027. PACMAN project 3 and ] ∗ § † ‡ a arc jt rtflyakolde upr rmthe from support acknowledges gratefully Djete Fabrice Mao nvriéPrsDuhn,PLUiest,CR,CEREMAD CNRS, University, PSL Paris–Dauphine, Université oubaUiest,Idsra niern Operations & Engineering Industrial University, Columbia eateto ahmtc,TeCieeUiest fHn K Hong of University Chinese The Mathematics, of Department cenVao pia oto:tednmcpormigpr programming dynamic the control: optimal McKean–Vlasov n octnto ruet nti e otx,adestabl and de context, int new then By this we conditions. in dynamics. filtrations, arguments state two concatenation the with and non in space and appear Markovian canonical to the appropriate process as control well the as of formulation, weak and strong ukan jhce n Li and Djehiche, Buckdahn, Snitzman esuyteMKa–lsvotmlcnrlpolmwt com with problem control optimal McKean–Vlasov the study We un,Cie,adMalhamé and Caines, Huang, [ 75 a Fabrice Mao ,wihgv rcs n eaoia nih noti spe this into insight pedagogical and precise a give which ], ,Z Y, amn n Delarue and Carmona aifigabcwr tcatcdffrnileuto (B equation differential stochastic backward a satisfying ) Djete [ 19 s hsapoc o pcfi aeo pia oto faM a of control optimal of case specific a for approach this use ] [ 44 [ 22 ; ) raPrt qiiru ti stecoeaieequilib cooperative the is (this equilibrium Pareto a or ]), 45 † ; 46 ac 5 2020 25, March ; 47 Dylan eerh 0 2t tet e ok Y 02,dp2917@co 10027, NY, York, New Street, 120th W 500 Research, Abstract [ ; 24 éinÎed–rne hswr lobnfie rmsupport from benefited also work This Île–de–France. région 48 oue by roduced oscaatrsn h pia ouino h oto pro control the of solution optimal the characterising ions o naayi namr eea otx hnst h notio the to thanks context general more a in analysis an for ] a qain(D o hr) hc a offiinsdependin coefficients has which short), for (SDE equation ial h ieaue hspolmi tcatccnrlproble control stochastic a is problem This literature. the [email protected] ong. ,706Prs rne [email protected] France, Paris, 75016 E, .Ide,teMKa–lsveuto aual appears naturally equation McKean–Vlasov the Indeed, ]. 1 dt eipce ytedsrbto ftesaeprocess. state the of distribution the by impacted be to ed rgamn rnil DPfrsot,udrcniin as conditions under short), for (DPP principle programming h qainadterwr ucin nydpn nsome on depend only functions reward the and equation the hr a enadatcsreo neeti h oi in topic the in interest of surge drastic a been has there , fcasrsls e sas eto h illuminating the mention also us Let results. chaos of n McKean dtoogl by thoroughly ed sdsrbto o odtoa itiuini h aewi case the in distribution conditional (or distribution ts ia oto fMKa–lsvsohsi qain,and equations, stochastic McKean–Vlasov of control timal qain,see equations, c rsodn oteotmlpath. optimal the to rresponding Possamaï nmc sarte eetpolmi h ieaue The literature. the in problem recent rather a is ynamics ertdPnrai tcatcmxmmpicpe This principle. maximum stochastic Pontryagin lebrated s h yai rgamn rnil ne general under principle programming dynamic the ish eo h lsia esrbeslcin conditioning selection, measurable classical the velop by d rrtn h otosa rbblt esrso an on measures probability as controls the erpreting v qiiru ae edn oteMGter,see theory, MFG the to leading case, equilibrium ive neattruhteeprcldsrbto ftheir of distribution empirical the through interact h lo enfil tcatcdffrnileutos also equations, differential stochastic mean–field of ol Mroinfruain,adalwn o h law the for allowing and formulations, –Markovian ocle enfil ae(F o hr)theory, short) for (MFG game mean–field so–called Lions ar n Lions and Lasry [ o os nvrosfruain,nml the namely formulations, various in noise mon 60 and ] nhsClèed rnecus [ course France de Collège his in ‡ Lacker Kac amn,Dlre n Lachapelle and Delarue, Carmona, iceuto.Tog ayauthors many Though equation. cific [ 49 Xiaolu [ ,wowr neetdi studying in interested were who ], 52 n[ in ) huhrltd hs two these related, Though ]). 53 ; D o hr) localled also short), for SDE Tan 54 neso n Djehiche and Andersson ; 55 § n nteother the on and ] imcs,as- case, rium inciple cKean–Vlasov 59 fteANR the of sealso (see ] lumbia.edu blem, [ 27 ∗ th ]. m n g the lecture notes of Cardaliaguet [21]). Related results were also obtained by Acciaio, Backhoff-Veraguas, and Carmona [1] for so–called generalised McKean–Vlasov control problems involving the law of the controls, and were a link with causal optimal transport was also highlighted. The stochastic maximum principle approach was also used in a context involving conditional distributions in Buckdahn, Li, and Ma [20] and Carmona and Zhu [26]. Related results have been obtained in the context of a relaxed formulation of the control problem, allowing in addition to obtain existence results by Chala [29], which were then revisited by Lacker [52], and Bahlali, Mezerdi, and Mezerdi [4; 5; 6] Readers familiar with the classical theory of stochastic control know that another popular approach to the problem is to use Bellman’s optimality principle, to obtain the so–called dynamic programming principle. In a nutshell, the idea behind the DPP is that the global optimisation problem can be solved by a recursive resolution of successive local optimisation problems. This fact is an intuitive result, which is often used as some sort of meta–theorem, but is not so easy to prove rigorously in general. Note also that, in contrast to the Pontryagin maximum principle approach, this approach in general requires fewer assumptions, though it can be applied in less situations. Notwithstanding these advantages, the DPP approach has long been unexplored for the control of McKean–Vlasov equations. One of the main reasons is actually a very bleak one for us: due to the non–linear dependency with respect to the law of process, the problem is actually a time inconsistent control problem (like the classical mean–variance optimisation problem in finance, see the recent papers by Björk and Murgoci [14], Björk, Khapko, and Murgoci [15], and [42] for a more thorough discussion of this topic), and Bellman’s optimality principle does not hold in this case. However, though the problem itself is time–inconsistent, one can recover some form of the DPP by extending the state space of the problem. This was first achieved by Lauriére and Pironneau [56], and later by Bensoussan, Frehse, and Yam [10; 11; 12], who assumed the existence at all times of a density for the marginal distribution of the state process, and reformulated the problem as a deterministic density control problem, with a family of deterministic control terms. Under this reformulation, they managed to prove a DPP and deduce a dynamic programming equation in the space of density functions. Following similar ideas, but without the assumptions of the existence of density, and allowing the coefficients and reward functions to not only depend on the distribution of the state, but to the joint distribution of the state and the control, Pham and Wei [67] also deduced a DPP by looking at a set of closed loop (or feedback) controls, in a non common noise context. They then extended this strategy to a common noise setting (where the control process is adapted to common noise filtration) in [66]. Specialisations to linear–quadratic settings were also explored by Pham [65], Li, Sun, and Yong [57], Li, Sun, and Xiong [58], Huang, Li, and Yong [43], Yong [80], and Basei and Pham [8]. Concerning the (general) DPP approach to solve problems involving McKean–Vlasov stochastic equations, let us men- tion Bouchard, Djehiche, and Kharroubi [16], who study a stochastic target problem for McKean–Vlasov stochastic equations. Roughly speaking, this means that they are interested in optimally controlling a McKean–Vlasov equa- tion over the interval [0,T ], under the target constraint that the marginal law of the controlled process at time T belongs to some Borel subset of the space of probability measures. They establish a general geometric dynamic pro- gramming, thus extending the seminal results of Soner and Touzi [76] corresponding to the non–McKean–Vlasov case. Another important contribution is due to Bayraktar, Cosso, and Pham [9]. There, the authors build upon the results of Fuhrman and Pham [40] and Bandini, Cosso, Fuhrman, and Pham [7] to obtain a randomised control problem by controlling the intensity a Poisson random measure. This enables them to allow for general open–loop controls, unlike in [66; 67], though the setting is still Markovian and does not permit to consider common noise. Our approach to obtaining the DPP is very different. One common drawback of all the results we mentioned above, is that they generically require some Markovian1 property of the system or its distribution, as well as strong regularity assumptions on coefficient and reward functions considered. This should appear somehow surprising to people familiar with the classical DDP theory. Indeed, for stochastic control problems, it is possible to use measurable selection arguments to obtain the DPP, in settings requiring nothing beyond mild measurability assumptions. As a rule of thumb, one needs two essential ingredients to prove the dynamic programming principle: first ensuring the stability of the controls with respect to conditioning and concatenation, and second the measurability of the associated value function. The use of measurable selection argument makes it possible to provide an adequate framework for verifying the conditioning, the concatenation and the measurability requirements of the associated value function without strong assumptions. This technique was followed by Dellacherie [32], by Bertsekas and Shreve in [13; 72; 73; 74], and by Shreve [69; 70; 71] for discrete–time stochastic control problems. Later, El Karoui, Huu Nguyen, and Jeanblanc-Picqué in [39] presented a framework for stochastic control problem in continuous time (accommodating general Markovian

1An exception is the work of Djehiche and Hamadène [33], which considers optimal control (and also a zero–sum game) of a non–Markovian McKean–Vlasov equation, and obtains both a characterisation of the value function and the optimal control using BSDE techniques, reminiscent of the classical results of Hamadène and Lepeltier [41] and El Karoui and Quenez [35; 36] for the non–McKean–Vlasov case. However, their approach does not allow for common noise, and is limited to control on the drift of the state process only.

2 processes). Thanks to the notion of relaxed control, that is to say the interpretation of a control as a probability measure on some canonical space, and thanks to the use of the notion of martingale problems, they proved a DPP by simple and clear arguments. El Karoui and Tan [37; 38] extended this approach to the non–Markovian case. Similar results were obtained by several authors, among which we mention Nutz and Soner [63], Neufeld and Nutz [61; 62], Nutz and van Handel [64], Žitković [82], and Possamaï, Tan, and Zhou [68]. Following the framework in [37; 39], we develop in this paper a general analysis based upon the measurable selection argument for the non–Markovian optimal control of McKean–Vlasov equations with common noise. In particular, we investigate the case where the drift and diffusion coefficients, as well as the reward functions, are allowed to depend on the joint conditional distribution of the path of the state process and of the control, see [67] for the case of the joint distribution of the state process and of feedback controls (see also Yong [80] for a more specific situation) in a non–common noise case. Motivated by the notion of weak solution of classical SDEs, and similarly to the ideas used by El Karoui and Tan [37], and Carmona, Delarue, and Lacker [28] in a mean–field games context, our first task is to provide an appropriate ‘relaxation‘ of the problem. We therefore introduce a notion of weak solution of controlled McKean–Vlasov equation with common noise. Notice that this is by no means a straightforward task. In standard McKean–Vlasov stochastic control problems, the controls (open loop in that case) are adapted with respect to the filtration generated by both the Brownian motion (W, B) (with B being the common noise) and the initial random variable ξ (serving as an initial condition for the problem). Then, the conditional distributions considered are associated to the filtration of B, in other words the ‘common noise‘ filtration, that is, L(Xt∧·, αt|B), where X is the state and α the control. We call this the strong formulation. The strong formulation does not enjoy a good stability condition. To see this, it is enough to notice that the conditional distribution is not continuous with respect to the joint distribution for instance the function 2 L(Xt,B) 7−→ E E Xt B is not continuous . To overcome this difficulty, we introduce a notion of weak solution by considering a more general filtration F describing the adaptability of the controls, and an extended common noise      filtration G as in [28]. Nevertheless, more conditions on F and G are needed to ensure that the formulation remains first compatible with the notion of strong solutions, then enjoys good stability properties for fixed control processes, and finally ensures that weak controls can be approximated sufficiently well by strong controls. With the help of this notion, we can then provide a weak formulation for McKean–Vlasov control problems with common noise. By interpreting controls as probability measures on an appropriate canonical space, and using measurable selection arguments as in [37; 38], we then move on to prove the universal measurability of the associated value function, and derive the stability of controls with respect to conditioning and concatenation, and finally deduce the DPP for the weak formulation under very general assumptions. Our next result addresses the DPP for the classical strong formulation. Using the DPP in weak formulation, and by adding standard Lipschitz conditions on the drift and diffusion coefficients, as in [66], but without any regularity assumptions on reward functions, and in a non–Markovian context, we obtain the DPP for the strong formulation of McKean–Vlasov control problems with common noise, where the control is adapted to the ‘common noise‘ filtration (B in this case of strong formulation). Also, for general strong formulation, where the control is adapted to both ξ, W and B, we obtain the DPP under some additional regularity conditions on the reward functions. These regularity conditions may seem unexpected at first sight, but they seem unavoidable due to the non–linear dependency of the drift and volatility coefficients with respect to the conditional distribution of X (see Remark 3.6 for a more thorough discussion). Finally, the DPP results in the general non–Markovian context induces the same results in the Markovian one. We stress again that our results and techniques are not easy extensions of those in the existing literature, although the very starting ideas may seem so. Our DPP results are much more general than those in the literature, and, as far as we can tell, are the most general. To interpret a control as a probability measure on the canonical space, one needs to formulate an adequate notion of weak solutions which should enjoy more stability properties than the strong solutions, and at the same time be able to approximate these solutions by strong ones. Finally, because of the presence of two filtrations (a general filtration and a common noise filtration) on the canonical space and the implicit conditions on the two filtrations, it become much more delicate to develop the classical measurable selection, conditioning and concatenation arguments in this new context. The rest of the paper is organised as follows. After recalling briefly some notations and introducing the probabilistic structure to give an adequate and precise definition of the tools that are used throughout the paper, we introduce in Section 2 several notions of weak and strong formulation (in a fixed probability space or in the canonical space) for the McKean–Vlasov stochastic control problem with common noise in a non–Markovian framework, and prove some equivalence results. Next, in Section 3, we present the main result of this paper, the DPP for three formulations: weak

3 formulation, strong formulation, and a B–strong formulation where the control is adapted with respect to the ‘common noise‘ filtration. We first provide all our results in the non–Markovian setting, and then in a Markovian framework. Finally, Section 4 is devoted to the proof of our main results. Notations. (i) Given a metric space (E, ∆) and p ≥ 0, we denote by P(E) the collection of all Borel probability p measures on E, and by Pp(E) the subset of Borel probability measures µ such that E ∆(e,e0) µ(de) < ∞ for some e ∈ E. When p ≥ 1, the space P (E) is equipped with the Wasserstein distance W defined by 0 p p R 1/p ′ ′ p ′ Wp(µ, µ ) := inf ∆(e,e ) λ(de, de ) , λ∈Λ(µ,µ′)  ZE×E  where Λ(µ, µ′) is the collection of all Borel probability measures λ on E ×E such that λ(de, E)= µ(de) and λ(E, de′)= ′ ′ µ (de ). When E is a Polish space, (Pp(E), Wp) is a Polish space (see Villani [78, Theorem 6.16]). Given another metric space (E′, ∆′), we denote by µ ⊗ µ′ ∈P(E × E′) the product probability of any (µ, µ′) ∈P(E) ×P(E′). (ii) Given a measurable space (Ω, F), we denote by P(Ω) the collection of all probability measures on (Ω, F). For any P U P probability measure P ∈ P(Ω), we denote by F the P–completion of the σ–field F, and by F := P∈P(Ω) F the universal completion of F. Let ξ :Ω −→ R ∪ {−∞, +∞} be a random variable and P ∈P(Ω), we define T P P P E ξ := E [ξ+] − E [ξ−], where ξ+ := ξ ∨ 0, ξ− := (−ξ) ∨ 0, with the convention ∞ − ∞ := −∞.

We also use the  following notation to denote the expectation of ξ under P

EP[ξ]= hP, ξi = hξ, Pi.

When Ω is a Polish space, a subset A ⊆ Ω is called an analytic set if there is another Polish space E, and a Borel subset B ⊆ Ω × E such that A = {ω ∈ Ω : ∃e ∈ E, (ω,e) ∈ B}. A function f : Ω −→ R ∪ {−∞, ∞} is called upper semi–analytic (u.s.a. for short) if {ω ∈ Ω : f(ω) >c} is analytic for every c ∈ R. Any upper semi–analytic function is universally measurable (see e.g. [13, Chapter 7]). (iii) Let Ω be a metric space, F its Borel σ–field and G⊂F be a sub–σ–field which is countably generated. Following G [77], we say that (Pω)ω∈Ω is a family of r.c.p.d. (regular conditional probability distributions) of P knowing G if it satisfies

G G • the map ω 7−→ Pω is G−measurable, and for all A ∈F and B ∈ G, one has P[A ∩ B]= B Pω[A]P(dω); G R • Pω [ω]G = 1 for all ω ∈ Ω, where [ω]G := A ∈F : A ∈ G and ω ∈ A .

Let (Ω, F , P, G = (Gt)t∈[0,T ]) be a filtered probabilityT  space, G ⊂F be a sub– σ–field, and E a metric space. Then P G −1 given a random element ξ : Ω −→ E, we use both the notations L (ξ|G)(ω) and Pω ◦ (ξ) to denote the conditional distribution of ξ knowing G under P. Moreover, given a measurable process X : [0,T ] × Ω −→ E, we can always define P µt := L (Xt|Gt) to be a P(E)–valued G–optional process (see for instance Lemma A.1). ′ ′ ′ (iv) Let (E, ∆) and (E , ∆ ) be two Polish spaces, we shall refer to Cb(E, E ) to designate the set of continuous ′ ′ ′ ′ ′ functions f : E → E such that supe∈E ∆ (f(e),e0) < ∞ for some e0 ∈ E . Let N denote the set of non–negative 2 n k k integers. For (k,n) ∈ N , we denote by Cb (R ; R) the set of maps bounded continuous maps f : R −→ R possessing 2 bounded continuous derivatives up to order n, and by ∂if (resp ∂i,j ) the partial derivative (resp. crossed second partial derivative) with respect to xi (resp (xi, xj )) for (i, j) ∈ {1,...,k}×{1,...,k}, and x := (x1, ..., xk) ∈ Rk. Given (k, q) ∈ N × N, we denote by Sk×q the collection of all k × q–dimensional matrices with real entries, equipped with the standard Euclidean norm that we will denote | · | when there are no ambiguity. Let us denote by 0k, 0k×q and Ik the null matrix of dimension k × k, the null matrix of dimension k × q, and the identity matrix of dimension k × k. Let T > 0, and (E, ∆) be a Polish space, we denote by C([0,T ], E) the space of all continuous functions on [0,T ] taking values in E, which is also a Polish space under the uniform convergence topology. When E = Rn for some n ∈ N, and ∆ is the usual Euclidean norm on Rn, we simply write Cn := C([0,T ], Rn) and denote by k·k the uniform norm. We n n n also denote by Cs,t := C([s,t]; R ) the space of all R –valued continuous functions on [s,t], for 0 ≤ s ≤ t ≤ T . When n n n = 0, the space C and Cs,t both degenerate to a singleton.

(v) Throughout the paper, we fix a constant p ≥ 0, a nonempty Polish space (U,ρ) and a point u0 ∈ U. Notice that a Polish space is always isomorphic to a Borel subset of [0, 1]. Let us denote by π such one (isomorphic) bijection between U and π(U) ⊆ [0, 1]. We further extend the definition of π−1 to R ∪ {−∞, ∞} by setting π−1(x) := ∂ for all

4 x∈ / π(U) and let U := U ∪{∂}, where ∂ is the usual cemetery point. Let ν ∈P(Ck) (resp.ν ¯ ∈P(Ck × U)) be a Borel probability measure on the canonical space Ck (resp. Ck × U) equipped with the canonical process X (resp. (X, α)). We denote, for each t ∈ [0,T ], −1 −1 ν(t) := ν ◦ Xt∧· resp.ν ¯(t):=ν ¯ ◦ (Xt∧·, α) .  2 Weak and strong formulations of the McKean–Vlasov control problem

The main objective of this paper is to study the following (non–Markovian) McKean–Vlasov control problem, in both strong and weak formulations, of the form

T α α α α ‘ sup E L t,Xt∧·, L Xt∧·, αt Gt , αt dt + g X· , L X· GT , ‘ α  Z0     α where G := (Gt)0≤t≤T is a filtration modelling the commun noise, supporting a Brownian motion B, L Xt∧·, αt Gt denotes the joint conditional distribution of (Xα , α ) knowing G , and (Xα) is a McKean–Vlasov type process, t∧· t t t t∈[0,T ]  controlled by α = (αt)0≤t≤T and generated by W together with an independent Brownian motion B

α α α α α α α ‘dXt = b t,Xt∧·, L Xt∧·, αt Gt , αt dt + σ t,Xt∧·, L Xt∧·, αt Gt , αt dWt + σ0 t,Xt∧·, L Xt∧·, αt Gt , αt dBt.‘ (2.1)          We will provide in the following a precise definition to the above controlled SDE, depending on the strong/weak formulation considered. Let us first specify the dimensions and some basic conditions on the coefficient functions. Let (n,ℓ,d) ∈ N × N × N. The coefficient functions

b : [0,T ] ×Cn ×P(Cn × U) × U −→ Rn, σ : [0,T ] ×Cn ×P(Cn × U) × U −→ Sn×d,

n n n×ℓ n n n n σ0 : [0,T ] ×C ×P(C × U) × U −→ S ,L : [0,T ] ×C ×P(C × U) × U −→ R, g : C ×P(C ) −→ R, are all assumed to be Borel measurable, and non–anticipative in the sense that

n n b, σ, σ0,L (t, x, ν,u¯ )= b, σ, σ0,L (t, xt∧·, ν¯(t),u), for all (t, x, ν,u¯ ) ∈ [0,T ] ×C ×P(C × U) × U.   2.1 A weak formulation A weak formulation of the control problem is obtained by considering all weak solutions of the controlled McKean– Vlasov SDE (2.1). Here the word ‘weak‘ refers to the fact that the probability space, as well as the equipped Brownian motion, is not assumed to be fixed, but is a part of the solution itself. This is of course consistent with the notion of the weak solution in the classical SDE theory. Definition 2.1. Let (t, ν) ∈ [0,T ] ×P(Cn). We say that a term

γ γ γ γ γ γ γ γ γ γ γ γ γ γ := Ω , F , P , F = (Fs )0≤s≤T , G = (Gs )0≤s≤T ,X , W ,B , µ , µ , α , is a weak control associated with the initial condition (t, ν) if the following conditions are satisfied: (i) (Ωγ , F γ, Pγ) is a probability space, equipped with two filtrations Fγ and Gγ such that, for all s ∈ [0,T ],

γ γ Pγ γ Pγ γ γ γ γ Gs ⊆Fs , and E 1D Gs = E 1D GT , P –a.s., for all D ∈Fs ∨ σ(W ); (2.2)

γ γ n  γ    γ γ γ (ii) X = (Xs )s∈[0,T ] is an R –valued, F –adapted continuous process, α := (αs )t≤s≤T is an U–valued, F – predictable process, and with the fixed constant p ≥ 0, one has

T Pγ γ p Pγ γ p E kX k + E ρ(αs ,u0) ds < ∞; (2.3)  Zt     γ γ d ℓ γ γ,t γ,t γ,t γ,t (iii) (W ,B ) is an R × R –valued, F –adapted continuous process; (W ,B ) := (Ws )0≤s≤T , (Bs )0≤s≤T , γ,t γ γ γ,t γ γ γ γ defined by Ws := Ws∨t − Wt , and Bs := Bs∨t − Bt , s ∈ [t,T ], is a standard (F , P )–Brownian motion on γ,t γ γ γ γ γ γ γ γ γ  [t,T ]; B is G –adapted; Ft ∨ σ(W ) is P –independent of GT , and µ = (µs )t≤s≤T (resp. µ = (µs )t≤s≤T ) is a Gγ–predictable P(Cn)–valued (resp. P(Cn × U)–valued) process satisfying

γ Pγ γ γ γ Pγ γ γ γ γ γ µs = L Xs∧· Gs resp. µs = L Xs∧·, αs Gs , for dP ⊗ ds–a.e. (s,ω) ∈ [t,T ] × Ω ;   

5 γ γ γ −1 (iv) X satisfies P ◦ (Xt∧·) = ν(t) and

s s s γ γ γ γ γ γ γ γ γ γ γ γ γ γ Xs = Xt + b(r, X· , µr , αr )dr+ σ(r, X· , µr , αr )dWr + σ0(r, X· , µr , αr )dBr , s ∈ [t,T ], P –a.s., (2.4) Zt Zt Zt where the integrals are implicitly assumed to be well–defined. For all (t, ν) ∈ [0,T ] ×P(Cn), let us denote

ΓW (t, ν) := All weak controls with initial condition (t, ν) .

Then with the reward functions L : [0,T] ×Cn ×P(Cn × U) × U −→ R and g : Cn ×P (Cn) −→ R, we introduce the value function of our McKean–Vlasov optimal control problem by

T Pγ γ γ γ γ γ VW (t, ν) := sup J(t,γ), where J(t,γ) := E L(s,Xs∧·, µs , αs )ds + g(XT ∧·, µT ) . (2.5) γ∈ΓW (t,ν)  Zt  Remark 2.2. In a weak control γ, the filtration Gγ is used to model the common noise. In particular, Bγ,t is adapted γ,t to Gγ, and W γ,t is independent of Gγ. In the classical strong formulation, Gγ is fixed as the filtration FB generated γ,t by Bγ,t, but for a general weak control, Gγ may be larger than FB . This will be the main difference between the strong and weak formulations in our approach. Meanwhile, (Gγ , Fγ) satisfies a (H)–hypothesis type condition in (2.2), which is consistent with the classical strong formulation (see Section 2.2 below). This property will be crucial in our proof of the DPP result for the strong formulation of the control problem, as well as in the limit theory of the McKean–Vlasov control problem in our accompanying paper [34]. Remark 2.3. (i) At this stage, the integrability condition (2.3) could be construed as artificial. Depending on more concrete property of the coefficient functions (b, σ, σ0), it would play the role of an admissibility condition for the control process, and ensure that the stochastic integrals in (2.4) are well–defined. For a more concrete example, consider the case where U = R and u0 = 0. When b, σ, and σ0 are all uniformly bounded, one can choose p = 0 so that all R–valued predictable processes would then be admissible. When σ(t, x, u, ν¯) = u, one may choose p = 2 to ensure that T γ γ the stochastic integral t αs dWs is well–defined and is a square–integrable martingale. It is also possible to consider more general types of integrability conditions, such as R T Pγ γ E Φ Ψ ρ(u0, αs ) ds < ∞,   Zt   for given maps Φ : [0, ∞) −→ [0, ∞) and Ψ :[0, ∞) −→ [0, ∞). This would for instance allow to consider exponential integrability requirements. For the sake of simplicity, we have chosen the condition in (2.3), but insist that it plays no essential role in the proof of the dynamic programming principle. p (ii) The set ΓW (t, ν) could be empty, in which case VW (t, ν)= −∞ by convention. For example, when Cn kxk ν(dx)= ∞, then ΓW (t, ν) = ∅, since (2.3) cannot be satisfied. Nevertheless, ΓW (t, ν) is non–empty under either one of the following conditions (see for instance Djete, Possamaï, and Tan [34, Theorem A.2] for a brief proof) R

n • (b, σ, σ0) are bounded and continuous in (x, ν,u¯ ) and ν ∈P(C );

′ • (b, σ, σ0) are continuous in (x, ν,u¯ ), and with the fixed constant p and other positive constants C, p , pˆ such that ′ ′ n p >p ≥ 1 ∨ pˆ, and p ≥ 2 ≥ pˆ ≥ 0, we have ν ∈Pp′ (C ) and

1 p ′ p ′ p ′ ′ b(t, x, ν,u¯ ) ≤ C 1+ kxk + kx k + ρ(u0,u ) ν¯(dx , du ) + ρ(u0,u) , n   ZC ×U    pˆ p 2 pˆ ′ p ′ p ′ ′ pˆ (σ, σ0)(t, x, ν,u¯ ) ≤ C 1+ kxk + kx k + ρ(u0,u ) ν¯(dx , du ) + ρ(u0,u) , n   ZC ×U    for all (t, x , ν,u¯ ) ∈ [0,T ] ×C n ×P(Cn × U) × U.

6 γ γ γ Remark 2.4. For technical reasons, we introduce an R–valued, F –adapted continuous process A := (As )0≤s≤T by (recall that π : U → [0, 1] is a fixed isomorphic bijection between U and π(U))

s∨t γ γ γ γ γ γ γ As := π(αr )dr, s ∈ [0,T ], so that π αs = lim n As − A , dP ⊗ ds–a.e. on Ω × [t,T ]. (2.6) n→∞ (s−1/n)∨0 Zt n d ℓ  γ γ  We further define a P(C ×C×C ×C )–valued process µ = (µs )0≤s≤T by

γ γ µγ := LP Xγ , Aγ , W γ,Bγ 1 + LP Xγ , Aγ , W γ,Bγ Gγ 1 . (2.7) s s∧· s∧· s∧· {s∈[0,t]}b b s∧· s∧· s∧· s {s∈(t,T ]} The process µγ can be defined to be a Gγ–adapted and Pγ –a.s. continuous process (equipping  P(Cn ×C×Cd ×Cℓ) with b the weak convergence topology). Indeed, by (2.2), we have

b γ Pγ γ γ γ γ γ γ µs = L Xs∧·, As∧·, W ,Bs∧· GT , P –a.e., for all s ∈ (t,T ]. Then by Lemma A.1, µγ can be defined to be Pγ –a.s. continuous  on both [0,t] and (t,T ]. Moreover, using the indepen- γ b γ γ γ dence property between Ft ∨ σ(W ) and GT , we have, P –a.s. γ Pγ bγ γ γ γ γ Pγ γ γ γ γ γ Pγ γ γ γ γ γ lim µr = lim L Xr∧·, Ar∧·, W ,Br∧· GT = L Xt∧·, At∧·, W ,Bt∧· GT = L Xt∧·, At∧·, W ,Bt∧· = µt . rցt rցt      Remarkb 2.5. It is perfectly possible for us to consider a slightly more general class of control problems allowingb for exponential discounting. More precisely, we could have an additional Borel map k : [0,T ] ×Cn ×P(Cn × U) × U −→ R n and consider, for fixed (t, ν) ∈ [0,T ] ×P(C ), the problem of maximising over γ ∈ ΓW (t, ν) the functional

T s T γ γ γ γ γ γ γ P − k(u,X ∧·,µ ,α )du γ − k(u,X ∧· ,µ ,α )du γ γ ˜ t u u u γ γ t u u u J(t,γ) := E e L(s,Xs∧·, µs , αs )ds +e g(XT ∧·, µT ) .  Zt R R  We refrained from working at that level of generality for notational simplicity, but our results extend directly to this context.

2.2 A strong formulation To obtain a strong formulation of the control problem, the usual approach is to consider a fixed probability space, equipped with fixed Brownian motions and fixed Brownian filtrations. In fact, this is equivalent to fix the filtrations, in the weak control γ, to be the Brownian filtrations (see Proposition 2.10 below). We will therefore present the two equivalent definitions one after the other.

2.2.1 Strong formulation as a special case of weak formulation Let us start with the main definition. Definition 2.6. Let (t, ν) ∈ [0,T ] ×P(Cn). A term γ = Ωγ , F γ, Fγ , Pγ, Gγ,Xγ, W γ,Bγ , µγ, µγ , αγ is called a strong γ γ γ control associated with the initial condition (t, ν), if γ ∈ ΓW (t, ν) and the filtrations G and F are P -augmented γ,◦ γ,◦ γ,◦ γ,◦  filtrations of G := (Gs )s∈[0,T ] and F := (Fs )s∈[0,T ], which are defined by

γ γ γ,◦ {∅, Ω }, if 0 ≤ s

B γ γ ΓS(t, ν) := γ ∈ ΓS(t, ν) : α is G –predictable . γ γ,t γ Remark 2.7. For a strong control γ ∈ ΓS(t, ν), the filtration G is generated by B , and F is generated by the initial γ γ,t γ,t γ variable Xt∧· and the Brownian motion W· and B· . Consequently, the control process α is adapted to the filtration γ γ,t γ,t γ,t generated by (Xt∧·, W· ,B· ), and the common noise comes only from B . We will show in Proposition 2.10 that this is equivalent to the case with a fixed probability space equipped with fixed Brownian motions and the initial random variable. We here define the strong control rules as special cases of weak control rules in order to avoid repeating all the technical conditions in Definition 2.1.

7 Our proof of the dynamic programming principle for the strong formulation of the McKean–Vlasov problem relies essentially on its equivalence to the weak formulation, which requires the following standard Lipschitz condition on the coefficient functions. Moreover, this condition ensures the existence and uniqueness of the solution to SDE (2.4) (see e.g. Theorem A.3). Assumption 2.8. Let the constant in (2.3) be p =2. There exists a constant C > 0 such that, for all (t, x, x′, ν,¯ ν¯′,u) ∈ n n n n [0,T ] ×C ×C ×P2(C × U) ×P2(C × U) × U, one has

′ ′ ′ ′ b, σ, σ0 (t, x, ν,u¯ ) − b, σ, σ0 (t, x , ν¯ ,u) ≤ C kx − x k + W2(¯ν, ν¯ ) , (2.8)    2 2 2 2 2 (b, σ, σ0 )(t, x, ν,u¯ ) ≤ C 1+ kxk + kyk + ρ(ˆu,u0) ν¯(dy, dˆu)+ ρ(u,u0) . n  ZC ×U   n Under Assumption 2.8, the set ΓS (t, ν) is nonempty for all (t, ν) ∈ [0,T ] ×P2(C ) (see e.g. Theorem A.3). We then introduce the following strong formulation (resp. B–strong formulation) of the McKean–Vlasov control problem

B VS (t, ν) := sup J(t,γ), and VS (t, ν) := sup J(t,γ). (2.9) γ∈Γ (t,ν) B S γ∈ΓS(t,ν) Remark 2.9 (The case without common noise: ℓ = 0). In the literature, the McKean–Vlasov control problem without common noise has also largely been studied. This contained as a special case in our setting. Indeed, when ℓ = 0, the γ,t γ γ γ process B degenerates to be a singleton and hence Gs = {∅, Ω } for all s ∈ [0,T ]. It follows that µ appearing in (2.4) turns out to satisfy γ Pγ γ γ γ γ µs = L Xs∧·, αs ), for dP ⊗ dt–a.e. (s,ω) ∈ [t,T ] × Ω , and the value function VS (t, ν) in (2.9)is the standard formulation of the control problem without common noise (see e.g. [25]).

2.2.2 Strong formulation on a fixed probability space

n n n For 0 ≤ s ≤ t ≤ T , let Cs,t := C([s,t], R ) denote the space of all R –valued continuous paths on [s,t], we then introduce a first canonical space, for every t ∈ [0,T ]

t n d ℓ t t Ω := C0,t ×Ct,T ×Ct,T , F := B(Ω ), (2.10)

t with corresponding canonical processes ζ := (ζs)0≤s≤t, W := (Ws)t≤s≤T , and B := (Bs)t≤s≤T . Let Ws := Ws∨t − Wt t t,◦ t,◦ t,◦ t,◦ and Bs := Bs∨t − Bt for all s ∈ [0,T ], and define F = (Fs )0≤s≤T and G = (Gs )0≤s≤T by

t t,◦ σ ζs∧· , if 0 ≤ s

t n  t t Pν  t t Let (t, ν) ∈ [0,T ] ×P2(C ), we fix a probability measure Pν on Ω , such that L ζt∧· = ν(t), and (W ,B ) are t t t t t standard Brownian motions on [t,T ], independent of ζ. Let F = (Fs)0≤s≤T and G = (Gs)0≤s≤T be the Pν –augmented t,◦ t,◦ B  filtration of F and G , we denote by A2(t, ν) (resp. A2 (t, ν)) the collection of all U–valued processes α = (αs)t≤s≤T which are Ft–predictable (resp. Gt–predictable) and such that

T Pt 2 E ν ρ(u0, αs) ds < ∞.  Zt  α  Then, given α ∈ A2(t, ν), let X be the unique strong solution (in sense of Definition A.2) of the SDE, with initial α condition Xt∧· = ζt∧·, s s s α α α α α α t α α t t Xs = Xt + b r, Xr∧·, µr , αr dr+ σ r, Xr∧·, µr , αr dWr + σ0 r, Xr∧·, µr , αr dBr, t ≤ s ≤ T, Pν –a.s., (2.11) Zt Zt Zt t    α Pν α t t t where µr = L (Xr∧·, αr) Gr , dPν × dr–a.e. on Ω × [t,T ]. Notice that the existence and uniqueness of a solution α to SDE (2.11) is ensured by Assumption 2.8 (see Theorem A.3). Finally, we denote for any α ∈ A2(t, ν), µs := t  Pν α t L Xs∧· Gs , s ∈ [t,T ]. We next show  that the above strong formulation of the control problem with fixed probability space is equivalent to that in Definition 2.6 as a special case of the weak control rules.

8 n Proposition 2.10. Let Assumption 2.8 hold true. Then for all (t, ν) ∈ [0,T ] ×P2(C ), one has

B VS (t, ν) = sup J(t,ν,α), and VS (t, ν) = sup J(t,ν,α), (2.12) B α∈A2(t,ν) α∈A2(t,ν) where T t Pν α α α α J(t,ν,α) := E L(s,Xs∧·, µs , αs)ds + g(X· , µT ) .  Zt  B Proof. We will only consider the case of VS , since the arguments for the case of VS are exactly the same. First, given α ∈ A2(t, ν), let us define t t t t t α t t α α γ := Ω , F , Pν , F , G ,X , W ,B , µ , µ , α .

Then it is straightforward to check that γ is a strong control rule (i.e. γ ∈ ΓS(t, ν)) such that J(t,γ)= J(t,ν,α). γ γ γ γ γ γ Next, let γ ∈ ΓS(t, ν). Notice that (X , α ) is F –predictable, and (µ , µ ) is G –predictable. Using for instance Claisse, Talay, and Tan [30, Proposition 10] (with a slight extension consisting simply in having a larger F0), there t n ℓ n n exists two Borel measurable functions Ψ1 : [0,T ] × Ω −→ R × U and Ψ2 : [0,T ] ×C −→ P(C ) ×P(C × U) such that

γ γ γ γ,t γ,t γ γ γ,t γ Xs , αs =Ψ1 s,Xt∧·, Ws∧·,Bs∧· , µs , µs =Ψ2(s,Bs∧·), s ∈ [0,T ], P –a.s.

t  ⋆ ⋆ t t  ⋆ ⋆ t Then, on Ω , let us define (Xs , αs):=Ψ1(s,ζt ∧ ·, Ws∧·,Bs∧·) and (µs , µs):=Ψ2(s,Bs∧·), so that

⋆ t ⋆ t t ⋆ ⋆ ⋆ −1 γ γ γ,t γ,t γ γ γ −1 α ∈ A2(t, ν), and Pν ◦ X , W ,B , α , µ , µ = P ◦ X , W ,B , α , µ , µ .

t ⋆  ⋆ ⋆ Pν ⋆ t This implies that X is the unique strong solutions to SDE (2.11) with control α , such that µs = L (Xs∧·|Gs), t ⋆ Pν ⋆ ⋆ t ⋆ µs = L ((Xs∧·, αs )|Gs) and J(t,ν,α )= J(t,γ).

3 The dynamic programming principle

The main results of our paper consist in the dynamic programming principle (DPP) for the previously introduced formulations of the McKean–Vlasov control problem. We will first prove the DPP for the general strong and weak control problems introduced in Section 2, and then show how they naturally induce the associated results in the Markovian case. Finally, we also discuss heuristically the Hamilton–Jacobi–Bellman (HJB for short) equations which can be deduced for each formulation.

3.1 The dynamic programming principle in the general case 3.1.1 Dynamic programming principle for the weak control problem To provide the dynamic programming principle of the McKean–Vlasov control problem (2.5), let us introduce another canonical space

⋆ ℓ n d ℓ ⋆ ⋆ ⋆ ⋆ Ω := C × C [0,T ], P(C ×C×C ×C ) , with canonical process (B , µ ) := (Bs , µs)s∈[0,T ],

G⋆ ⋆  ⋆ ⋆ ⋆ G⋆ and canonical filtration := (Gs )0≤s≤T defined by Gs := σ (µr ,Br ) : r ∈ [0,s] b, s ∈ [0,T ].b Then, for every – ⋆ n stopping time τ (which can then be written as a function of B and µ), for all (t, ν) ∈ [0,T ] ×P(C ) and γ ∈ ΓW (t, ν), γ  we define (recall that µ is defined by (2.7)) b γ ⋆ γ,t γ τ := τ B· , µ· .b (3.1) b n Theorem 3.1. The value function VW : [0,T ] ×P(C ) −→ R ∪ {−∞ , ∞} of the weak McKean–Vlasov control problem (2.5) is upper semi–analytic. Moreover, let (t, ν) ∈ [0,T ] ×P(Cn)b, τ ⋆ be a G⋆–stopping time taking values in [t,T ], and τ γ be defined in (3.1), one has

τ γ Pγ γ γ γ γ γ VW (t, ν) = sup E L(s,Xs∧·, µs , αs )ds + VW τ , µτ γ . (3.2) γ∈ΓW (t,ν)  Zt  

9 3.1.2 Dynamic programming for the strong control problems We now consider the two strong formulations of the control problems introduced in (2.9), or equivalently in (2.12). To formulate the DPP results, we will rather use the fixed probability space context in (2.12). Recall that, given initial n t t t condition (t, ν) ∈ [0,T ] ×P2(C ), a fixed probability space (Ω , F , Pν ) is defined in and below (2.10). Let us first B consider the strong control problem VS . B n Theorem 3.2. Let Assumption 2.8 hold. Then the value function VS : [0,T ] ×P2(C ) −→ R ∪ {−∞, ∞} is upper n t,◦ t t t semi–analytic. Moreover, let (t, ν) ∈ [0,T ] ×P2(C ), and τ be a G –stopping time on (Ω , F , Pν ), taking values in [t,T ], one has τ t B Pν α α B α VS (t, ν) = sup E L(s,Xs∧·, µs , αs)ds + VS τ, µτ . (3.3) α∈AB(t,ν) t 2  Z   For the strong control problem VS , we need some additional regularity conditions on the coefficient functions. Assumption 3.3. For all t ∈ [0,T ], the functions

n n n n×d n×ℓ (b, σ, σ0) : (x, ν,u¯ ) ∈C ×P(C × U) × U 7−→ (b, σ, σ0)(t, x, ν,u¯ ) ∈ R × S × S , are continuous, and there exists a constant C > 0 such that, for all (t, x, u, ν¯) ∈ [0,T ] ×Cn × U ×P(Cn × U),

2 2 2 ′ 2 ′ 2 (L,g)(t, x, ν,u¯ ) ≤ C 1+ kxk + kyk + ρ(u ,u0) ν¯(dy, du )+ ρ(u,u0) . n  ZC ×U   Moreover, the map n n (x, ν,u¯ ) ∈C ×P2(C × U) × U 7−→ (L,g)(t, x, ν,u¯ ) ∈ R × R, is lower semi–continuous for all t ∈ [0,T ].

n t,◦ Theorem 3.4. Let Assumption 2.8 and Assumption 3.3 hold true. Let (t, ν) ∈ [0,T ]×P2(C ), and τ be a G –stopping t t t time on (Ω , F , Pν ) taking values in [t,T ]. Then

VS (t, ν)= VW (t, ν),

n so that the value function VS : [0,T ] ×P2(C ) −→ R ∪ {−∞, ∞} is upper semi–analytic, and one has

τ t Pν α α α VS(t, ν) = sup E L(s,Xs∧·, µs , αs)ds + VS τ, µτ . (3.4) α∈A2(t,ν)  Zt   Remark 3.5. (i) Our results for the dynamic programming for VW and VS in Theorem 3.1 and Theorem 3.4 are new in this general framework. For the result in Theorem 3.2, where the control is adapted to the common noise B, the same DPP result has been obtained in Pham and Wei [66, Proposition 3.1]. However, our result is more general for two reasons. First, we do not require any regularity conditions on the reward functions L and g, thanks to our use of measurable selection arguments. Second, we are able to stay in a generic non–Markovian framework with interaction terms given by the law of both control and controlled processes, while the results of [66] are given in a Markovian context with interaction terms given by the law of controlled process.

(ii) From our point of view, the formulations VW and VS in (2.5) and (2.9) seem to be more natural, because they should be the ones arising naturally as limit of finite population control problems (see Lacker [52] for the case without common noise, and Djete, Possamaï, and Tan [34] for the context with common noise). Indeed, for the problem with a finite population N, when the controller observes the evolution of the empirical distribution of (X1,...,XN ), it is more reasonable to assume that he/she uses the information generated by both (Xt∧·,W,B) (as in the definition of VS ), B rather than just the information from B (as in the definition VS ), to control the system. In this sense, the formulation B VS may not be the most natural strong formulation for McKean–Vlasov control problems with common noise.

Remark 3.6. The DPP result for VS in Theorem 3.4 has been proved under additional regularity conditions, namely the ones given in Assumption 3.3. This should appear as a surprise to readers familiar with the measurable selection approach to the DPP for classical stochastic control problems. We will try here to give some intuition on why, at least if one uses our method of proof, there does not seem to be any way to make do without these aditional assumptions.

Let us consider the classical conditioning argument in the proof of the DPP. Given a control process α := (αs)s∈[t,T ] ∈ t t A2(t, ν), which is adapted to the filtration generated by (Xt∧·, Ws ,Bs)s∈[t,T ], we consider some time to ∈ (t,T ], and the

10 t t filtration G := (Gs)s∈[t,T ], generated by B . Then, under the r.c.p.d. of Pν knowing Gto , the process (αs)s∈[to,T ] will be to to t adapted to the filtration generated by (Xto∧·, Ws ,Bs )s∈[to,T ] together with (Ws )s∈[t,to]. Because of the randomness of t e e e t (Ws )s∈[t,to], we cannot consider (αs)s∈[to,T ] as a ‘strong‘ control process under the r.c.p.d. of Pν knowing Gto . To bypass this difficulty, we will need to use the equivalence result V = V together with the DPP results for V S W e W given by Theorem 3.1. The equivalence result will be proved in our accompanying paper [34] under the integrability and regularity conditions in Assumption 2.8 and Assumption 3.3.

3.2 Dynamic programming principle in the Markovian case With the DPP results in the general non–Markovian context of Theorem 3.1, Theorem 3.2 and Theorem 3.4, we can easily establish the DPP results for the control problems in the Markovian setting. In fact, we will consider a framework which is slightly more general than the classical Markovian formulation, by considering the so–called updating functions, as in Brunick and Shreve [18]. Let E be a non–empty Polish space. A Borel measurable function Φ : Cn −→ C([0,T ], E) is called an updating function if it satisfies n Φt(x)=Φt(x(t ∧ ·)), for all (t, x) ∈ [0,T ] ×C , and for all 0 ≤ s ≤ t ≤ T

x x′ x x′ x x x′ x′ Φr( ) r∈[s,t] = Φr( ) r∈[s,t], whenever Φs( )=Φs( ), and (r) − (s) r∈[s,t] = (r) − (s) r∈[s,t].     The intuition of the updating function Φ is the following: the value of Φt(x) depends only on the path of x up to time t, and for 0 ≤ s

n Φt(x) := x(t), with E = R .

i i t (ii) Let Mt (x) := max0≤s≤t x (s) for i = 1, · · · ,n, t ∈ [0,T ], and At(x) := 0 x(s)ds, t ∈ [0,T ]. Then the running process, together with the running maximum and running average process, is also an example of updating functions R n n n Φt(x) := x(t),Mt(x), At(x) , with E = R × R × R .

Throughout this subsection, we fix an update function Φ. In this context, one can in fact define the value function on [0,T ] ×P(E) under some additional conditions. Givenν ¯ ∈P(Cn × U) (resp. ν ∈P(Cn)), let us consider X (resp. (X, α)) as canonical element on the canonical space Cn (resp. Cn × U), and then define

◦ −1 ◦ −1 [¯ν]t :=ν ¯ ◦ (Φt(X), α) ∈P(E × U) resp. [ν]t := ν ◦ (Φt(X)) ∈P(E) , t ∈ [0,T ].

Assumption 3.8. For a fixed updating function Φ : Cn −→ C([0,T ], E), there exist Borel measurable functions ◦ ◦ ◦ ◦ ◦ n n×d n×ℓ (b , σ , σ0 ,L ,g ):[0,T ] × E × U ×P(E × U) −→ R × S × S × R × R, such that

◦ ◦ ◦ ◦ ◦ ◦ n n b, σ, σ0,L,g (t, x, ν,u¯ )= b , σ , σ0 ,L ,g (t, Φt(x), [¯ν]t ,u), for all (t, x, u, ν¯) ∈ [0,T ] ×C × U ×P(C × U).

Let t ∈ [0,T ], and ν◦ ∈P(E), we define first the following sets

◦ n ◦ ◦ V(t, ν ) := ν ∈P(C ) : [ν]t = ν ,

◦ ◦ ◦ ◦  B,◦ ◦ B ΓW (t, ν ) := ΓW (t, ν), ΓS(t, ν ) := ΓS(t, ν), ΓS (t, ν ) := ΓS(t, ν), ◦ ◦ ◦ ν∈V[(t,ν ) ν∈V[(t,ν ) ν∈V[(t,ν ) as well as the value functions, with J(t,γ) defined in (2.5),

◦ ◦ ◦ ◦ B,◦ ◦ VW (t, ν ) := sup J(t,γ), VS (t, ν ) := sup J(t,γ) and VS (t, ν ) := sup J(t,γ). γ∈Γ◦ (t,ν◦) γ∈Γ◦ (t,ν◦) B,◦ ◦ W S γ∈ΓS (t,ν )

11 ◦ ◦ Remark 3.9. When the updating function is the running process given by Φt(x) := x(t), the problems VW , VS and B,◦ VS are of course exactly the classical Markovian formulation of the control problems. n n Lemma 3.10. Let Assumption 3.8 hold true, and fix some t ∈ [0,T ]. Then, for any (ν1, ν2) ∈ P(C ) ×P(C ) such ◦ ◦ that [ν1]t = [ν2]t , one has

B B VW (t, ν1)= VW (t, ν2), VS(t, ν1)= VS(t, ν2) and VS (t, ν1)= VS (t, ν2). Consequently, for all ν ∈P(Cn), one has

◦ ◦ ◦ ◦ B B,◦ ◦ VW (t, ν)= VW (t, [ν]t ), VS(t, ν)= VS (t, [ν]t ), and VS (t, ν)= VS (t, [ν]t ).

B Proof. We will only consider the equality for VW , the arguments for VS and VS will be the same. First, we can n consider ν2 as a probability measure defined on the canonical space C with canonical process X, and containing n n the random variable Zt := Φt(X). Then, on (a possible enlarged) probability space (C , B(C ), ν2), there exists a Borel measurable function ψ : E × [0, 1] −→ Cn, together with a random variable η with uniform distribution −1 −1 on [0, 1], which is independent of Zt, such that ν2 ◦ Zt,X· = ν2 ◦ Zt, ψ(Zt, η) . Next, consider an arbitrary γ := Ω1, F 1, P1, F1, G1,X1, W 1,B1, µ1, µ1, α1 ∈ Γ (t, ν ). Without loss of generality (that is up to enlargement of 1 W 1   the space), we assume that there exists a random variable η with uniform distribution on [0, 1] in the probability space 1 1 1  1 1 1 1 1 1 (Ω , F0 , P ), and which is independent of the random variables (X , W ,B , µ , µ , α ). 2 1 1 1 1 −1 ◦ ◦ We then define γ as follows. Let Zs := Φs(X ), for all s ∈ [0,T ], so that, by definition, P ◦ Zt = [ν1]t = [ν2]t . Next, let 1  ψs Z , η , if s ∈ [0,t], 2 t Xs := X2+ X1− X1, if s ∈ (t,T ].  t s t It follows by the properties of ψ and those of the updating function Φ that 1 2 2 1 P ◦ Xt∧· = ν2(t), and Φs(X )=Φs(X ), s ∈ [t,T ]. (3.5)

2 P1 2 1 1 2 P1 2 1 1 1 1 1 1 2 1 1 2 2 1 Let µs := L (Xs∧·, αs )|Gs , µs := L Xs∧·|Gs , for s ∈ [t,T ], and γ2 := Ω , F , F , P , G ,X , W ,B , µ , µ , α . Using Assumption 3.8 and (3.5), we have γ ∈ Γ (t, ν ) and J(t,γ )= J(t,γ ), implying V (t, ν )= V (t, ν ).  2 W 2 2 1 W 1 W 2  Now we provide the dynamic programming principle for the Markovian control problem under Assumption 3.8. Corollary 3.11. Let Assumption 3.8 hold true, t ∈ [0,T ] and ν◦ ∈P(E). Let τ ⋆ be a G⋆–stopping time taking values ◦ γ ⋆ t,◦ in [t,T ] on Ω and (τ ) ◦ ◦ be defined from τ as in (3.1), and τ be a G –stopping time taking values in [t,T ] γ∈ΓW (t,ν ) on Ωt. Then one has the following dynamic programming results. ◦ γ γ (i) The function VW : [0,T ] ×P(E) −→ R ∪ {−∞, ∞} is upper semi–analytic and, with Zs := Φs(X· ),

τ γ ◦ ◦ Pγ ◦ γ γ ◦ γ ◦ γ γ ◦ VW (t, ν ) = sup E L s,Zs , [µ ]s, αs ds + VW τ , [µ ]τ γ . (3.6) γ∈Γ◦ (t,ν◦) t W  Z    B,◦ (ii) Let Assumption 2.8 hold true, then VS : [0,T ] ×P(E) −→ R ∪ {−∞, ∞} is upper semi–analytic, and with α α Zs := Φs(X· ), one has τ t B,◦ ◦ Pν α α ◦ B,◦ α ◦ VS (t, ν ) = sup E L s,Zs , [µ ]s , αs ds + VS τ, [µ ]τ . (3.7) α∈AB(t,ν◦) t 2  Z    ◦ ◦ ◦ ◦ α α (iii) Let Assumption 2.8 and Assumption 3.3 hold, then VS (t, ν )= VW (t, ν ), and with Zs := Φs(X· ), τ ◦ ◦ α α ◦ ◦ α ◦ VS (t, ν ) = sup E L s,Zs , [µ ]s, αs ds + VS τ, [µ ]τ . (3.8) ◦ α∈A2(t,ν )  Zt    B Proof. We will only consider the case VW , the arguments for VS and VS will be the same. ◦ n ◦ ◦ n Let JVK := {(t,ν,ν ) ∈ [0,T ] × P(C ) ×P(E) : [ν]t = ν }. Notice that Φ : C −→ C([0,T ], E) is Borel, then ◦ n ◦ ◦ (t, ν) 7−→ [ν]t is also Borel, and hence JVK is a Borel subset of [0,T ] ×P(C ) ×P(E). Further, one has VW (t, ν ) =

12 sup(t,ν,ν◦)∈JVK VW (t, ν) from Lemma 3.10, and VW is upper semi–analytic by Theorem 3.1. It follows by the measurable ◦ ◦ ◦ ◦ selection theorem (e.g. [37, Proposition 2.17]) that VW : (t, ν ) ∈ [0,T ] ×P(E) −→ VW (t, ν ) ∈ R ∪ {−∞, ∞} is also upper semi–analytic. Finally, using the DPP results in Theorem 3.1, it follows that

τ γ ◦ ◦ Pγ γ γ γ γ γ VW (t, ν ) = sup VW (t, ν) = sup sup E L s,Xs∧·, µs , αs ds + VW τ , µτ γ ◦ ◦ ν∈V(t,ν ) ν∈V(t,ν ) γ∈ΓW (t,ν)  Zt  τ γ   Pγ ◦ γ γ ◦ γ ◦ γ γ ◦ = sup sup E L s,Zs , [µ ]s , αs ds + VW τ , [µ ]τ γ ◦ ν∈V(t,ν ) γ∈ΓW (t,ν)  Zt  τ γ   Pγ ◦ γ γ ◦ γ ◦ γ γ ◦ = sup E L s,Zs , [µ ]s , αs ds + VW τ , [µ ]τ γ . γ∈Γ◦ (t,ν) t W  Z   

3.3 Discussion: from dynamic programming to the HJB equation One of the classical applications of the DPP consists in giving some local characterisation of the value function, such as in proving that it is the viscosity solution of an HJB equation. This was achieved in Pham and Wei [67] for ◦ ◦,B the control problem VS in the setting with σ0 ≡ 0), and in Pham and Wei [66] for the control problem VS (for n Φt(x) := x(t), with E = R ). It relies essentially on the notion of differentiability with respect to probability measures due to Lions (see e.g. [59] and Cardaliaguet’s notes [21, Section 6]), and It¯o’s formula along a measure (see e.g. Carmona and Delarue [23, Proposition 6.5 and Proposition 6.3]). We will now provide some heuristic arguments to B,◦ ◦ derive the HJB equation from our DPP results for both VS and VS , with updating function Φt(x)= x(t). n Let us first recall briefly the notion of the derivative, in sense of Fréchet, ∂ν V (ν) for a function V : P2(R ) −→ R. n Consider a probability space (Ω, F, P) rich enough so that, for any ν ∈ P2(R ), there exists a random variable Z : Ω −→ Rn such that LP(Z)= ν. We denote by L2(Ω, F, P) the space of square–integrable random variables on (Ω, F, P). n 2 n P Let V : P2(R ) −→ R, we consider V : L (Ω, F, P) −→ R , the lifted version of V , defined by V (X) := V (L (X)). Recall that V is said to be continuously Fréchet differentiable, if there exists a unique continuous application DV : L2(Ω, F, P) −→ L2(Ω, F, P), such that,e for all Z ∈ L2(Ω, F, P), e e e V (Z + Y ) − V (Z) − E Y ⊤DV (Z) lim , kY k2→0 kY k2   e e e 2 1/2 2 1 where kY k2 := E[|Y | ] for any Y ∈ L (Ω, F, P). We say that V is of class C if V is continuously Fréchet differentiable, n 2 P and denote for any ν ∈ P2(R ), ∂ν V (ν)(Z) := DV (Z), P–a.s., for any Z ∈ L (Ω, F, P) such that L (Z) = ν. Notice n n 2 n n that one has ∂ν V (ν) : R ∋ y 7−→ ∂ν V (ν)(y) ∈ R and this function belongse to L (R , B(R ), ν). Besides, the law n n of DV (Z) is independent of the choice of Z. Similarly,e we also define the derivatives P2(R ) × R ∋ (ν,y) 7−→ n ∂y∂ν V (ν)(y) ∈ S and e n n n ′ 2 ′ ′ n P2(R ) × R × R ∋ (ν,y,y ) 7−→ ∂ν V (ν)(y,y ) := ∂ν ∂ν V (ν)(y) (y ) ∈ S . In the following, we say V is a ‘smooth function‘, if all the above Fréchet derivatives are well defined and are continuous.

3.3.1 HJB equation for the common noise strong formulation B,◦ Let us consider the control problem VS and repeat the arguments in [66] in a heuristic way. Given a “smooth function” n n B V : [0,T ] ×P2(R ) −→ R, (t, ν) ∈ [0,T ] ×P2(R ), and γ ∈ ΓS(t, ν), it follows from Itô’s formula that, for s ∈ [t,T ], s γ γ γ γ γ γ γ V s, µs = V t, ν + ∂tV r, µr + ∂ν V r, µr (y) · b r, y, µr ⊗ δαr , αr µr (dy)dr n Zt ZR  1 s     γ ⊤ ⊤ γ γ γ γ + Tr ∂x∂ν V r, µr (y) σ σ + σ0 σ0 (r, y, µr ⊗ δαr , αr ) µr (dy)dr 2 n Zt ZR 1 s     2 γ ′ ⊤ γ γ γ ′ γ γ γ γ γ ′ + Tr ∂ν V r, µr (y,y )σ0 (r, y, µr ⊗ δαr , αr )σ0(r, y , µr ⊗ δαr , αr ) µr (dy)µr (dy )dr 2 n 2 Zt Z(R ) s    γ γ γ γ γ + ∂ν V r, µr (x) · σ0(r, y, µr ⊗ δαr , αr )µr (dy)dBr. (3.9) n Zt ZR ×U 

13 ◦,B γ As γ ∈ ΓS (t, ν), for Lebesgue–almost every r ∈ [t,T ], αr is a measurable function of (Bu − Bt)u∈[t,r]. Considering piecewise constant control process, αγ would be a deterministic constant on a small time horizon [t,t + ε]. By replacing B,◦ V in (3.9) by VS and taking supremum as in DDP (3.3) (but over constant control processes), this leads to the Hamiltonian

B 1 2 ′ ′ H [V ] t, ν := sup L + [V ] t,y,ν ⊗ δu,u ν(dy)+ [V ] t,y,u,y , ν ⊗ δu,u ν(dy)ν(dy ) , n n 2 u∈U  ZR Z(R )       where for any (r,y,u,y′,u′, ν¯) ∈ [0,T ] × Rn × U × Rn × U ×P(Rn × U) 1 [V ]1(r, y, ν,u¯ ) := ∂ V (r, ν)(y) · b(r, y, ν,u¯ )+ Tr ∂ ∂ V (r, ν)(y)(σ⊤σ + σ⊤σ )(r, y, ν,u¯ ) , ν 2 x ν 0 0   and 1 [V ]2(r,y,u,y′,u′, ν¯) := Tr ∂2V r, ν (y,y′)σ⊤(r, y, ν,u¯ )σ (r, y′, ν,u¯ ′) . 2 ν 0 0 Heuristically, V B,◦ should satisfy the HJB equation  

B,◦ B B,◦ n B,◦ −∂tVS (t, ν) − H VS t, ν =0, (t, ν) ∈ [0,T ) ×P2(R ), VS (T, ·)= g(·). We refer to [66] for a detailed rigorous proof of the fact that V B,◦ is a viscosity solution of the above HJB equation under some technical regularity conditions.

3.3.2 HJB equation for the general strong formulation

◦ ◦ Similarly, for the control problem VS , we consider a strong control rule γ ∈ ΓS(t, ν), where for Lebesgue–almost every γ r ∈ [t,T ], the control process αr is a measurable function of both (Bu − Bt)u∈[t,r], (Wu − Wt)u∈[t,r] and Xt∧·. As γ γ,t γ,t the control process α is adapted to the filtration generated by (Xt∧·, W ,B ), by considering adapted piecewise constant control processes, the control process on the first small interval [t,t + ε) should be a measurable function of Xt∧·. Similarly to Pham and Wei [67] in a non–common noise setting and by considering the Itô formula (3.9), this would formally lead to the Hamiltonian

H[V ] t, ν := sup L + [V ]1 t,y,ν ◦ (ˆa)−1,a(y) ν(dy)+ [V ]2 t,y,a(y),y′,a(y′), ν ◦ (ˆa)−1 ν(dy)ν(dy′) , L2 Rn Rn 2 a∈ ν  Z Z( )      n n 2 n n where aˆ : R ∋ x 7−→ (x, a(x)) ∈ R × U, and Lν of all ν–square integrable functions a : (R , B(R ), ν) −→ U. ◦ Heuristically, VS should be a solution of the HJB equation

◦ ◦ n B,◦ −∂tVS (t, ν) − H[VS ] t, ν =0, (t, ν) ∈ [0,T ) ×P2(R ), VS (T, ·)= g(·).

 B,◦ ◦ As explained above, the difference between the HJB equations for VS and VS comes mainly from the fact that the γ ◦ control process α , for γ ∈ ΓS(t, ν), depends on the initial random variable condition, which in turn modifies the Hamiltonian function which appears in the PDE. Finally, we also refer to Wu and Zhang [79] for a discussion of the McKean–Vlasov control problem in a non–Markovian framework without common noise.

4 Proofs of the main results

We now provide the proofs of our main DPP results in Theorems 3.1, 3.2 and 3.4, where a key ingredient is the measurable selection argument. We will first reformulate the control problems on an appropriate canonical space in Section 4.1, and then provide some technical lemmata for the problems formulated on the canonical space in Section 4.2, and finally give the proofs of the main results themselves in Section 4.3.

4.1 Reformulation of the control problems on the canonical space 4.1.1 Canonical space In order to prove the dynamic programming results in Section 3, we first reformulate the controlled McKean–Vlasov SDE problems on an appropriate canonical space. This is going to be achieved by the usual way, that is to say by

14 considering appropriately defined controlled martingale problems. Recall that n, d and ℓ are the dimensions of the spaces in which X, W and B take values, U = U ∪{∂} and that π−1 maps R ∪ {∞, −∞} to U. Let us introduce a first canonical space by

n d ℓ −1 Ω := C ×C×C ×C with canonical process (X, A, W, B), and αt := π lim n At − A0∨(t−1/n) , t ∈ [0,T ]. n→+∞   Denoteb by C([0,T ], P(Ω)) be the space of all continuousb b c pathsb on [0b,T ] taking values inbP(Ω)b, which is a Polish space for the uniform topology (see e.g. [2, Lemmata 3.97, 3.98, and 3.99]), we introduce a second canonical space by b b Ω := Ω × C [0,T ], P(Ω) , with canonical process (X, A, W, B, µ) and canonical filtration F = (F t)0≤t≤T .  Let F := B(bΩ) be the Borelbσ–field on Ω. Notice that for any t ∈ [0b,T ], µt is a probability measure on Ω, we then define two processes µ = (µt)0≤t≤T and µ = (µt)0≤t≤T on Ω by b −1 b −1 µt := µt ◦ Xt∧·, αt , µt := (µt) ◦ Xt∧· .   Define also the U–valued process α = (αt)b0≤t≤T bon Ωbby b b

−1 αt := π lim n At − A0∨(t−1/n) , t ∈ [0,T ]. n→+∞   t t  t t Finally, for any t ∈ [0,T ], we introduce the processes W := (Ws )s∈[0,T ] and B := (Bs)s∈[0,T ] by

t t Bs := Bs∨t − Bt, and Ws := Ws∨t − Wt, s ∈ [0,T ],

t t and the filtration G := (Gs)0≤s≤T by

t {∅, Ω}, if s ∈ [0,t), Gs := t (4.1) (σ (Br, µr) : r ∈ [0,s] , if s ∈ [t,T ].  4.1.2 Controlled martingale problems on theb canonical space We now reformulate the strong/weak control problem as a controlled martingale problem on the canonical space Ω, where a control (term) can be considered as a probability measure on Ω. To this end, let us first introduce the corresponding n d ℓ n generator. Given the coefficient functions b, σ and σ0, for all (t, x, w, b, ν,u¯ ) ∈ [0,T ] ×C ×C ×C ×P(C × U) × U, let ¯b t, (x, w, b), ν,u¯ := b, 0d, 0ℓ (t, x, ν,u¯ ), (4.2) and   ⊤ σ σ0 σ σ0 a¯ t, (x, w, b), ν,u¯ := Id×d 0d×ℓ Id×d 0d×ℓ (t, x, ν,u¯ ), (4.3) 0 I  0 I   ℓ×d ℓ×ℓ ℓ×d ℓ×ℓ 2 n+d+ℓ    and then introduce the generator L, for all ϕ ∈ Cb (R ),

n+d+ℓ 1 n+d+ℓ L ϕ x, w, b, ν,u¯ := ¯b (t, (x, w, b), ν,u¯ )∂ ϕ(x , w , b )+ a¯ (t, (x, w, b), ν,u¯ )∂2 ϕ(x , w , b ). t i i t t t 2 i,j i,j t t t i=1 i,j=1  X X

We next define a process |S| = |S|t 0≤t≤T by

 t ¯ |S|t := |b| + |a¯| (s,X,W,B, µs, αs)ds, Z0  2 n+d+ℓ ϕ ϕ and then, for all ϕ ∈ Cb (R ), let S = (St )t∈[0,T ] be defined by

t ϕ St := ϕ(Xt, Wt,Bt) − Lsϕ X, W, B, µs, αs ds, t ∈ [0,T ], (4.4) Z0  15 t t + t − where for φ : [0,T ] → R, 0 φ(s)ds := 0 φ (s)ds − 0 φ (s)ds with the convention ∞ − ∞ = −∞. Notice that on ϕ ϕ {|S| < ∞}, the process S is R–valued. To localise the process S , we also introduce, for each m ≥ 1, T R R R ϕ ϕ ϕ ϕ,m 1 1 τm := inf t : |S|t ≥ m , and St := St∧τm = St {τm≥t} + Sτm {τm

 + ϕ,m Notice that the process |S| is left–continuous, τm is a F –stopping time on Ω, and S is an F–adapted uniformly bounded process. Definition 4.1. Let (t, ν) ∈ [0,T ] ×P(Ω). A probability P on (Ω, F) is called a weak control rule with initial condition (t, ν) if b (i) the process α = (αb) satisfies b s t≤s≤T T P p P αs ∈ U =1, for Lebesgue–a.e. s ∈ [t,T ], and E ρ(u0, αs) ds < ∞;  Zt     (ii) the process µ = (µs)0≤s≤T satisfies

t −1 GT −1 µbs = Pb◦ (Xs∧·, As∧·,W,Bs∧·) 1{s∈[0,t]} + P ◦ (Xs∧·, As∧·,W,Bs∧·) 1{s∈(t,T ]}, P − a.s. (4.6)

−1 with P ◦ (Xbt∧·, At∧·, Wt∧·,Bt∧·) = ν(t); ϕ (iii) EP[kXkp] < ∞, P[|S| < ∞]=1, the process S is an (F, P)–local martingale on [t,T ], for all ϕ ∈ C2 Rn × Rd × T b b Rℓ . Given ν ∈P(Cn), we denote by V(ν) the collection of all probability measures ν ∈P(Ω) such that ν ◦ X−1 = ν, and let

PW (t, ν) := All weak control rules P with initial condition (t, ν) , andb PW (bt, ν) := b PWb (t, ν).  ν∈V[(ν) b b b b b n t Remark 4.2. Let P ∈ PW (t, ν) for some t ∈ [0,T ] and ν ∈ P(C ). Notice that for s ∈ (t,Tb], µs is Gs–measurable, then by (4.6), one has t Gs −1 µs = P ◦ (Xs∧·, As∧·,W,Bs∧·) , P–a.s. b

Further, as the canonical process (µs)s∈[0,T ] is continuous, it follows that b P P t P t L Xt∧·, At∧·,W,Bt∧· = µt = limb µs = lim L Xs∧·, As∧·,W,Bs∧· GT = L Xt∧·, At∧·,W,Bt∧· GT , P–a.s. sցt sցt      b b t This implies that Ft ∨ σ(W ) = σ(Xt∧·, At∧·,W,Bt∧·) is independent of GT , which is consistent with the conditions in Definition 2.1.

Finally, under P, (µs)s∈[0,t] is completely determined by ν(t). More precisely, one has

′ ′ ′ ′ P t ⋆ ′ ′ ′ ⋆ ′ µt(dx, da, dwb, db)= δ(x ,a ,w ⊕tw ,b )(dx, da,b dw, db) ν(t)(dx , da , dw , db )L W (dw ), P–a.s. d ZΩ×C  where Wb t is a (F, P)–Brownian motion on [t,T ], by the martingaleb problem property in Definition 4.1. b n Definition 4.3. Let (t, ν) ∈ [0,T ] ×P2(C ). A probability P on (Ω, F) is called a strong control rule (resp. B–strong control rule) with initial condition (t, ν), if P ∈ PW (t, ν) and moreover there exists some Borel measurable function t ℓ φ : [0,T ] × Ω −→ U resp. φ : [0,T ] ×Ct,T −→ U such that

t t  t αs = φ s,Xt∧·, Ws∧·,Bs∧· resp. φ s,Bs∧· , P–a.s., for all s ∈ [t,T ].

B   Let us then denote by PS(t, ν) (resp. PS(t, ν)) the collection of all strong (resp. B–strong) control rules with initial condition (t, ν).

16 4.1.3 Equivalence of the reformulation We now show that every strong/weak control (term) induces a strong/weak control rule on the canonical space, and any strong/weak control rule on the canonical space can be induced by a strong/weak control (term). n Lemma 4.4. (i) Let t ∈ [0,T ] and ν ∈P(C ). Then for every γ ∈ ΓW (t, ν), one has

γ γ γ γ γ γ γ −1 P := P ◦ X , A , W ,B , µ ∈ PW (t, ν). (4.7)

 γ γ γ γ γ γ −1 Conversely, given P ∈ PW (t, ν), there exists some γ ∈ ΓW (t, ν) suchb that P ◦ X , A , W ,B , µ = P. n B (ii) Let t ∈ [0,T ], ν ∈P2(C ), and Assumption 2.8 hold true. Then for every γ ∈ ΓS(t, ν) resp. ΓS(t, ν) , one has b γ γ γ γ γ γ γ −1 B  P := P ◦ X , A , W ,B , µ ∈ PS(t, ν) resp. PS(t, ν) . (4.8)

B P   B Pγ Conversely, given ∈ PS(t, ν) resp. PS(t, ν) , thereb exists some γ ∈ ΓS(t, ν) resp. ΓS(t, ν) such that ◦ γ γ γ γ γ −1 X , A , W ,B , µ = P.   γ −1 Proof. (i) First, let γ ∈ Γ (t, ν) and P := Pγ ◦ Xγ, Aγ , W γ,Bγ , µγ . First, it is straightforward to check that b W

γ  γ T γ P p P p P [αs ∈ U]=1, for dt–a.e. s ∈ [t,T ], E kXk < ∞b and E (ρ(u0, αs)) ds < ∞.  Zt    γ Further, as the integrals in (2.4) are well defined, one has |S|T < ∞, P –a.s. Moreover, by Itô’s formula, the process ϕ F Pγ 2 Rn Rd Rℓ Ss s∈[t,T ] defined in (4.4) is an ( , )–local martingale, for every ϕ ∈ Cb × × . t,γ γ γ ℓ Next, notice that B and µ are adapted to G , one has, for all (s,β,ψ) ∈ (t,T ] × Cb Ω × Cb C × C([0,T ]; P(Ω)) ,

γ γ γ γ EP t EP γ γ,t γ EP P γ γ γ  γ γ γ,t γ  hβ, µsiψ BT ∧·, µT ∧· b= hβ, µs iψ BT ∧·, µT ∧· = hβ, L Xs∧·, As∧·, W b,Bs∧· GT iψ BT ∧·, µT ∧· b γ γ   = EP β Xγ , A γ , W γ ,Bγ ψ Bγ,t , µγ =EP β X , A ,W,B  ψ Bt , µ  s∧· s∧· s∧· T ∧· T ∧· s∧· s∧· s∧· T ∧· T ∧· b b γ b γ b b P P t t = E  β, L Xs∧·, As∧·,W,B s∧· GT ψ BT ∧·, µT ∧· .   b b γ P  γ γ t γ    This implies that µs = L Xs∧·, As∧·, W ,Bs∧· GT , P –a.s. for all s ∈ (t,T ]. By the same argument and using the bγ γ γ γ P fact that Ft ∨ σ(W ) is independent of GT , one can easily check that µs = L (Xs∧·, As∧·, W·,Bs∧·) for s ∈ [0,t], and γ −1 γ that P ◦ Xt∧· = νb. This implies that P ∈ PW (t, ν). γ γ b Assume in addition that γ ∈ ΓS(t, ν) so that α is F –predictable. Then there exists a Borel measurable function t γ γ γ,t γ,t γ φ : [t,T ] × Ω −→ U such that αs = φ s,Xt∧·, Ws∧·,Bs∧· , for all s ∈ [t,T ], P –a.s. (see e.g. Claisse, Talay, and Tan t t γ [30, Proposition 10]). This implies that αs = φ s,Xt∧·, Ws∧·,Bs∧· , P –a.s. for all s ∈ [t,T ], and it follows that γ  P ∈ P (t, ν). S  n (ii) Let P ∈ PW (t, ν) for some ν ∈P(C ). By Stroock and Varadhan [77, Theorem 4.5.2], one knows that (W, B) are (F, P)–Brownian motions on [t,T ], and s s s Xs = Xt + b(r, X·, µr, αr)dr + σ(r, X·, µr, αr)dWr + σ0(r, X·, µr, αr)dBr, P–a.s., Zt Zt Zt t Moreover, with the filtration G defined in (4.1), and in view of Remark 4.2, it is straightforward to check that

t γ := Ω, F, F, P, G ,X,W,B, µ, µ, α ∈ ΓW (t, ν). t t  If, in addition, P ∈ PS(t, ν), so that A is a (σ Xt∧r∧·, Wr∧·,Br∧· )r∈[t,T ]–adapted continuous process. Using Corol- lary A.4 and the fact that µ = LP (X , A ,W,B ) Bt , µ , P–a.s., for all s ∈ [t,T ], one can deduce that s s∧· s∧· s∧· r∈[t,s] s∧· P t P G˜ t t F˜t µs = L (Xs∧·, As∧·,W,Bs∧·) Br∈[t,s] , –a.s., for all s ∈ [t,T ]. Let be the filtration generated by B , be b t t ˜ t,P ˜t,P b the filtration generated by (X t∧·, W ,B ), and G , F be the corresponding P–augmented filtrations. Then µ is

Gb˜ t,P–predictable, and X is F˜t,P–predictable. Then it follows that b ′ t,P t,P t t γ := Ω, F, F˜ , P, G˜ ,X,W ,B , µ, µ, α ∈ ΓS(t, ν).  17 B B (iii) Finally, the results related to PS(t, ν) and ΓS(t, ν) can be deduced by almost the same arguments as for PS(t, ν) and ΓS(t, ν).

B Remark 4.5. From Lemma 4.4, we can easily deduce that under Assumption 2.8, for P ∈ PS (t, ν) or P ∈ PS (t, ν), the canonical process µ satisfies

P t P t µs = bL (Xs∧·, As∧·,W,Bs∧·) Bs∧· = L (Xs∧·, As∧·,W,Bs∧·) B , P–a.s., for all s ∈ [0,T ]. A direct consequence of Lemma 4.4 is that we can now reformulate equivalently  the weak/strong formulation of the b McKean–Vlasov control problem on the canonical space. Corollary 4.6. Let (t, ν) ∈ [0,T ] ×P(Cn), one has

T P VW (t, ν) = sup J(t, P), where J(t, P) := E L s,X, µs, αs ds + g X, µT . (4.9) t P∈PW (t,ν)  Z    n Moreover, when Assumption 2.8 holds true and ν ∈P2(C ), one has

B VS (t, ν) = sup J(t, P), and VS (t, ν) = sup J(t, P). B P ∈PS (t,ν) P∈PS (t,ν)

4.2 Technical lemmata

We provide in this section some technical results related to the sets PW (t, ν). Lemma 4.7. Both graph sets b b

JPW K := (t, ν, P) : P ∈ PW (t, ν) and JPW K := (t,ν, P) : P ∈ PW (t, ν)

  n are analytic subsets of, respectively,b b[0,T ] ×P(Ω)b ×Pb(Ω) and [0,T ] ×P(C ) ×P(Ω). Moreover, the value function

n VW : [0,Tb ] ×P(C ) −→ R ∪ {−∞, ∞}, is upper semi–analytic.

2 n d ℓ ℓ Proof. For 0 ≤ r ≤ s ≤ T , m ≥ 1, χ ∈ Cb(Ω), ϕ ∈ Cb (R × R × R ), φ ∈ Cb(Ω), ψ ∈ Cb(C × C([0,T ]; P(Ω))), we define ξr∧· := χ Xr∧·, Ar∧·, Wr∧·,Br∧·, µr∧· , b b and the following Borel measurable subsets of [0,T ] ×P(Ω) ×P(Ω):  b T T 1 P b p P p K := (t, ν, P) : P[αθ ∈ U]dθ = T − t, E kXk < ∞, E (ρ(u0, αθ)) dθ < ∞ ,  Zt  Zt   2,m P EP ϕ,m EP ϕ,m   Kr,s [χ, ϕ] := (t, ν,b ) : Sr ξr∧· = Ss ξr∧· , 3 P P P K [φ] := (t, ν, P) : E  hφ, µt∧si− E φ(X[t∧s]∧· , A[t∧s]∧·,W,B[t∧s]∧·) = E hφ, µt(t)i − hφ, ν(t)i =0 , s b n o K4[φ, ψ] := (t, ν, P) : EPh φ, µ iψ(Bt, µ ) = EP φ(X , A ,W,B  )ψ (Bt, µ) .  s b bt∨s [t∨s]∧· [t∨s]∧· [t∨s]∧· b b n    o 1 The above Borel measurableb sets allowb to characteriseb the graph set JPW K. Indeed, K containsb the on 2,m Ω such that the canonical element α takes its values in U and not in U ∪{∂}, Kr,s [χ, ϕ] reduces the set P(Ω) to the set of probabilities on Ω that solves a (local) martingale problem, whileb the probabilities which satisfy the ‘fixed point property‘, i.e. the canonical process µ is equal to the conditional distribution of canonical process (X,A,W,B), are 3 4 contained in Ks [φ] and Ks [φ, ψ]. Let us consider a countable dense subsetb X of r,s,m,χ,ϕ,φ,ψ in

2 2 n d ℓ  ℓ [0,T ] × N × Cb(Ω) × Cb (R × R × R ) × Cb(Ω) ×Cb(C × C([0,T ]; P(Ω))),

b b 18 where 0 ≤ r ≤ s ≤ T . By the above remarks, it is straightforward to check that

1 2,m 3 4 5 JPW K = K [h] ∩ Kr,s [χ, ϕ] ∩ Ks [φ] ∩ Ks [φ, ψ] ∩ Ks [φ, ψ] , X \   b and hence it is a Borel subset of [0,T ]×P(Ω)×P(Ω). Furthermore, since P(Ω) ∋ ν 7−→ ν ◦(X)−1 ∈P(Cn) is continuous, the set −1 JPW K = b(t,ν, P) : (t, ν, P) ∈ JPW K, ν ◦ (Xb) =b ν , b is an analytic subset of [0,T ] × P(Cn) × P(Ω). Finally, use the (analytic) measurable selection theorem (see e.g. b El Karoui and Tan [37, Proposition 2.17]), it follows thatb b

VW (t, ν) = sup J(t, P), (t,ν,P)∈JPW K is upper semi–analytic as desired.

We next prove a stability result w.r.t. the “conditioning” of PW (t, ν).

t t G Lemma 4.8. Let (t, ν) ∈ [0,T ] ×P(Ω), P ∈ P (t, ν), τ¯ be a G –stopping time taking values in [t,T ], and P τ¯ W b b ω¯ ω¯∈Ω t be a family of r.c.p.d. of P knowing Gτ¯. Then  b b b t b Gτ¯ Pω¯ ∈ PW τ¯(¯ω), µτ¯(¯ω)(¯ω) , for P–a.e. ω¯ ∈ Ω.

t  Gτ¯ Proof. Let P ∈ PW (t, ν). First, it is easy tob check thatb for P–a.e. ω¯ ∈ Ω, one has Pω¯ [αs ∈ U]=1, for Lebesgue–almost t G τ¯ T p Pω¯ every s ∈ [¯τ(¯ω),T ], and E τ¯(¯ω) ρ(u0, αs) ds < ∞. b b  R n   d ℓ ℓ t Next, notice that for all s ∈ [0,T ], β ∈ Cb(C ×C×C ×C ), ψ ∈ Cb(C × C([0,T ], P(Ω))) and Z ∈ Gτ¯,

P τ¯ P τ¯ E hβ, µsiψ(B , µ)1Z∩{τ¯≤s} = E β(Xs∧·, As∧·,W,Bs∧·)ψ(B , µ)b1Z∩{τ¯≤s} , so that, for P–a.e. ω¯ ∈ Ω and any τ¯(¯ω) ≤ s ≤ T ,    b b b Gt Gt τ¯ τ¯ Pω¯ τ¯ Pω¯ τ¯ E hβ, µsiψ(Bs∧·, µs∧·) = E β(Xs∧·, As∧·,W,Bs∧·)ψ(Bs∧·, µs∧·) . By considering a countable dense set of maps (β, ψ) ∈ C (Cn ×C×Cd ×Cℓ) × C (Cℓ × C([0,T], P(Ω))), it follows that b b b b b t G t τ¯ τ¯(¯ω) Gτ¯ Pω¯ µs = L Xs∧·, As∧·,W,Bs∧·|GT , Pω¯ –a.s., for all s ≥ τ¯(¯ω), for P–a.e. ω¯ ∈ Ωb. Similarly, one can prove that, for P–a.e. ω¯, and s ≤ τ¯(¯ω), b t G t τ¯ Gτ¯ Pω¯ µs = L (Xs∧·, As∧·,W,Bs∧·), Pω¯ –a.s., and hence, for P–a.e. ω¯ ∈ Ω, b t t G G t τ¯ τ¯ τ¯(¯ω) Gτ¯ Pω¯ Pω¯ µs = L (Xs∧·, As∧·,W,Bs∧·)1{s∈[0,τ¯(¯ω)]} + L (Xs∧·, As∧·,W,Bs∧·|GT )1{s∈(¯τ(¯ω),T ]}, Pω¯ –a.s.

t G t τ¯ Gτ¯ Pω¯ p 2 n Finally, itb is clear that for P–a.e. ω¯, one has E kXk < ∞ and Pω¯ |S|T < ∞ = 1. Moreover, let ϕ ∈ Cb (R × Rd Rℓ ϕ,m ϕ F P × ), so that the localised process S = Sτm∧· is a ( , )–martingale on [t,T ]. Fix T ≥ r>s ≥ t, J ∈ F s and t     K ∈ Gτ¯, we have

Gt Gt P P τ¯ ϕ P ϕ P ϕ P P τ¯ ϕ E E . E E E E . Sτm∧r1J 1K∩{τ¯≤s} = Sτm∧r1J∩K∩{τ¯≤s} = Sτm∧s1J∩K∩{τ¯≤s} = Sτm∧s1J 1K∩{τ¯≤s} . h i h i This implies that        Gt Gt P τ¯ ϕ P τ¯ ϕ E ω¯ 1 E ω¯ 1 P Sτm∧r J = Sτm∧s J , for –a.e. ω.¯ t G   ϕ  τ¯ By considering countably many s, r, J, it follows that S is a F, Pω¯ –local martingale on [¯τ(¯ω),T ] for P–a.e. ω¯ ∈ Ω. We hence conclude the proof. 

19 We next provide a stability result for PW under concatenation. For any constant M > 0, let us introduce

T M P p P p M M M M Pt := P ∈P(Ω) : E kXk +E ρ(u0, αs) ds ≤ M , PW (t, ν) := PW (t, ν)∩P t , PW (t, ν) := PW (t, ν)∩P t , Zt n   h  i o and b b b b M VW (t, ν) := sup J(t, P). M P∈PW (t,ν) M Notice that VW (t, ν) ր VW (t, ν) as M ր ∞. Moreover, as in Lemma 4.7, the graph set

M M (t,ν,M, P) : P ∈ PW (t, ν) is analytic, and (t,ν,M) 7−→ VW (t, ν) ∈ R ∪ {−∞, ∞} is upper semi–analytic. (4.10)

 n −1 −1 Lemma 4.9. Let t ∈ [0,T ], ν1, ν2 ∈ P(Ω) and ν ∈ P(C ) be such that ν1 ◦ Xt∧· = ν2 ◦ Xt∧· = ν(t). Then for all P1 ∈ PW (t, ν1), there exists P2 ∈ PW (t, ν2) satisfying b b b b b P t t t −1 P t t t −1 b b b 1 ◦ bX, A , W ,B = 2 ◦ X, A , W ,B , t   where A· := A·∨t − At, so that J(t, P1)= J(t, P2). Consequently, one has M VW (t, ν) = sup J(t, P), and VW (t, ν) = sup J(t, P). M P∈PW (t,ν1) P∈PW (t,ν1) The proof is almost the same as that of Lemma 3.10, and hence it is omitted. b b b b n t Lemma 4.10. Let (t, ν) ∈ [0,T ] ×P(C ), P ∈ PW (t, ν), τ¯ be a G –stopping time taking values in [t,T ], ε> 0. Then ε ε there exists a family of probability measures (Q ) such that (t, ν,M) 7−→ Q is universally t,ν,M (t,ν,M)∈[0,T ]×P(Ω)×R+ t,ν,M M measurable, and for every (t, ν,M) s.t. PW (t, ν) 6= ∅, one has b b b b b M M ε ε V (t, ν) − ε, when V (t, ν) < ∞, Q M b Q b b W W −1 t,ν,M ∈ PW (t, ν) and J t, t,ν,M ≥ M with ν := ν ◦ X . (4.11) (1/ε, when VW (t, ν)= ∞,  b b b b bt b Moreover, there exists a P–integrable, Gτ¯–measurable r.v. M : Ω → R+ such that for all constant M ≥ 0, one can find M,ε M,ε a probability measure P ∈ PW (t, ν) satisfying P |F = P|F and τ¯ c τ¯ ε M,ε t Q is a version of r.c.p.d. of P knowing G . τ¯(¯ω),µτ¯(¯ω)(¯ω),M+M(¯ω) ω¯∈Ω τ¯  ε Proof. The existence of the family of probability measures ((Qt,ν,M ) ) satisfying (4.11) follows by b b (t,ν,M)∈[0,T ]×P(Ω)×R+ (4.10) and Lemma 4.9, together with the measurable selection theorem (see e.g. [37, Proposition 2.21]). b b t b With P ∈ PW (t, ν), we consider a family of r.c.p.d. (Pω¯ )ω¯∈Ω of P knowing Gτ¯, and define

T Pω¯ p p M(¯ω) := E kXk + ρ(αs,u0) ds ,  Zτ¯   M(¯ω) c M(¯ω) so that Pω¯ ∈ PW τ¯(¯ω), µτ¯(¯ω)(¯ω) for P–a.e. ω¯, by Lemma 4.8. In particular, PW τ¯(¯ω), µτ¯(¯ω)(¯ω) is nonempty for P–a.e. ω¯ ∈ Ω. For a fixed constant M ≥ 0, let b  b  b b ε ε b b Q := Q . ω¯ τ¯(¯ω),µτ¯(¯ω)(¯ω),M(¯ω)+M

Notice that, for P–a.e. ω¯ ∈ Ω, b b −1 ε −1 µτ¯(¯ω)(¯ω)= Pω¯ ◦ (Xτ¯(¯ω)∧·, Aτ¯(¯ω)∧·,W,Bτ¯(¯ω)∧·) = Qω¯ ◦ (Xτ¯(¯ω)∧·, Aτ¯(¯ω)∧·,W,Bτ¯(¯ω)∧·) , then b ε ε ε Qω Qω Qω L ¯ Xτ¯(¯ω)∧·, Aτ¯(¯ω)∧·, Wτ¯(¯ω)∧·,Bτ¯(¯ω)∧·, µτ¯(¯ω)∧· = L ¯ Xτ¯(¯ω)∧·, Aτ¯(¯ω)∧·, Wτ¯(¯ω)∧·,Bτ¯(¯ω)∧· ⊗ L ¯ µτ¯(¯ω)∧·

Pω¯  = L Xτ¯(¯ω)∧·, Aτ¯(¯ω)∧·, Wτ¯(¯ω)∧·,Bτ¯(¯ω)∧·,µτ¯(¯ω)∧· .  (4.12) b b  b 20 In particular, one has

ε t b t νˆ x a w b νˆ Qω¯ Bτ¯(¯ω)∧· = (¯ω )τ¯(¯ω)∧·, µτ¯(¯ω)∧· =ω ¯τ¯(¯ω)∧· =1, for P–a.e. ω¯ = (¯ω , ω¯ , ω¯ , ω¯ , ω¯ ) ∈ Ω. (4.13)

 M,ε  Let us then define a probability measurebP on Ω by

M,ε P [K] := Qω¯ (K)P(d¯ω), for all K ∈ F. ZΩ M,ε ε M,ε t By (4.12), one has P = P on F τ¯, and moreover, (Qω¯ )ω¯∈Ω is a family of r.c.p.d. of P knowing Gτ¯. To conclude M,ε the proof, it is enough to check that P ∈ PW (t, ν). M,ε First, it is clear that P [αs ∈ U]=1, for Lebesgue–almost every s ∈ [t,T ], and

M,ε T τ¯ P p P p P E ρ(u0, αs) ds ≤ E ρ(u0, αs) ds + E M < ∞.  Zt   Zt      ℓ ℓ c Next, for each β ∈ Cb(Ω), ψ ∈ Cb(C ×P(Ω)), h ∈ Cb(C ) and s ∈ [t,T ], one has

M,ε ε P τ¯ t Qω¯ τ¯ t E β(Xs∧·b, As∧·,W,Bs∧·)ψ(Bb, µ)h(Bτ¯∧·) = E β(Xs∧·, As∧·,W,Bs∧·)ψ(B , µ)h(Bτ¯∧·) P(d¯ω) Zω¯ ε    Qω¯ τ¯ t = E β(Xs∧·, As∧·,W,Bs∧·)ψ(B b, µ) h(Bτ¯(¯ω)∧·(¯ω))P(d¯ω) b Zω¯ ε  ε τ¯(¯ω) Qω¯ Qω¯ τ¯ t = E E β(Xs∧·, As∧·,W,Bs∧·) GbT ψ(B , µ) h(Bτ¯(¯ω)∧·(¯ω))P(d¯ω) Zω¯ ε Q     ω¯ τ¯ t = E hβ, µsiψ(B , µ) h(Bτ¯(¯ω)∧·(¯ω)))P(d¯ω) b Zω¯ ε   Qω¯ τ¯ t = E hβ, µbsiψ(B , µb)h(Bτ¯∧·)) P(d¯ω) ω¯ Z M,ε P  τ¯ t  = E hβ, µsbiψ(B , µ)bh(Bτ¯∧·) ,   Qε where the second andb fifthb equalities are due to Equation (4.13), and the fourth follows by the fact that ω¯ ∈ t t τ¯ PW (¯τ(¯ω), µτ¯(¯ω)(¯ω)). Notice that Bu = Bτ¯∧u + Bu for any u ∈ [t,T ], the above equality implies that

M,ε P t M,ε b b µs = L Xs∧·, As∧·,W,Bs∧· GT , P –a.s.

M,ε M,ε P p  P p Finally, we easily check that P [|Sb|T < ∞]=1 and E [kXk ] ≤ E [kXk ]+ M < ∞. For a fixed test function 2 n+d+ℓ ω¯ ′ ′ ϕ ∈ Cb (R ), we consider the localised stopping times τm defined in (4.5) and τk (¯ω ) :=τ ¯(¯ω) ∨ τk(¯ω ) for each ω¯ ω¯ ω¯ ϕ ε ω¯ ∈ Ω. We know that τk ≤ τk+1, for any k ∈ N, that τk −→ ∞, and that (Ss∧τ ω¯ )s∈[τ(¯ω),T ] is an (F, Qω¯ )–martingale k→∞ k for all k ∈ N. Notice that for all s ∈ [t,T ] and A ∈ F s, the map

ε ϕ t Qω ω¯ 7−→ E ¯ S ω¯ 1A1s>τ¯(¯ω) is G –measurable. s∧τm∧τk τ¯ h i Then for s ≤ r ≤ T,

M,ε ϕ M,ε ϕ M,ε ϕ EP 1 EP 1 1 EP 1 1 Ss∧τm A = Ss∧τm A s≤τ¯ + Ss∧τm A s>τ¯ ε M,ε P ϕ Q ϕ P ϕ P ϕ    ω¯    = E [Ss∧τ 1A1s≤τ¯] + lim E Ss∧τ ∧τ ω¯ 1A1s>τ¯(¯ω) P(dω¯)= E Ss∧τ 1A1s≤τ¯ + E Sr∧τ 1A1s>τ¯ m k→∞ m k m m ZΩ ϕ ϕ   M,ε ϕ     EP EP EP = Sτ¯∧τm 1A1s≤τ¯1r<τ¯ + Sτ¯∧τm 1A1s≤τ¯1τ¯≤r + Sr∧τm 1A1s>τ¯ M,ε M,ε EP ϕ EP ϕ EP ϕ = Sr∧τm 1A1s≤τ¯1r<τ¯  +  Sτ¯∧τm 1A1s≤τ¯1τ¯≤r +  Sr∧τm 1A1s> τ¯ M,ε ϕ M,ε ϕ M,ε ϕ M,ε ϕ EP  EP  EP   EP = Sr∧τm 1A1s≤τ¯1r<τ¯ + Sr∧τm 1A1s≤τ¯1τ¯≤r + Sr∧τm 1A1s>τ¯ = Sr∧τm 1A ,  ϕ  M,ε   M,ε    which means that (Su )u∈[t,T ] is an (F, P )–local martingale, and hence P ∈ PW (t, ν).

21 4.3 Proof of the main results 4.3.1 Proof of Theorem 3.1 t First, VW is upper semi–analytic by Lemma 4.7. Further, let τ¯ be a G –stopping time taking value in [t,T ], it follows by Lemma 4.8 that, for every P ∈ PW (t, ν) τ¯ T P P t J(t, P)= E L(s,Xs∧·, µs, αs)ds + E L(s,Xs∧·, µs, αs)ds + g XT ∧·, µT Gτ  Zt  Zτ¯  τ¯  P ≤ E L(s,Xs∧·, µs, αs)ds + VW τ,¯ µτ¯  Zt  ′ τ¯  P ≤ sup E L(s,Xs∧·, µs, αs)ds + VW τ,¯ µτ¯ . ′ t P ∈PW (t,ν)  Z   t Notice that a G –stopping time on Ω can be considered as a G⋆–stopping time τ ⋆ on Ω⋆. Then by the way how τ γ is defined from τ ⋆ in (3.1) and Lemma 4.4, we obtain the inequality

τ γ Pγ γ γ γ γ γ VW (t, ν) ≤ sup E L(s,Xs∧·, µs , αs )ds + VW τ , µτ γ . (4.14) γ∈ΓW (t,ν)  Zt   We now consider the reverse inequality, for which one can assume w.l.o.g. that

′ τ¯ P VW (t, ν) < ∞, and sup E L(s,Xs∧·, µs, αs)ds + VW τ,¯ µτ¯ > −∞. (4.15) ′ t P ∈PW (t,ν)  Z   Let P ∈ PW (t, ν) be a weak control rule, then by Lemma 4.10, for some F τ¯–measurable P–integrable r.v. M : Ω → R+, M,ε one has a family of probability measures (P )M≥0 in PW (t, ν) such that c τ¯ P M+M(¯ω) 1 M+M(¯ω) E L(s,Xs∧·, µs, αs)ds + VW τ,¯ µτ¯ − ε 1 + P VW (¯τ, µτ¯)= ∞ {V M+M(¯ω)(¯τ,µ )<∞} ε  Zt W τ¯  τ¯ ε T Q b    b  P τ,¯ µ,Mˆ +M ≤ E L(s,Xs∧·, µs, , αs)ds + E L(s,Xs∧·, µsb, αs)ds + g XT ∧·, µT  Zt  Zτ¯  M,ε T  P b = E L(s,Xs∧·, µs, αs)ds + g XT ∧·, µT ≤ VW (t, ν).  Zt   M+M If P VW (¯τ, µτ¯) = ∞ ] > 0 for some M ≥ 0, then by taking ε −→ 0, one finds VW (t, ν) = ∞ which is in contradiction to (4.15). When P V M+M (¯τ, µ ) = ∞ = 0 for all M ≥ 0, let M −→ ∞ and then take the supremum  b  W τ¯ over all P ∈ PW (t, ν), it follows that  b  ′ τ¯ P sup E L(s,Xs∧·, µs, αs)ds + VW τ,¯ µτ¯ − ε ≤ VW (t, ν). ′ t P ∈PW (t,ν)  Z   Notice that ε> 0 is arbitrary, and again by the way how τ γ is defined from τ ⋆ (equivalent to τ¯ on Ω) and Lemma 4.4, we can conclude the proof with (4.14).

4.3.2 Proof of Theorem 3.4

n Let (t, ν) ∈ [0,T ] ×P2(C ) and Assumption 3.3 hold, by [34, Theorem 3.1] (letting pˆ = p =2 in their Assumption 2.1), one has VS (t, ν)= VW (t, ν). n n Therefore VS : [0,T ] ×P2(C ) −→ R ∪{−∞, ∞} has the same measurability as VW : [0,T ] ×P2(C ) −→ R ∪{−∞, ∞}. t,◦ t t t γ γ,t Next, let τ be a G –stopping time on (Ω , F , Pν ) taking value in [t,T ], we denote τ := τ(B ) for γ ∈ ΓS(t, ν). Then by the formulation equivalence result in Proposition 2.10, the DPP result (3.4) is equivalent to

τ γ Pγ γ γ γ γ γ VS (t, ν) = sup E L s,Xs∧·, µs , αs ds + VS (τ , µ ) . γ∈ΓS(t,ν)  Zt   22 Recall that under Assumption 2.8 and by Theorem A.3, one has

Pγ γ 2 E sup |Xs | < ∞.  s∈[0,T ] 

Then by Lemma 4.8 and the fact that VS = VW , it follows that

τ γ T Pγ γ γ γ γ γ γ γ γ VS (t, ν) = sup J(t,γ) = sup E L s,Xs∧·, µs , αs ds + L s,Xs∧·, µs , αs ds + g XT ∧·, µT γ γ∈ΓS(t,ν) γ∈ΓS(t,ν)  Zt Zτ  τ γ    Pγ γ γ γ γ γ ≤ sup E L s,Xs∧·, µs , αs ds + VS (τ , µ ) . γ∈ΓS(t,ν)  Zt   Further, by Theorem 3.1, we have

τ γ Pγ γ γ γ γ γ VS (t, ν)= VW (t, ν) = sup E L s,Xs∧·, µs , αs ds + VW (τ , µ ) γ∈ΓW (t,ν)  Zt  τ γ  Pγ γ γ γ γ γ ≥ sup E L s,Xs∧·, µs , αs ds + VS(τ , µ ) , γ∈ΓS(t,ν)  Zt   and hence the proof is concluded.

4.3.3 Proof of Theorem 3.2

B In this part, we use the results and techniques of Theorem 3.1 to show the DPP for VS . We start by proving the B B universal measurability of VS . For this, we consider an equivalent formulation of VS , which is more appropriate for our purpose.

B ⋆ ℓ ⋆ 4.3.3.1 An equivalent reformulation for VS Let Ω := C be the canonical space with canonical process B , ⋆ ⋆ ⋆ ⋆ and P be the Wiener measure, under which B is an ℓ–dimensional standard Brownian motion. Let F = (Ft )t∈[0,T ] be the canonical filtration. Recall that we consider a fixede Borel map π : U ∪{∂} → R ∪ {−∞, ∞}. We denote byeU e F⋆ e R EP⋆ T 2 e ⋆ e the set of –predictable processes θ taking values in , such that 0 θt dt < ∞. Define a metric d on U by T  R  e ⋆ 2 P⋆ 2 e d (η,θ) := E ηt − θt dt , for all (η,θ) ∈U×U,  Z0 

⋆ e θ t so that (U, d ) is a Polish space (see e.g. Brezis [17, Theorems 4.8 and 4.13]). Next, let θ ∈U, and define At := 0 θsds, t ∈ [0,T ]. We consider then the map Υ : U −→P(Cℓ ×C) defined by R ⋆ −1 ⋆ ⋆ θ(B· ) Υ(θ) := P ◦ B· , A· , θ ∈U.   n e −1 Let us introduce, for all t ∈ [0,T ], ν ∈P2(C ) andeν ∈Pe2(Ω) such that ν = ν ◦ X ,

⋆ P P −1 P t PS(t, ν) := ∈ PW (t, ν) : ◦ B·, Ab· ∈ Υb U , and Bt∧· isb –independentb of (B , A) , and    ⋆ ⋆ PS(t, ν) := PW (t, ν) ∩ PS(t, ν).

n −1 Lemma 4.11. Let (t, ν) ∈ [0,T ] ×P2(C ) and ν ∈P(Ω) be such that ν ◦ X = ν. Then under Assumption 2.8, one ⋆ B b b b b has PS(t, ν) ⊆ PS(t, ν) and B b b VSb(t, ν) = sup J(t, Pb). (4.16) ⋆ P∈PS (t,ν)

B Proof. First, take γ ∈ ΓS(t, ν). W.l.o.g., we can assume that there exists an independent Brownian motion B in the γ γ γ ⋆,γ γ γ space (Ω .F , P ), and let B· := Bt∨· − Bt + Bt∧·, then e ′ γ γ γ γ γ γ γ ⋆,γ γ γ γ B γ := Ω , F , P , F e, G ,X , W ,B , µ , µ , α ∈ PS (t, ν).  23 Recall that αγ is Gγ–predictable and Gγ is the augmented filtration generated by Bγ,t, then for some Borel function ′ ′ ℓ γ γ,t Pγ γ · γ,t γ φ : [t,T ] ×C → U, one has αs = φ(s,Bs∧·), s ∈ [t,T ], –a.s. Let A· := 0 π φ(s,Bs∧·) ds and µ be defined as in ′ γ γ γ′ γ γ′ γ′ −1 ′ −1 ′ (2.7), it follows that P := P ◦ X , A , W ,B , µ ∈ PW (t, ν) satisfiesRP ◦(B, A) ∈ Γ(U) and J(t, P )= J(t,γ). ′ B Then J(t,γ)= J(t, P ) ≤ sup ⋆ J(t, P) and hence V (t, ν) ≤ sup ⋆ J(t, P). b P∈PS (t,ν)  S P∈PS (t,ν)

b ⋆ ⋆ ⋆ −1 ⋆ −1 ⋆ ⋆ θ (B· ) −1 Next, given P ∈ PS(t, ν), since P◦ B, A ∈ Υ U , there exists θ ∈U such that P◦ B·, A· = P ◦ B· , A· ) . ⋆ Thus π αs(ω) = θ (Bs∧·(ω)), for dP ⊗ dt–a.e. (s, ω¯) ∈ [t,T ] × Ω. As P ∈ PW (t, ν), we know P[αs ∈ U]=1 for dt– s    e  a.e. s ∈ [0,T ], therefore π α (¯ω) = θ⋆(B (¯ω)) ∈ π(U) and α (¯ω) = π−1 θ⋆(B (¯ω)) ∈eU, fore dP ⊗ dt–a.e.  s s s∧· s s s∧· (s, ω¯) ∈ [0,T ] × Ω. Further, since (Bt, A) is P–independent of B , it follows that there is a Borel measurable function  t∧·  ℓ t B φ : [0,T ] ×C −→ R such that As = φ(s,Bs∧·), s ∈ [0,T ], P–a.s., and therefore P ∈ PS(t, ν). This implies that ⋆ B PS(t, ν) ⊆ PS(t, ν), and the equality (4.16). B We are now ready to prove the measurability of VS . Lemma 4.12. The graph sets

⋆ n ⋆ ⋆ ⋆ JPSK := (t,ν, P) ∈ [0,T ] ×P2(C ) ×P(Ω) : P ∈ PS(t, ν) , and JPS K := (t, ν, P) : P ∈ PS(t, ν) , are analytic sets in respectively [0,T ]×P (Cn)×P(Ω) and [0,T ]×P (Ω)×P(Ω). Consequently, V B : [0,T ]×P (Cn) −→ 2 2 b b S b b 2 R ∪ {−∞, ∞} is upper semi–analytic.

⋆ b ⋆ Proof. We will only consider the case of PS, while the proof is almost the same for PS. First, notice that Υ : U −→P(Cℓ ×C), b is continuous and injective, so that Υ U is a Borel subset of P(Cℓ ×C) (see e.g. Kechris [50, Theorem 15.1]). It follows that  1 n −1 D := (t,ν, P) ∈ [0,T ] ×P2(C ) ×P(Ω) : P ◦ B·, A· ∈ Υ U , is a Borel subset of [0,T ] ×P(Cn) ×P(Ω), as the map  

n −1 ℓ Γ1 : [0,T ] ×P2(C ) ×P(Ω) ∋ (t,ν, P) 7−→ P ◦ B·, A· ∈P(C ×C), is Borel measurable. Similarly 

2 n t D := (t,ν, P) ∈ [0,T ] ×P2(C ) ×P(Ω) : Bt∧· is P–independent of (B , A, µ) , is also a Borel subset of [0,T ] ×P(Cn) ×P(Ω). Indeed, for all (h, ψ) ∈ C (Cℓ) × C Cℓ ×C , the function b b b n P t P P t Γh,ψ : [0,T ] ×P2(C ) ×P(Ω) ∋ (t,ν, P) 7−→ E [h(Bt∧·)ψ(B , A)] − E [h(Bt∧·)]E [ψ(B , A)] ∈ R,

ℓ ℓ is continuous. By consider a countable dense subset R⊂ Cb(C ) × Cb C ×C , it follows that 

2 −1  D = Γϕ,ψ{0}, (h,ψ\)∈R is a Borel set. Finally, notice that

⋆ 1 2 JPSK = JPW K ∩ D ∩ D , and we then conclude the proof by Lemma 4.7 and Lemma 4.11.

M n Recall that for each M > 0, Pt is defined in Section 4.2, we similarly introduce, for (t, ν) ∈ [0,T ] ×P2(C ) and −1 ν ∈P2(Ω) such that ν = ν ◦ X ,

B,M B ⋆,M ⋆ B,M B ⋆ P (t, ν) := P (t, ν)∩PM , P (t, ν) := P ∩P (t, ν), PB,M (t, ν) := P (t, ν)∩PM , P⋆,M (t, ν) := P ∩PB,M (t, ν). b S b S tb b S S S S S t S S W By Lemma 4.9, it is clear that b b b b b b b B,M B VS (t, ν) := sup J(t, P) = sup J(t, P) = sup J(t, P) = sup J(t, P) ր VS (t, ν), as M ր ∞. B,M ⋆,M B,M ⋆,M P∈PS (t,ν) P∈PS (t,ν) P∈PS (t,ν) P∈PS (t,ν)

b 24 b b b t B t G Lemma 4.13. n P G P τ¯ (i) Let (t, ν) ∈ [0,T ] ×P2(C ), ∈ PS(t, ν), τ¯ a –stopping time taking values in [t,T ], and ω¯ ω¯∈Ω t t Gτ¯ B be a family of r.c.p.d. of P knowing Gτ¯. Then Pω¯ ∈ PS τ¯(¯ω), µτ¯(¯ω)(¯ω) , for P–a.e. ω¯ ∈ Ω.  ⋆,M  n B (ii) The graph set (t, ν,M, P) : P ∈ PS (t, ν) is analytic. Further, let (t, ν) ∈ [0,T ] ×P2(C ), P ∈ PS(t, ν), τ¯ be t a G –stopping time taking values in [t,T ], and ε > 0. Then there exists a family of probability measures and a family  ε ε of probability measuresb(Q ) b b such that (t, ν,M) 7−→ Q is universally measurable, and for t,ν,M (t,ν,M)∈[0,T ]×P(Ω)×R+ t,ν,M B,M every (t, ν,M) s.t. PS (t, ν) 6= ∅, one has b b b b b B,M B,M ε b ε V (t, ν) − ε, when V (t, ν) < ∞, Q b B,M b Q S S −1 t,ν,M ∈ PS (t, ν), and J t, t,ν,M ≥ 1 B,M for ν = ν ◦ X . (4.17) ( ε , when; VS (t, ν)= ∞,  b b b t b b b Moreover, there is a Gτ¯–measurable and P–integrable r.v. M : Ω → R+ such that for all constant M > 0, there exists M,ε B M,ε P ∈ PS(t, ν) such that P |F = P|F and τ¯ τ¯ c ε ε,M t Qτ¯(¯ω),µ(¯ω),M+M(¯ω) is a version of the r.c.p.d. of P knowing Gτ¯. ω¯∈Ω   B ℓ Proof. (i) Let P ∈ PS(t, ν), thenb thereb exists a Borel measurable function φ : [t,T ] ×C −→ U such that

t αs = φ s,Bs∧· , for all s ∈ [t,T ], P–a.s.

 ω¯ Let us consider the concatenated path (¯ω ⊗t w¯)s :=ω ¯t∧s + w¯s∨t − w¯t and define a Borel measurable function φ by

ω¯ wb b wb x a w b µ w wx wa ww wb wµ φ s, ¯ := φ s, ω¯ ⊗τ¯(¯ω) ¯ , for s ∈ [¯τ(¯ω),T ], ω¯ = (¯ω , ω¯ , ω¯ , ω¯ , ω¯ ), = ( , , , , ) ∈ Ω.

Then by a classical conditioning argument, it is easy to check that for P–a.e. ω¯b ∈ Ω, b

t ω¯ τ¯(¯ω) Gτ¯ αs = φ s,Bs∧· , for all s ∈ [¯τ(¯ω),T ], Pω¯ –a.s.

 t Gτ¯ B Using Lemma 4.8 and Definition 4.3, it follows that Pω¯ ∈ PS τ¯(¯ω), µτ¯(¯ω)(¯ω) , for P–a.e. ω¯ ∈ Ω. ⋆,M (ii) Using Lemma 4.12, it is easy to see that the graph set (t,ν,M, P) : P∈ PS (t, ν) is analytic. Then one can ε apply the same arguments as in Lemma 4.10 to obtain a measurable family (Q ) n such that  t,ν,M (t,ν,M) ∈[0,T ]×P(C )×R+

ε ⋆,M ε B,M 1 B B Qt,ν,M ∈ PS (t, ν) and J(t, Qt,ν,M ) ≥ (VS (t, ν) − ε)1{V ,M <∞} + 1{V ,M =∞}. S ε S ε ε n To proceed, we will define a family (Q ) from the family (Q )(t,ν,M)∈[0,T ]×P(C )×R+ as t,ν,M (t,ν,M)∈[0,T ]×P(Ω)×R+ t,ν,M −1 ε follows. For all (t, ν) ∈ [0,T ] ×P(Ω), let ν := ν ◦ X . Then on the probability space (Ω, F T , Qt,ν,M ), we consider a ′ ′ ′ Ft–measurable random element (As, Ws,Bbs)s∈[0b,t] such that b b b b b ε ′ ′ ′ −1 Qt,ν,M ◦ (Xt∧·, At∧·, Wt∧·,Bt∧·) = ν(t).

Define ′ ′ ′ ′ t ′ ′ b t As := At + As − At, Ws := Wt + Ws , Bs := Bt + Bs, for s ∈ [t,T ], ε ε t ′ Qt,ν,M ′ ′ ′ Qt,ν,M ′ ′ ′ µs := L (Xs∧, As∧, W ,Bs∧)1{s∈[0,t]} + L (Xs∧, As∧, W ,Bs∧|GT )1{s∈(t,T ]}. Let ε b ε ′ ′ ′ ′ ε ε Qt,ν,M := Qt,ν,M ◦ X, A , W ,B , µ , so that J(t, Qt,ν,M )= J(t, Qt,ν,M ) and hence satisfies (4.17). B Let P ∈ P (t, ν), as in Lemma 4.10, for M >0, we let S b b b T p t ε ε M(¯ω) := E kXkp + ρ(α ,u ) ds G (¯ω), and Q := Q . s 0 τ¯ ω¯ τ¯(¯ω),µτ¯(¯ω)(¯ω),M(¯ω)+M  Zτ¯   c b b 25 ε M,ε Again, as in the proof of Lemma 4.10, one has Qω¯ satisfies (4.12) and (4.13), which allows defining P by

M,ε P [K] ∈ Qω¯ [K]P(dω¯), for all K ∈ F, ZΩ M,ε M,ε ε M,ε t so that P ∈ PW (t, ν), P = P on F τ¯ and (Qω¯ )ω¯∈Ω is a family of r.c.p.d. of P knowing Gτ¯.

M,ε B ℓ Finally, it is enough to prove that P ∈ PS(t, ν). Let s ∈ [t,T ], h ∈ Cb(R) and ψ ∈ Cb(C ), then

M,ε M,ε ε P t P Q· t E Ash(As)ψ Bs∧· 1s>τ¯ = E E Ash(As)ψ Bs∧· 1s>τ¯ M,ε ε ε P Q· Q· t t    = E E E As Bs∧· h(As)ψ B s∧· 1s>τ¯

M,ε M,ε M,ε P h P h P t t i i t t = E E E As Gτ¯ ∨ σ(Bs∧·) h(As)ψ Bs∧· Gτ¯ 1s>τ¯ M,ε h M,ε h i i = EP EP A Bt h (A )ψ Bt 1 ,  s s∧· s s∧· s>τ¯  ε B  ′  ′ n where the second equality follows by the fact that Qω¯ ∈ PS(¯τ(¯ω), ν ) for some ν ∈ P(C ) and hence h(As) = τ¯(¯ω) ε h(φ(Bs∧· )), Qω¯ –a.s., for some Borel measurable function φ, and the last quality follows by the fact that on {s> τ¯}, M,ε t M,ε M,ε B EP A G ∨ σ(Bt ) = EP A Bt . Further, as P | = P| and P ∈ P (t, ν), one can use similarly s τ¯ s∧· s s∧· F τ¯ Fτ¯ S argument to find that     M,ε M,ε M,ε P t P P t t E Ash(As)ψ Bs∧· 1s≤τ¯ = E E As Bs∧· h(As)ψ Bs∧· 1s≤τ¯ . This implies that        

M,ε M,ε M,ε P P t t P t M,ε E As − E As Bs∧· h(As)ψ Bs∧· =0, and hence As = E As Bs∧· , P –a.s.

h   i M,ε   In other words, A is a continuous process, adapted to the P –augmented filtration generated by Bt, then there exists ˆ ℓ ˆ t M,ε a Borel measurable function φ : [t,T ] ×C −→ U such that As = φ(s,Bs∧·), for all s ∈ [t,T ], P –a.s., and hence M,ε B P ∈ PS(t, ν), which concludes the proof.

4.3.3.2 Proof of Theorem 3.2 The proof is almost the same as that of Theorem 3.1. First, one has the measura- B t,◦ t t bility of VS by Lemma 4.12. Next, notice that a G –stopping time τ on Ω can be considered as a special G –stopping time τ¯ on Ω. then using the conditioning argument in Lemma 4.13, it follows that τ B B VS (t, ν) ≤ sup E L(s,Xs∧·, µs, αs)ds + VS τ, µτ . B P t ∈PS (t,ν)  Z   Finally, it is enough to use the concatenation argument in Lemma 4.13 and sending M → ∞ to obtain the reverse inequality τ B P B VS (t, ν) ≥ sup E L(s,Xs∧·, µs, αs)ds + VS τ, µτ . B P t ∈PS(t,ν)  Z  

A Some technical results on controlled McKean–Vlasov SDEs

Let us first recall a technical optional projection result. Lemma A.1. Let E be a Polish space, (Ω, F, P) be a complete probability space, equipped with a complete filtration G := (Gt)t≥0.

(i) Given an E–valued measurable process (Xt)t∈[0,T ], there exists a P(E)–valued G–optional process β such that P βτ = L Xτ Gτ , P–a.s., for all G–stopping times τ. (ii) Assume in addition that X is a continuous process, and that the G–optional σ–field is identical to the G–predictable

σ–field. Then one can choose β to be an a.s. continuous process.

26 Proof. (i) The existence of such process β is ensured by, e.g. Kurtz [51, Theorem A.3] or Yor [81, Proposition 1]. (ii) When X is a continuous process, it follows again by [51, Theorem A.3] (or [81, Proposition 1]) that β is càdlàg 2 P–a.s. Further, let ϕ ∈ Cb(E) and (τn)n≥1 be a increasing sequence of uniformly bounded G–stopping times . One has P P P hϕ, βτn i = E [ϕ(Xτn )|Gτn ], P–a.s., and hence limn→∞ E [hϕ, βτn i]= E [hϕ, βlimn τn i]. Then it follows by Dellacherie [31, Theorem IV–T24] that (hϕ, βti)t∈[0,T ] is left–continuous, P–a.s. By considering a countable dense family of functions ϕ in Cb(E), one concludes that β is also left–continuous a.s.

Let (Ω, F, P) be a complete probability space, F = (Fs)s≥0 a complete filtration, supporting two independent F– Brownian motions B⋆ and W ⋆, which are respectively Rd– and Rℓ–valued. Let us fix a Rn–valued, F–adapted continuous process (ξs)s≥0, a U–valued F–predictable process (αs)s≥0, a complete sub–filtration G = (Gs)s≥0 of F, and denote ∗,t ⋆ ⋆ ∗,t ⋆ ⋆ Ws := Ws∨t − Wt and Bs := Bs∨t − Bt . We will study the following SDE with data (t, ξ, α, G): Xs = ξs for all P s ∈ [0,t], and with µr := L (Xr∧·, αr|Gr), s s s ⋆ ⋆ Xs = ξt + b r, Xr∧·, µr, αr dr + σ r, Xr∧·, µr, αr dWr + σ0 r, Xr∧·, µr, αr dBr , for all s ≥ t, P–a.s. (A.1) Zt Zt Zt    Definition A.2. A strong solution of SDE (A.1), with data (t, ξ, α, G), on [0,T ], is an Rn–valued F–adapted continuous 2 process X = (Xt)t≥0 such that E sups∈[0,T ] |Xs| < ∞ and (A.1) holds true.   E p E T p Theorem A.3. Let 0 ≤ t ≤ T , Assumption 2.8 hold true, sups∈[0,t] |ξs| < ∞ and t ρ(u0, αs) ds < ∞ for some p ≥ 2. Then    R  (i) there exists a unique strong solution Xt,ξ,α of (A.1) on [0,T ] with data (t, ξ, α, G). Moreover, it holds that t,ξ,α p E sups∈[0,T ] Xs < ∞; (ii) assume in addition  that (ξ , W ⋆,B⋆ ) is independent of G, and Bt is G–adapted, and there exists a Borel mea- t∧· t∧· surable function φ : [0,T ] ×Cn ×Cd ×Cℓ −→ U such that

⋆,t ⋆,t αs = φ s, ξt∧·, Ws∧·,Bs∧· , P–a.s., for all s ∈ [0,T ]. s∨t  Then, with As := t π(αr)dr, there exists a continuous process (µt)t∈[0,T ] such that for all s ∈ [0,T ] P t,ξ,α R ⋆ ⋆ P t,ξ,α ⋆ ⋆ P t,ξ,α ⋆ ⋆ ∗,t P µs = L (Xs∧· , As∧·, W ,Bs∧·) GT = L (Xs∧· , As∧·, W ,Bs∧·b) Gs = L (Xs∧· , As∧·, W ,Bs∧·) Bs∧· , –a.s. Proof. (i)We follow [77, Theorem 5.1.1] to prove the existence and uniqueness  of a strong solution to (A.1 ). Let Sp be b defined by

p n P p S := Y := (Ys)s∈[0,T ] : R –valued and F–adapted and continuous process such that E kY kT < ∞ ,  n p  P where kxks := supr∈[0,s] |xr| for s ∈ [0,T ] and x ∈ C . For all Y ∈ S , we define, with µs := L Ys∧·, αs GT , Ψ(Y ) := (Ψ(Y ) ) by s 0≤s≤T  t∨s t∨s t∨s ⋆ ⋆ Ψ(Y )s := ξt∧s + b r, Yr∧·, µr, αr dr + σ r, Yr∧·, µr, αr dWr + σ0 r, Yr∧·, µr, αr dBr . Zt Zt Zt 1 2 p p i P  i   Then, for (Y , Y ) ∈ S × S , with µs := L Ys∧·, αs GT , i ∈{1, 2}, one has v  p EP 1 2 p pEP 1 1 2 2 ⋆ Ψ(Y ) − Ψ(Y ) s ≤ 3 sup σ r, Yr∧·, µr , αr − σ r, Yr∧·, µr, αr dWr  v∈[t,s] Zt  h i v   p p P 1 1 2 2 ⋆ +3 E sup σ0 r, Yr∧·, µr , αr − σ0 r, Yr∧·, µr, αr d Br  v∈[t,s] Zt  s  p  p P 1 1 2 2 +3 E b r, Yr∧·, µr , αr − b r, Yr∧·, µr, αr dr .  Zt   

Notice that, for all r ∈ [t,T ], p 1 2 p 1 2 p P 1 P 2 P 1 2 p W2 µr , µr ≤ Wp µr , µr = Wp L ((Yr∧·, αr)|Gr), L ((Yr∧·, αr)|Gr) ≤ E kYr∧· − Yr∧·k Gr .   h i 2which is G–predictable  time as soon as the G–optional σ–field is identical to the G–predictable σ–field.

27 Then by Burkholder–Davis–Gundy inequality, Jensen’s inequality and Assumption 2.8, there is some constant CT > 0 such that s P 1 2 p P 1 2 p E kΨ(Y ) − Ψ(Y )ks ≤ CT E kYr∧· − Yr∧·kr dr. (A.2) t h i Z h i Besides, by Assumption 2.8

T p P p P p E kΨ(0)k ≤ C 1+ E sup |ξr| + E ρ(u0, αr) dr .   r∈[0,t]   Zt    Then by taking Y 2 =0, (A.2) implies that Ψ(Y ) ∈ Sp whenever Y ∈ Sp. Moreover, for any positive integer n

s P n 1 n 2 p P n−1 1 n−1 2 p E kΨ (Y ) − Ψ (Y )ks ≤ CT E kΨ (Y ) − Ψ (Y )kr dr t h i Z s h r i 2 P n−2 1 n−2 2 p ≤ (CT ) E kΨ (Y ) − Ψ (Y )kv dvdr t t Z Z h i n 1 EP 1 2 p ≤ (CT ) {s≥v1≥v2≥...≥vn≥t} kY − Y kvn dv1 . . . dvn Z (s − t)n h i ≤ (C )nEP kY 1 − Y 2kp . T s n! h i Let Y ∈ Sp, X0 := Y , and Xn := Ψn(Y ), for n ≥ 1, it follows that (s − t)n EP kXn − Xn+1kp ≤ (C )nEP kY − Ψ(Y )kp , and hence EP kXn − Xn+1kp < ∞, s T s n! T n≥1  i h i h X i n p which implies that the sequence (X )n≥1 converges uniformly, P–a.s., to some X ∈ S . Finally, it is straightforward to see that X is the unique strong solution of (A.1) with data (t, ξ, α, G). −1 t n d ℓ (ii) Let ν := P ◦ (ξt∧·) . We recall that the canonical space Ω := C0,t ×Ct,T ×Ct,T was introduced in Section 2.2.2, t,◦ t,◦ with corresponding canonical processes ζ = (ζs)0≤s≤t, W = (Ws)t≤s≤T , and B = (Bs)t≤s≤T . and filtration F , G . t t t Moreover, under Pν , Ws := Ws∨t − Wt and Bs := Bs∨t − Bt are standard Brownian motion on [t,T ] independent of ζ, t t t t,◦ t,◦ and F and G are the Pν –augmented filtration of F and G . t t α t Define αs := φ(s, ζt∧·, Ws∧·,Bs∧·), for all s ∈ [t,T ]. There exists a unique solution Y of Equation (A.1) on Ω , t α t associated with (t, ζt∧·, α, G ). As Y is an F –adapted continuous process, there exists an Borel measurable function n d ℓ n Ψ:[0,Te] ×C ×C ×C → R such that e α t t t Ys =Ψs(ζt∧·, Ws∧·,Bs∧·), s ∈ [0,T ], Pν –a.s. Next, on the probability space (Ω, F, P), let us define

α ∗,t ∗,t Xs := Ψs(ξt∧·, Ws∧·,Bs∧·).

α B⋆,t Then it is clear that X is the unique solution, on (Ω, F, F, P), of Equation (A.1), associated with (t, ξt∧·, α, F ), where B⋆,t B⋆,t ⋆,t ⋆ ⋆ F := (Fs )s∈[0,T ] is the P–augmented filtration generated by B . Moreover, as (ξt∧·, W ,Bt∧·) is independent of G, and B⋆,t is G–adapted, one has, for all s ∈ [t,T ],

P α ⋆,t P ⋆,t ⋆,t ⋆,t ∗,t ∗,t µs = L (Xs∧·, αs) Bs∧· = L (Ψs∧·(ξt∧·, Ws∧·,Bs∧·), φ(s, ξt∧·, Ws∧·,Bs∧·)) Bs∧· P ⋆,t ⋆,t ⋆,t ⋆,t  = L (Ψs∧·(ξt∧·, Ws∧·,Bs∧·), φ(s, ξt∧·, Ws∧·,Bs∧·)) Gs 

P α = L (Xs∧·, αs) Gs , P–a.s. 

α  t,ξ,α α This implies that X is also a solution of Equation (A.1) associated with (t, ξt∧·, α, G), and hence X = X by uniqueness of solution to Equation (A.1). ⋆,t ⋆,t Further, as As is a Borel measurable function of (ξt∧·, Ws∧·,Bs∧·), it follows by the same argument that, for all s ∈ [0,T ],

P t,ξ,α ⋆ ⋆ P t,ξ,α ⋆ ⋆ P t,ξ,α ⋆ ⋆ ⋆,t µs := L Xs∧· , As∧·, W ,Bs∧· GT = L Xs∧· , As∧·, W ,Bs∧· Gs = L Xs∧· , As∧·, W ,Bs∧· Bs∧· , P − a.s. Finally, using Lemma A.1, one can choose the process µ to be continuous.   b

b 28 Let us consider the following system of SDE, where a solution is a couple of F–adapted continuous processes (X, µ) n d ℓ P 2 2 such that: for some ν0 ∈P(C ×C×C ×C ), E kXk + W2(µ, ν0) < ∞, Xs = ξs for s ∈ [0,t], and s s  s b ⋆ ⋆ Xs = ξt + b r,b Xr∧·, µr, αr dr + σ r, Xr∧·, µr, αr dWrb+b σ0 r, Xr∧·, µr, αr dBr , s ∈ [t,T ], P–a.s., (A.3) Zt Zt Zt    P ∗,t s∧t P ⋆ ⋆ t,∗ and µr := L (Xr∧·, αr|Br∧·, µr∧·) for all r ≥ t, and with As := t π(αr)dr, µs = L Xs∧·, As∧·, W ,Bs∧· Bs∧·, µs∧· for all s ≥ 0, and finally (ξ , W ⋆,B⋆ ) is independent of (B∗,t, µ). t∧· t∧· R 

Corollary A.4. Let Assumptionb 2.8 hold true, and assume that there existsb a Borel measurable function φ : [0,Tb ] × Cn ×Cd ×Cℓ −→ U such that b

T ⋆,t ⋆,t 2 αs = φ s, ξt∧·, Ws∧·,Bs∧· , for dP ⊗ dt –a.e. (s,ω) ∈ [t,T ] × Ω, and E ρ(u0, αs) ds < ∞.  Zt   Then, Equation (A.3) has a unique solution (X, µ), where X is the strong solution of Equation (A.1) with data ⋆,t ⋆,t (t, ξ, α, FB ), with FB being the P–augmented filtration generated by B⋆,t and

P b ⋆ ⋆ ⋆,t µs = L Xs∧·, As∧·, W ,Bs∧· Bs∧· , s ∈ [t,T ], P–a.s. Proof. Given a solution (X, µ) to Equation (A.3), we notice that X is a strong solution of Equation (A.1) associated with b B⋆,t,µ B⋆,t,µ B⋆,t,µ B⋆,t,µ ⋆,t ⋆ ⋆ data (t, α, ξ, F ), where F := (Fs )s∈[0,T ] with Fs := σ(Bs∧·, µs∧·). As (ξt∧·, W ,Bt∧·) is independent of (B⋆,t, µ), it is then enoughb to apply Theorem A.3 to conclude that X is the strong solution of Equation (A.1) with ⋆,t data (t, ξ, α, FB )b. b b b b b References

[1] B. Acciaio, J. Backhoff-Veraguas, and R. Carmona. Generalized McKean–Vlasov (mean field) control: a stochastic maximum principle and a transport perspective. arXiv preprint arXiv:1802.05754, 2018. [2] C. Aliprantis and K. Border. Infinite dimensional analysis: a hitchhiker’s guide. Springer–Verlag Berlin Heidelberg, 3rd edition, 2006. [3] D. Andersson and B. Djehiche. A maximum principle for SDEs of mean–field type. & Optimization, 63(3):341–356, 2011. [4] K. Bahlali, M. Mezerdi, and B. Mezerdi. Existence of optimal controls for systems governed by mean–field stochastic differential equations. Afrika Statistika, 9(1):627–645, 2014. [5] K. Bahlali, M. Mezerdi, and B. Mezerdi. Existence and optimality conditions for relaxed mean–field stochastic control problems. Systems & Control Letters, 102:1–8, 2017. [6] K. Bahlali, M. Mezerdi, and B. Mezerdi. On the relaxed mean–field stochastic control problem. Stochastics and Dynamics, 18(3):1850024, 2018. [7] E. Bandini, A. Cosso, M. Fuhrman, and H. Pham. Randomization method and backward SDEs for optimal control of partially observed path–dependent stochastic systems. The Annals of Applied Probability, to appear. [8] M. Basei and H. Pham. A weak martingale approach to linear–quadratic McKean–Vlasov stochastic control problems. Journal of Optimization Theory and Applications, 181(2):347–382, 2019. [9] E. Bayraktar, A. Cosso, and H. Pham. Randomized dynamic programming principle and Feynman–Kac represen- tation for optimal control of McKean–Vlasov dynamics. Transactions of the American Mathematical Society, 370 (3):2115–2160, 2018. [10] A. Bensoussan, J. Frehse, and S. Yam. Mean field games and mean field type control theory, volume 101 of SpringerBriefs in mathematics. Springer–Verlag New York, 2013. [11] A. Bensoussan, J. Frehse, and S. Yam. The master equation in mean field theory. Journal de Mathématiques Pures et Appliquées, 103(6):1441–1474, 2015.

29 [12] A. Bensoussan, J. Frehse, and S. Yam. On the interpretation of the master equation. Stochastic Processes and their Applications, 127(7):2093–2137, 2017. [13] D. Bertsekas and S. Shreve. Stochastic optimal control: the discrete time case, volume 139 of Mathematics in science and engineering. Academic Press New York, 1978. [14] T. Björk and A. Murgoci. A theory of Markovian time–inconsistent stochastic control in discrete time. Finance and Stochastics, 18(3):545–592, 2014. [15] T. Björk, M. Khapko, and A. Murgoci. On time–inconsistent stochastic control in continuous time. Finance and Stochastics, 21(2):331–360, 2017. [16] B. Bouchard, B. Djehiche, and I. Kharroubi. Quenched mass transport of particles towards a target quenched mass transport of particles towards a target quenched mass transport of particles towards a target quenched mass transport of particles towards a target. arXiv preprint arXiv:1707.07869, 2017. [17] H. Brezis. , Sobolev spaces and partial differential equations. Springer–Verlag New York, 2011. [18] G. Brunick and S. Shreve. Mimicking an Itô process by a solution of a stochastic differential equation. The Annals of Applied Probability, 23(4):1584–1628, 2013. [19] R. Buckdahn, B. Djehiche, and J. Li. A general stochastic maximum principle for SDEs of mean–field type. Applied Mathematics & Optimization, 64(2):197–216, 2011. [20] R. Buckdahn, J. Li, and J. Ma. A mean–field stochastic control problem with partial observations. The Annals of Probability, 27(5):3201–3245, 2017. [21] P. Cardaliaguet. Notes on mean field games (from P.-L. Lions’ lectures at Collège de France). Lecture given at Tor Vergata, April–May 2010, 2010. [22] P. Cardaliaguet, F. Delarue, J.-M. Lasry, and P.-L. Lions. The master equation and the convergence problem in mean field games, volume 201 of Annals of mathematics studies. Princeton University Press, 2019. [23] R. Carmona and F. Delarue. The master equation for large population equilibriums. In D. Crisan, B. Hambly, and T. Zariphopoulou, editors, Stochastic analysis and applications 2014: in honour of Terry Lyons, volume 100 of Springer proceedings in mathematics and , pages 77–128. Springer, 2014. [24] R. Carmona and F. Delarue. Forward–backward stochastic differential equations and controlled McKean–Vlasov dynamics. The Annals of Probability, 43(5):2647–2700, 2015. [25] R. Carmona and F. Delarue. Probabilistic theory of mean field games with applications I, volume 83 of Probability theory and stochastic modelling. Springer International Publishing, 2018. [26] R. Carmona and X. Zhu. A probabilistic approach to mean field games with major and minor players. The Annals of Applied Probability, 26(3):1535–1580, 2016. [27] R. Carmona, F. Delarue, and A. Lachapelle. Control of McKean–Vlasov dynamics versus mean field games. Mathematics and Financial , 7(2):131–166, 2013. [28] R. Carmona, F. Delarue, and D. Lacker. Mean field games with common noise. The Annals of Probability, 44(6): 3740–3803, 2016. [29] A. Chala. The relaxed optimal control problem for mean–field SDEs systems and application. Automatica, 50(3): 924–930, 2014. [30] J. Claisse, D. Talay, and X. Tan. A pseudo–Markov property for controlled diffusion processes. SIAM Journal on Control and Optimization, 54(2):1017–1029, 2016. [31] C. Dellacherie. Capacités et processus stochastiques, volume 67 of Ergebnisse der Mathematik und ihrer Grenzge- biete. 2. Folge. Springer–Verlag Berlin Heidelberg, 1972. [32] C. Dellacherie. Quelques résultats sur les maisons de jeu analytiques. Séminaire de probabilités de Strasbourg, XIX: 222–229, 1985.

30 [33] B. Djehiche and S. Hamadène. Optimal control and zero–sum stochastic differential game problems of mean–field type. Applied Mathematics & Optimization, to appear. [34] M. Djete, D. Possamaï, and X. Tan. McKean–Vlasov optimal control: limit theory and equivalence between different formulations. in preparation, 2019. [35] N. El Karoui and M.-C. Quenez. Programmation dynamique et évaluation des actifs contingents en marché incomplet. Comptes rendus de l’Académie des sciences – Série 1 – Mathématique, 313(12):851–854, 1991. [36] N. El Karoui and M.-C. Quenez. Dynamic programming and pricing of contingent claims in an incomplete market. SIAM Journal on Control and Optimization, 33(1):29–66, 1995. [37] N. El Karoui and X. Tan. Capacities, measurable selection and dynamic programming part I: abstract framework. arXiv preprint arXiv:1310.3363, 2013. [38] N. El Karoui and X. Tan. Capacities, measurable selection and dynamic programming part II: application in stochastic control problems. arXiv preprint arXiv:1310.3364, 2013. [39] N. El Karoui, D. Huu Nguyen, and M. Jeanblanc-Picqué. Compactification methods in the control of degenerate diffusions: existence of an optimal control. Stochastics, 20(3):169–219, 1987. [40] M. Fuhrman and H. Pham. Randomized and backward SDE representation for optimal control of non–Markovian SDEs. The Annals of Applied Probability, 25(4):2134–2167, 2015. [41] S. Hamadène and J.-P. Lepeltier. Backward equations, stochastic control and zero–sum stochastic differential games. Stochastics: An International Journal of Probability and Stochastic Processes, 54(3-4):221–231, 1995. [42] C. Hernández and D. Possamaï. Me, myself and I: a general theory of non–Markovian time–inconsistent stochastic control for sophisticated agents. in preparation, 2019. [43] J. Huang, X. Li, and J. Yong. A linear–quadratic optimal control problem for mean–field stochastic differential equations in infinite horizon. Mathematical Control and Related Fields, 5(1):97–139, 2015. [44] M. Huang, P. Caines, and R. Malhamé. Individual and mass behaviour in large population stochastic wireless power control problems: centralized and Nash equilibrium solutions. In C. Abdallah and F. Lewis, editors, Proceedings of the 42nd IEEE conference on decision and control, 2003., pages 98–103. IEEE, 2003. [45] M. Huang, R. Malhamé, and P. Caines. Large population stochastic dynamic games: closed–loop McKean–Vlasov systems and the Nash certainty equivalence principle. Communications in Information & Systems, 6(3):221–252, 2006. [46] M. Huang, P. Caines, and R. Malhamé. An invariance principle in large population stochastic dynamic games. Journal of Systems Science and Complexity, 20(2):162–172, 2007. [47] M. Huang, P. Caines, and R. Malhamé. Large–population cost–coupled LQG problems with nonuniform agents: Individual–mass behavior and decentralized ε–Nash equilibria. IEEE Transactions on Automatic Control, 52(9): 1560–1571, 2007. [48] M. Huang, P. Caines, and R. Malhamé. The Nash certainty equivalence principle and McKean–Vlasov systems: an invariance principle and entry adaptation. In D. Castanon and J. Spall, editors, 46th IEEE conference on decision and control, 2007, pages 121–126. IEEE, 2007. [49] J. Kac. Foundations of kinetic theory. In J. Neyman, editor, Proceedings of the third Berkeley symposium on math- ematical statistics and probability, volume 3: contributions to astronomy and , pages 171–197. University of California Press, 1956. [50] A. Kechris. Classical descriptive set theory, volume 156 of Graduate texts in mathematics. Springer–Verlag New York, 1995. [51] T. Kurtz. Martingale problems for conditional distributions of Markov processes. Electronic Journal of Probability, 3(9):1–29, 1998.

31 [52] D. Lacker. Limit theory for controlled McKean–Vlasov dynamics. SIAM Journal on Control and Optimization, 55 (3):1641–1672, 2017. [53] J.-M. Lasry and P.-L. Lions. Jeux à champ moyen. I–Le cas stationnaire. Comptes Rendus Mathématique, 343(9): 619–625, 2006. [54] J.-M. Lasry and P.-L. Lions. Jeux à champ moyen. II–Horizon fini et contrôle optimal. Comptes Rendus Mathé- matique, 343(10):679–684, 2006. [55] J.-M. Lasry and P.-L. Lions. Mean field games. Japanese Journal of Mathematics, 2(1):229–260, 2007. [56] M. Lauriére and O. Pironneau. Dynamic programming for mean–field type control. Comptes Rendus Mathématique, 352(9):707–713, 2014. [57] V. Li, J. Sun, and J. Yong. Mean–field stochastic linear quadratic optimal control problems: closed–loop solvability. Probability, Uncertainty and Quantitative Risk, 1(2):1–24, 2016. [58] X. Li, J. Sun, and J. Xiong. Linear quadratic optimal control problems for mean–field backward stochastic differential equations. Applied Mathematics & Optimization, 80(1):223–250, 2019. [59] P.-L. Lions. Théorie des jeux de champ moyen et applications. Cours du Collège de France. http://www.college-de-france.fr/default/EN/ all/equder/audiovideo.jsp , 2006–2012. [60] H. McKean. Propagation of chaos for a class of non–linear parabolic equations. In Lecture series on differential equations. Session 7. Stochastic differential equations, pages 41–57. Fort Belvoir Defense Technical Information Center, 1969. [61] A. Neufeld and M. Nutz. Superreplication under volatility uncertainty for measurable claims. Electronic Journal of Probability, 18(48):1–14, 2013. [62] A. Neufeld and M. Nutz. Nonlinear Lévy processes and their characteristics. Transactions of the American Mathematical Society, 369(1):69–95, 2017. [63] M. Nutz and H. Soner. Superhedging and dynamic risk measures under volatility uncertainty. SIAM Journal on Control and Optimization, 50(4):2065–2089, 2012. [64] M. Nutz and R. van Handel. Constructing sublinear expectations on path space. Stochastic Processes and their Applications, 123(8):3100–3121, 2013. [65] H. Pham. Linear quadratic optimal control of conditional McKean–Vlasov equation with random coefficients and applications. Probability, Uncertainty and Quantitative Risk, 1(7):1–26, 2016. [66] H. Pham and X. Wei. Dynamic programming for optimal control of stochastic McKean–Vlasov dynamics. SIAM Journal on Control and Optimization, 55(2):1069–1101, 2017. [67] H. Pham and X. Wei. Bellman equation and viscosity solutions for mean–field stochastic control problem. ESAIM: Control, Optimisation and Calculus of Variations, 24(1):437–461, 2018. [68] D. Possamaï, X. Tan, and C. Zhou. Stochastic control for a class of nonlinear kernels and applications. The Annals of Probability, 46(1):551–603, 2018. [69] S. Shreve. Dynamic programming in complete separable spaces. PhD thesis, University of Illinois at Urbana– Champaign, 1977. [70] S. Shreve. Probability measures and the C−sets of Selivanovskij. Pacific Journal of Mathematics, 79(1):189–196, 1978. [71] S. Shreve. Resolution of measurability problems in discrete–time stochastic control. In M. Kohlmann and W. Vo- gel, editors, Stochastic control theory and stochastic differential systems. Proceedings of a workshop of the „Son- derforschungsbereich 72 der Deutschen Forschungsgemeinschaft and der Universität Bonn" which took place in January 1979 at Bad Honnef, volume 16 of Lecture notes in control and information sciences, pages 580–587. Springer Berlin Heidelberg, 1979.

32 [72] S. Shreve and D. Bertsekas. Alternative theoretical frameworks for finite horizon discrete–time stochastic optimal control. SIAM Journal on Control and Optimization, 16(6):953–978, 1978. [73] S. Shreve and D. Bertsekas. Dynamic programming in Borel spaces. In M. Puterman, editor, Dynamic programming and its applications. Proceedings of the international conference on dynamic programming and its applications, university of British Columbia, Vancouver, British Columbia, Canada, April 14–16, 1977, pages 115–130. Academic Press New York San Francisco London, 1978. [74] S. Shreve and D. Bertsekas. Universally measurable policies in dynamic programming. Mathematics of Operations Research, 4(1):15–30, 1979. [75] A.-S. Snitzman. Topics in propagation of chaos. In P. Hennequin, editor, École d’été de probabilités de Saint–Flour XIX – 1989, number 1464 in Lecture notes in mathematics, pages 165–251. Springer Berlin Heidelberg, 1991. [76] H. Soner and N. Touzi. Dynamic programming for stochastic target problems and geometric flows. Journal of the European Mathematical Society, 4(3):201–236, 2002. [77] D. Stroock and S. Varadhan. Multidimensional diffusion processes, volume 233 of Grundlehren der mathematischen Wissenschaften. Springer–Verlag Berlin Heidelberg, 1997. [78] C. Villani. Optimal transport: old and new, volume 338 of Grundlehren der Mathematischen Wissenschafte. Springer, 2008. [79] C. Wu and J. Zhang. Viscosity solutions to parabolic master equations and McKean–Vlasov SDEs with closed–loop controls. arXiv preprint arXiv:1805.02639, 2018. [80] J. Yong. Linear–quadratic optimal control problems for mean–field stochastic differential equations. SIAM Journal on Control and Optimization, 51(4):2809–2838, 2013. [81] M. Yor. Sur les théories du filtrage et de la prédiction. Séminaire de probabilités de Strasbourg, XI:257–297, 1977. [82] G. Žitković. Dynamic programming for controlled Markov families: abstractly and over martingale measures. SIAM Journal on Control and Optimization, 52(3):1597–1621, 2014.

33