arXiv:1507.01494v1 [math.ST] 6 Jul 2015 te iutos ehial oedmnig(..SDE’s) simil (e.g. that demanding expect more might technically one situations, and other p problems, role estimation delicate such the Th in highlight regularity. results of our scale question, that natural in parameter the bias) (without hs ah lod o eogto belong funct not a do is also estimator paths the since whose intuition: following the with vnin even soitdrs scmue ihrsett the to respect with computed is risk associated L ik,b osdrn h siaini h nepltn f interpolating the in estimation the considering by risks, sfntoswt ausin values with functions as n h olwn usin hc srte aua u appa in but values take natural also rather which is estimators which about question, b following theorems) the Girsanov ing and Cameron-Martin of because motion, praht hs rbes yepoigtcnqe rmMa from method techniques A. Stein’s employing and by Privault problems, N. these where to PR09] approach [PR08; articles the from stems ..[o0;CH1 R1 i1] n,mr eeal,o pr space on Hilbert year generally, more recent and, the Liu13]) ar in PR11; research active CKH11; only very [Gob01; which a e.g. currently picture, is and general [NP12]) more monograph a into combinati a fits Such frameworks. infinite-dimensional these in n nest siainfraCxpoes nafiietm in time finite a on process, Cox a for estimation intensity es drift and estimation: statistical parametric) dimensional aeaia(INdAM). Matematica OOE PCS O RWINMTO N O PROCESSES COX AND MOTION BROWNIAN FOR SPACES, SOBOLEV UCINLCAE-A ONSADSENETMTR IN ESTIMATORS STEIN AND BOUNDS CRAMER-RAO FUNCTIONAL 2 oivsiaeti rbe,w rtpoieCae-a bou Cramer-Rao provide first we problem, this investigate To si P0;P0] easm htteukonfnto ob e be to function unknown the that assume we PR09], [PR08; in As nti ae efcso w rbeso o-aaerc(or, non-parametric of problems two on focus we paper this In h eodadtidatosaemmeso h NMAgopo group GNAMPA the of members are authors third and second The for , td ue-ffiin ti yeetmtr i h Gaussian the (in estimators type Stein super-efficient study yteeeg ucinlascae osm rcinlSobo fractional some to associated functional energy the by n nest siainfraCxpoeso nt interva finite a on process Cox a for estimation intensity and Abstract. nbt iutos rmrRolwrbud r band e in obtained, risk are finite bounds with lower estimators Cramer-Rao unbiased situations, both In α W ∈ α, 2 [0 for , H , J6]t td rmrRobud n ue-ffiin “shrin super-efficient and bounds Cramer-Rao study to [JS61] ] ttrsotta oubae siao xs in exist estimator unbiased no that out turns It 1]. 0 1 (0 N UT,MUII RTLI N AI TREVISAN DARIO AND PRATELLI, MAURIZIO MUSTA, ENI α eivsiaetepolm fditetmto o shifte a for estimation drift of problems the investigate We T , ≥ wihi esnbecoc,a es ntecs fshifte of case the in least at choice, reasonable a is (which ) 1 / Term29.Atog i upiig hs eut re results these surprising, bit a Although 2.9). (Theorem 2 L 2 ([0 T , H ] µ , 1. 0 1 (where ) (nor Introduction H 0 1 H xs.B alai aclstcnqe,w also we techniques, calculus Malliavin By exist. W 0 1 1 ned n[R8 R9,etmtr r seen are estimators PR09], [PR08; in Indeed, ? α, µ L 2 for , 2 sayfiiemaue r qiaety the equivalently, or, measure) finite any is omadntte(stronger) the not and norm iainfrasitdBona motion Brownian shifted a for timation α atoa ooe space Sobolev ractional ae ytecoc fdffrn norms different of choice the by layed ≥ o fteraiaino h process, the of realization the of ion no hs w oefltechniques powerful two these of on . rfr,bsdsaseigarather a answering besides erefore, case). 1 [0 l e space lev / tiigi atclrta no that particular in ntailing a ihipc nsaitc (see on impact with ea, etywsntcniee:what considered: not was rently lai aclsadteso-called the and calculus lliavin ) ti torsy oestimate to risky” “too is it 2), rpeoeamgtapa in appear might phenomena ar evl[0 terval tw oefrhrb address- by further move we ut ´ vilcdvlpda original R´eveillac an developed T , bblsi approximations. obabilistic a eoecer(e the (see clear become has s d ihrsett different to respect with nds ,we h iki given is risk the when ], h siuoNzoaed Alta di Nazionale Istituto the f oergrul,infinite- rigorously, more rwinmotion Brownian d H tmtdblnst the to belongs stimated H T , 0 1 0 1 ⊂ .Orinvestigation Our ]. Term25 and 2.5) (Theorem W ae estimators kage” α, 2 H ⊂ H 0 Brownian d 1 L 0 1 ⊂ 2 norm. . W concile α, 2 ⊂ 2 ENIMUSTA,MAURIZIOPRATELLI,ANDDARIOTREVISAN

As a second task, we study super-efficient “shrinkage” estimators in the spaces W α,2. It is often intuitively suggested the ideal situation for the problem of estimation would be to have an unbiased estimator with low variance, but allowing for a little bias may entail existence of estimators with lower risks, in many situations: this is the purpose of Stein’s method, and we rely on its extension and combination with to these frameworks developed in [PR08; PR09]. With a similar approach, we give sufficient conditions for super-efficient estimators in W α,2, for α < 1/2, and we give explicit examples of such estimators, in the case of Brownian motion (Example 5.1). In the case of Cox processes, although it is possible to define a suitable version of Malliavin calculus and provide as well sufficient conditions for Stein estimators, we are currently unable to provide explicit examples. The paper is organized as follows. In Section 2 we deal with drift estimation for a shifted 1 Brownian motion, addressing Cramer-Rao lower bounds with respect to risks computed in H0 and fractional Sobolev spaces. Analogous results on intensity estimators for Cox processes are given in Section 3. In Section 4, we recall notation and results for Malliavin calculus on the Wiener space. Finally, in Section 5, we discuss super-efficient estimators.

2. Drift estimation for a shifted Brownian motion

In this section, we fix T 0 and let X = (Xt)t∈[0,T ] be a Brownian motion (on the finite ≥ F F P interval [0, T ]), defined on some filtered probability space (Ω, , ( t)t∈[0,T ], ). As a (infinite- dimensional) space of parameters Θ, we consider a set of absolutely continuous, adapted t processes ut := 0 u˙ s ds (for t [0, T ]) such that (u ˙ t)t∈[0,T ] satisfies the conditions of : indeed, for u Θ, we∈ define the probability Pu := LuP, with R ∈ T 1 T Lu := exp u˙ dX u˙ 2 ds , s s − 2 s Z0 Z0 h i Pu u and Girsanov theorem entails that, with respect to the probability , the process Xt := Xt ut is a Brownian motion on [0, T ]. We− address the problem of estimating the drift w.r.t. Pu on the basis of a single observation of X. This is of interest in different fields of applications: for example, we can interpret X as the observed output signal of some unknown input signal u, perturbed by a Brownian noise. Such a problem is investigated e.g. in [PR08], where the following definition is given. Definition 2.1. Any measurable ξ :Ω [0, T ] R is called an estimator of the drift u. An estimator of the drift u is said to be unbiased× if,→ for every u Θ, t [0, T ], u u u ∈ ∈ ξt is P -integrable and it holds E [ ξt ]= E [ ut ]. In this section, we forgo to specify “of the drift u” and we simply refer to estimators. u Moreover, we refer to the quantity E [ ξt ut ] as the bias of the estimator ξ (whenever it is well-defined). − By introducing as a risk associated to any estimator ξ, the quantity T Eu 2 Eu 2 (1) [ ξ u L2(µ) ]= ξt ut µ(dt) , k − k 0 | − | h Z i where µ is any finite Borel measure on [0, T ], Privault and R´eveillac provide the following Cramer-Rao lower bound for adapted and unbiased estimators [PR08, Proposition 2.1], Θ being the space of all absolutely continuous, adapted processes, whose derivatives satisfy the conditions of Girsanov theorem. FUNCTIONALCRAMER-RAOBOUNDSINSOBOLEVSPACES 3

Theorem 2.2 (Cramer-Rao inequality in L2(µ)). For any adapted and unbiased estimator ξ it holds T u 2 (2) E [ ξ u 2 ] tµ(dt), for every u Θ. k − kL (µ) ≥ ∈ Z0 Equality is attained by the (efficient) estimator uˆ = X. Before giving our results, let us briefly comment on some aspects of this inequality and its proof, in particular with respect to adaptedness of ξ and the role played by the exponent 2. By direct inspection of the proof in [PR08], the requirement for ξ to be adapted is seen to be unnecessary. Indeed, the argument relies on an application of Cauchy-Schwarz inequality in the right hand side of the identity T Eu u (3) v(t)= (ξt ut) v˙(s) dXs , for t [0, T ], − 0 ∈ h Z i t valid for every deterministic process v Θ (thus, v(t) := 0 v˙(s) ds) and then choosing ∈ R v˙(s) = 1[0,t](s). In turn, the proof of (3) uses fact that, for every ε , it holds u + ε v Θ, thus R ∈ ∈ Eu+ε v[ ξ ]= Eu+ε v[ u + ε v(t)]= Eu+ε v[ u ]+ ε v(t), for t [0, T ]. t t t ∈ and differentiates with respect to ε at ε = 0 (exchanging between differentiation and expec- tation is justified by the finitness of the left hand side in (2), otherwise there is nothing to prove):

d Eu+ε v E d u+ε v [ ξt ut ]= (ξt ut) LT dε ε=0 − − dε ε=0 h T i Eu u = (ξt ut) v˙(s) dXs . − 0 h Z i Let us also notice that it is not necessary for Θ to be the whole set of drifts u such that Girsanov theorem applies tou ˙, and the following condition is sufficient: for every u Θ and deterministic v Θ, it holds u + v Θ. ∈ ∈ ∈ Remark 2.3. Back to the problem of adaptedness of ξ, it would be desirable to argue that general (not-necessarily adapted) estimators can not perform better than adapted ones, and the following argument might seem to go in that direction, but does not allow us to conclude. Let ξ be any unbiased estimator and for u Θ, consider the optional projection η of ξ, with u ∈ u u respect to the probability P , so that ηt := E [ ξt Ft ], for t [0, T ]. Then, E [ ηt ]= ut and it holds | ∈ Eu[ η u 2 ]= Eu Eu[ξ u F ]2 Eu[ ξ u 2 ]. | t − t| t − t | t ≤ | t − t | u However, this does not entail that η performs better that ξ, since η = η depends also on u, thus it is not an estimator. On the other side, if we keepu ¯ Θ fixed, then ηu¯ could be biased, i.e. Eu[ ηu¯ ] = Eu[ u ] for some u Θ, t [0, T ]. ∈ t 6 t ∈ ∈ Remark 2.4. Similarly to the mean squared error, one can consider the risk defined by Lp norms, for p (1, ): ∈ ∞ T Eu[ ξ u p ] µ(dt). | t − t | Z0 4 ENIMUSTA,MAURIZIOPRATELLI,ANDDARIOTREVISAN

Again, by direct inspection of the proof in [PR08], applying H¨older inequality (with conjugate exponents (p,q)) instead of Cauchy-Schwarz inequality in (3), we obtain an inequality of the form p Eu p v(t) 1 p/2 [ ξt ut ] | | p/2 p/q t , for t [0, T ], | − | ≥ p/q t 2 ≥ cq ∈ cq 0 v˙ (s) ds E q where cq := [ Y ] is the q-th moment R of a N (0, 1) random variable Y . Integration with respect to µ then| provides| a Cramer-Rao type lower bound. However, letting ξ = X, one has Eu[ X u p ]= Eu[ Xu p ]= c tp/2, for t [0, T ], | t − t| | t | p ∈ thus X is not an efficient estimator in Lp(Ω [0, T ]) for p = 2. × 6 1 1 In all what follows, we let H0 (= H0 (0, T )) be the space of (continuous) functions in the t ˙ ˙ 2 form h(t) = 0 h(s) ds, for t [0, T ], with h L (0, T ) (usually called, in this context, the ∈ ∈ 1 Cameron-Martin space), and we assume that, for every u Θ, h H0 , it holds u + h Θ. 1 R ˙ ∈ ∈ ∈ 1 The H “energy” functional, namely h H1 := h L2(0,T ) provides a Hilbert norm on H . 0 k k 0 k k 0 For simplicity of notation, we extend such a functional identically to + for any Borel curve R 1 ∞ h : [0, T ] which do not belong to H0 . → 1 C1/2 We notice that H0 is included in (0, T ), the space of 1/2-H¨older continuous functions: since the paths of the Brownian motion are not in 1/2-H¨older continuous, we deduce that 1 the process X is not H0 -valued (negligibility of the Cameron-Martin space holds true also for 1 abstract, infinite-dimensional, Wiener spaces). However, since the drift u takes values in H0 , it is natural to look for an estimator ξ sharing this property. Our first result shows that, if we require ξ to be unbiased, this is not possible, i.e. such an estimator ξ has necessarily infinite 1 H0 risk. Theorem 2.5 (Estimators in H1). Let ξ be an estimator such that, for some u Θ, it holds 0 ∈ Eu 2 [ ξ u H1 ] < . k − k 0 ∞ Then, ξ is not unbiased. Before we address the proof for general, possibly non-adapted, estimators, we give the following argument that exploits Ito formula: actually it is longer, but we feel that it is more of stochastic flavor. Proof. (Case of adapted estimators.) Let us assume, by contradiction, that ξ is unbiased, 2 Pu 1 1 thus by difference, ξ L (Ω, ; H0 ). For every (deterministic) v H0 , arguing as above for the deduction of (3),∈ we obtain that ∈ t t Eu ˙ u v(t)= (ξs u˙ s) ds v˙(s) dXs , for t [0, T ], 0 − 0 ∈ h Z Z i where stochastic integration reduces to the interval [0,t] because of the adaptedness assump- tion. Integrating by parts (i.e., using Ito’s formula) we rewrite the random variable above as t s t s u ˙ ˙ u v˙(r) dXr (ξs u˙ s) ds + (ξr u˙ r) dr v˙(s) dXs 0 0 − 0 0 − Z  Z  Z  Z1  obtaining the right analogue of (3) for the study of H0 energy: t s Eu u ˙ v(t)= v˙(r) dXr (ξs u˙ s) ds , for t [0, T ]. 0 0 − ∈ h Z  Z  i FUNCTIONALCRAMER-RAOBOUNDSINSOBOLEVSPACES 5

Indeed, Cauchy-Schwarz inequality and Ito’s isometry give t s t v(t)2 Eu v˙(r) dXu 2 ds Eu (ξ˙ u˙ )2 ds ≤ r s − s Z0 Z0 Z0 th s  t i h i 2 u 2 = v˙ (r) dr ds E [ (ξ˙s u˙ s) ] ds 0 0 0 − Z t  Z  t Z = (t s)v ˙2(s)ds Eu[ (ξ˙ u˙ )2 ] ds. − s − s Z0 Z0 In particular, choosing t = T , we deduce 2 Eu 2 v(T ) [ ξ u H1 ] . k − k 0 ≥ T (T t)v ˙2(t) dt 0 − To obtain a contradiction, it is enough to proveR that for every constant c > 0, there exists v˙ L2(0, T ) such that the left hand side above is greater than c, i.e., ∈ T 2 T (4) v˙(t) dt c (T t)v ˙(t)2 dt. ≥ − Z0  Z0 1 Indeed, if we letv ˙(t)= (T −t)α for some 0 <α< 1, it holds 2 T T 1−α 2 T T 2(1−α) v˙(t) dt = and (T t)v ˙2(t) dt = . 1 α − 2(1 α) Z0   −  Z0 − It is then sufficient to let α 1 to conclude.  ↑ 1 Remark 2.6. Instead of the explicit construction of v H0 above, to obtain a contradiction we can also use the following duality result. On a measure∈ space (E, E,µ), if g 0 is a measurable function such that, for some constant c> 0, it holds ≥ 1/2 f gdµ c f 2 dµ , for every f L∞(µ), f 0, ≤ ∈ ≥ ZE ZE  2 then it holds g L (µ) with g 2 c. The easy proof follows from considering the ∈ k kL (µ) ≤ continuous, linear functional φ initially defined on L∞ L2(µ) by f f gdµ and then ∩ 7→ E apply Riesz theorem on its extension to L2(µ). R In the proof above, a contradiction immediately follows from (4), letting µ(dt) = (T t) dt and g(t) = (T t)−1. − − We now provide a complete proof of Theorem 2.5. 2 Pu 1 Proof. (General case.) Arguing by contradiction, we let ξ L (Ω, ; H0 ). For every (deter- ministic) v H1, arguing as above for the deduction of (3),∈ we obtain instead ∈ 0 t T Eu ˙ u v(t)= (ξs u˙ s) ds v˙(s) dXs , for t [0, T ]. 0 − 0 ∈ h Z Z i Then, we differentiate with respect to t [0, T ] (exchanging derivatives and expectation is ensured by the finite risk assumption), and∈ we obtain, for a.e. t [0, T ], ∈ T Eu ˙ u v˙(t)= (ξt u˙ t) v˙(s) dXs , − 0 h Z i 6 ENIMUSTA,MAURIZIOPRATELLI,ANDDARIOTREVISAN

At this stage, Cauchy-Schwarz inequality and Ito isometry yield T 2 u 2 2 (5) v˙(t) E ξ˙t u˙ t v˙(s) ds, for a.e. t [0, T ], | | ≤ | − | 0 | | ∈ h i Z From this inequality, we easily obtain a contradiction, arguing as follows. Let A [0, T ]bea Eu ˙ 2 ⊆ non-negligible Borel subset such that A [ ξt u˙ t ]dt < 1, which exists because of the finite risk assumption and uniform integrability (notice| − | that A does not depend upon v). Then, integrating the above inequality for t R A, we obtain ∈ T v˙(t) 2dt Eu ξ˙ u˙ 2 dt v˙(t) 2dt, | | ≤ | t − t| | | ZA ZA Z0 2 h 2i for everyv ˙ L (0, T ), in particular for everyv ˙ L (A). Simply takingv ˙ = 1A, we obtain the required∈ contradiction. ∈  1 Actually, the result on the absence of unbiased estimators in H0 can be slightly strength- ened, allowing for estimator whose bias is sufficiently regular. We state it as a corollary (of the proof), remarking that similar deductions could be performed also in the cases that we consider below. u Corollary 2.7. Let ξ be an estimator such that, for every u Θ, t [0, T ], ξt is P -integrable, and it holds, for some C = (C ) L2(0, T ) (possibly depending∈ ∈ upon u Θ), t t∈[0,T ] ∈ ∈ d d E 1 u+εv[ ξt ut ] Ct v H1 , a.e. t [0, T ], for every v H0 . dt dε ε=0 − ≤ k k 0 ∈ ∈

Then, the H1 risk of the estimator ξ is infinite, i.e. 0 Eu 2 [ ξ u H1 ] ds = , for every u Θ. k − k 0 ∞ ∈ Proof. We argue exactly as in the proof above, but we write Eu+ε v Eu+ε v u+ε v [ ξt ]= [ ut ]+ ε v(t)+ bt . u Eu where bt := [ ξt ut ] is the bias. After differentiation with respect to ε and t, we obtain Eu ˙ −2 2 Eu ˙ 2 (5) with [ ξt u˙ t ]+ Ct in place of [ ξt u˙ t ] and we conclude arguing as in the proof above. | − | | − |  1 α,2 2 We address now analogous results for the intermediate spaces H0 W L , for α (0, 1), defined as follows. ⊂ ⊂ ∈ Definition 2.8. For α (0, 1), p (1, ), the fractional Sobolev space W α,p(= W α,p(0, T )) is defined as the space of∈ functions∈u ∞Lp(0, T ) such that their “energy” functional ∈ T T p p ut us u W α,p := | − pα+1| dt ds k k 0 t s Z0 Z0 | − | is finite. We refer to [DNPV12] for a survey of the theory of fractional Sobolev spaces, although here we need nothing more than the definition above. The space W α,p, endowed with a suitable norm, interpolates (in the sense that could be made precise) between the Sobolev space W 1,p ′ and Lp; for example, it holds W α ,p W α,p for 0 < α α′ < 1, and W α,2 H1, with ⊆ ≤ ⊆ T T r 2 2 1 2 (6) u α,2 2 u˙ r 2α ds dt dr Cα,T u H1 . k kW0 ≤ | | t s ≤ k k 0 Z0 Zr Z0 | − | FUNCTIONALCRAMER-RAOBOUNDSINSOBOLEVSPACES 7

1 From this inequality, the above theorem for estimators in H0 could be also obtained by the next results. Let us first consider the Cramer-Rao bound in the quadratic case. Theorem 2.9 (Cramer-Rao inequality in W α,2). Let ξ be an unbiased estimator. For every α (0, 1), it holds ∈ T T u 2 1 E ξ u α,2 dt ds, for every u Θ. W 2α k − k 0 ≥ 0 0 t s ∈ h i Z Z | − | Equality is attained by the (efficient) estimator ξ = X. In particular, if an estimator ξ has finite W α,2 risk for some α [1/2, 1) and u Θ, then it is not unbiased. ∈ ∈

Proof. We introduce the notation ∆t := ξt ut, for t [0, T ], so that, by Fubini theorem, we write − ∈ T T Eu 2 Eu 2 [ ∆t ∆s ] ξ u α,2 = | −2 α+1| dt ds. k − kW0 t s Z0 Z0 | − | h i 1 If ξ is an unbiased estimator and v H0 , we argue (once again) to obtain (3), and subtract such identity for s, t [0, T ], thus ∈ ∈ T Eu u v(t) v(s)= (∆t ∆s) v˙(r) dXr . − − 0 h Z i Hence, Cauchy-Schwarz inequality and Ito isometry give the lower bound 2 u 2 v(t) v(s) E [ ∆t ∆s ] | − | , for s, t [0, T ]. | − | ≥ T 2 ∈ 0 v˙ (s) ds

We letv ˙(r) = 1[s∧t,s∨t](r), so that R Eu[ ∆ ∆ 2 ] t s for s, t [0, T ]. | t − s| ≥ | − | ∈ The Cramer-Rao then follows: T T Eu[ ∆ ∆ 2 ] T T 1 | t − s| dt ds dt ds. t s 2 α+1 ≥ t s 2 α Z0 Z0 | − | Z0 Z0 | − | Finally, if ξ = X, then X u = Xu, thus it holds − Eu[ Xu Xu 2 ]= t s , for s, t [0, T ]. | t − s | | − | ∈ and the Cramer-Rao lower bound is attained: T T Eu[ Xu Xu 2 ] T T 1 | t − s | dt ds = dt ds. t s 2 α+1 t s 2 α Z0 Z0 | − | Z0 Z0 | − | 

In the case of a general exponent p (1, ) (with q = p/(p 1)), arguing similarly, we α,p ∈ ∞ −q obtain the following bound, in W . As above, we let cq = E[ Y ] be the q-th moment of a standard Gaussian random variable. | | 8 ENIMUSTA,MAURIZIOPRATELLI,ANDDARIOTREVISAN

Theorem 2.10 (Cramer-Rao inequality in W α,p). Let ξ be an unbiased estimator. For every α (0, 1), p (1, ), it holds ∈ ∈ ∞ 1−pα+p/2 Eu p 1 2 T ξ u W α,p . k − k 0 ≥ cp/q p max 0, (1/2 α) (1 + p(1/2 α)) h i q { − } − Since Eu[ Xu Xu p ]= c t s p/2, | t − s | p | − | the risk of the estimator ξ = X is given by T T Eu u u p T T [ Xt Xs ] 1 | − | dt ds = cp dt ds. t s p α+1 t s p α+1−p/2 Z0 Z0 | − | Z0 Z0 | − | As in Remark 2.4 above, we conclude that X is not an efficient estimator with respect to the risk in W α,p, for p = 2. 6 Remark 2.11. Before we conclude this section, we remark that all the bounds above can be generalized (at least) to the case of a continuous Gaussian martingale, with process t σ2 ds, t [0, T ] and also by introducing different energies, such as 0 s ∈ R T T u(t) u(s) p | − | µ(dt, ds), t s αp+1 Z0 Z0 | − | where µ is a measure on [0, T ] (a natural choice would be to take µ somehow related to σ2). However, we choose to limit the discussion to the case of the Brownian motion, to limit technicalities and emphasize the role played by the norm chosen to estimate the risk.

3. Intensity estimation for the Cox process Throughout this section, we fix T 0 and let X = (X ) be a Poisson process defined ≥ t t∈[0,T ] on some filtered probability space (Ω, F, (F ) , P), with jump times (T ) (for k 1, t t∈[0,T ] k k≥1 ≥ we let Tk(ω) = T in the eventuality that no k-th jump occur). As a space of parameters Θ, we consider the set of all absolutely continuous, (strictly) increasing, F0-measurable processes u = (ut)t∈[0,T ] such that their a.e. derivatives (u ˙ t)t∈[0,T ] satisfy the assumptions of Girsanov theorem for the Poisson process (the proofs work also for slightly smaller sets). Given u Θ, we define the probability Pu := Lu P, where ∈

XT T u L := u˙ Tk exp (u ˙ s 1 ) ds . − 0 − kY=1  Z  Girsanov theorem entails that, with respect to the probability Pu, the process X is a Cox process with intensity (u ˙ t)t∈[0,T ] (see e.g. [JYC09, Section 8.4] for details on related doubly u stochastic Poisson processes). Notice that P (A) does not depend on u for A F0, thus e.g. v ∈v for t [0, T ], v Θ, ut is integrable with respect to P and its expectation E [ut] actually does not∈ depend∈ on v. We address the problem of estimating u, or equivalently the intensity of X w.r.t. Pu, based on a single observation of X. In the case of a deterministic intensity, i.e when X is an inhomogeneous Poisson process, this is investigated e.g. in [PR09], and, similarly to the case of shifted Brownian motion, the following definition is given. FUNCTIONALCRAMER-RAOBOUNDSINSOBOLEVSPACES 9

Definition 3.1. Any measurable stochastic process ξ :Ω [0, T ] R is called an estimator of the intensity u. An estimator of the intensity u is said× to be unbiased→ if, for every u Θ, t [0, T ], ξ is integrable and it holds Eu[ ξ ]= E[ u ]. ∈ ∈ t t t As in the previous section, we forgo to specify “of the intensity u” and simply refer to estimators. Privault and R´evelliac studied the estimation problem, in the case of deterministic inten- sities, w.r.t. the risk in L2(µ), defined as in (1), for any finite Borel measure on [0, T ]. Their set of parameters Θ consists of all the space of deterministic absolutely continuous, increasing processes u, see [PR09, Definition 2.1]. We briefly show how a similar argument indeed applies as well to the case of stochastic intensities. Theorem 3.2 (Cramer-Rao inequality in L2(µ)). For any unbiased estimator ξ, it holds T u 2 u E [ ξ u 2 ] E [u ] µ(dt), for every u Θ, k − kL (µ) ≥ t ∈ Z0 and equality is attained by the (efficient) estimator ξ = X. Proof. For every process v Θ, since ξ is unbiased we have ∈ Eu+ε v[ ξ ]= Eu+ε v[ u + ε v ]= Eu+ε v[ u ]+ ε Eu+εv[v ], for t [0, T ]. t t t t t ∈ Differentiating w.r.t. ε, as in in [PR09, Proposition 2.3] we obtain the identity

u d u+ε v E [ vt ]= E [ ξt ut ] dε ε=0 − (7) T u v˙s = E (ξt ut) (dXs u˙ s ds) . − 0 u˙ s − h Z i By Cauchy-Schwarz inequality and the fact that X is a Cox process with intensityu ˙, we get, for t [0, T ], ∈ T v˙ 2 Eu[ v ]2 Eu[ (ξ u )2 ] Eu s ds thus Eu[ (ξ u )2 ] Eu[ u ], t ≤ t − t u˙ t − t ≥ t Z0 s h i  once we letv ˙ =u ˙ 1[0,t]. The thesis follows by integration w.r.t. µ. Differently from the case of Brownian motion, the lower bound depends on the parameter u Θ. This is quite natural in view of the classical, finite-dimensional, Cramer-Rao lower bound,∈ where the inverse of the Fisher information appears, measuring the local regularity of the densities: when u is small, the density becomes very peaked and the bound becomes trivial. Since the intensity u Θ is absolutely continuous, also in this case we investigate lower 1 ∈ bounds for the H0 risk: also in this case, no unbiased estimators exist. In the next result, we also collect the case of fractional Sobolev spaces W α,2, for α (0, 1). ∈ Theorem 3.3. For any unbiased estimator ξ, α (0, 1), it holds ∈ T T r Eu 2 Eu 1 [ ξ u α,2 ] 2 [u ˙ r] 2α+1 dsdtdr, k − kW0 ≥ (t s) Z0 Zr Z0 − for every u Θ. There exists no unbiased estimator ξ with finite risk in W α,2 for α [1/2, 1), ∈ 1 ∈ as well as in H0 . 10 ENI MUSTA, MAURIZIO PRATELLI, AND DARIO TREVISAN

Proof. We subtract (7) for two different times s, t [0, T ], and apply Cauchy-Schwarz, obtaining ∈ Eu 2 Eu 2 [ vt vs ] [ ∆t ∆s ] | − 2 | . | − | ≥ Eu T v˙s ds 0 u˙ s

Hence, takingv ˙r = 1[s∧t,s∨t](r)u ˙ r, we have hR i Eu[ ∆ ∆ 2 ] Eu[ u u ], for every s, t [0, T ]. | t − s| ≥ | t − s| ∈ Eu t If s 0 for a.e. r [0, T ]. The case of H0 follows at once from inequality∞ (6).∈ ∈ 

4. of variations In this section, we briefly recall some results concerning Malliavin Calculus on the classical Wiener space (we refer to the monograph [Nua06] for details), limiting ourselves the essentials for constructing super-efficient estimators. In the framework of Section 2, i.e. if X = (Xt)t∈[0,T ] is a Brownian motion (on the finite F F P interval [0, T ]), defined on some filtered probability space (Ω, , ( t)t∈[0,T ], ), we introduce the space S of smooth functionals, as those in the form

F = φ (Xt1 ,...,Xtn ) , for some t ,...,t [0, T ] and φ C∞(Rn) (n 0). The Malliavin derivative DF is then 1 n ∈ ∈ b ≥ defined as the L2(0, T )-valued random variable n ∂φ DtF := (Xt1 ,...,Xtn ) 1[0,ti](t), for a.e. t [0, T ]. ∂xi ∈ Xi=1 For h L2(0, T ), we let D F := T D F h(t)dt (in the classical Wiener space framework, this ∈ h 0 t corresponds to differentiation along the direction in H1 given by h˜(t)= t h(s)ds, t [0, T ]: R 0 0 ∈ differently from the previous sections, we prefer to focus on the space L2(0, T ) instead of H1). R 0 The Cameron-Martin theorem entails the following integration by parts formula for smooth functionals. Proposition 4.1. Let F S and h L2(0, T ). Then, it holds ∈ ∈ ∗ (8) E[ DhF ]= E [ F h ] , ∗ T where we let h = 0 h(s)dXs be the Ito(-Wiener) integral. A straightforwardR consequence of the integration by parts formula above is closability for the operator D : S L2(Ω) L2(Ω [0, T ]). The domain of its closure defines the Sobolev- Malliavin space D1,⊂2, on which→ the operator× D extends continuously. Proposition 4.2 (chain rule). Let F ,...,F D1,2 and φ C1(Rn). Then, it holds 1 n ∈ ∈ b φ(F ,...,F ) D1,2 with 1 n ∈ n ∂φ Dtφ(F1,...,Fn)= (F1,...,Fn) DtFi, for a.e. t [0, T ]. ∂xi ∈ Xi=1 FUNCTIONALCRAMER-RAOBOUNDSINSOBOLEVSPACES 11

Remark 4.3 (Malliavin Calculus for a Cox process). It seems reasonable to develop a theory of differential calculus for Cox processes, akin to that for Poisson processes introduced [PR09]: F F P in the setting of Section 3, i.e., if we let (Xt)t∈[0,T ] be a Cox process on (Ω, , ( t)t∈[0,T ], ), S with intensity λ = (λt)t∈[0,T ] and jump times (Tk)k≥1. Then, we let be the space of random variables F in the form ∞

F = f0 1{XT =0} + 1{XT =n} fn(T1,...,Tn), n=1 X n n where, for n 0, fn :Ω R R is bounded, measurable with respect to F0 B(R ) (i.e. its randomness depends≥ only× on →λ) and for every ω Ω, f (ω; ) is C∞(Rn) and× symmetric, i.e., ∈ n · b fn(ω; t1,...,tn) is left unchanged by any permutation of the coordinates (t1,...,tn) and that, for every n 0, it holds fn(ω; t1,...,tn)= fn+1(ω; t1,...,tn, T ), for ω Ω, t1,...,tn R. For F S≥, we may let DF (ω) L2(0, T ) ∈ ∈ ∈ ∈ ∞ n 1 D F := 1 1 (t) ∂ f (T ,...,T ) λ , t − {XT =n} [0,Tk] λ k n 1 n t n=1 Tk X Xk=1 for a.e. t [0, T ]. One can∈ prove the validity of the chain rule and an integration-by-parts formula, providing some notion of divergence, thus defining Sobolev-Malliavin spaces in this setting. However, it is presently not clear how to effectively use such calculus to produce super-efficient Stein-type estimators, see Remark 5.2 below.

5. Super-efficient estimators In this section, we address the problem of Stein type, super-efficient estimators for the drift of a shifted Brownian motion, with respect to risks computed in the Sobolev spaces introduced above. For L2(µ)-type risks, super-efficient estimators in the form X + ξ were first studied in [PR08]. Privault and R´eveillac consider a process ξ = D log F , t [0, T ], where F is any t 1[0,t] ∈ P-a.s. non-negative random variable in D1,2 such that √F is ∆-superharmonic w.r.t. a suitable “Laplacian” operator, actually related to the structure of the risk considered (which is not, in the Gaussian case, the usual Gross-Malliavin Laplacian). We show that a similar approach leads to super-efficient estimators also in fractional Sobolev spaces W α,2, for α [0, 1/2) (of course, this perturbative approach does not provide any information for larger∈ values of α). Eu 2 Indeed, for every ξ = (ξt)t∈[0,T ], with [ ξ 2,α ] < , we write k kW0 ∞

u 2 u 2 2 E [ X + ξ u α,2 ]= E X u α,2 + ξ α,2 + k − kW0 k − kW0 k kW0 h i + 2 Eu (ξ ξ ) [(X u ) (X u )] dµ (s,t), t − s t − t − s − s α Z   −2α−1 2 where we introduce the Borel measure µα(ds, dt) = 2(t s) 1{s

2 (9) ∆αF := (D˜s,t) Fµα(ds, dt), 2 Z[0,T ] initially defined on S. Arguing e.g. as in [PR08, Proposition 4.5], it is possible to show that ∆ : S L2(Ω, Pu) L2(Ω, Pu) is closable and that the random variables G D1,2, with α ⊆ → ∈ (10) D˜ G D1,2, for a.e. s, t [0, T ] and D˜ 2 G L2 Ω [0, T ]2, P µ , s,t ∈ ∈ s,t ∈ × × α belong to the domain of the closure, so that ∆αG is well-defined (actually, by the same expression as in (9)). Moreover, the operator ∆α is of diffusion type, i.e., for every F1,...,Fn S 2 Rn ∈ , φ Cb ( ), the function φ F (we write F = (F1,...,Fn)) belongs to the domain of ∆α, and it∈ holds ◦ n n ∂φ ∂2φ (11) ∆α(φ F)= (F)∆αFi + (F)Γα(Fi, Fj), P-a.e. in Ω, ◦ ∂xi ∂xi∂xj Xi=1 i,jX=1 with Γ (F , F ) = D˜ F D˜ F µ (ds, dt), for i, j 1,...,n (the Malliavin matrix α i j [0,T ]2 s,t i s,t j α ∈ { } associated to (F )n ). This identity, by density, extends under natural integrability assump- i i=1R tions on F as well as on φ. The operator ∆α enters in the picture if we assume that process ξ is of the form ξt = 2 1,2 2 D˜ 0,t log F , t [0, T ], for some P-a.e. positive random variable F D , with G = log F satisfying (10).∈ If we are in a position to apply the chain rule (11),∈ it holds ∆ F 2 ∆ log F 2 = 2 α Γ (F, F ) α F − F 2 α 2∆ F 1 = α Γ (log F 2, log F 2) F − 2 α which can be explicitly written in terms of ξ as

4∆F 2 = 2D˜ s,t(ξt ξs)+ ξt ξs µα(ds, dt). F [0,T ]2 − | − | Z h i As a result, we obtain

u 2 u ∆αF E X + ξ u α,2 = ρ + 4 E . k − kW0 F h i   Therefore, in order to find super-efficient estimators, it is enough to prove existence of some ξ (independent of u) that can be written in terms of some F (possibly depending on u), with FUNCTIONALCRAMER-RAOBOUNDSINSOBOLEVSPACES 13

u ∆αF 0 (i.e., super-harmonic) with strict inequality on a set of positive P (or equivalently P) measure.≤ In case of shifted Brownian motion, we provide the following

Example 5.1. Let F be a r.v. of the form F = φ Xt1 , Xt2 Xt1 ,...,Xtn Xtn−1 ), for some n n − − 0 = t0 < t1 <...

Γα(δiX, δj X)= D˜s,tδiXD˜s,tδjXµα(ds, dt), for i, j 1,...,n , 2 ∈ { } Z[0,T ] with the notation δiX = Xti Xti−1 . Before we proceed further,− we have to take into account that, with different probabilities Pu u u , the r.v.’s may have different derivatives DF = D F and Laplacians ∆αF = ∆αF , since the calculus w.r.t. Pu is “modelled” on the process Xu = X u, thus, for h L2(0, T ), t [0, T ], it holds − ∈ ∈ t u DhXt = DhXt + Dhut = h(s)ds + Dhut Z0 and u ∆αXt =∆αXt +∆αut =∆αut, provided that ut is sufficiently regular. To proceed further with computations, we assume 1 that the process u is deterministic i.e. we restrict the space of parameters Θ to H0 only, so that Dhut =∆αut = 0, ruling out the problem of possible dependence upon u of the Malliavin calculus that we consider. Then, (11) reduces to n ∂2φ ∆αF = ai,j, ∂xi∂xj i,jX=1 where, for i, j 1,...,n , with t = 0, ∈ { } 0 t t

ai,j := 1[ti−1,ti](r)dr 1[tj−1,tj ](r)drµα(dt, ds). 2 Z[0,T ] Zs Zs n To prove that the symmetric matrix A := (aij)i,j=1 is well-defined and invertible, we argue as follows: for every v = (v )n , it holds, using the notation Av, v := n a v v , i i=1 h i i,j i,j i j n t t P Av, v = vivj 1[ti−1,ti](r)dr 1[tj−1,tj ](r)drµα(dt, ds) h i [0,T ]2 s s Z Xi,j Z Z 2 t n

= vi1[ti−1,ti](r)dr µα(dt, ds) [0,T ]2 s ! Z Z Xi=1 2 2 = v˜(t) v˜(s) µα(dt, ds)= v˜ α,2 , 2 | − | k kW0 Z[0,T ] t n where we letv ˜(t) = 0 i=1 1[ti−1,ti](s)vids. From this identity and (6) we deduce that A is well-defined, while non-degeneracy follows from the fact that, if v˜ W α,2 = 0, thenv ˜ is R P k k 0 constant, which cannot happen except when v = 0. n We let B := (bi,j)i,j=1 be the inverse matrix of A, and consider the function φ(x) := Bx,x a , x Rn, h i ∈ 14 ENI MUSTA, MAURIZIO PRATELLI, AND DARIO TREVISAN for a suitable choice of a R. Then, by formally applying the the chain rule in Rn, it holds n∈ ∂2φ a = 2a(2(a 1) + n) Bx,x a−1 , ∂xi∂xj i,j − h i Xi,j which suggests the choice a (1 n/2, 0) (and n 3). However, for a in this range, φ is 2 Rn ∈ − ≥ not Cb ( ) and in order to rigorously conclude super-efficiency for an estimator in the form 2 Xt + D˜ 0,t log F , t [0, T ], we have to justify all the applications of the chain rule above. Indeed, the only non-trivial∈ step is to prove the following estimate, for every u H1: ∈ 0 Eu B(δX), (δX) −1 < . h i ∞ h i n In turn, this holds true because we may pass to the joint law of δX = (δiX)i=1, which is Gaussian non-degenerate (possibly non-centred) and the integrand can then be estimated from above by some constant times the function x x −2 (here the assumption n 3 plays a role too). 7→ | | ≥ Next, to prove e.g. that log F 2 D1,2, with ∈n bi,jδiX1[t − ,t ](t) D log F 2 = 2a i,j=1 j 1 j , for a.e. t [0, T ], t B(δX), (δX) ∈ P h i it is sufficient to notice that, assuming this identity true, then we could estimate, by Cauchy- Schwarz inequality, T u 2 2 2 u −1 E [ Dt log F ]dt 4a T trace(B)E B(δX), (δX) . 0 | | ≤ h i Z h i This a priori estimate entails log F 2 D1,2, by suitably approximating the function z log z with smooth functions. ∈ 7→ E 2 Similarly, to estimate [ ξ α,2 ], we apply Cauchy-Schwarz and deduce, for s, t [0, T ], k kW0 ∈ with s

References

[CKH11] Jos´eM. Corcuera and Arturo Kohatsu-Higa. “Statistical inference and Malliavin calculus”. In: Seminar on Stochastic Analysis, Random Fields and Applications VI. Vol. 63. Progr. Probab. Birkh¨auser/Springer Basel AG, Basel, 2011, pp. 59– 82. [DNPV12] Eleonora Di Nezza, Giampiero Palatucci, and Enrico Valdinoci. “Hitchhiker’s guide to the fractional Sobolev spaces”. In: Bull. Sci. Math. 136.5 (2012), pp. 521– 573. [Gob01] Emmanuel Gobet. “Local asymptotic mixed normality property for elliptic dif- fusion: a Malliavin calculus approach”. In: Bernoulli 7.6 (2001), pp. 899–912. [JS61] W. James and Charles Stein. “Estimation with quadratic loss”. In: Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I. Univ. California Press, Berke- ley, Calif., 1961, pp. 361–379. [JYC09] Monique Jeanblanc, Marc Yor, and Marc Chesney. Mathematical methods for financial markets. Springer Finance. Springer-Verlag London, Ltd., London, 2009, pp. xxvi+732. [Liu13] Junfeng Liu. “Remarks on parameter estimation for the drift of fractional Brow- nian sheet”. In: Acta Math. Vietnam. 38.2 (2013), pp. 241–253. [NP12] Ivan Nourdin and Giovanni Peccati. Normal approximations with Malliavin cal- culus. Vol. 192. Cambridge Tracts in Mathematics. From Stein’s method to uni- versality. Cambridge University Press, Cambridge, 2012, pp. xiv+239. [Nua06] David Nualart. The Malliavin calculus and related topics. Second. Probability and its Applications (New York). Springer-Verlag, Berlin, 2006. [PR08] Nicolas Privault and Anthony R´eveillac. “Stein estimation for the drift of Gauss- ian processes using the Malliavin calculus”. In: Ann. Statist. 36.5 (2008), pp. 2531– 2550. [PR09] Nicolas Privault and Anthony R´eveillac. “Stein estimation of Poisson process intensities”. In: Stat. Inference Stoch. Process. 12.1 (2009), pp. 37–53. [PR11] Nicolas Privault and Anthony R´eveillac. “Sure shrinkage of Gaussian paths and signal identification”. In: ESAIM Probab. Stat. 15 (2011), pp. 180–196.

Delft University of Technology E-mail address: [email protected] Universita` degli Studi di Pisa E-mail address: [email protected] Universita` degli Studi di Pisa E-mail address: [email protected]