.

STOCHASTIC DIFFERENTIAL EQUATIONS AND HYPOELLIPTIC OPERATORS

Denis R. Bell Department of Mathematics University of North Florida Jacksonville, FL 32224 U. S. A.

In Real and Stochastic Analysis, 9-42, Trends Math. ed. M. M. Rao, Birkhauser, Boston. MA, 2004.

1 1. Introduction The first half of the twentieth century saw some remarkable developments in analytic . Wiener constructed a rigorous mathematical model of Brownian motion. Kolmogorov discovered that the transition probabilities of a diffusion process define a fundamental solution to the associated heat equation. Itˆo developed a stochas- tic which made it possible to represent a diffusion with a given (infinitesimal) generator as the solution of a stochastic differential equation. These developments cre- ated a link between the fields of partial differential equations and stochastic analysis whereby results in the former area could be used to prove results in the latter. d More specifically, let X0,...,Xn denote a collection of smooth vector fields on R , regarded also as first-order differential operators, and define the second-order differen- tial operator n 1 L ≡ X2 + X . (1.1) 2 i 0 i=1 Consider also the Stratonovich stochastic differential equation (sde)

n dξt = Xi(ξt) ◦ dwi + X0(ξt)dt (1.2) i=1 where w =(w1,...,wn)isann−dimensional standard . Then the solution ξ to (1.2) is a time-homogeneous Markov process with generator L, whose transition probabilities p(t, x, dy) satisfy the following PDE (known as the Kolmogorov forward equation) in the weak sense

∂p = L∗p. ∂t y A differential operator G is said to be hypoelliptic if, whenever Gu is smooth for a distribution u defined on some open subset of the domain of G, then u is smooth. In 1967, H¨ormander proved that an operator of the form L in (1.1) is hypoelliptic if the Lie algebra generated by X0,...,Xn has dimension d at each point*. It follows immediately that the transition probabilities p of ξ in (1.2) have smooth densities − ∗ at all positive times provided the parabolic operator ∂/∂t Ly satisfies H¨ormander’s condition at ξ0. Of course, this is a rather circuitous route for proving a result of a decidedly probabilistic nature, a fact that had hardly escaped the notice of PDE theorists and stochastic analysts alike. In the mid-seventies Malliavin ([Ma1], [Ma2]) outlined a new and innovative method for directly proving the smoothness of the transition probabilities p. The idea is as follows: the measure p(t, x, dy) is the image of Wiener measure under the Itˆo

* This hypothesis will be referred to as H¨ormander’s (Lie bracket) condition.

2 map g : w → ξt. Now the Wiener measure, which can be considered as an infinite- dimensional analogue of standard Gaussian measure on Euclidean space, has a well- understood analytic structure. If the map g were smooth, then regularity properties of p could be studied by a process of integration by parts (cf. Section 2). In fact, this is not the case; g is only defined up to a set of full Wiener measure and is most pathological from the standpoint of classical calculus. Malliavin solved this problem by constructing an extended calculus applicable to solution maps of sde’s. His method (which has since been termed Malliavin calculus) was based on the infinite-dimensional Ornstein-Uhlenbeck semigroup and was rather elaborate. It has since been simplified and extended by many authors and has become a powerful tool in stochastic analysis. Following further pioneering work by Kusuoka and Stroock, it has led to a complete probabilistic proof of H¨ormander’s theorem, as Malliavin originally intended, and has inspired a host of other results. Examples include filtering theorems by Michel [Mi], a deeper understanding of the Skorohod and the development of an anticipating by Nualart and Pardoux [NP], an extension of Clark’s formula by Ocone [O], and Bismut’s probabilistic analysis of the small-time asymptotics of the heat kernel of the Dirac operator on a Riemannian manifold [Bi2]. In this article, I describe my own contributions to the subject. The interested reader is encouraged to also study the aforementioned works. The contents of the article are as follows. In Section 2, we present an elemen- tary derivation of Malliavin’s integration by parts formula by which one establishes smoothness of the transition probabilities discussed above. We also include a result of Bismut that illustrates how this formula is related to H¨ormander’s condition. Much of the notation, together with the basic tools that will be used later, are introduced in this section of the article. The material presented in Sections 3 and 4 is joint work with S. Mohammed. In Section 3, we prove a sharp generalization of H¨ormander’s theorem. This result asserts that the operator L in (1.1) is hypoelliptic under hypotheses that allow H¨ormander’s Lie bracket condition to fail on hypersurfaces in the domain of L. The theorem establishes the hupoellipticity of a large class of operators with points of infinite-type, and is sharp in the category of H¨ormander operators with smooth coefficients. In Section 4, we establish sufficient conditions for the existence of smooth densities for a class of stochastic functional equations of the form

dxt = g(xt−r)dw + B(t, x)dt (1.3) where g denotes a matrix-valued function, r is a positive time-delay, and B is a non- anticipating functional defined on the space of paths. Our hypotheses allow degeneracy of g on a hypersurface in the ambient space of the equation. Probabilistically, there is an important difference between the solution ξ to equation (1.2) and the solution x

3 to (1.3). Namely, while ξ is a Markov process, x is non-Markov. As a consequence, the aforementioned result cannot be proved by appealing to existing results in PDE theory. Finally, in Section 5 we describe some open problems, which we hope will stimulate further study into this interesting subject.

2. Integration by parts and the regularity of induced measures As in Section 1, let ξ denote the solution to the sde (1.2). In this section we will give an elementary proof of a key result in Malliavin’s paper [Ma1], which gives a criterion under which the random variables ξt have absolutely continuous distributions at all positive times t. The material in this section is taken from the author’s doctoral dissertation [Be1]. Following Malliavin, we will make use the following result from harmonic analysis (see [Ma1] for a proof).

Lemma 2.1. Let µ denote a finite Borel measure on Rd. Suppose that for k ∈ N and each set of non-negative integers d1,...,dk, there exists a constant C such that for all test (C∞ compact support) functions φ on Rd

∂d1+...dk ≤ || ||∞ d d φdν C φ . (2.1) d 1 k R ∂x1 ...∂xd

If this condition holds for k =1, then the measure ν is absolutely continuous with respect to the Lebesgue measure on Rd. If the condition holds for every k ≥ 1, then ν has a smooth density.

The following result is included in order to motivate the approach that follows. It treats a finite-dimensional analogue of the infinite-dimensional problem that will be studied later.

Theorem 2.2. Suppose T : Rm → Rd is a C2 map. Let dγ(x)=(2π)−m/2 × exp(−|x|2/2)dx denote the standard Gaussian measure on Rm. Let ν ≡ T (γ) denote the induced measure on Rd, ν(B) ≡ γ(T −1(B) for Borel subsets B ⊆ Rd. Define

N ≡{x ∈ Rd/DT (x) ∈ L(Rm, Rd) is non surjective}.

Then the condition γ(N)=0 (2.2) is necessary and sufficient for absolute continuity of the measure ν.

4 Proof. The necessity of condition (2.2) follows immediately from Sard’s theorem which asserts that the set T (N) has zero Lebesgue measure. To prove sufficiency, we argue as follows. Define a d × d matrix

σ(x)=DT(x)DT(x)∗. (2.3)

Let {ψk} denote a sequence of smooth bump functions from [0, ∞) → [0, 1] such that

(i) ψk(t)=1if0≤ t ≤ k

(ii) ψk(t)=0ift ≥ k +1 ≥ | p | ∞ (iii) For each p 1, supk D ψk(t) < . d d For each k ∈ N, define Rk : R ⊗ R → [0, 1] by ψ (||α−1||),α∈ GL(d) R (α)= k k 0,α/∈ GL(d).

∞ d d Note that Rk is C , since GL(d) is an open subset of R ⊗R . We also define sequences d of measures {γk} on C0 and {νk} on R by

dγ k (x)=R (σ(x)),ν= T (γ ). dγ k k k

Assume now that (2.2) holds. Since N is precisely the set on which σ is degenerate, this is equivalent to the condition σ(x) ∈ GL(d) a.s., which implies νk → ν in variation norm. Hence it suffices to prove absolute continuity of νk for each k. We do this as d follows: let e1,...,ed denote the standard basis of R and define, for any 1 ≤ i ≤ d,a m m function hi : R → R by DT(x)∗σ−1(x)e ,σ∈ GL(d) h (x)= i i 0,σ/∈ GL(d).

d Let φ be a test function on R . By definition of hi,wehave ∂φ −1 dνk = Dφ(T (x)eiψk(||σ (x)||)dγ(x) Rd ∂xi Rm −1 = D(φ ◦ T )(x)hi(x)ψk(||σ (x)||)dγ(x). (2.4) Rm Define the (Gaussian) divergence operator Div acting on smooth functions G : Rn → Rn by Div(G)(x)=−T race DG(x).

Integrating by parts in (2.4) yields ∂φ dνk = φ ◦ T (x)Xi(x)dγ(x) Rd ∂xi Rm

5 where −1 Xi = Div[ψk(||σ ||)hi](2.5) 1 Since Xi is a with compact support, it follows that Xi ∈ L (γ). Thus (2.5) yields ∂φ dνk ≤||φ||∞||Xi||L1(γ) Rd ∂xi and the absolute continuity of νk follows from Lemma 2.1. With the above finite-dimensional case as motivation, we now move to the heart of the matter. Let γ denote the Wiener measure, defined on the space C0 of continuous paths {w :[0, 1] → Rn/w(0) = 0}. Recall that we wish to study the law ν of a random variable ξt, for t>0, where ξ is the solution to a stochastic differential equation. Let A and B denote bounded smooth maps from Rd into Rd ⊗ Rn and Rd respectively d with bounded derivatives of all orders, and let x ∈ R . As before, let w =(w1,...,wn) denote a standard Wiener process and consider the Itˆo sde t t ξt = x + A(ξs)dws + B(ξs)ds, t ≥ 0. (2.6) 0 0

Denoting by g the map w → ξ and by gt the composition of g with evaluation at time t>0, we have ν = gt(γ). We seek conditions under which the measure ν is absolutely continuous with respect to the Lebesgue measure on Rd. The setting looks similar to that of Theorem 2.2. There are, however, two important differences: (i) the measure γ is defined on an infinite-dimensional vector space

(ii) the map g (and hence gt)isnon-differentiable in the classical sense. Point (i) can be handled without too much difficulty. Although there is nothing like a

Lebesgue measure on C0, the usual backdrop for an integration by parts calculation, the Wiener measure has a well-developed analytic structure. In particular, there are formulae for integration by parts on Wiener space*. The second point is more serious. Malliavin developed his stochastic calculus of variations in [Ma1] in order to address this. We shall adopt here a more elementary approach, based on the follwing observa- tion. There is a Hilbert subspace H ⊂ C0 of paths, known as the Cameron-Martin space, canonically associated with the Wiener measure. The space H is the set of Lebesgue a.e. differentiable paths h such that 1 |h(t)|2dt < ∞. 0 equipped with the inner product 1   H ≡ dt, h, k ∈ H. 0 * See, for example, Gross’ divergence theorem for abstract Wiener spaces [G].

6 The point is that, when restricted to H, the map g becomes entirely regular. In fact, it is intuitively clear that the restriction* of g to H, which we denote byg ˜, is the map h ∈ H → k defined by the ordinary integral equation t t  ≥ kt = x + A(ks)hsds + B(ks)ds, t 0(2.7) 0 0 and it is easily seen that the mapg ˜ possesses the same degree of smoothness as the coefficient functions A and B. For example, an equation for the derivative η ≡ Drg˜(h), r ∈ H, is obtained by formally differentiating in (2.7) with respect to h.Thusη satisfies t   ηt = DA(ks)(ηs,hs)+A(ks)rs + DB(ks)ηs ds. 0 Integral equations for higher order derivatives ofg ˜ can be similarly obtained. The foregoing remarks provide an alternative elementary approach to Malliavin’s d work. As before, let φ denote a test function on R and letg ˜t denote the composition ofg ˜ with evaluation at time t. Suppose that for each standard basis vector ei, one can m construct a sequence of paths hi such that

m Dg˜t(Pmw)hi = ei.

Applying the dominated convergence theorem and integrating by parts with respect to the measure γ, we will then have ∂φ dν = lim Dφ(˜gt(Pmw))eidγ d m R ∂xi C0 ◦ m m = lim D(φ gt )(Pmw)hi dγ m C 0 m = lim φ(˜gt(Pmw))Div[hi ]dγ m C0 { m } 1 If we can show that the sequence Div[hi ] m is bounded in L (γ), then we obtain ∂φ ≤|| || || m || dν φ ∞ sup Div[hi ] L1(γ) Rd ∂xi m

* The term “restriction” requires clarification because g is only defined up to a set of γ-measure 1, and γ(H) = 0. It is more correct to say that g stochastically extends g˜ in the following sense. For each m ∈ N, define Pm : C0 → H to be the operator that piecewise linearizes on the uniform partition [0, 1/m, 2/m ..., 1]. It can be shown that, as m →∞, Pm converges strongly to the identity map on C0, andg ˜(Pmw) → g(w) a.s.

7 and the absolute continuity of ν follows from Lemma 2.1 as before. We will actually carry out a modified version of the above procedure, using a sequence of piecewise linear approximations to g. Being finite-rank operators, these approximations finite-dimensionalize the diffrerential analysis in the problem. Thus, at each level of approximation, we need only perform an elementary integration by parts in Euclidean space, as in the proof of Theorem 2.2.

Our approximation scheme is as follows. For each m ∈ N and w ∈ C0, let ∆jw (= m − ∈ d ∆j w) denote w(j +1)t/m) w(tj/m). Define v0,vt/m,...,vt R inductively by v0 = x and

k−1 k−1 t v = x + A(v )∆ w + B(v ),k=1,...,m. (2.8) kt/m jt/m j m jt/m j=0 j=0

m d Let v :[0, 1] → R denote the path piecewise linear between the points (kt/m, vkt/m), k =0,...,m and constant on [t, 1]. m ∞ It is easy to see that for each m, the map g : C0 → H is C . We prove in [Be2, pp 35 - 37]

Theorem 2.3. For every p ∈ N, lim sup E |ξ − vm|p =0 →∞ s s m s∈[0,t] where ξ is the solution to the sde (2.6). We now define an analogue of the matrix σ m ≡ m m ∗ ∈ d ⊗ d appearing in the proof of Theorem 2.2. Let σ (w) Dgt (w)Dgt (w) R R . We prove in [Be2, pp 37 - 39]

Theorem 2.4. As m →∞, the matrix sequence σm converges in probability to a limit σ in Rd ⊗ Rd. Let I denote the d × d identity matrix and consider the d × d matrix-valued equations s s Ys = I + DA(ξu)(Yu,dwu)+ DB(ξu)Yudu 0 0 and s s Zs = I − ZuDA(ξu)(., dwu) − ZuDB(ξu)du. 0 0 Then t ∗ ∗ ∗ σ = Yt ZsA(ξs)A(ξs) Zs ds Yt . (2.9) 0 where ∗ denotes matrix transpose.

8 Remarks

The matrix processes Yt and Zt have a natural interpretation in terms of the d stochastic flow of the sde (2.6), i.e. the random map on R , φt : x → ξt. It can ∞ be shown that φt is a.s. a C map and an easy computation shows that Yt = Dφt. ∈ ≥ −1 Furthermore, Yt GL(d) for all t 0 and Zt = Yt . The matrix σ defined in (2.9) is called the Malliavin covariance matrix associated to the random variable ξt. The next theorem shows that establishing non-degeneracy of this matrix is the key to proving regularity of the distribution of ξt, as might be suspected from Theorem 2.2 and its proof.

Theorem 2.5 (Malliavin). Suppose σ ∈ GL(d) a.s. Then the random variable ξt is absolutely continuous with respect to Lebesgue measure on Rd.

Our proof of Theorem 2.5 will make use of the following technical result, which is tailor-made for the estimations we will need to do later (see [Be2, pp34-35] for its proof).

d Lemma 2.6. Suppose that X, U0,...,Um−1 are R -valued random variables, n d V0,...,Vm−1 and Y0,...,Ym−1 are random linear maps from R to R , and Z0,..., d n d Zm−1 are random bilinear maps from R ×R to R , satisfying the following conditions

(i) For all 0 ≤ i ≤ m − 1,Ui,Vi,Yi, and Zi are measurable with respect to Fit/m, where { } { } Ft is the fitration generated by wt . (ii) max ||X||p, ||Ui||p, ||Vi||p, ||Yi||p, ||Zi||p, 0 ≤ i ≤ m − 1 ≤ M, where ||.||p denotes the Lp norms of the (norms of) the various quantities in their respective spaces.

Let ηkt/m, 0 ≤ k ≤ m, be random variables satisfying the equations

k−1 k−1 k−1 k−1 1 1 η = X + U + V (∆ w)+ Y (η )+ Z (η , ∆ w). kt/m m j j j m j jt/m j jt/m j j=0 j=0 j=0 j=0

Then there exists a constant N, depending only on M and p, such that

||ηkt/m||p ≤ N, ∀ 0 ≤ k ≤ m.

Proof of Theorem 2.5 Let Vm denote the finite-dimensional subspace of C0 consisting of paths that are piecewise linear between the times 0, t/m, . . . , t, and constant on [t, 1]. Following the method used to prove Theorem 2.2, we define

dγ k (w)=R (σ(w)),ν= g (γ ) dγ k k t k where Rk are as defined previously. As before, the assumption σ ∈ GL(d), a.s. implies

νk → ν in variation, so it suffices to prove that each νk is absolutely continuous.

9 d m Let e denote any unit vector in R and define h : C0 → H by Dgm(w)∗σm(w)−1e, σm ∈ GL(d) h (w)= t i 0,σm ∈/ GL(d).

Arguing as before and integrating by parts with respect to the measures Pm(γ) (note these are Gaussian measures on the finite-dimensional vector spaces Vm) yields ◦ m m ◦ m Dyφdνk = lim φ gt Div [h Rk σ ] dγ d m→∞ R C0 where Div is now defined with respect to the Cameron-Martin space H

Div G(w)=H −T race H DG(w).

To complete the proof, we must thus show that

m m sup E|Div [h Rk ◦ σ ]| < ∞ (2.10) m

First consider the inner product term in the divergence. This is non-zero only if σm ∈ GL(d) and ||σm||−1 ≤ k + 1. In this case

| m ◦ m | | m −1 m | E H = E < (σ ) e, Dgt (w)w>

≤ | m| (k +1)E ηt (2.11) where ηm = Dgm(w)w satisfies the equation

k−1 m m m ηkt/m = A(vit/m)∆iw + DA(vit/m)(ηit/m,i w)+t/mDB(vit/m)ηit/m , i=0

k =1,...,m.

| m| Since A, DA, and DB are bounded, Lemma 2.6 implies that E ηt is bounded and it m m follows from (2.11) that E| H | is bounded. It remains to show that the same holds for the second term in the divergence, i.e.

m m sup E|T raceH D[h Rk ◦ σ ]| < ∞. (2.12) m

n Let f1,...,fn be an orthonormal basis of R . For 1 ≤ r ≤ n and 0 ≤ l ≤ m − 1, define f rl ∈ H by   0, 0 ≤ s

10 rl The set Sm ≡{f , 1 ≤ r ≤ n, 0 ≤ l ≤ m−1} is orthonormal in H, thus can be extended ∈ ∩ c m to an orthonormal basis Bm of H. Note that for any f Bm Sm,Dg (w)f =0. Evaluating the Trace in (2.12) on the basis Bm gives

n,m−1 m m m m rl rl T raceH D[h Rk ◦ σ ]= r=1,l=0

n,m−1  ◦ m m −1 2 m rl rl = Rk σ (w) < (σ ) e, D gt (w)(f ,f ) > r=1,l=0 − m −1 m rl m −1 m rl < (σ ) Dσ (w)f (σ ) e, Dgt (w)f  m m rl m −1 m rl +DRk(σ )Dσ (w)f < (σ ) e, Dgt (w)f > .

In view of the definition of Rk, it suffices to show that

− m1 2 m rl rl ∞ sup E D gt (w)(f ,f ) < (2.13) m l=0 and m−1 | m rl|×| m rl| ∞ sup E Dσ (w)f Dgt (w)f < . (2.14) m l=0 √ rl m rl rl Let η denote the path mDg (w)f . Differentiation in (2.8) yields ηjt/m =0if j ≤ l and

j−1 j−1 √ t ηrl = tA(v )f + DA(v )(ηrl , ∆ w)+ DB(v )ηrl ,j≥ l+1 jt/m lt/m r pt/m pt/m p m pt/m pt/m p=l p=l

It follows from Lemma 2.6 that

|| rk || ∞ sup ηjt/m 4 < (2.15) j,k,m

Now let ρrl denote the path mD2gm(w)(f rl,frl). This satisfies the equation

rl rl ρjt/m = tDA(vlt/m)(ηlt/m,fr) j−1 2 rl rl rl + tD A(vpt/m)(ηpt/m,ηpt/m, ∆pw)+DA(vpt/m)(ρpt/m, ∆pw) p=l j−1 1 + tD2B(v )(ηrl ,ηrl )+DB(v )ρrl ,j≥ l +1 m pt/m pt/m pt/m pt/m pt/m p=l

11 rl ≤ (ρjt/m = 0 for j l). Lemma 2.6 together with (2.15) now give

|| rk || ∞ sup ρjt/m 4 < r,j,k,m and this implies (2.13). Condition (2.14) can be established by a similar argument. With this, the proof of the theorem is complete. We now return to the sde (1.2) defined in terms of vector fields. It follows from

(2.9) that the Malliavin covariance matrix σ for ξt now has the form n t ⊗ ∗ ∗ σ = Yt [ZsXi(ξs)] [ZsXi(ξs)] dsYt (2.16) i=1 0 where Z satisfies is the d × d matrix-valued sde n s s Zs = I − ZuDXi(ξu) ◦ dwi − ZuDX0(ξu)du (2.17) i=1 0 0

As before, Yt is the inverse matrix of Zt. Note that the stochastic integral in (2.17) is of Stratonovich type. The following result and its proof (which I first learned from Bismut’s paper [Bi1]) makes transparent the relationship between H¨ormander’s Lie bracket condition and the non-degeneracy of σ.

Theorem 2.7. Suppose the Lie algebra generated by the vector fields X1,...,Xn span d R at ξ0. Then σ ∈ GL(d), a.s. (hence, by Theorem 2.5 ξt is absolutely continuous with respect to Lebesgue measure on Rd). The proof will require the following

Lemma 2.8. Suppose y ∈ Rd, B is a smooth vector field on Rd and τ is a stopping time such that

=0 , ∀s ∈ [0,τ]. (2.18)

Then for i =1,...,n,

=0, ∀s ∈ [0,τ].

Proof. Applying Itˆo’s formula and using (2.17) , we obtain, for s ∈ [0,τ],

0=d<(ZsB(ξs)),y >

 n  = Zs[B,X0]ds + Zs[B,Xi](ξs) ◦ dwi,y . (2.19) i=1

12 Converting the Stratonovich to Itˆo form, we see that the infinitesimal term in (2.19) has the form n G(s)ds + Zs[B,Xi](ξs)dwi i=1 for some continuous adapted process G.Thus n =0, ∀s ∈ [0,τ]. i=1

The conclusion now follows from the computational rules of Itˆo calculus: dwidt =0, while dwidwj = δijdt. Proof of Theorem 2.7. Define V to be the set of vectors

{Xi(ξ0), [Xi,Xj](ξ0), [Xi, [Xj,Xk]](ξ0),...,1 ≤ i,j,k,...,≤ n}.

We will show that V ⊆ σ, which clearly implies the result. For each 0

Rs = span{ ZuXi(ξu/ 0 ≤ u ≤ s, 0 ≤ i ≤ n} and

R = ∩0

Then Rt = Range σ. Furthermore by the Blumenthal zero-one law, there exists a deterministic set R˜ such that R = R˜ a.s. Suppose y ∈ R˜⊥. Then with probability 1, there exists τ>0 such that Rs = R˜ for all s ∈ [0,τ]. Then for i =1,...,n

=0, ∀s ∈ [0,τ]. (2.20)

Iterating Lemma 2.8 on (2.20) gives

,,,...=0

⊥ for s ∈ [0,τ], and setting s = 0 shows that y ∈ (span V ) . Thus span V ⊆ R˜ ⊆ Rt = Range V, as required. In [Ma1], Malliavin proves further Theorem 2.9. If p (det σ) ∈∩p≥1L (γ)(2.21) ∞ then the density of ξt is C . This is also proved in [Be1], by iterating the argument used to prove Theorem 2.5. Kusuoka and Stroock prove in [KS2] that (2.21) holds under H¨ormander’s general condition and they use this to give a complete probabilistic proof of H¨ormander’s theorem. We do not include this work here since we will prove a more general result in the next section.

13 3. A H¨ormander theorem for infinitely degenerate operators

The material in this section comes from [BM3]. Let X0,...,Xn denote smooth vector fields defined on an open subset D of Rd. As before, we consider the second-order differential operator n 1 L ≡ X2 + X . (3.1 2 i 0 i=1

Let Lie(X0,...,Xn) be the Lie algebra generated by the vector fields X0,...,Xn. According to the theorem of H¨ormander ([H], Theorem 1.1), L is hypoelliptic on D if the vector space Lie(X0,...,Xn)(x) has dimension d at every x ∈ D. It can be shown that this is a necessary condition for hypoellipticity for operators of the form (3.1) with analytic coefficients.

This is not the case if the vector fields X0,...,Xn defining L are allowed to be (smooth) non-analytic. This fact was strikingly illustrated by Kusuoka and Stroock, who studied differential operators on R3 of the form

2 2 2 ≡ ∂ 2 ∂ ∂ La 2 + a (x1) 2 + 2 . ∂x1 ∂x2 ∂x3 They assume a to be a C∞ real-valued even function, non-decreasing on [0, ∞), which vanishes (only) at zero. It is shown in ([KS2], Theorem 8.41) that La is hypoelliptic on R3 if and only if a satisfies the condition lim s log a(s) = 0. In particular, in the case a→0+ p a(s) = exp(−|s| ), La is hypoelliptic provided p ∈ (−1, 0). However, it is clear that any such operator fails to satisfy H¨ormander’s condition on the hyperplane x1 =0. In this section we present a sharp criterion for hypoellipticity that implies H¨ormander’s theorem and encompasses the class of superdegenerate hypoelliptic oper- ators of Kusuoka and Stroock. We introduce the following notation. For a positive integer m, let E(m) denote any matrix whose columns consist of X0, ···,Xn together with all (iterated) Lie brackets of the form

n ··· ··· ··· n [Xi1 ,Xi2 ]i1,i2=0 ; ;[Xi1 , [Xi2 , [Xi3 , , [Xim1 ,Xim ]] ]]i1,i2,···,im=0 , For x ∈ D and m ≥ 1, define

µ(m) ≡ smallest eigenvalue of [E(m)E(m)∗].

Observe that µ(m)(x) > 0 for some m ≥ 1 if and only if H¨ormander’s (general) condition holds for the operator L at x ∈ D. In this case we will say that x is a H¨ormander point for L. We denote the set of all such points by H (note that H is an open subset of D). The set D ∩ Hc of non-H¨ormander points of L will be denoted simply by Hc in the sequel. The main result of this section is the following

14 Theorem 3.1. Suppose the non-H¨ormander set Hc of L is contained in a C2 hyper- surface S. Assume that at every point x in Hc

(i) at least one of the vector fields X1, ···,Xn is transversal to S. (ii) There exists an integer m ≥ 1, an open neighborhood U of x, and an exponent p ∈ (−1, 0) such that µ(m)(y) ≥ exp{−[ρ(y, S)]p}, ∀y ∈ U (3.3) where ρ(y, S) denotes the Euclidean distance between y and S. Then L is hypoelliptic on D. We note that hypotheses (ii) in Theorem 3.1 controls the rate at which the H¨ormander condition fails in a neighborhood of non-H¨ormander points. It is clear that some such condition is necessary, since the Kusuoka-Stroock result cited above shows that the operator 2 2 2 ∂ {−| |p} ∂ ∂ Lp = 2 + exp x1 2 + 2 ∂x1 ∂x2 ∂x3 is non-hypoelliptic if p ≤−1. Furthermore, the non-hypoellipticity of the operator L−1 shows that the allowed range (-1, 0) for p in (3.3) is optimal. Hypothesis (ii) has the following probabilistic interpretation in terms of the diffusion process ξt corresponding to L, defined in (1.2). It implies that if ξ starts at a non-H¨ormander point x on S, then it will escape from x at a fast enough rate to acquire a non-singular distribution. One can see that a hypothesis such as (i) is also necessary for the hypoellipticity of L (at least in the case where X0 = 0) by looking at the probabilistic picture. For if Xi is tangential to S for each i =1,...,n then, started from a point x ∈ S, ξ will d stay on S. Hence ξt will not have a density in R at positive times t, which implies that L cannot be hypoelliptic (as we remarked earlier hypoellipticity of L implies the existence of a density for ξt,t>0). The ideas underlying the proof of Theorem 3.1 are as follows. In Section 2 we introduced the Malliavin covariance matrix σ corresponding to the random variable ξt and showed that non-degeneracy of σ implies the existence of a density for ξt. Kusuoka and Stroock have proved that if the inverse moments of σ do not explode too quickly as t ↓ 0, then the parabolic operator L + ∂/∂t is hypoelliptic, they also showed how to deduce hypoellipticity of L from this. More specifically, let σ denote the matrix defined in (2.16) and (2.17) (recall that the process ξ in these formulas is defined by equation

(1.2)) and let ∆(t, ξ0) denote the determinant of σ. The following result is proved in [KS2]. Let D be an open set in Rd. Suppose that for every q ≥ 1 and every x in D, there exists a neighborhood V ⊆ D of x such that  lim t log sup ∆(t, y)−1 =0. (3.4) → q t 0+ y∈V

15 ∂ Then the differential operator L + is hypoelliptic on R × D. We will prove that ∂t (3.4) is satisfied under a (parabolic version of) the hypotheses of Theorem 3.1. There are two stages to this argument: (a) a local parameterization φ of the hypersurface S is introduced. Throughout this section, let ξ denote the diffusion process defined in (1.2). The quantity φ(ξt) measures the distance between ξt and S. We obtain probabilistic lower bounds on the Lq-norms of the paths φ(ξ). These lower bounds are asymptotically sharp as q →∞. (b) We study the way the lower bounds in (a) are degraded under hypothesis (ii) of Theorem 3.1. This allows us to obtain sharp lower bounds on ∆ from which we are able to verify condition (3.4). The proof of Theorem 3.1 uses basic stochastic analytic tools, e.g. Itˆo’s formula, Girsanov’s theorem, and the time-change theorem for stochastic integrals. In particu- lar, this work establishes a precise connection between the maximal class of hypoelliptic operators of the form (1.1) and the space-time scaling property of the Wiener process. This provides new insight into hypoellipticity that is not available through a classical analysis of the problem. Before proving Theorem 3.1, we state and prove a parabolic version of the theorem. (m) To this end, denote by F the matrix obtained by deleting the column X0 from the matrix E(m) defined on Page 14 and define λ(m) to be the smallest eigenvalue of the ∗ matrix [F (m)F (m) ]. Define K ≡{x ∈ D/ λ(m)(x) > 0} Suppose the set Kc is contained in a C2 hypersurface N of D. Assume that at every c point in K , at least one of the vector fields X1, ···,Xn is transversal to N. Assume further that for every x ∈ Kc, there exists an integer m ≥ 1, an open neighborhood U of x, and p ∈ (−1, 0) such that λ(m)(y) ≥ exp{−[ρ(y, N)]p} for all y ∈ U. Then the ∂ × operator L + ∂t is hypoelliptic on R D. The proof of Theorem 3.3 will require a definition and several preliminary results which we now state.

Definition. A non-negative random variable T is exponentially positive if there exist positive constants c1 and c2 (which we will refer to as the characteristics of T ) such that − P (T<)

Lemma 3.4. Let y :[0, 1] × Ω → Rd be an Itˆo process of the form

n dy(t)= ai(t) dWi(t)+b(t) dt, 0 ≤ t ≤ 1, (3.5) i=1

16 d where a1,...,an,b :[0, 1] × Ω → R are measurable adapted processes, all bounded a.s. by a deterministic constant c3. Let r>0 and define

τ ≡ inf{s>0:|ys − y0| = r}∧1. (3.6)

Then τ is an exponentially positive stopping time, and the characteristics of τ depend only on r, c3,n and d.

The next two lemmas are central to our argument (in order to facilitate the exposi- tion, we delay their proofs to the end of the section). The first yields sharp probabilistic lower bounds when applied to diffusion processes with at least one non-zero initial time diffusion coefficient.

Lemma 3.5. Let y :[0, 1] × Ω → Rd be the Itˆo process in (3.5). Suppose that τ ≤ T is an exponentially positive stopping time such that at least one diffusion coefficient ai satisfies the condition: a.s., |ai(s)|≥δ, for all 0 ≤ s ≤ τ, for some deterministic δ>0.

Then for every m ≥ 2, there exist positive constants c4, c5 and T0 such that for all m+1 t ∈ (0,T0) and  ∈ (0,c4 t ), the following holds

 ∧   t τ − m 1 P |y(u)| du <  < exp −c5 m+1 . (3.7) 0

The constants c4 and c5 can be chosen to depend only on m, c3, δ, and the character- istics of τ. The constant T0 depends only on the characteristics of τ.

The following result describes how the estimate (3.7) transforms under composition of the integrand with a function that vanishes at zero, at an appropriate exponential rate.

Lemma 3.6. Let τ be an exponentially positive stopping time. Suppose y is an Itˆo process as in Lemma 3.4. Suppose further that y and τ satisfy an estimate of the form − p ∈ − (3.7) for some m> p+1 , where p ( 1, 0). Then there exist positive constants T1, − 1 c6, c7 and q>1 such that for all t ∈ (0,T1) and all 

Furthermore, the constants T1, c6, c7 and q are completely determined by c3 in Lemma

3.4, c4, c5, and m in (3.7), p, and the characteristics of τ.

Finally, we will need the following two technical Lemmas (since the proofs are straightforward we omit them)

17 Lemma 3.7. For every q ≥ 1 and every bounded set V ⊂ Rd there exists a positive constant c8 such that for all t ∈ (0, 1) and x ∈ V ∞   − 2q − 1  1 ≤ 2dq ∆(t, x) 2q c8 1+ P Q(t, x) du : h ∈ R , |h| =1 . (3.10) i=1 0

Lemma 3.8. Suppose that the hypotheses of Theorem 3.3 are satisfied. Then for every x ∈ D there exists an integer m ≥ 1 such that exactly one of the following two conditions holds: (a) <λ(m)(x) > 0. (b) There exists an open neighborhood U ⊆ D of x,aC2 function φ : U → R, and an exponent p ∈ (−1, 0) such that

(i) φ(x)=0and ∇φ(x) · Xi(x) =0 , for at least one i =1,...,n. (ii) <λ(m)(y) ≥ exp(−|φ(y)|p), for all y ∈ U.

We now assume the hypotheses and notations of Theorem 3.3. Without loss of generality, the vector fields X0, ···,Xn may be supposed to be defined on the whole of Rd and to have compact support (this follows from a simple argument using a partition of unity and the fact that hypoellipticity is a local property). We will be assume this from now on).

Let x0 ∈ D and choose m so that either (a)or(b) of Lemma 3.8 hold for x0. Let t ∈ (0, 1) and suppose x lies in a fixed bounded neighborhood W of x0. Define  ≡ | x − |∨ x −  ∧ τ1 inf s>0: ξs x Zs I =1/2 1 . (3.11)

By Lemma 3.4, τ1 is an exponentially positive stopping time with characteristics inde- pendent of x ∈ W . Let Sd ≡{h ∈ Rd : |h| =1} denote the unit sphere in Rd. Suppose h ∈ Sd and α =1/18. Then   n t x x 2 ≤ ∩ ∩ c P du <  P (A E)+P (A E ), i=1 0 where   n t∧τ1 ≡ x x 2 A du<) i=1 0

18 and   n t∧τ1 n ≡ x x 2 E + i=1 0 j=1    n 1 2 du < α , u i 0 2 i j k u j,k=1 By a lemma of Kusuoka-Stroock and Norris (cf. e.g. [B], Lemma 6.5; cf[KS2],˙ Theorem

A.24), there exist positive constants c9 and c10 such that

c −α P (A ∩ E ) ≤ c9 exp(−c10 ).

d The constants c9 and c10 are independent of h ∈ S . Note that E ⊆ F ∩ G, where   n t∧τ1 ≡ x x 2 α F du <  i,j=1 0    n t∧τ n 1 1 G ≡ 2 du<α . u i 0 2 i j k u i=1 0 j,k=1 Thus   n t x x 2 ≤ − −α ∩ ∩ P du <  c9 exp( c10 )+P (A F G). (3.12) i=1 0 Applying a similar argument to P (A ∩ F ∩ G) gives

−α2 P (A ∩ F ∩ G) ≤ c11 exp(−c12 )+P (A ∩ F ∩ G ∩ H)(3.13) where  ∧  n t τ1 x x 2 α2 H := du< . i,j,k=1 0 It is easy to check that   n t∧τ1 ∩ ⊆ x x 2 r1 G H du <  i=1 0 for some r1 ∈ (0, 1) and sufficiently small >0. Thus A ∩ F ∩ G ∩ H ⊆

 ∧   t τ1 n  n x x 2 x x 2 r2 < Zu Xi(ξu),h> + du< 0 i=1 i,j=0

19 for some r2 ∈ (0, 1) and sufficiently small >0. Combining this with (3.12) and (3.13), one obtains   n t x x 2 ≤ − −α − −α2 P du< c9 exp( c10 )+c11 exp( c12 ) i=1 0    t∧τ1 n n x x 2 x x 2 r2 +P + du< . 0 i=1 i,j=0 (3.14) Iterating the argument used to derive (3.14) yields the following:

For each m ≥ 1, there exist positive constants c13 and c14 and exponents r3 and d r4 ∈ (0, 1), all independent of h ∈ S , such that for all t ∈ (0,T), x ∈ W , and

 ∈ (0,c14), one has   n t x x 2 P du <  i=1 0   N t∧τ1 − ≤ − r3 x x 2 r4 exp( c14 )+P du< . (3.15) j=1 0

(m) Here the vector fields K1,...,KN are the columns of the matrix function X . Apply- ing a straightforward compactness argument (cf[B],˙ Lemma 6.8) to (3.15), one obtains

−r3 P (Q(t, x) <) ≤ exp(−c15 )    N t∧τ1 − d x x 2 r4 | | +c16 sup P du < c17 : h =1 (3.16) j=1 0 for  ∈ (0,c18) and positive constants c15,c16,c17,c18. x Since (3.8) implies Z (u) − I≤ 1/2, for all 0 ≤ u ≤ τ1, it is easy to deduce from (3.16) that   t∧τ1 − −  ≤ − r3 d (m) x r4 P (Q(t, x) <) exp( c15 )+c16  P λ (ξu) du < c18 . (3.17) 0

We now consider each of the two cases (a) and (b) delineated in the conclusion of

Lemma 3.8. Suppose first that (a) holds at x0 for some m ≥ 1. Then by continuity of λ(m) there exist ρ>0 and δ>0 such that

λ(m)(y) ≥ δ (3.18)

20 d for all y ∈ Bρ(x0), where Bρ(x0) denotes the open ball in R with center x0 and radius x ρ. Let V ≡ Bρ/2(x0), assume x ∈ V , and let τ2 denote the first exit time of ξ from V . Then (3.17) and (3.18) imply

P (Q(t, x) <)   r4 − − c18e ≤ exp(−c  r3 )+c  dP τ ∧ τ ∧ t< (3.19) 15 16 1 2 δ

−c21r5 ≤ c19 exp(−c20 )(3.20)

r4 c18 ≡ ∧ provided t> δ , where r5 r3 r4 and c19, c20 and c21 are positive constants, independent of (t, x) ∈ (0,T)×V . Substituting (3.20) into (3.9) yields, for every q ≥ 1, the following inequality

  −2dq  r  −12q ≤ δt 4 ∆(t, x) 2q c8 + A(t) , c18 where ∞ ∞ r6 r6 A(t) ≡ 1+ c19 exp(−c20j ), ≤ 1+ c19 exp(−c20j ) < ∞, j=k j=1

−2dq/r4 where r6 ≡ c21r5/2dq and k is the integer part of (δt/c18) . We conclude that −1 ∆(t, x) 2q grows no faster than a power of t as t ↓ 0, uniformly with respect to x ∈ V . Hence (3.4) is satisfied.

We now turn to the case where (b) of Lemma 3.8 holds at the point x0. By the transversality condition b(i) we may choose ρ>0 small enough to ensure that Bρ(x0)U and such that 1 |∇φ(x).X (x)|≥ |∇φ(x ).X (x )| > 0 i 2 0 i 0 for some 1 ≤ i ≤ n and every x ∈ Bρ(x0). Let V ≡ Bρ/2(x0). Assume x ∈ V and let x τ3 denote the first exit time of ξ from Bρ/2(x). In view of Lemma 3.4 (b)(ii), (3.17) implies   t∧τ1∧τ3 − − ≤ − r3 d −| x|p r4 P (Q(t, x) <) exp( c15 )+c16 P exp( ηu ) du < c18 0 (3.21) x x where η (t) denotes the process φ(ξ (t)), t ≤ τ3. Applying Itˆo’s lemma to compute ηx(t) gives

n x x x x dη (t)= ∇φ(ξ (t)).Xi(ξ (t))dWi(t)+(L − c)φ(ξ (t)) dt. (3.22) i=1 Lemma 3.8 (b)(i), Lemma 3.1, and (3.22) imply that the process y ≡ ηx and the stopping time τ ≡ τ1 ∧ τ2 satisfy the hypotheses of Lemma 3.5. Hence (3.7) is satisfied

21 x for every m>1 with τ = τ1 ∧ τ3 and y = η . Thus, by Lemma 3.6 there exist positive  constants c6, c7, T1 and q > 1, all independent of x ∈ V , such that for all t ∈ (0,T1) 1 − and 

 ∧ ∧  t τ1 τ3 −| x|p {− | |q } P exp( ηu ) du< < exp c7 log  . (3.23) 0

Substituting this into (3.17) gives

−r3 −d r4 q P (Q(t, x) <) ≤ exp(−c15 )+c16 exp(−c7| log  | )(3.24)

1 − for t ∈ (0,T1) and 

1 − 2q −  1 ≤ q ∆(t, x) 2q c8 exp(2dqc6t )+c22 , 0

Note that the constants c6, c8 and c22 can all be chosen to be independent of x ∈ V . The right hand side of (3.25) explodes exponentially fast as t ↓ 0. However, since q > 1 we conclude that (3.4) holds also for this case, and the proof of Theorem 3.3 is complete. Proof of Theorem 3.1. Assume L satisfies the hypotheses of Theorem 3.1. We borrow yet another tech- nique from Kusuoka and Stroock [KS2]. The idea is to imbed the operator L in an another operator L˜ defined on a (d + 1)-dimensional domain and satisfying the hy- potheses of Theorem 3.3. This will prove the hypoellipticity of the parabolic operator ˜ ∂ L + ∂t, which in turn will imply hypoellipticity of L. Choose a smooth non-negative real-valued function ρ on (0, 1) such that both ρ(s) and its derivative ρ(s) are bounded away from zero for all s ∈ (0, 1). Define the operator L˜ on the domain D × (0, 1) by

1 ∂2 L˜ ≡ ρ(s) L + . 2 ∂s2

Then L˜ has the form n+1 1 L˜ ≡ X˜ 2 + X˜ . 2 i 0 i=1

1/2 where X˜0(x, s) ≡ ρ(s)X0(x), X˜i(x, s) ≡ ρ(s) Xi(x), 1 ≤ i ≤ n, and X˜n+1 (x, s) ≡ ∂ ∈ × ˜(m) ˜(m) ˜ ˜ ∂s,(x, s) D (0, 1). Define F , λ , H, and K similarly to before, using the

22  vector fields X˜i in place of the Xi’s. Since ρ and ρ are bounded away from zero on

(0, 1), it is easy to see that there are positive constants δm such that

(m) (m) λ˜ (x, s) ≥ δm µ (x), ∀(x, s) ∈ D × (0, 1). (3.26)

This implies K˜ c ⊆ Hc × (0, 1). Since Hc is contained in a C2 hypersurface S in D, then H˜ c is contained in the C2 hypersurface S × (0, 1) in D × (0, 1). By assumption, at c least one the vector fields X1, ···,Xn is transversal to S at every point of H . Hence c one of the vectors fields X˜1, ···, X˜n+1 is transversal to S × (0, 1) at every point of H˜ . Hypothesis (ii) of Theorem 3.1 together with (3.26) imply that λ˜(m) satisfies the corre- sponding hypothesis in Theorem 3.3 ( with respect to the set K˜ c). We conclude from ˜ ∂ × × Theorem 3.3 that the operator L + ∂t is hypoelliptic on R D (0, 1). Consequently L˜ is hypoelliptic on D × (0, 1), and this implies L is hypoelliptic on D. We conclude this section by proving Lemmas 3.5 and 3.6, which provided the key steps in the preceding argument. The proof of Lemma 3.5 requires two preliminary results which we first state and prove.

Proposition 3.9. Suppose m ≥ 2 and a>0. Let B :[0, ∞) × Ω → R be a one- dimensional Brownian motion. Then there exists a positive constant c23 such that   a √   m 1+ 2 − 2 P |B(u)| du< ≤ 2 exp −c23a m  m 0

−7 for every >0. The constant c23 may be chosen to be 2 . −7 ˙ Proof. The result is known to hold for m = 2, with c23 =2 (cf[IW], Lemma V.10.6, p. 399). For m>2 we apply H¨older’s inequality and use the result for m = 2 to obtain     a a √ m 2 1− 2 2 1+2 −2 P |B(u)| du <  ≤ P |B(u)| du0. This proves the proposition.

Proposition 3.10. Assume the notation and hypotheses of Lemma 3.5. Then for every m ≥ 2 and q>0, there exist positive constants c24 and c25 such that for all >0,     τ − m q q+ 2(q 1) P |yu| du < , τ ≥ 

The constants c24 and c25 depend only on c3 (the bound for the drift and diffusion coefficients of y), δ, m, and q. In particular, they are independent of y0 and the characteristics of τ.

23 Proof. Expressing (3.5) in components, it is sufficient to treat the case d =1,n≥ 1. In this case, the process y may be written in the form t yt = B(τ4(t)) + b(u) du, 0 ≤ t ≤ T, 0 where t 2 τ4(t) ≡ |a(u)| du, 0 ≤ t ≤ T, 0 and B :[0, ∞) × Ω → R is a one-dimensional adapted Brownian motion started at q B(0) = y0. By assumption, we may take |a(u)|≥δ>0 for 0 ≤ u ≤ τ. Hence τ ≥  2 q implies τ4(τ) ≥ δ  . Furthermore, the function τ4(t) is strictly increasing on (0,τ) and changes of the time variable yield τ τ4(τ) | |m ≥ −1 | −1 |m yu du c26 y(τ4 (s)) ds 0 0 τ4(τ) s −1 − b(τ (u)) m = c 1 B(s)+ 4 du ds, 26 | −1 |2 0 0 a(τ4 (u)) ≡ 2 where c26 nc3.Thus   τ m q P |yu| du<,τ≥  0  2 q −  δ  s b(τ 1(u)) m ≤ P B(s)+ 4 du ds < c , τ (τ) ≥ δ2q . (3.27) −1 2 26 4 0 0 a(τ4 (u)) We now define a (bounded) process h :[0, ∞) × Ω → R by  −1  b(τ4 (u)) − , u ∈ (0,τ (τ)) |a(τ 1(u))|2 4 h(u) ≡ 4  b(τ) ≥ |a(τ)|2 , u τ4(τ). and we denote by B the process s  B (s) ≡ B(s)+ h(u)du, 0 ≤ s ≤ τ4(T ). 0 By the Girsanov theorem, B is a Brownian motion on Ω with respect to the measure   τ4(T ) 1 τ4(T ) dP  ≡ exp − h(u) dB(u) − h2(u) du dP. 0 2 0

Denote by Ω the event   δ2q  m Ω ≡ |B (s)| ds < c26 , 0

24 and by G the Girsanov density   τ4(T ) 1 τ4(T ) G ≡ exp − h(u)dB(u) − h2(u) du . 0 2 0

We now apply H¨older’s inequality to (3.27) to obtain   τ  m q −2  P |yu| du < , τ ≥  ≤ P (Ω) ≤ E(G )P (Ω). 0

By Proposition 3.9, we have   √ −  q+ 2(q 1) P (Ω) ≤ 2 exp −2c25 m , (3.28)

− 2 2 ≡ 1 m 2(1+ m ) where c25 2 c23c26 δ . The boundedness of h and τ4(T ) imply the existence of a constant c27, depending only on the bounds of the foregoing quantities, such that   τ4(T ) τ4(T ) −2 2 G ≤ c27 exp 2 h(u) dB(u) − 2 h (u) du . (3.29) 0 0

The desired conclusion follows from (3.28) and (3.29), together with the fact that the exponential on the right hand side of (3.29) is a Girsanov density (note that it is obtained by replacing h by (−2h) in the relation defining G), and therefore has expectation equal to 1. Proof of Lemma 3.2. Note that for every q>0, we may write   t∧τ m P |yu| du <  ≤ P1 + P2, (3.30) 0 where   t∧τ m q P1 ≡ P |yu| du<,t∧ τ ≥  0 and q P2 ≡ P (t ∧ τ<).

By Proposition 3.10,   − q+ 2(q 1) P1 t> q, then −q P2 < exp(−c27 ), (3.32)

25 where c27 and T0 denote the characteristics of τ. Combining (3.30) - (3.32), we obtain

 ∧    t τ − m q+ 2(q 1) −q P |yu| du <  ≤ c24 exp −c25 m + exp(c27 ) 0

1 ∈ q for t (0,T0) and 0 <

Proof of Lemma 3.6.  ≥ − p ≡ −m Choose and fix m max 1+p , 2 and set q p(m+1) ;soq>1. Define a function ψ :[0, ∞) → [0, 1) by p − m ψ(z) ≡ exp( z ) z>0 0 z =0.

Note that (i) ψ is strictly increasing;

(ii) ψ is convex in an interval (0,c28), for some positive constant c28. Furthermore     t∧τ t∧τ p m P exp(−|yu| )du <  = P ψ(|yu| ) du< . 0 0

We break the proof of the lemma into two cases. Firstly, suppose that |y0|≥c29 1 ≡ 1 m ≡ { | − | 1 }∧ ( ( 2 c28) ). Let τ5 inf s>0: ys y0 = 2 c29 τ. Then there exists a positive constant c30, determined by m, p, and δ, such that   t∧τ p P exp(−|yu| )du <  ≤ P (t ∧ τ5 ≤ c30). (3.33) 0

Applying Lemma 3.5 to the right hand side of (3.33), we deduce the existence of positive constants c and c , such that if t ∈ (0,c ) and 0 << t , then 31 32 31 c30     t∧τ p c32 P exp(−|yu| )du< ≤ exp − . 0 

The constants c30,c31,c32 depend only on m, p, c3, and the characteristics of τ.Thus the conclusion of the lemma holds in this case.

On the other hand, suppose |y0|

m τ6 ≡ inf{s>0:|ys| = c28}∧τ.

26 Jensen’s inequality yields      t∧τ t∧τ6 | |m ≤ | |m ≤ ∧ −1  ≤ P ψ( yu )du <  P yu du (t τ6)ψ ∧ P1 + P2 0 0 t τ6 where    P1 ≡ P t ∧ τ6 ≤ c28 and     t∧τ | |m ≤ ∧ −1  ∧  P2 yu du (t τ6)ψ ∧ ,tτ6 > . 0 t τ6 c28

Note that P1 is of the same form as the probability on the right hand side of (4.7), and hence satisfies a similar estimate.

We now consider P2. An elementary argument shows that the convexity of ψ in the interval (0,c ) implies that the function θ(u) ≡ uψ−1  is increasing for u>  . 28 u c28 In particular, if t ∧ τ >  then 6 c28     −1  −1  (t ∧ τ6)ψ ≤ Tψ t ∧ τ6 T where T is any upper bound for t. This implies     t∧τ6 m −1  P2 ≤ P |yu| du ≤ Tψ . (3.34) 0 T

We now apply Lemma 3.5 to estimate the right hand side of (3.34). Thus

  − 1   m+1 P < exp −c Tψ−1 ≤ exp(−c | log |q), (3.35) 2 5 T 7   − 1 for all 0

4. A study of a class of degenerate functional stochastic differential equa- tions In the previous section we derived very sharp sufficient conditions for the existence of smooth densities for the class of Itˆo processes

n dξt = Xi(ξt)dwi + X0(ξt)dt. (4.1) i=1

27 The arguments relied heavily on the existence of an invertible stochastic flow on Rd determined by the equation. Furthermore, as is apparent from Theorem 2.7 and its proof, the Lie bracket condition of H¨ormander appears as a result of the interaction between Z, the derivative of the stochastic flow, and the vector fields Xi. The existence of a stochastic flow is the analytic counterpart of the probabilistic fact that the solution to the classical Itˆo equation (4.1) is a Markov process. In this section (which comes from [BM2]) we study a class of stochastic differential equations with far less analytic and probabilistic structure. In order to describe this class of equations, we introduce the following notation and assumptions. Denote by: η a continuous (deterministic) Rd-valued path defined on a time interval [−r, 0], for a fixed r>0. g, a bounded smooth map from Rd to Rd ⊗ Rn with bounded derivatives of all orders. C the space of continuous paths {x :[−r, ∞) → Rd}, equipped with the uniform norm on compact time sets. B : C → C a smooth bounded map, with bounded derivatives of all orders.

Assume also that B is non-anticipating, i.e. for all t ≥−r and x ∈ C, B(t, x) ≡ B(x)t depends only on {x(u): u ≤ t}.

Hd[0,s], the d-dimensional Cameron-Martin space (cf. Section 2) with paths and norm defined on the time interval [0,s]. Suppose that for all x ∈ C, the map

DB(x)/Hd : Hd → Hd and that

≡ || || ∞ α sup DB(x) Hd[0,s] < . s∈[0,t],x∈C

We consider the following stochastic functional differential equation

dxt = g(xt−r)dwt + B(t, x)dt, t ≥ 0(4.2)

xt = η(t), −r ≤ t ≤ 0 where, as before, w is a standard Wiener process in Rn. Owing to the past-state dependence of the coefficients in the equation, the solution x will not generally be a Markov process. In addition to their theoretical interest, equations of the form (4.2) have important applications. For example, physical systems influenced by noise are often modeled as diffusion processes (4.1). However, this choice of model assumes that the evolution of the system at any instant in time depends only on the position of the system at that instant, together with the noise input. Since physical systems are always subject to some degree of inertia, this assumption is somewhat unrealistic. It would seem

28 that equations of the form (4.2), where the coefficients are allowed to depend on past behaviour, more accurately model such physical systems. We establish the existence of smooth densities for x under hypotheses that allow degeneracy of the matrix-valued function g. The general scheme of Chapter 2 is used.

As before, we construct a Malliavin covariance σ for each xt, (t>0) and deduce the existence of a smooth density for xt by demonstrating the strong invertibility of σ. However the argument for doing this is, by necessity, quite different from that in Chapter 3. Because of its non-Markovian nature*, x does not generate a stochastic flow on Rd. Thus the analysis used in Chapter 3 to study the non-degeneracy of σ breaks down. We salvage the situation by proving and exploiting a property that seems peculiar to equations of the form (4.2); namely we show (cf. Lemma 4.5) that for all times a ≥ 0 and k ≥ 1,x satisfies estimates of the form   a+r 2 k P |xt| dt<ε = o(ε )asε → 0. a

This probabilistic lower bound on x is propagated from the initial condition η by the time delay r in equation 4.2 **. We use this property to deduce non-degeneracy of the Malliavin covariance matrices, under appropriate hypotheses on g and B. The main result of this section is

Theorem 4.1. Suppose there exist positive constants ρ, δ, an integer p ≥ 2 and a function φ : Rd → R such that (i) the Lebesgue measure of the set {s ∈ [−r, 0] : φ(η(s)) = 0} is zero. (ii) g satisfies p ∗ |φ(x)| I, |φ(x)| <ρ gg (x) ≥ (4.3) δI, |φ(x)|≥ρ.

(iii) φ is C2 with bounded first and second derivatives and there is a positive constant c such that ∇φ(x)≥c, whenever |φ(x)|≤ρ. (4.4) √ αt Suppose 0 <α 2te < 1. Then the random variable xt defined by (4.2) is absolutely continuous and has a C∞ density.

* In fact, the process x can be embedded as a Markov process with an infinite- dimensional state space. However, the generator of this process is a highly degenerate operator whose analysis is way beyond the reach of existing PDE techniques. ** There is no reason to believe that similar estimates hold for classical diffusion processes.

29 We note that condition (ii) above allows degeneracy of g on the hypersurface D ⊆ Rd where φ vanishes, while condition (ii)) implies that D does not contain any singular points.

Denote by F the map w → xt. In order to prove the theorem we must compute the Malliavin covariance matrix, defined formally by σ ≡ DF(w)DF(w)∗. This can be done as follows. Let H denote the Cameron-Martin space and Pm the sequence of piecewise linear projections defined in Chapter 2, we consider the natural restriction F˜ of the map F to H, i.e. the map h ∈ H → kt where k ≡ η on [−r, 0] and

  ks = g(ks−r)hs + B(s, k),s>0.

The matrix σ is then obtained as the limit in probability of the matrix sequence

∗ DF˜(Pm)DF˜(Pm) .

The result of this computation is

Lemma 4.2. The Malliavin covariance matrix σ for the random variable xt is t ∗ ∗ σ = Zug(xu−r)g(xu−r) Zudu (4.5) 0 where the process Z satisfies   t . ∗ ∗ Zs = I + Dg(xs−r)(., dws) Zs + DHd B(x) Z . (4.6) (s+r)∧t 0 s It is at this point that the non-Markovian nature of the problem manifests itself analytically; in contrast to the situation in Chapter 3, there is now no reason to suppose that the matrix process Zs is non-degenerate for small values of s. We show in the next Lemma that it is, however, non-degenerate for values of s close enough to t.

Lemma 4.3. Under the hypotheses of Theorem 4.1, Zs ∈ GL(d) for s ∈ [t − r, t] and ∞ || −1|| ≤ ∈ − there exists a deterministic constant c< such that Zs c for s [t r, t]. ∗ ∈ Proof. For notational convenience, write the operator DHd B(x) as K(x). For s [t − r, t] and any unit vector e ∈ Rd, (4.6) gives . Zse = e + K(x)s( Zuedu). (4.7) 0 Now . . K(x)s( Zuedu) ≤ K(x)( Zuedu) Hd[0,s] 0 0 .  s  ≤ || || | |2 1/2 α Zuedu Hd[0,s] = α Zue . (4.8) 0 0

30 Substituting this back into (4.7), we deduce  s  2 1/2 ||Zs|| ≤ 1+α ||Zu|| du 0 which implies s 2 2 ||Zs|| ≤ 2+2α ||Zu|| du. 0 Applying Gronwall’s inequality to this yields √ √ αs αt ||Zs|| ≤ 2e ≤ 2e , ∀s ≤ t.

Combining this with (4.8) gives . √ αt K(x)s( Zu du) ≤ 2te α<1. 0

It follows from (4.7) that Zs ∈ GL(d),   ∞ . j −1 − j Zs = ( 1) K(x)s( Zu du) j=0 0 and √ || −1|| ≤ − αt −1 Zs (1 α 2te ) .

Let λ denote the smallest eigenvalue of σ and set ξs ≡ φ(xs),s≥−r. It follows from (4.5) and Lemma 4.3 that t 1 p λ ≥ (|ξu−r| ∧ δ) du. (4.9) c (t−r)∧0

The following two results are used to produce lower bounds on the Malliavin covariance matrices corresponding to equation 4.2. These lower bounds comprise the essential part of the proof of Theorem 4.1. Lemma 4.4. Fix b>a>0. Let y :[0, ∞) × Ω → R be a measurable such that E sup |y(t)|p is finite for every positive integer p. Suppose that y a≤t≤b satisfies   b P y(t)2dt<ε = o(εk). a Then for every positive constant α,   b P {y(t)2 ∧ α} dt < ε = o(εk). a

31 Proof This result was originally proved as Lemma 3 in ([BM1], pp. 91–94). We give here a simplified proof of the result. Let A and B denote, respectively, the sets {s ∈ [a, b]:y(s)2 ≤ α} and {s ∈ [a, b]: y(s)2 >α}. Then     b P {y(t)2 ∧ α} dt < ε = P y(t)2 dt + αλ(B) <ε a A   ≤ P y(t)2 dt<ε, αλ(B) <ε A   b 2 2 = P y(t) dt < ε + y(t) dt, λ(B) < ε/α ≤ P1 + P2. a B Here   b √ 2 2 2 P1 := P y(t) dt < ε + y(t) dt, λ(B) < ε/α, sup y(t) ≤ 1/ ε a B a≤t≤b and √ 2 P2 := P ( sup y(t) > 1/ ε). a≤t≤b

Note that   b √ 2 k P1 ≤ P y(t) dt < ε + ε/α = o(ε ) a by hypothesis. Using the finite-moment hypothesis on y and applying Markov Chebyshev’s in- k equality to the probability P2, we obtain P2 = o(ε ). This completes the proof of the lemma.

Lemma 4.5 (Propagation lemma). Suppose that, for some −r

Then   b+r 2 k P ξs ds < ε = o(ε ). (4.11) a+r

Proof. Write g =(g1 ...gn), where gi, 1 ≤ i ≤ n, are column d-vectors. Computing

ξs = φ(xs),s>0, by Itˆo’s formula gives

n dξs = ∇φ(xs) · gi(xs−r) dwi(s)+G(s) ds, s > 0(4.12) i=1

32 where G is a bounded adapted real-valued process. We write   b+r 2 P ξs ds<ε = P1 + P2 a+r where   b+r n b+r ≡ 2 ∇ · 2 ≥ 1/18 P1 P ξs ds < ε, [ φ(xs) gi(xs−r)] ds ε a+r i=1 a+r and   b+r n b+r ≡ 2 ∇ · 2 1/18 P2 P ξs ds < ε, [ φ(,xs) gi(xs−r)] ds < ε . a+r i=1 a+r In view of 4.12, an inequality of Kusuoka and Stroock (cf[KS2],˙ Lemma 6.5) implies k that P1 = o(ε ) . Thus it is sufficient to show that P2 also has this property. Write n 2 a(s)= [∇φ(xs) · gi(xs−r)] . i=1 2 p Then by (4.3) and (4.4), it follows that a(s) ≥ c (|ξs−r| ∧ δ)if|ξs|≤ρ. Define

A ≡{s ∈ [a + r, b + r]:|ξs|≤ρ} and ≡{s ∈ [a + r, b + r]:|ξs| >ρ}.

Then   b+r b+r 2 1/18 P2 = P ξs ds<ε, a(s) ds<ε a+r a+r   b+r ≤ 2 2 | |p ∧ 1/18 P ξs ds<ε, c ( ξs−r δ) ds < ε a+r A   b+r b+r 2 2 | |p ∧ 1/18 2 | |p ∧ = P ξs ds < ε, c ( ξs−r δ) ds<ε + c ( ξs−r δ) ds a+r a+r B   b+r b+r ≤ 2 2 | |p ∧ 1/18 2 P ξs ds<ε, c ( ξs−r δ) ds < ε + c δλ(B) . a+r a+r b+r 2 2 However ξs ds < ε implies λ(B) < ε/ρ . Thus the preceding probability is a+r   b+r 2 p 1/18 2 2 ≤ P c (|ξs−r| ∧ δ) ds < ε + c δε/ρ a+r   b p  1/18 ≤ P (|ξs| ∧ δ) ds < c ε a

33 for some positive constant c, and for small enough ε. Assumption 4.10, Lemma 4.4, and Jensen’s inequality allow us to conclude that the probability on the right hand side k k of the above inequality is o(ε ). This implies that P2 = o(ε ), and the proof of Lemma (4.2) is complete. We are now in a position to complete the proof of Theorem 4.1. Recall that we need to verify the condition

−1 p (det σ) ∈∩p≥1L (γ). (4.13)

As before let λ denote the smallest eigenvalue of σ. Let n denote the integer such that t ∈ ((n − 1)r, nr]. First, suppose n = 1. Hypothesis (i) of Theorem 4.1 and (4.9) imply t−r t−r 1 p 1 p λ ≥ (|ξu| ∧ δ) du = (|ηu| ∧ δ) du > 0. c −r c −r

Since the second integral is deterministic, 4.13 trivially holds in this case. On the other hand, suppose n>1. Since [t−(n+1)r, t −nr] ⊂ [−r, 0], hypotheses (i) implies   t−nr 2 k P ξs ds<ε = o(ε ). t−(n+1)r

We now iterate Lemma 4.5 n times on this estimate to obtain   t 2 k P ξs ds < ε = o(ε ). t−r

Applying Lemma 3.4 yields   t 2 ∧ 2/p k P ξs δ ds < ε = o(ε )(4.14) t−r

Using Jensen’s inequality in (4.14) gives   t p k P |ξs| ∧ δds<ε = o(ε ). t−r Combining this with (4.9), we finally have

P (λ<)=o(k).

This implies (4.13) and completes the proof of the theorem.

34 5. Some open problems In this section we describe a few open problems that we hope will serve as a stimulus to further research. 1. It would be interesting to generalize Theorem 4.1 to the class of fully hereditary functional differential equations

dxt = A(t, x)dw + B(t, x)dt (5.1) where both A(t, x) and B(t.x) are allowed to depend on the whole history of the path

{xs :0≤ s ≤ t}. Kusuoka and Stroock [KS1] addressed this problem under a strong ellipticity assumption, i.e. they have shown that xt has a smooth density for all positive t if there exists δ>0 such that A(t, x)A(t, x)∗ ≥ δI,∀(t, x) ∈ [0, ∞) × C. As far as I am aware, Theorem 4.1 is the only result establishing the existence of densities for a general class of non-Markov Itˆo processes under hypotheses that allow degeneracy of the diffusion coefficient. An analogous result in the fully general setting (5.1) would therefore be of considerable interest and importance. 2. The non-hypoellipticity of the operators

2 2 2 ∂ {−| |p} ∂ ∂ ≤− 2 + exp x1 2 + 2 ,p 1 ∂x1 ∂x2 ∂x3 described in Section 2, contrasts strikingly with a result proved by Fedii in 1971. Fedii showed that the operator on R2

∂2 ∂2 + exp(−|x|p) ,p<0 ∂x2 ∂y2 is hypoelliptic for all negative values of p. It would be interesting to gain a deeper understanding, either by classical or probabilistic means, of the role that dimension is playing in these results. 3. The probabilistic methods employed above can also be used to study quasilinear partial differential equations. For example, let ψ denote a smooth non-linear and non- negative function defined on [0, ∞) × Rd × R and consider the initial-value problem  ∂u = Lu + ψ(t, x, u), (t, x) ∈ (0, ∞) × Rd ∂t . ((5.2) u(0,x)=0,x∈ Rd where L is defined in (1.1). In particular, A continuous weak solution to (5.2) is given by the probabilistic representation t x x u(t, x)=E ψ(s, ξt−s,u(s, ξt−s)) ds (5.3) 0

35 where ξx denotes process ξ in (1.2) with initial point x ∈ Rd. It should be possible to use this probabilistic representation together with the ideas in of Section 3 to show that, under suitable conditions, the solution u to (5.2) is smooth for t>0. We note, however, that the problems is considerably more difficult than for the linear case treated in Section 3, owing to the presence of u in the right hand side of (5.3). Quasilinear problems of this type have received a great deal of attention in recent years. Dynkin [D] and others have discovered a remarkable link between operators of the form L + ψ and a collection of stochastic processes called . 4. A related issue is the study of the quasilinear Dirichlet problem. Suppose D is a bounded regular open subset of Rd with a C2 boundary, f is a smooth non-negative function defined on D¯ × R, and g is a smooth non-negative function on ∂D  Lu = f(x, u),x∈ D (5.4) u(x)=g(x),x∈ ∂D

Under mild further conditions, a weak continuous solution of equation (5.4) exists, given implicitly by τ u(x)+E f(ξx(s),u(ξx(s))) ds = E[g(ξx(τ)] (5.5) 0

where τ = τ(x) is the first exit time of the diffusion ξx from D. Again, one wold hope to be able to study the regularity of the solution u to (5.4) via the representation (5.5) by using the methods of Section 3. The goal would be to establish smoothness of u on D¯ under conditions that allow degeneracy of the operator L (e.g. H¨ormander’s condition or even the superdegeneracy condition introduced in Theorem 3.1.).

References

[Be1] Bell, D. R., Some properties of measures induced by solutions of stochastic differ- ential equations, Ph. D. Thesis, Univ. Warwick, 1982.

[Be2] Bell, D. R., Degenerate Stochastic Differential Equations and Hypoellipticity, Pit- man Monographs and Surveys in Pure and , Vol. 79. Long- man, Essex, 1995.

[BM1] Bell, D. R., and Mohammed, S.-E. A., The Malliavin calculus and stochastic delay equations, J. Funct. Anal. 99, No1˙ (1991) 75–99.

[BM2] Bell, D. R., and Mohammed, S.-E. A., Smooth densities for degenerate stochastic delay equations with hereditary drift, Ann. Prob. 23, no. 4, 1875-1894, 1995.

36 [BM3] Bell, D. R. and Mohammed, S.-E. A., An extension of H¨ormander’s theorem for infinitely degenerate second-order operators, Duke Math J., 78, no. 3 (1995), 453- 475.

[Bi1] Bismut, J. M., Martingales, the Malliavin calculus and hypoellipticity under gen- eral H¨ormander’s conditions, Z. Wahrsch. Verw. Gebiete 56 (1981) 529-548.

[Bi2] Bismut, J. M., Large Deviations and the Malliavin calculus. Progress in Mathe- matics, vol. 45, Birkhauser, Boston, 1984.

[D] Dynkin, E. B., Superdiffusions and parabolic nonlinear differential equations, Ann. Prob., 20, no. 2 (1992) 942-962.

[F] Fedii, V. S., On a criterion for hypoellipticity, Math. USSR sb., 14 (1971) 15-45.

[G] L. Gross, on Hilbert space, J. Funct. Anal. No. 1 (1967) 123-181.

[H] H¨ormander, L˙, Hypoelliptic second order differential equations, Acta Math. 119:3– 4 (1967) 147-171.

[IW] Ikeda, N˙, and Watanabe, S˙, Stochastic Differential Equations and Diffusion Pro- cesses, Second Edition, North-Holland-Kodansha, 1989.

[KS1] Kusuoka, S˙, and Stroock, D˙, Applications of the Malliavin calculus, I, Taniguchi SymposSA˙ Katata (1982), 271–306.

[KS2] Kusuoka, S˙, and Stroock, D˙, Applications of the Malliavin calculus, Part II, Jour- nal of Faculty of Science, University of Tokyo, Sec1A,˙ Vol32,˙ No1˙ (1985) 1-76.

[Ma1] Malliavin, P˙, Stochastic calculus of variations and hypoelliptic operators, Proceed- ings of the International Conference on Stochastic Differential Equations, Kyoto, Kinokuniya, 1976, 195-263.

k [Ma2] Malliavin, P˙, C -hypoellipticity with degeneracy, part II, Stochastic Analysis,A. Friedman and M. Pinsky (eds), 1978, 327-340.

[Mi] Michel, D., R´egularit´e des lois conditionnelles en th´eorie du filtrage non-lineaire et calcul des variations stochastique. J. Funct. Anal. 41 No. 1 (1981) 1-36.

[NP] Nualart, D. and Pardoux, E., Stochstic calculus with anticipating integrands. Probab. Theory Rel. Fields 78 (1988) 535-581.

[O] D. Ocone, Malliavin’s calculus and stochastic integral representation of functionals of diffusion processes. Stochastics 12 (1984) no. 3-4, 161-185.

37