<<

Appendix: A short presentation of (by P.A. Meyer)

This appendix was written originally as an introduction to theory and for non-probabilists. It may be read independently of the remainder of the book, or as a commentary on Chapter I to provide motivations and examples. In contradistinction to the main text, it isn't restricted to continuous . Stochastic uses only continuous semimartingales, since the jumps of processes taking values in a manifold cannot be described in local coordinates. On the other hand, for real (or vector) semimartingales, the discontinuous case arises almost as often as the continuous one, and it would be a pity not to mention it at all in this book. We are going to present without proofs the main facts about general semimartingales, with detailed comments and references to the second volume of Dellacherie-Meyer Probability and Potentials (hereafter quoted as PP). More recent references are mentioned at the end. There are at least two definitions of semimartingales. The older one follows the historical development of the theory. Square integrable martingales are introduced as models for a "noise", superimposed on a "signal" which is a process of integrable total variation. Then stochastic are studied, general semimartingales and stochastic integrals are defined by localization, and Ito's formula is proved. Finally, it becomes clear at the very end that the class of processes considered is invariant by change of law, that is, semimartingales remain semimartingales if the basic law lP is replaced by an equivalent one (i. e. , a law which has the same sets of measure 0), though their signal-plus-noise decomposition is altered by this change. The second method starts with the stochastic , taking care to have everything depend only on the equivalence class of lP, and gets down to the concrete decompositions only at the end. The equivalence of these two points of view is far from trivial. It was proved by Dellacherie-Mokobodzki after much pioneering work by Metivier-Pellaumail, and was also independently discovered by Bichteler (who started from a vector integral approach). We shall follow the second path, since it is more convenient for a concise presentation. Semimartingales as integrators 1 A is a function X( t, w) (implicitly assumed to be measur• able) defined on a product I X S1, where (S1, F, lP) is a complete and I is an interval of the line, the parameter tEl representing time. It will simplify things a little to assume that I is closed, and we take I = [0, 1] for definiteness. The usual notation for a stochastic process is. (Xt ) or simply X: Our aim is to define the stochastic integral J f dX = Jo1 ft( w) dXt( w) of a process 130 f = (ft) (the integrand) with respect to a process X = (Xt ) (the integrator), the result of this integration being a defined a.e.. For simplicity, we shall assume from the start that our integrator is right continuous in its time variable t, and has the value 0 at time O. We shall say that the integrand is a simple process if there exists a dyadic subdivision ti = i2-k of [0,1] such that on each interval] ti, ti+l ] f depends only on w : ft(w) = fi(W). Then the "stochastic integral" has an obvious definition

(1)

We start with a linear space of uniformly bounded simple processes called S. Processes being functions of two variables (t, w), S generates a a- F( S) on [0, 1] x n. Let us make the following definition

The process (Xt ) is called an integrator if there is a map (the "stochastic integral") extending to all uniformly bounded F( S)-measurable processes the elementary integral (1), and satisfying the dominated convergence theorem in probability. More explicitly, if a uniformly bounded sequence (fn) converges pointwise to 0, then the random variables f fn dX converge to 0 in probability. Convergence in probability has the advantage of not being altered if IP is replaced by an equivalent measure. It has the disadvantage that the corresponding topology isn't locally convex, and the functional involved is rather unusual and delicate. Let us look for integrators in the deterministic case, n consisting of just one point, so that a process is simply a function of time. Take for S the space of all usual step functions on [0, 1] . Then our assumption means that the "stochastic" integral is an ordinary integral, and the integrator Xes) must be a function of bounded variation. As a first attempt to generalize this, we may take some non• degenerate n, S being the space of all uniformly bounded simple processes. Then one can show (though the proof isn't quite trivial) that for a.e. w the sample function X(., w) is a function of boullded variation, and the stochastic integral is just a pathwise Stieltjes integral. This extension of the usual integral is useful, but of course it isn't particularly exciting.

2 The situation changes radically, however, if a filtration (Ft ) is added to the above picture, the integrator (Xt ) being adapted and the class S being restricted to that of adapted simple processes (uniformly bounded). For the convenience of the reader, let us recall that (Ft ) is an increasing and right-continuous family of a-fields of F, each one containing all sets of measure 0 in n, and that adaptation of a process (Xt ) means that X t is Ft-measurable for each t. Then it turns out that there usually exist (unless the filtration is nearly trivial) integrators (Xt ) which are not of bounded variation. The best known of these integrators is brownian , for which the stochastic integration theory is the celebrated Ito integral. 131

Integrators w.r. to S are then called semimartingales, and the a-field gener• ated by S is the predictable (or previsible) a-field. Functions of (t, w) measurable with respect to this a-field are called predictable processes. All left continuous adapted processes are predictable, but this isn't generally the case for right con• tinuous ones, which are thus "unpredictable". This seems to be just a pun, but in reality it has a deep significance in nature, being a mathematical translation of the impossibility to predict exactly phenomena like radioactive desintegrations (or the exact time your children will allow you to use the phone). A simple, but very useful consequence of the dominated convergence theorem is the following : if f is a uniformly bounded left continuous adapted process, the stochastic integral J f dX is well defined, and is the limit in probability of the natural Riemann sums over dyadic subdivisions (t; = i2-k )

(2)

3 Let us pause for comments. There are two kinds of integrators for which integration theory is specially easy. The first one is that of processes whose sample paths are functions of bounded variation. The second one is that of square integrable martingales (Mt ) (see n09 below), for which one can easily extend the classical method Ito used in the case of . Thus all square integrable martingales are integrators. We shall see later that all martingales are in fact integrators, but this is a much more difficult result. Secondly, since convergence in probability depends only on the equivalence class of the law IP, the same is true for the semimartingale notion itself, and for the value of the stochastic integral. This is extremely important for statistical problems in the theory of processes, since in the probability law isn't known a priori, but must be deduced from observation. A less important point concerns the exceptional sets in the dominated conver• gence theorem: in classical measure theory, we would allow for convergence except on a set of measure o. The same can be done here, but the exceptional set to be considered is a subset of [0, 1] x n while the law IP is on n. So we define an evanescent set A C [0,1] x n as a set whose projection on n has measure 0, and we can add the provision "except on an evanescent set" to the statement of the dominated convergence theorem. Also, the usual dominated convergence the• orem would allow the sequence to be dominated by an integrable r.v., not just a uniformly bounded one. Extensions of this kind do exist (PP VIII. 75), but do not interest us now, since they would refer to a specific semimartingale, while we are concerned with statements valid for all semimartingales. The form of the dominated convergence theorem we have given will suffice us, except for a slight relaxation, to come a little later, of the uniform boundedness requirement. Finally, let us give a reference to PP for the equivalence of the "integrator" definition of semimartingales and the traditional one. It appears at VIII. 79-85, and in fact it assumes less than we did here. Namely, it suffices to ask that the 132 set of all integrals f f dX with f simple and bounded by 1 in absolute value be bounded in probability. Then it is shown that one can replace the law 1P by an equivalent one so that X becomes a quasi martingale - a reasonable, easy to handle type of process (see nO 10 below) which can be readily decomposed into a process with integrable variation plus a martingale. An easy consequence of this decomposition is the fact that semimartingales aren't only right continuous processes: they also have left limits (in the technical jargon, they are cadlag processes). The left limit of X at time t > °is denoted by X t -, and the jump X t - X t - is denoted by ,6,.Xt . Stopping times and localization 4 The simplest bounded predictable processes are indicator functions of intervals - possibly random ones. The product of a bounded predictable process (It) by the indicator function of a (deterministic) interval ] 0, s] is again bounded predictable. Let us denote by Ys the corresponding stochastic integral f; f dX (this is the stochastic analogue of an indefinite integral in classical calculus). It turns out that the process (Ys ) is a semimartingale. Note that our language is slightly incorrect : Ys isn't really a r.v., but rather a class of r.v.'s, and our statement means that one can choose a representative in each class so that the resulting process is right continuous, and is a semimartingale. When thinking of the indefinite stochastic integral as a process, it is convenient to use the notation Y = f· X (think of the notation f· f.1- in integration theory for the measure with density f w.r.t. f.1-). Then comes a very simple and basic property: if Y = f.X, and 9 is bounded predictable, then g.Y = (gf).X, a kind of "associativity" for stochastic integrals. Random intervals come next. For which kind of random variable T with values in [0,1] can one define the stochastic integral f 1 ]0,T] dX? The indicator function of the "random interval" ]0, TJ is left continuous, and for such a process "predictable" means the same thing as "adapted". On the other hand adaptation here means that the set {t ~ T} belongs to Ft for each t E]O,lJ , and this property is the classical definition of a . The stochastic integral of X over JO, T] is X T , the value of the process X at the random time T (XT(W) = XT(w)(w)), and the indefinite integral I]o,T]·X is the process X stopped at T, defined by X; = XTAS. Stopping times lead to an essential idea in semimartingale theory, that of localization. It turns out that if T is a stopping time, the value of the indefinite integral f· X on the random interval ] 0, T] depends only on the values of f and X on this interval. Let now f be a bounded predictable process, meaning that for every ( or a.e.) W the sample function f.(w) is bounded, no longer uniformly in w. Then one can prove there exists an increasing sequence of stopping times Tn ~ 1 such that each one of the processes f I]O,Tn] is uniformly bounded, and such that 1P{Tn < I} tends to 0. The first assumption allows the definition of ftn f dX for each n, and the second one implies that these r.v.'s do not depend 133 on n for n sufficiently large. We denote by J; I dX their limit, which can also be shown not to depend on the sequence (Tn). This gives a simple, but extremely convenient enlargement of the domain of stochastic integration. As an example, consider an adapted process (ft) which is right continuous with (finite) left limits (an adapted cadlag process). Then the process (ft-) of its 1eft limits is left continuous and adapted, hence predictable. It is also bounded, and in this concrete case the sequence Tn can be made explicit: set indeed

Tn(W) = inf{t : Ilt(w)1 > n} (inf(0) = 1) , then Tn is easily shown to be a stopping time, and we have Tn(w) = 1 for n large. On the other hand the process (ft) is bounded in absolute value by n on the open interval [O,Tn [ (a jump may occur at time Tn), and therefore the process (ft-) is bounded by n on JO, TnJ . In particular, the integral J It- dXt is meaningful without uniform boundedness condition. It is easy to see that it can be approximated in probability by Riemann sums, as we did previously. Sample path properties 5 We are going to state an improved version of the dominated convergence theorem, and to deduce from it some useful sample path properties for stochastic integrals. To this end we need a new notation : given a real valued process (fd, we set f* = SUPt I/ti. Then we have (PP VIlI.14) THEOREM. Let (fr) be a sequence of predictable processes, all bounded in absolute value by some finite valued r. v., and converging pointwise to o. Then (fn.x)* converges to 0 in probability. Therefore, the indefinite integrals of these processes tend to 0 in a rather strong sense, implying in particular that there is a subsequence converging a.s. uniformly to o. It is then easy to prove (PP, VIlL3 d» THEOREM. The jumps of the indefinite stochastic integral Y = I· X are given by Doyt = It DoXt . In particular, if X is continuous so is Y. The of a semimartingale

6 Let us compute the difference X; - 2 Jot X s - dXs . We approximate the stochastic integral by its Riemann sums 2: i Xti-(Xt.+ 1 - Xt;} relative to fine enough subdivisions (t;), and write the first term as 2: i (X;'+l - Xi.). Since there are only countably many discontinuities, we may use subdivisions such that X t.- = Xt. a.s., and then it appears that the difference is equal to the limit of 2:i (Xti+1 - X t .)2. It is therefore positive, and increases with t. A right continuous version of this increasing process is denoted by ([ X, X Jt), and called the quadratic variation or the square bracket of the semimartingale X (PP VIIl.16). This notation is justified by the fact that the square bracket can be "polarized" into a bilinear functional. Given two semimartingales X and Y, we have the 134 following formula

XS"i = it X s- dYs + it Ys- dXs + [x, Y]t (3) with 2[X, Y] = ([X + Y, X + Y] - [X,X] - [Y, YJ), an adapted process of finite variation. Remember that we assumed for simplicity that all our integrators had the value 0 at time o. Otherwise, the usual term -XoYo would appear on the left side, or the square bracket would be given the initial value XoYo . Since stochastic integrals and adapted processes of finite variation are semimartingales, we see from (3) that a product of two semimartingales is a semimartingale, a simple and important result. The integration by parts formula is easier to remember in differential notation (4) Another useful result about the square bracket is the computation of its jumps (5) with the obvious consequence that the bracket of two semimartingales, one of which is continuous, is itself continuous. The formula of integration by parts deserves some comments. Since ordinary ( deterministic) functions of bounded variation are semimartingales, it contains the classical integration by parts formula (which, however, isn't very well known in the discontinuous case, and isn't often written in the symmetric form (4)). On the other hand, in the deterministic case the bracket is reduced to the sum of its jumps, it never has a continuous part. This continuous part - denoted [X, y]c - is of purely probabilistic origin. For instance, if X is brownian motion, the bracket [X, X] t is equal to t, and its non-vanishing comes from the fluctuations of the brownian path, which make it nowhere differentiable. A semimartingale X with vanishing bracket [X, X] is continuous and its paths are of bounded variation. We use the bracket notation here in spite of its shortcomings. For instance, it is unacceptable to physicists, who understand [X, Y] as a commutator. In similar problems they write dXs dYs for d[ X, Y] s , which is excellent, but doesn't provide a good notation for the process fo• dXs dYs . We must also mention the existence of a second "bracket" to come later, the angle bracket (PP VII.39) given formally by

t = it IE[dXsdYsIFs ] .

Since we are doing probability, not , we will use the square bracket notation without worrying. The bracket of two stochastic integrals is computed as follows (PP VIII.22)

[J .X, g. Y]t = it jsgs d[ x, Y] s (Stieltjes integral). 135

This formula too is more pleasant III the shorthand of differential notation : d [fox, goY] = (fg)o [X, Y] Ito's formula 7 Ito's formula is the cornerstone of stochastic calculus. However, it isn't in itself deeper than the integration by parts formula, of which it is a relatively easy extension (PP VIIL27). Let (Xt ) be a semimartingale with values in lRd (i.e. each component of (Xt ) is a real semimartingale), and let f be a twice continuously differentiable function on lRd • Then the real valued process (foXt) is a semimartingale. The whole subject of stochastic differential geometry takes its roots in this C2 stability property, and therefore our only point here is to mention that it extends to non• continuous semimartingales. "Ito's formula" (which in fact is essentially due to Kunita-Watanabe in this general form) gives an explicit representation of the increment f(Xt) - f(Xo) as a sum of three terms (7.1) (7.2) (7.3), which we comment upon and compare with the continuous case. We first have a stochastic integral term (7.1) which is exactly the same as in the continuous case except for an important point: we had to write X s- instead of Xs in the integrand to make it previsible! The second term doesn't exist in the continuous case, since it arises from the jumps. It comes as an absolutely convergent sum over all jumps

(7.2)

(we recall that E II~X~II is convergent over any finite interval of time). Up to now the formula would be true for a deterministic curve of bounded variation and a function f of class C1 , but now comes the final term involving second

(7.3) which is exactly the same as in the continuous case, except for notations: we have used the continuous part of the square brackets instead of the bracket itself (in the continuous case, the result is obviously the same). There is no serious need of using X s - here: this is a Stieltjes integral against a of bounded variation, so that we get the same result writing Xs instead. The reason was simply one of homogeneity with the preceding formulas. Let us note that there are more general functions preserving the semimartingale property than the C2 functions: convex functions will do (and also of course differences of convex functions), see PP VIL33 and VIIL26. This applies in particular to the absolute value of a real valued semi martingale, and gives rise to the useful theory of local times (PP VIIL29). 136

Stochastic differential equations 8 We end our list of results which depend only on the equivalence class of the law lP by a mention of stochastic differential equations, a subject on which we cannot give many details. Eve~ in the deterministic case, it isn't usual to solve differential equations driven by a discontinuous function of bounded variation! It turns out that it is possible to solve stochastic differential equations of the form

dYt = f( t, Yt-) dXt , Yo prescribed (8.1) where the driving process (Xt ) is a vector semimartingale as well as the unknown process (rt), and f is a matrix of the proper dimensions which satisfies the same Lipschitz conditions as in the theory of ordinary differential equations. Following the path first taken by Ito, one can extend to stochastic differential equations all the fundamental theorems of the deterministic theory : existence and uniqueness (Kazamaki, C. Doleans-Dade, Ph. Protter), and stability (M. Emery, Ph. P rot ter ). Recently, differentiability with respect to initial conditions was developed by P. Malliavin into an extremely powerful tool belonging to his stochastic (a slightly misleading name). Martingale theory 9 Warning: we are going to work now on tbe wbole balf-line IR+ instead of a finite interval. In tbis context a semimartingale is understood as a process whicb satisfies tbe preceding discussion on any finite interval of IR+ . We shall return to the beginning, and really start using the probability law JP. Let us first recall the meaning of the filtration (Ft ) : all Ft-measurable random variables are considered to be known by time t. Functions which aren't known must be estimated or predicted, and the usual estimator of a r.v. f at time t is its conditional expectation IE [f I F t ] , which is meaningful provided f is integrable, but whose meaning is particularly clear when f is square-integrable, since then it is simply the projection of f on the space L 2 (Ft ) of all known r.v.'s. Conditional expectations may be used to define the average trend of a stochastic process (Xt) with integrable r.v.'s. It is natural to consider that X has a downward (upward) trend if for every pair s < t, the conditional expectation IE [Xt I Fs] is below (above) the present value Xs. Processes which have no trend at all are pure fluctuations, and in they are called martingales. Curiously enough, processes with a downward trend are called super martingales (this name comes from the superharmonic functions of , but we don't pause to give a detailed explanation), and of course processes with an upward trend are submartingales. Let us insist that being a martingale isn't a property of the individual random curves X. (w) (though it has some general implications on the behaviour of these random curves, as we shall see below). It is a statistical property, depending on the probability law lP, and also on the filtration which describes at each time the observer's knowledge. Real martingales are used to model many kinds of random 137 curves, some of which may be of natural objects (like a in thermal motion in a ), and others being abstract curves like a graph on a sheet of paper or a computer screen, plotting the fortune of a gambler in a fair game, or the electrical fluctuations of an amplifier. One of the important examples of martingales are processes Xt = IE [X I :Ft ] predicting at time t a fixed (integrable) random variable X. In fact, the definition of martingales Xs = IE[Xt l:Fs ] for s < t shows that every martingale can be reduced to this example on any finite interval, so we aren't losing much generality if we restrict the discussion to these "prediction martingales" . This will allow some simplifications in the statement of results. Note that each X t is defined up to a.s. equality, and it is very satisfactory that a choice can be done so that the resulting process is right continuous with left limits, i.e. has no discontinuities worse than jumps (PP VI.2-4). Why jumps at all? Since we are predicting something which remains fixed, it is clear that the jumps come from the filtration, that is, from discontinuities in the information we are using to predict X. Once a good version has been chosen, the martingale property Xs = IE[Xt l:Fs ] isn't only valid at fixed times, but also at stopping times ("Doob's stopping theorem", PP VI.lO), a basic property since nature may provide us with a natural time scale, but never with a natural time origin, and we are always measuring time onward from some random origin. It is also satisfactory that the prediction X t , under very mild restrictions, a.s. converges to X ("martingale convergence theorem", PP VI.6), and that the supremum X* = SUPt IXtl of the whole prediction process can be effectively controlled if X isn't too large ("Doob's inequalities", PP VI.I). On the other hand, is there, and the paths of (Xt ) are quite irregular. Except in trivial cases they won't converge to X either from below or from above, but rather by reducing slowly their fluctuation around X. Above all, like all semimartingales they are quite often of infinite variation (a continuous martingale path cannot be of finite variation on any interval, unless of course it is constant on it), but have a finite quadratic variation [X, X] t • In the case of martingales, this quadratic variation process is of special importance, because it controls the size of the martingale itself. The "Burkholder-Davis-Gundy, or BDG, inequalities, PP VII.92, imply in particular that the norms in LP of the random variables X* and J [X,X]oo are equivalent for 1 ~ p < 00. These inequalities are deep, and are at the root of many applications of martingale theory to analysis. After this loose sketch of martingale theory, let us return to stochastic inte• gration, and therefore restrict ourselves to martingales X such that Xo = O. A general idea to keep in mind is that martingales form a s.ubclass of semimartingales which is stable under stochastic integration. However, it needs to be made precise. The correct statement requires the use of the notation X* introduced above, and of the class HP (1 ~ p < 00) of martingales X such that X* E LP. Then the stochastic integral f·X of a bounded predictable process f with respect to a mar• tingale X E HP still is a martingale of the class HP. This isn't an easy result, specially for p = 1 : it requires the use of the BDG inequalities mentioned above, 138 to allow HP to be defined via the square bracket, which behaves very simply under stochastic integration. To avoid all integrability restrictions, we introduce the class of local martin• gale8, first defined by Ito and S. Watanabe (PP VI.27-29). A right continuous, adapted process X with Xo = 0 is a if there exist stopping times Tn increasing to infinity, such that the stopped processes x'{n = XtATn are martingales. This class includes all martingales X such that Xo = 0, and i8 8table under 8tocha8tic integration by any locally bounded predictable proce88 f (PP VIII.3 e)). The restriction Xo = 0 may be removed: one then says that X is a local martingale if X - Xo satisfies the preceding definition. This is quite good, but on the other hand one needs a simple criterion to decide whether a local martingale X is a true martingale. The simplest is the following, a special case of PP VI.30 f) : it is sufficient that X* be integrable. On the other hand, this is equivalent (BDG) to J [X, X] 00 E L1, which is usually the most practical condition to test. We end this section with a remark. In the application of probabilistic models to the real word, it is standard to replace a random variable by its expectation in a first approximation (the giving a rough measure of the correctness of this operation), while fluctuations around the mean must be taken into account for a second approximation. From this point of view, (local) martingales X such that Xo = 0 appear as a class of processes negligible in first approximation. Thus we are led to the idea of a "calculus modulo local martingales", which will be taken up in the next section. Compensation 11 Martingales, and more generally local martingales, have just been interpreted as processes without a trend. Our first aim now will be defining the in8tantaneou8 trend of a process (Xt ). A natural idea consists in defining a kind of "conditional velocity"

(11.1) assuming of course that the random variables are integrable, so that the conditional expectation has a meaning. Such a definition has been used by Kolmogorov at the very beginning of theory. The (adapted) processes which have a velocity equal everywhere to 0, and therefore correspond to the constants in the deterministic case, can be shown (under mild regularity conditions) to be the martingales (with arbitrary initial value). Therefore the process X t - J; VS ds is a martingale. This may be considered the most elementary example of a compen8ation : subtracting from a given process a smoother one (here, differentiable), and get a martingale. It turns out that if we replace the requirement of differentiability by that of bounded variation, we can get much more general results (PP VII.12). We first 139 recall that submartingales are processes (Xt ) with integrable random variables, such that lE[Xt IFs] ~Xs for s

t 1 At=limhl o i0 'hlE[Xs+h-XsIFs]ds. (11.2) and the difference X - A is a true martingale.

For positive submartingales the last result IS true without any additional integrability condition.

In particular, if (Xt ) is an adapted increasing process with lE[Xt ] < 00 for every t, one can subtract from it a unique predictable increasing process (compensator) At with Ao = 0 so that X - A is a martingale, and the computation (11.2) is applicable. The basic example illustrating this situation is that of a Poisson process (Xt) of intensity A, for which At = At. Another example is that of the square bracket [X, X] of a square integrable martingale X, for which the compensator is called the angle bracket . The angle bracket can be computed in many cases by the same kind of trick as (11.2)

(11.3)

The existence of the (increasing, predictable) compensator of an increasing process (Xt ) with Xo =0 doesn't demand the integrability of the r.v.'s X t : local integra bility suffices, in the sense that there should exist stopping times Tn increasing to infinity such that Xt/\Tn is integrable. Martingales constructed by compensation of processes with bounded variation have some special properties. Observe that a process of bounded variation is the sum of its jump part, plus a continuous part, which is predictable and hence disappears in the compensation procedure. Therefore the martingale itself is the compensated sum of its jumps, and its square bracket is a purely discontinuous increasing process. Conversely, any (local) martingale with a purely discontinuous square bracket can be shown to be the "compensated sum of its jumps" though the interpretation isn't so simple minded, the sum of the jumps being usually divergent. What is true is that if you sum the jumps of a local martingale X which are larger than c in absolute value, you get something which you can compensate, and you let c tend to 0 after compensating. Then you get a limit X d which is called the compensated sum of the jumps of X and contributes exactly the discontinuous part of the square bracket. There may be a remainder XC = X - X d which is called the continuous (local) martingale part of X (the word "local" is generally 140 omitted here) and contributes the continuous part [X, X]c of the square bracket (PP VIII.42-45). This extends to martingales the celebrated analysis, due to Paul Levy, of the sample paths of processes with independent increments. Note that eventually the notion of Levy measure itself was extended to local martingales by M. Yor. A third example where compensators exist is that of "quasimartingales", or processes with bounded stochastic variation. These are adapted processes X with integrable random variables such that for each t the expectation (similar to a total variation in the deterministic case)

remains bounded over the set of all subdivisions (t;) of [0, t] . In this case, the (predictable) compensator is a process of bounded variation, but it is no longer increasing as in the case of submartingales. We give no details here, and refer the reader to PP VI.38 and the appendix on quasimartingales. Finally, last but not least, the processes X which can be decomposed into a sum X = Xo+M +A, M being a local martingale and A a process of bounded variation (and Mo = Ao = 0), are exactly the semimartingales, i. e. the integrators of nO 1. If such a decomposition exists with A predictable, then it is unique, and is called the canonical decomposition of X, while X itself is called a special semimartingale - but it doesn't necessarily exist if the jumps of X are too large (PP VII.23-25). On the other hand, if we replace the probability IP by an equivalent law, the class of special semimartingales isn't necessarily preserved, and even if a given semimartingale is special under both laws its canonical decomposition changes (as also does the compensator of a given process). The way the decomposition changes isn't too complicated, however. It is given by the so called "Girsanov theorem", PP VI1.49. We hope that our reader, who had the patience to follow us up to this point, has been convinced that the theory of semimartingales deals with rather simple and intuitive ideas, even in the discontinuous case. While the techniques of proof are rather straightforward, the details are somewhat lengthy, involving possibly too much measure theory for non specialists. So why not leave them to the specialist and go ahead? In the remainder of this appendix we shall "go ahead" and will illustrate the potentialities of stochastic calculus with a few classical and beautiful examples. Brownian motion 12 Consider a local martingale X with Xo = 0, satisfying the following "structure equation" [X,X]t = t . (12.1 ) We are going to describe completely the law of this process. First of all, since the square bracket is continuous, the process X itself must be continuous. Let us then 141 apply Ito's formula, in its simpler version for continuous semimartingales, to the function f(x) = eiux . We have

2 eiuXt = 1 + iu t eiuX• dXs _ u t eiuX• ds . (12.2) 10 2 10 The second term to the right is a stochastic integral w.r. to a local martingale, hence a local martingale. Writing it as a combination of all the remaining terms we see that it is uniformly bounded on finite intervals. Hence it is a true martingale of expectation O. Taking expectations and setting h( t) = lE [e iuXt ] we get for h the

2 h(t) = 1- -u 1t h(s)ds 2 0 whose is h(t) = e-tu2 / 2 • Hence the random variable X t is gaussian with mean 0 and variance t. Taking a little more care we might have studied the conditional law of the increment X t - Xs (s < t) given some event A E:Fs , with the result that it is gaussian of mean 0 and variance t - s. Since this doesn't depend on A our martingale has independent increments, and is a standard brownian motion. This celebrated characterization of brownian motion is due to Levy, and this beautiful proof to Kunita-Watanabe. Note that we haven't proved the existence of a martingale satisfying (12.1), which is Wiener's theorem. Levy's theorem has a d-dimensional version, with essentially the same proof: local martingales (xi) such that [Xi, X j ] t = 6ij t are the components of a d-dimensional brownian motion. Levy's unique characterization of the law of brownian motion is intimately related (PP VIII.62) to another important property of brownian motion. Namely, any square integrable r.v. f measurable w.r.t. the a-field generated by all the r.v.'s X t ) has a stochastic integral representation

(12.3) where (Hd is some predictable process such that lE[lo00 H; dsJ < 00. This is called the predictable representation property (PRP) of brownian motion, and can be extended to d-dimensional brownian motion, provided the stochastic integral is understood as 1000 Hs-dXs for a vector process H. Note that (12.3) implies in this case the continuity of all square integrable martingales (even of all local martingales, but this requires a little more work). Though the predictable repre• sentation property is rather exceptional (it can be interpreted as an extremality property), many examples are known besides brownian motion - for instance, the compensated Poisson process considered in the next section. Let us give an idea of the proof. We denote by :r the subspace of L2 consisting of all r.v.'s f representable as a stochastic integral (12.3), with lE[f] = 0 It isn't 142 difficult to prove that .:T is a closed subspace, so that the PRP amounts to showing that .:T.L consists only of the constants. We shall only prove the weaker result that every bounded r. v. h orthogonal to .:T is a constant. Adding a constant if necessary we may assume h to be positive, then we normalize it to have expectation 1 and we set Q = h.1P, a new probability law. Our original brownian motion X still is a continuous semimartingale under Q, with the same square bracket [X, XJt = t as before. Let us prove that X is a martingale under Q. To this end, choose s < t , A E .rs , and observe that the r. v. fA (Xt - Xs) is a stochastic integral, and hence is orthogonal to h. According to Levy's theorem, X then is a brownian motion under Q, meaning that Q and 1P coincide on the a-field generated by X, and finally that h = 1 a.s .. This remark (due to Dellacherie) has been the starting point of much work on the PRP, leading to definitive results by Jacod-Yor. There is a bridge between Levy's theorem and the PRP, and another bridge between the PRP and another celebrated result of probability theory, the Wiener• Ito expansion of random variables into a of multiple stochastic integrals. Let us do a formal computation assuming that X has the PRP. We start from (12.3), and expand again each r.v. of the process (Hs)

Since Hs is .rs-measurable, the interval J 0,00 [ can be replaced by J 0, s J , and we have

We iterate this procedure, expanding the r.v. H sr , taking out its expectation h( s, r) and writing the remainder as a triple integral, etc. One can turn these formal manipulations into correct , and show that the successive terms of the expansion are mutually orthogonal. Essentially, this depends only on X being a martingale with mean 0 and angle bracket t. Under these hypotheses a non trivial mathematical problem arises, for which no general approach is known: does this expansion always converge to f ? The answer is known to be positive for brownian motion, for the compensated Poisson processes below, and very recently (Emery, 1988) for a family of simple and interesting martingales first investigated by Azema. It isn't very likely that it is always positive, though no explicit counterexample is known. Poisson process 13 Consider now the following "structure equation", just a little more compli• cated than (12.1) : X is a real valued local martingale, and

(13.1) where c -=f 0 is a constant. At any jump time t we have b..[X, XJt = b..Xl = cb..Xt , and therefore b..Xt = c. Since the sum of the squares of the jumps of X is 143 a.s. convergent on finite intervals, only finitely many jumps can occur on [ 0, tJ . The equation (13.1) tells us that cXt is a finite variation process, and therefore X has no continuous martingale part. Then the bracket to the left is reduced to the sum of its jumps, and must be constant between them, which means on the right that between jumps X t is a linear function of slope -l/e. Setting c Nt = X t + ~t defines a counter Nt, i. e. an increasing step process with jumps equal to 1, and ~Xt = Nt - >..t is a local martingale, where A = 1/e2 • By the local martingale property there exist stopping times Tk i 00 such that X stopped at Tk is a true martingale. Therefore XtATk is integrable of mean 0 and IE[ t A TkJ ::; At. By monotone convergence Nt is integrable of expectation At. Finally, from AXtA S Nt + >..t we get that X* ELI. We might now apply the Ito formula as in the preceding proof. It is simpler, however, to apply it to (Nt) instead of (Xt ) - in fact, it is a trivial formula, N being a step process, and it is easy to rewrite it as

We now take expectations, replacing in the last integral dNt by its compen• sator >.. dt (this amounts to saying that a stochastic integral w.r.t. X has ex• pectation 0). Then we get for the function h( t) = IE[ eiuN, ] the differential equation h( t) = 1 + c Jot h( s )ds, c = >..( eiu - 1) whose solution is h( t) = eet, thus showing that Nt has a Poisson law of expectation >..t. As in the case of brownian motion, we can deal with the increments of the process instead of the process itself and prove that (Nt) has independent increments, and is a Poisson process with unit jumps and intensity>... So finally (Xt ) is a compensated Poisson process, with jump size e and intensity 1/e2 . This proof is due to S. Watanabe, but our presentation has a rather novel feature: the use of structure equations to describe both brownian motion and Poisson processes. For structure equations, see a paper by M. Emery in Seminaire de Probabilites XXIII (1989). Here too, we have a several dimensional version, involving local martingales (XI) such that [Xi, Xi Jt = t+eiX: and the off-diagonal brackets are equal to o. Then what we get is a family of independent compensated Poisson processes. This independence is crucial in the proof of Levy's decomposition for processes with independent increments. Brownian motion and harmonic functions

14 Let (Xt ) be a IRd-valued standard brownian motion, such that Xo = o. The process B t = x+Xt is called brownian motion starting from x (more generally one could take for Bo anyr.v. independent oftheprocess X). Since [Xi ,XjJt=8ijt, we can write Ito's formula, applied to a function f of class C2 , as follows

t f(Bt) = fCBo) + i Vf(Bs)·dBs + -lit tlf(Bs)ds. (14.1) 0 2 0 144

Therefore the process Mt = f(Bt) - t f; b..f(Bs) ds is a local martingale, whose bracket is [M, M]t = f; II \7 f(Bs) 112 ds. In particular, if f is harmonic (b..f =0), the process (f oBd is a local martingale, and if f is superharmonic ( f 2: 0, b..f ~ 0 ) then (foBt ) is a positive supermartingale. This old remark (Kakutani, Paul Levy, ... ) may be considered as the starting point of the relations between probability and potential theory. Doob removed the restriction f E C2 from the statement concerning superharmonic functions, and proved that for an arbitrary superharmonic function (typically lower semicontinuous, not continuous), the process f(Bd still is a continuous supermartingale, a discovery which started a new era in the theory of Markov processes. He also applied martingale inequalities, convergence theorems, etc to potential theory - but we won't give details here, because the most interesting results concern harmonic or superharmonic functions in an open set, and this localization procedure involves some probabilistic technicalities. Let us just make a remark concerning a case where no localization is necessary: a closed set F if said to be polar if the brownian motion (Bt ) starting from an arbitrary point x f/. F never hits F. Trivial examples of closed polar sets are a discrete countable set in m?, a smooth curve in Ie ... It is then clear that all the above results hold in the complement of a closed polar set instead of IRd.

Let us now take d = 2 and instead of B t as above, denote the 2-dimensional brownian motion by Zt = X t + iYi, in conformity with the standard notation z = x + iy for a complex variable. This process is also called complex brownian motion. Extending the definition of the square bracket to complex processes, and using the relations [X,X]t= [Y, Y]t=t, [X, Y]t=O we get

[Z,Z]t = [Z,Z]t = 0 [Z,Z]t=t. (14.2)

A continuous complex local martingale M such that--.lM,M] = 0 is called a conformal martingale. By conjugation it is clear that M is a conformal martin• gale too, and the only non zero bracket associated with a conformal martingale is [M, M] , which is real and increasing. Levy's theorem for conformal martin• gales states : any conformal martingale M with [M, M] t = t has the same law as complex brownian motion. Examples of conformal martingales are provided by images of complex brownian motion under conformal mappings, i. e. holomorphic or antiholomorphic functions. Indeed, Ito's formula can be applied as well to the complex process (Zt) and a complex C2 function f, so let us take f to be holo• morphic (or just meromorphic : the set of poles of f is closed polar). An easy computation using the Cauchy-Riemann equations shows that Ito's formula can then be written as

(14.3) where f' denotes the of f in the complex sense. In particular, the process M t = fOZt is a complex local martingale (in accordance with the general 145 result about harmonic functions). It is also a conformal martingale, since we have

(14.4)

Products of conformal martingales can easily be shown to be conformal, but sums aren't necessarily conformal, as the trivial example of M + M shows. The main property of conformal martingales consists in following the same paths as complex brownian motion with a different speed. Indeed, let M be a conformal martingale, and let (At) denote its associated increasing process [M,M]. Assume for simplicity that (At) is strictly increasing and that its sample paths increase to infinity as t -+ 00 (one can show that these conditions are satisfied in the concrete case (14.4) unless f is constant). Then the stopping times at(w) = inf {s : As(w) > t} (14.5) are finite valued, and the paths a.(w) are increasing and continuous. The time changed process M ut can be shown to be a local martingale with respect to a suitably time changed filtration, and it has now the same bracket structure as complex brownian motion. So by Levy's theorem it must have the same law. This fact (already known to Levy) allows the application of martingale methods in the theory of analytic functions of one complex variable (see Durrett's book mentioned in the bibliography). The extension of martingale methods to several complex variables has also attracted much interest, but the problems are quite difficult. Martingale problems and the gamma operator 15 Let L be some linear mapping from the space D of Coo functions on IRd with compact support to, say, bounded continuous functions on IR d • According to Stroock and Varadhan, we say that a semimartingale (Xt ) taking values in IRd is a solution of the martingale problem associated with L if, for any function f E 1) , the following process M(f) is a local martingale:

(15.1 )

For instance, by (14.1) brownian motion in IRd with an arbitrary initial point is a solution of the martingale problem associated with L = t 6.. Since f and Lf in (15.1) are bounded, this local martingale is in fact a true martingale with initial value 0, and therefore has zero expectation. Assume for a that Xo = x , and for any bounded Borel function g denote by pte x, g) the expectation of goXt ; integrating (15.1) we get for fED

Pt(x, f) = f(x) + it Ps(X, L!) ds , (15.2) If we can find a solution of the martingale problem for an arbitrary starting point x, we may consider Pt(-,!) as a function on IRd , and g(t,x) = Pt(x,f) 146 looks like the solution of the Cauchy problem for the :t g = Lg, g(O,.) = f, except that we have PsLf instead of LPsf. However, the two problems are closely related, and solving martingale problems gives a lot of information about the heat equation. For instance, Stroock and Varadhan have solved the martingale problem when L is a second order elliptic operator with merely continuous coefficients, and their probabilistic construction also solves a Cauchy problem, which apparently doesn't come within reach of the usual functional analytic methods. Since the local martingale M(f) is uniformly bounded on finite intervals, it is also square integrable. Then its square bracket is integrable, and it has a compensator, the angle bracket, which we are going to compute. This will illustrate how the general theorems about compensation are used.

We shall compute modulo local martingales, using the symbol rv to mean that two semimartingales differ by a local martingale. We shall also use heavily the differential notation shorthand. Denoting by f, g two elements of 1), the definition of the martingale M(f) can be written df(Xt) rv Lf(Xt) dt = dAt , where At = J; Lf(Xs) ds, similarly dg(Xt ) rv Lg(Xt) dt = dBt , and since 1) is an algebra d(f(Xt)g(Xt)) rv L(fg)oXt = dCt . On the other hand, the formula of integration by parts can be written

df(X)g(X) rv f(X-) dg(X) + g(X-) df(X) + d

dC rv f(X) dB + g(X) dA + d

Since the equivalent terms are predictable and of bounded variation they are equal. We have thus computed the angle bracket of df(X) and dg(X). This is the same as the angle bracket of the martingales M(f) and M(g), since they differ from the preceding processes by continuous processes of bounded variation, which never contribute to brackets. Finally

from which we deduce easily r(f,1) for j = Lj ajeUj ' Since this must be a positive function we get the inequality

2: ('ljJ(Uj - Uk) - 'ljJ(Uj) - 'ljJ( -Uk)) ajak eUj-Uk(x) ~ 0 j,k valid unless x belongs to some exceptional set. Forgetting about exceptional sets and taking x = 0 we get exactly the usual definition of a function 'ljJ oj negative type, which therefore turns out, rather unexpectedly, to have a probabilistic meaning. Let us add a few comments on the case of a smooth elliptic second order differential operator L on a manifold. First of all, the gamma operator r(f, g) is a derivation in j for fixed g. This seems to be the correct axiom to introduce if one wishes to define a diffusion (Xt ) taking values in a space E without differentiable structure (there exist interesting examples of this situation, the first one being Malliavin's Ornstein-Uhlenbeck process on Wiener space, and more recent ones concerning ). Next, assuming that L maps the algebra A into itself, Bakry has defined the higher order gammas, the first of which is

2r2 (f,g) = Lr(f, g) - r(f,Lg) - r(Lj,g) . (15.3)

Just as the first gamma is a generalized metric, the second one has been interpreted by Bakry and Emery as a generalized Ricci curvature. We can do no better than referring the reader to their papers in the volumes XIX, XXI and XXII of the Seminaire de Probabilites. 148

The DoIeans exponential 16 In this section we give some applications of the integration by parts formula to linear stochastic differential equations. Let (Ct ) be a semimartingale taking values in the space of (d, d)-matrices, such that Co = o. Then the right and left stochastic exponentials of this process are defined as the of the stochastic differential equations

(16.1)

Note that if C is a local martingale, the same is true for Land R. Of course, the two exponentials are exchanged by transposition. These processes are multiplicative stochastic integrals I1 s

L t = expeCt - t [C, cJD II (I + L\Ct) e-.c:..Ct (16.2) s:=;t

[C, C ] ~ denoting as usual the continuous part of the square bracket. Note that the absolute convergence of Ls L\C; implies the absolute convergence of the infinite product as it stands, but e-.c:..ct is also hidden in the first term exp (Cd, and thus appears twice in this formula with opposite signs. We shall not prove this formula in full generality (see Sem. Prob. X, for details). Let us just prove that if C is a continuous semimartingale (no product, and no C necessary on the bracket) we have dLt = Lt dCt . Set U = eC , V = e-HC,C] . Ito's formula for the function eX gives us

As V is continuous of bounded variation we have the ordinary formula d(UV) = U dV + V dU from which the result follows. Let for instance B t be standard brownian motion. Then £(AB) = ZA is the solution of the differential equation 149

The above formula gives an explicit expression for Z),

(16.3) which we compare with the generating function for

),2 ..\n exp(..\x - -2) = Hn(x) 2: n -,n.

Setting hn(t, x) = tn/ 2 Hn(x/Vt) we get

(16.4)

Since Z), is a local martingale, we expect the coefficients of its Taylor expansion to be local martingales too. This is true, since hn satisfies the heat equation

The first local martingales we get in this way are (Bd, (Bl- t), (B: - 3tBt ) ... On the other hand, the equation (16.3) can also be solved by the Picard method

s Z; = 1 +..\ 1t dBs + ..\21t dBs 1 dBr + ... so that by identification with (16.4) we get the explicit value of the iterated integrals. This isn't a quite rigorous proof, but it indicates the correct result. Finally, let us mention that the (scalar) Doh~ans exponential plays an important role in statistical problems, because of the Girsanov theorem (PP VII.49). Let IP be a probability, chosen as a reference for a class of equivalent probability laws Q, which is supposed to contain the "real" law of some stochastic process under observation. In classical statistics we would characterize Q by its density ~~, but in our situation time plays a role, and we use instead the density martingale dQ Qt = dIP computed on the a-field Ft.

It has expectation 1, and it can be shown to be strictly positive - more precisely a.e. sample path is bounded from below on [0,00 ] . Usually Fo is the trivial a-field, so that Qo = 1. We know that the class of semimartingales is invariant under the change of law, but we need explicit formulas to transfer information from the theory of processes under Q to that under IP and backwards. It is very easy to see that a process X is a Q-martingale if and only if X/Q is a IP-martingale, but there is a better way : the Girsanov transformation is a mapping from Q-(local)martingales to IP-(1ocal)martingales which involves only 150 an additive correction by a predictable process of bounded variation, namely the mapping t dQ X I--t X + where M t = i Q s o s- and we have used the (predictable) angle bracket under lP, which of course must be assumed to exist. Note that M is the "stochastic logarithm" of Q i. e. the stochastic exponential of M is Q. To prove this, we recall that X is a semimartingale under lP, and use the symbol", to denote equivalence modulo lP-Iocal martingales. Since X is assumed to be a / Q _ = d and we are finished. Conversely, it is easy to see that if X+ '" 0 is a local martingale under lP then X is a local martingale under from from lP-martingales to t=Xt - Jot Hs ds is a local martingale, and since its square bracket is still equal to t it is a brownian motion. Thus the effect on the brownian motion X of the replacement of lP by