Working with Thursday, March 03, 2011 2:00 PM

Readings: Shreve Ch. 2 Grigoriu Sec. 2.17 (several mistakes)

Homework 2 to be posted tonight, due Friday, March 11 at 5 PM.

We discussed last time the relatively simple case where the -algebra of information corresponded to the information generated by a discrete Y. For this simple case, the abstract machinery is not so necessary, but it becomes very useful for more complicated situations where the information is more difficult to describe concisely, i.e., the behavior of the stock market over the next month.

The key properties of conditional expectations that are used in calculations are:

See the references for the technical general definition of independence. Intuitively, the information contained in has no relation with the random variable

(intuitively, that means that when Y is completely determined by the information in then Y behaves deterministically in the conditional expectation. Shreve calls this Taking Out What is Known (TOWK) This generalizes the notion:

Multi-level coarse-graining:

2011 Page 1 Important special case is when

This is just the law of total expectation, generalized.

In an elementary probability class where for example is generated by a discrete random variable Y

where

is a random variable which we can treat as a function of the random variable Y

One silly mistake in Grigoriu is he has a nonsense formula for

In this abstract setting, conditional probability is actually defined through conditional expectation:

2011 Page 2 Examples for computing with conditional probability:

Consider a random sum:

where N, Xn are independent random variables, all the Xn have the same probability distribution (they are iid: independent and identically distributed.)

this is a trick (widely useful in stochastic processes) to move a random variable from an index in a summation (or limit of an integral) into a form that it can be taken out of the conditional expectation by converting the random sum into a deterministic sum of random variables.

Now that the sum is over a deterministic set of indices we can use linearity of expectation (also conditional expectation):

So by TOWK,

2011 Page 3 As another example, let's derive the , which is analogous to the law of total expectation and is worth knowing in its own right.

The first term characterizes the level of uncertainty that remains in X once the information in is known, whereas the second term characterizes the uncertainty in the underlying information

Derivation:

We now establish a lemma:

2011 Page 4 We now establish a lemma:

Proof of Lemma:

This establishes the lemma.

So now we use this lemma to continue our derivation of the law of total variance:

2011 Page 5 Filtrations of sigma-algebras and connections to stochastic processes

Suppose we are given a stochastic process

We can define for each moment of time t a -algebra corresponding to the information generated by the stochastic process up to time t. Mathematically, we define cylinder sets:

and define to be the -algebra generated by all these cylinder sets.

This family of -algebras has the folllowing properties:

This is a special case of a filtration.

A filtration is a collection of -algebras

on a

such that:

2011 Page 6 We say that a stochastic process is adapted to a filtration

provided that for all t

This means that the information in the filtration is enough to completely determine the stochastic process as we go along in time.

2011 Page 7