<<

theory and

Alexander Grigoryan University of Bielefeld Lecture Notes, October 2007 - February 2008

Contents

1 Construction of measures 3 1.1Introductionandexamples...... 3 1.2 σ-additive measures ...... 5 1.3 An example of using ...... 7 1.4Extensionofmeasurefromsemi-ringtoaring...... 8 1.5 Extension of measure to a σ-...... 11 1.5.1 σ-rings and σ-...... 11 1.5.2 Outermeasure...... 13 1.5.3 Symmetric difference...... 14 1.5.4 Measurable sets ...... 16 1.6 σ-finitemeasures...... 20 1.7Nullsets...... 23 1.8 in Rn ...... 25 1.8.1 Productmeasure...... 25 1.8.2 Construction of measure in Rn...... 26 1.9 Probability spaces ...... 28 1.10 Independence ...... 29

2 Integration 38 2.1 Measurable functions...... 38 2.2Sequencesofmeasurablefunctions...... 42 2.3 The Lebesgue for finitemeasures...... 47 2.3.1 Simplefunctions...... 47 2.3.2 Positivemeasurablefunctions...... 49 2.3.3 Integrablefunctions...... 52 2.4Integrationoversubsets...... 56 2.5 The Lebesgue integral for σ-finitemeasure...... 59 2.6 Convergence ...... 61 2.7 Lebesgue spaces Lp ...... 68 2.7.1 The p-norm...... 69 2.7.2 Spaces Lp ...... 71 2.8ProductmeasuresandFubini’stheorem...... 76 2.8.1 Productmeasure...... 76

1 2.8.2 Cavalieriprinciple...... 78 2.8.3 Fubini’stheorem...... 83

3 Integration in Euclidean spaces and in probability spaces 86 3.1ChangeofvariablesinLebesgueintegral...... 86 3.2Randomvariablesandtheirdistributions...... 100 3.3Functionalsofrandomvariables...... 104 3.4Randomvectorsandjointdistributions...... 106 3.5Independentrandomvariables...... 110 3.6Sequencesofrandomvariables...... 114 3.7Theweaklawoflargenumbers...... 115 3.8Thestronglawoflargenumbers...... 117 3.9 Extra material: the of the Weierstrass using the weak law oflargenumbers...... 121

2 1 Construction of measures

1.1 Introduction and examples Themainsubjectofthislecturecourseandthenotionofmeasure (Maß). The rigorous definition of measure will be given later, but now we can recall the familiar from the elementary mathematics notions, which are all particular cases of measure: 1. of intervals in R:ifI is a bounded with the endpoints a, b (that is, I is one of the intervals (a, b), [a, b], [a, b), (a, b]) then its length is defined by

(I)= b a . | − | The useful property of the length is the additivity:ifanintervalI is a disjoint of n a finite family Ik of intervals, that is, I = Ik,then { }k=1 k n F (I)= (Ik) . k=1 X N Indeed, let ai be the of all distinct endpoints of the intervals I,I1,...,In enumer- { }i=0 ated in the increasing order. Then I has the endpoints a0,aN while each interval Ik has necessarily the endpoints ai,ai+1 for some i (indeed, if the endpoints of Ik are ai and aj with j>i+1 then the point ai+1 is an interior point of Ik, which that Ik must in- tersect with some other interval Im). Conversely, any couple ai, ai+1 of consecutive points are the end points of some interval Ik (indeed, the interval (ai,ai+1) must be covered by some interval Ik; since the endpoints of Ik are consecutive numbers in the aj , { } it follows that they are ai and ai+1). We conclude that

N 1 n − (I)=aN a0 = (ai+1 ai)= (Ik) . − − i=0 k=1 X X 2. of domains in R2. The full notion of area will be constructed within the general measure theory later in this course. However, for rectangular domains the area is defined easily. A rectangle A in R2 is defined as the direct product of two intervals I,J from R: 2 A = I J = (x, y) R : x I,y J . × ∈ ∈ ∈ Then set © ª area (A)= (I) (J) . We claim that the area is also additive: if a rectangle A is a disjoint union of a finite family of rectangles A1,...,An,thatis,A = k Ak,then Fn area (A)= area (Ak) . k=1 X For simplicity, let us restrict the consideration to the case when all sides of all rectangles are semi-open intervals of the form [a, b).Considerfirst a particular case, when the

3 rectangles A1, ..., Ak form a regular tiling of A;thatis,letA = I J where I = Ii and × i J = Jj, and assume that all rectangles Ak have the form Ii Jj.Then j × F F area (A)= (I) (J)= (Ii) (Jj)= (Ii) (Jj)= area (Ak) . i j i,j k X X X X

Now consider the general case when A is an arbitrary disjoint union of rectangles Ak. Let xi be the set of all X-coordinates of the endpoints of the rectangles Ak put in { } the increasing order, and yj be similarly the set of all the Y -coordinates, also in the increasing order. Consider{ the} rectangles

Bij =[xi,xi+1) [yj,yj+1). ×

Then the family Bij forms a regular tiling of A and, by the first case, { }i,j

area (A)= area (Bij) . i,j X

On the other hand, each Ak is a disjoint union of some of Bij, and, moreover, those Bij that are of Ak, form a regular tiling of Ak, which implies that

area(Ak)= area (Bij) .

Bij A X⊂ k

Combining the previous two lines and using the fact that each Bij is a of exactly one set Ak,weobtain

area (Ak)= area (Bij)= area (Bij)=area(A) .

k k Bij A i,j X X X⊂ k X 3. of domains in R3. The construction is similar to the area. Consider all boxes in R3, that is, the domains of the form A = I J K where I,J,K are intervals × × in R, and set vol (A)= (I) (J) (K) . Then volume is also an additive functional, which is proved in a similar way. Later on, we will give the detailed proof of a similar in an abstract setting. 4. Probability is another example of an additive functional. In probability theory, one considers a set Ω of elementary events, and certain subsets of Ω are called events (Ereignisse). For each event A Ω, one assigns the probability, which is denoted by ⊂ P (A) and which is a in [0, 1]. A reasonably defined probability must satisfy the additivity: if the event A is a disjoint union of a finite sequence of evens A1, ..., An then n P (A)= P (Ak) . k=1 X The fact that Ai and Aj are disjoint, when i = j, means that the events Ai and Aj cannot occur at the same time. 6 Thecommonfeatureofalltheaboveexampleisthefollowing.Wearegivenanon- M (which in the above example was R, R2, R3, Ω), a family S of its subsets (the

4 families of intervals, rectangles, boxes, events), and a functional μ : S R+ := [0, + ) (length, area, volume, probability) with the following property: if A → S is a disjoint∞ n ∈ union of a finite family Ak of sets from S then { }k=1 n

μ (A)= μ (Ak) . k=1 X A functional μ with this property is called a finitely additive measure. Hence, length, area, volume, probability are all finitely additive measures.

1.2 σ-additive measures As above, let M be an arbitrary non-empty set and S be a family of subsets of M.

Definition. A functional μ : S R+ is called a σ-additive measure if whenever a set → N A S is a disjoint union of an at most countable sequence Ak k=1 (where N is either finite∈ or N = )then { } ∞ N

μ (A)= μ (Ak) . k=1 X If N = then the above sum is understood as a . If this property holds only for finite values∞ of N then μ is a finitely additive measure. Clearly, a σ-additive measure is also finitely additive (but the converse is not true). At first, the difference between finitely additive and σ-additive measures might look in- significant, but the σ-additivity provides much more possibilities for applications and is one of the central issues of the measure theory. On the other hand, it is normally more difficult to prove σ-additivity.

Theorem 1.1 The length is a σ-additive measure on the family of all bounded intervals in R.

Before we prove this theorem, consider a simpler property. N Definition. A functional μ : S R+ is called σ-subadditive if whenever A Ak → ⊂ k=1 where A and all Ak areelementsofS and N is either finite or infinite, then S N

μ (A) μ (Ak) . ≤ k=1 X If this property holds only for finite values of N then μ is called finitely subadditive.

Lemma 1.2 The length is σ-subadditive.

Proof. Let I, Ik ∞ be intervals such that I ∞ Ik and let us prove that { }k=1 ⊂ k=1

∞ S (I) (Ik) ≤ k=1 X

5 (the case N< follows from the case N = byaddingtheemptyinterval).Letusfix ∞ ∞ some ε>0 and choose a bounded closed interval I0 I such that ⊂ (I) (I0)+ε. ≤ For any k,chooseanopen interval I0 Ik such that k ⊃ ε (I0 ) (Ik)+ . k ≤ 2k

Then the bounded closed interval I0 is covered by a sequence Ik0 of open intervals. By n{ } the Borel-Lebesgue lemma, there is a finite subfamily Ik0 that also covers I0.It j j=1 follows from the finite additivity of length that it is finitelyn subadditive,o that is,

(I0) (Ik0 j ), ≤ j X (see 7), which implies that

∞ (I0) (I0 ) . ≤ k k=1 X This yields ∞ ε ∞ l (I) ε + (Ik)+ =2ε + (Ik) . ≤ 2k k=1 k=1 X ³ ´ X Since ε>0 is arbitrary, letting ε 0 we finish the proof. → Proof of Theorem 1.1. We need to prove that if I = k∞=1 Ik then

∞ F (I)= (Ik) k=1 X ByLemma1.2,wehaveimmediately

∞ (I) (Ik) ≤ k=1 X sowearelefttoprovetheoppositeinequality.Foranyfixed n N,wehave ∈ n I Ik. ⊃ k=1 It follows from the finite additivity of lengthF that n

(I) (Ik) ≥ k=1 X (see Exercise 7). Letting n ,weobtain →∞ ∞ (I) (Ik) , ≥ k=1 X which finishes the proof.

6 1.3 An example of using probability theory Probability theory deals with random events and their . A classical example of a random event is a coin tossing. The of each tossing may be heads or tails: H or T. If the coin is fair then after N trials, H occurs approximately N/2 times, and so does T.Itisnaturaltobelieve that if N then #H 1 so that one says that H →∞ N → 2 occurs with probability 1/2 and writes P(H)=1/2.InthesamewayP(T)=1/2.Ifa coin is biased (=not fair) then it could be the case that P(H) is different from 1/2,say,

P(H)=p and P(T )=q := 1 p. (1.1) − We present here an amusing how random events H and T satisfying (1.1) can be used to prove the following purely algebraic :

(1 pn)m +(1 qm)n 1, (1.2) − − ≥ where 0

H T T H T

n ⎧ T T H H H ⎪ H H T H T ⎪ ⎨ T H T T T

⎪ m ⎪ ⎩⎪ Then, using the independence of the| events, we{z obtain: }

n p = P a given column contains only H n { } 1 p = P a given column contains at least one T − { } whence n m (1 p ) = P any column contains at least one T . (1.3) − { } Similarly, m n (1 q ) = P any row contains at least one H . (1.4) − { } Let us show that one of the events (1.3) and (1.4) will always take place which would imply that the sum of their probabilities is at least 1, and prove (1.2). Indeed, assume that the event (1.3) does not take place, that is, some column contains only H,say,as below: H H H H Then one easily sees that H occurs in any row so that the event (1.4) takes place, which wastobeproved.

7 Is this proof rigorous? It may leave impression of a rather empirical argument than a mathematical proof. The point is that we have used in this argument the existence of events with certain properties: firstly, H should have probability p where p a given number in (0, 1) and secondly, there must be enough independent events like that. Certainly, mathematics cannot rely on the existence of biased coins (or even fair coins!) so in order to make the above argument rigorous, one should have a mathematical notion of events and their probabilities, and prove the existence of the events with the required properties. This can and will be done using the measure theory.

1.4 Extension of measure from semi-ring to a ring Let M be any non-empty set. Definition. AfamilyS of subsets of M is called a semi-ring if

S contains • ∅ A, B S = A B S • ∈ ⇒ ∩ ∈ A, B S = A B is a disjoint union of a finite family sets from S. • ∈ ⇒ \

Example. The family of all intervals in R is a semi-ring. Indeed, the intersection of two intervals is an interval, and the difference of two intervals is either an interval or the disjoint union of two intervals. In the same way, the family of all intervals of the form [a, b) is a semi-ring. The family of all rectangles in R2 is also a semi-ring (see Exercise 6). Definition. AfamilyS if subsets of M is called a ring if

S contains • ∅ A, B S = A B S and A B S. • ∈ ⇒ ∪ ∈ \ ∈ It follows that also the intersection A B belongs to S because ∩ A B = B (B A) ∩ \ \ is also in S. Hence, a ring is closed under taking the set-theoretic operations , , .Also, it follows that a ring is a semi-ring. ∩ ∪ \

Definition. AringS is called a σ-ring if the union of any countable family Ak k∞=1 of sets from S is also in S. { }

It follows that the intersection A = k Ak is also in S. Indeed, let B be any of the sets Ak so that B A.Then ⊃ T

A = B (B A)=B (B Ak) S. \ \ \ \ ∈ µ k ¶ S Trivial examples of rings are S = ,S= ,M ,orS =2M — the family of all {∅} {∅ } subsets of M. On the other hand, the set of the intervals in R is not a ring.

8 Observe that if Sα is a family of rings (σ-rings) then the intersection α Sαis also a ring (resp., σ-ring),{ which} trivially follows from the definition. T Definition. Given a family S of subsets of M,denotebyR (S) the intersection of all rings containing S. Note that at least one ring containing S always exists: 2M . The ring R (S) is hence the minimal ring containing S.

Theorem 1.3 Let S be a semi-ring. (a) The minimal ring R (S) consists of all finite disjoint unions of sets from S. (b) If μ is a finitely additive measure on S then μ extends uniquely to a finitely additive measure on R (S). (c) If μ is a σ-additive measure on S then the extension of μ to R (S) is also σ-additive.

For example, if S is the semi-ring of all intervals on R then the minimal ring R (S) consists of finite disjoint unions of intervals. The notion of the length extends then to all such sets and becomes a measure there. Clearly, if a set A R (S) is a disjoint union of n n ∈ the intervals Ik k=1 then (A)= k=1 (Ik) . This formula follows from the fact that the extension{ of } to R (S) is a measure. On the other hand, this formula can be used to explicitly define the extension of . P Proof. (a) Let S0 be the family of sets that consists of all finite disjoint unions of sets from S,thatis, n S0 = Ak : Ak S, n N . ∈ ∈ ½k=1 ¾ We need to show that S0 = R (S).ItsuF ffices to prove that S0 is a ring. Indeed, if we know that already then S0 R (S) because the ring S0 contains S and R (S) is the minimal ⊃ ring containing S. On the other hand, S0 R (S) , because R (S) beingaringcontains ⊂ all finiteunionsofelementsofS and, in particular, all elements of S0. TheproofofthefactthatS0 is a ring will be split into steps. If A and B are elements n m of S then we can write A = Ak and B = Bl,whereAk,Bl S. k=1 l=1 ∈ Step 1. If A, B S0 and A, B are disjoint then A B S0. Indeed, A B is the ∈ F F t ∈ t disjoint union of all the sets Ak and Bl so that A B S0. t ∈ Step 2. If A, B S0 then A B S0. Indeed, we have ∈ ∩ ∈

A B = (Ak Bl) . ∩ k,l ∩ F Since Ak Bl S by the definition of a semi-ring, we conclude that A B S0. ∩ ∈ ∩ ∈ Step 3. If A, B S0 then A B S0. Indeed, since ∈ \ ∈

A B = (Ak B) , \ k \ F it suffices to show that Ak B S0 (and then use Step 1). Next, we have \ ∈

Ak B = Ak Bl = (Ak Bl) . \ \ l l \ F T By Step 2, it suffices to prove that Ak Bl S0. Indeed, since Ak,Bl S,bythedefinition \ ∈ ∈ of a semi-ring we conclude that Ak Bl is a finite disjoint union of elements of S,whence \ Ak Bl S0. \ ∈ 9 Step 4. If A, B S0 then A B S0. Indeed, we have ∈ ∪ ∈ A B =(A B) (B A) (A B) . ∪ \ \ ∩

By Steps 2 and 3, the sets A B, B A, AF B are inFS0, and by Step 1 we conclude that \ \ ∩ their disjoint union is also in S0. BySteps3and4,weconcludethatS0 is a ring, which finishes the proof. (b) Now let μ be a finitely additive measure on S. The extension to S0 = R (S) is n unique: if A R (S) and A = Ak where Ak S then necessarily ∈ k=1 ∈ F n μ (A)= μ (Ak) . (1.5) k=1 X Now let us prove the existence of μ on S0. For any set A S0 as above, let us define μ by (1.5) and prove that μ is indeed finitely additive. First of∈ all, let us show that μ (A) does n not depend on the choice of the splitting of A = k=1 Ak. Let us have another splitting A = Bl where Bl S.Then l ∈ F Ak = Ak Bl F l ∩ and since Ak Bl S and μ is finitely additiveF on S,weobtain ∩ ∈

μ (Ak)= μ (Ak Bl) . ∩ l X Summing up on k,weobtain

μ (Ak)= μ (Ak Bl) . ∩ k k l X X X Similarly, we have

μ (Bl)= (Ak Bl) , ∩ l l k X X X whence k μ (Ak)= k μ (Bk). Finally, let us prove the finite additivity of the measure (1.5). Let A, B be two disjoint P P sets from R (S) and assume that A = Ak and B = Bl,whereAk,Bl S.Then k l ∈ A B is the disjoint union of all the sets Ak and Bl whence by the previous argument t F F

μ (A B)= μ (Ak)+ μ (Bl)=μ (A)+μ (B) . t k l X X If there is a finite family of C1, ..., Cn R (S) then using the fact that the unions of sets from R (S) is also in R (S), we obtain∈ by induction in n that

μ Ck = μ (Ck) . µ k ¶ k F X (c) Let A = ∞ Bl where A, Bl R (S). We need to prove that l=1 ∈ F ∞ μ (A)= μ (Bl) . (1.6) l=1 X 10 Represent the given sets in the form A = k Ak and Bl = m Blm where the in k and m are finite and the sets Ak and Blm belong to S. Set also F F

Cklm = Ak Blm ∩ and observe that Cklm S.Also,wehave ∈

Ak = Ak A = Ak Blm = (Ak Blm)= Cklm ∩ ∩ ∩ l,m l,m l,m G G G and Blm = Blm A = Blm Ak = (Ak Blm)= Cklm. ∩ ∩ ∩ k k k G G G By the σ-additivity of μ on S,weobtain

μ (Ak)= μ (Cklm) l,m X and μ (Blm)= μ (Cklm) . k X It follows that μ (Ak)= μ (Cklm)= μ (Blm) . k k,l.m l,m X X X On the other hand, we have by definition of μ on R (S) that

μ (A)= μ (Ak) k X and μ (Bl)= μ (Blm) m X whence μ (Bl)= μ (Blm) . l l,m X X Combining the above lines, we obtain

μ (A)= μ (Ak)= μ (Blm)= μ (Bl) , k l,m l X X X which finishes the proof.

1.5 Extension of measure to a σ-algebra 1.5.1 σ-rings and σ-algebras Recall that a σ-ring is a ring that is closed under taking countable unions (and intersec- tions). For a σ-additive measure, a σ-ring would be a natural domain. Our main aim in this section is to extend a σ-additive measure μ from a ring R to a σ-ring.

11 So far we have only trivial examples of σ-rings: S = and S =2M .Implicit examples of σ-rings can be obtained using the following approach.{∅} Definition. For any family S of subsets of M,denotebyΣ (S) the intersection of all σ-rings containing S. At least one σ-ring containing S exists always: 2M . Clearly, the intersection of any family of σ-rings is again a σ-ring. Hence, Σ (S) is the minimal σ-ring containing S. Most of examples of rings that we have seen so far were not σ-rings. For example, the ring R that consists of all finite disjoint unions of intervals is not σ-ring because it does not contain the countable union of intervals

∞ (k, k +1/2) . k=1 S In general, it would be good to have an explicit of Σ (R), similarly to the description of R (S) in Theorem 1.3. One might that Σ (R) consists of disjoint countable unions of sets from R. However, this is not true. Let R be again the ring of finite disjoint unions of intervals. Consider the following construction of the 1 2 C: it is obtained from [0, 1] by removing the interval 3 , 3 then removing the middle third of the remaining intervals, etc: ¡ ¢ 1 2 1 2 7 8 1 2 7 8 19 20 25 26 C =[0, 1] , , , , , , , ... \ 3 3 \ 9 9 \ 9 9 \ 27 27 \ 27 27 \ 27 27 \ 27 27 \ µ ¶ µ ¶ µ ¶ µ ¶ µ ¶ µ ¶ µ ¶ Since C is obtained from intervals by using countable union of interval and ,wehave C Σ (R). However, the Cantor set is uncountable and contains no intervals\ expect for single∈ points (see Exercise 13). Hence, C cannot be represented as a countable union of intervals. The constructive procedure of obtaining the σ-ring Σ (R) from a ring R is as follows. Denote by Rσ the family of all countable unions of elements from R. Clearly, R Rσ ⊂ ⊂ Σ (R).ThendenotebyRσδ the family of all countable intersections from Rσ so that

R Rσ Rσδ Σ (R) . ⊂ ⊂ ⊂

Define similarly Rσδσ, Rσδσδ, etc. We obtain an increasing sequence of families of sets, all being subfamilies of Σ (R). One might hope that their union is Σ (R) but in general thisisnottrueeither.InordertoexhaustΣ (R) by this procedure, one has to apply it uncountable many times, using the transfinite induction. This is, however, beyond the scope of this course. As a consequence of this discussion, we see that a simple procedure used in Theorem 1.3 for extension of a measure from a semi-ring to a ring, is not going to work in the present setting and, hence, the procedure of extension is more complicated. At the first step of extension a measure to a σ-ring, we assume that the full set M is also an element of R and, hence, μ (M) < . Consider the following example: let M be an ∞ interval in R, R consists of all bounded subintervals of M and μ isthelengthofaninterval. If M is bounded, then M R. However, if M is unbounded (for example, M = R)then M does belong to R while∈M clearly belongs to Σ (R) since M can be represented as a countable union of bounded intervals. It is also clear that for any reasonable extension of length, μ (M) in this case must be . ∞ 12 The assumption M R and, hence, μ (M) < ,simplifies the situation, while the opposite case will be treated∈ later on. ∞ Definition. A ring containing M is called an algebra.Aσ-ring in M that contains M is called a σ-algebra.

1.5.2 Henceforth, assume that R is an algebra. It follows from the definition of an algebra that if A R then Ac := M A R. Also, let μ be a σ-additive measure on R.Foranyset ∈ \ ∈ A M,define its outer measure μ∗ (A) by ⊂

∞ ∞ μ∗ (A)=inf μ (Ak): Ak R and A Al . (1.7) ∈ ⊂ (k=1 k=1 ) X S In other words, we consider all coverings Ak ∞ of A by a sequence of sets from the { }k=1 algebra R and define μ∗ (A) as the infimumofthesumofallμ (Ak), taken over all such coverings. It follows from (1.7) that μ∗ is monotone in the following sense: if A B then ⊂ μ∗ (A) μ∗ (B). Indeed, any sequence Ak R that covers B, will cover also A,which implies≤ that the infimum in (1.7) in the{ case} ⊂ of the set A is taken over a larger family, than in the case of B, whence μ∗ (A) μ∗ (B) follows. ≤ Let us emphasize that μ∗ (A) is definedonallsubsetsA of M,butμ∗ is not necessarily M ameasureon2 . Eventually, we will construct a σ-algebra containing R where μ∗ will be a σ-additive measure. This will be preceded by a sequence of Lemmas.

Lemma 1.4 For any set A M, μ∗ (A) < . If in addition A R then μ∗ (A)=μ (A) . ⊂ ∞ ∈ Proof. Note that R and μ ( )=0because ∅∈ ∅ μ ( )=μ ( )=μ ( )+μ ( ) . ∅ ∅ t ∅ ∅ ∅ For any set A M,consideracovering Ak = M, , , ... of A.SinceM, R,it follows from (1.7)⊂ that { } { ∅ ∅ } ∅∈

μ∗ (A) μ (M)+μ ( )+μ ( )+... = μ (M) < . ≤ ∅ ∅ ∞ Assume now A R. Considering a covering Ak = A, , , ... and using that A, R, we obtain in the∈ same way that { } { ∅ ∅ } ∅∈

μ∗ (A) μ (A)+μ ( )+μ ( )+... = μ (A) . (1.8) ≤ ∅ ∅ On the other hand, for any sequence Ak asin(1.7)wehavebytheσ-subadditivity of μ (see Exercise 6) that { } ∞ μ (A) μ (Ak) . ≤ k=1 X Taking inf over all such Ak ,weobtain { } μ (A) μ∗ (A) , ≤ whichtogetherwith(1.8)yieldsμ∗ (A)=μ (A).

13 M Lemma 1.5 The outer measure μ∗ is σ-subadditive on 2 . Proof. We need to prove that if

∞ A Ak (1.9) ⊂ k=1 S where A and Ak are subsets of M then

∞ μ∗ (A) μ∗ (Ak) . (1.10) ≤ k=1 X By the definition of μ∗, for any set Ak and for any ε>0 there exists a sequence Akn n∞=1 of sets from R such that { } ∞ Ak Akn (1.11) ⊂ n=1 and S ∞ ε μ (A ) μ (A ) . ∗ k kn 2k ≥ n=1 − X Adding up these inequalities for all k,weobtain

∞ ∞ μ∗ (Ak) μ (Akn) ε. (1.12) ≥ − k=1 k,n=1 X X On the other hand, by (1.9) and (1.11), we obtain that

∞ A Akn. ⊂ k,n=1 S Since Akn R, we obtain by (1.7) ∈ ∞ μ∗ (A) μ (Akn) . ≤ k,n=1 X Comparing with (1.12), we obtain

∞ μ∗ (A) μ∗ (Ak)+ε. ≤ k=1 X Since this inequality is true for any ε>0, it follows that it is also true for ε =0,which finishes the proof.

1.5.3 Symmetric difference

Definition. The symmetric difference of two sets A, B M is the set ⊂ A B := (A B) (B A)=(A B) (A B) . M \ ∪ \ ∪ \ ∩

Clearly, A M B = B M A.Also,x A M B if and only if x belongs to exactly one of the sets A, B, that is, either x A and∈x/B or x/A and x B. ∈ ∈ ∈ ∈ 14 Lemma 1.6 (a) For arbitrary sets A1,A2,B1,B2 M, ⊂

(A1 A2) (B1 B2) (A1 B1) (A2 B2) , (1.13) ◦ M ◦ ⊂ M ∪ M where denotes any of the operations , , . ◦ ∪ ∩ \ (b) If μ∗ is an outer measure on M then

μ∗ (A) μ∗ (B) μ∗ (A B) , (1.14) | − | ≤ M for arbitrary sets A, B M. ⊂ Proof. (a) Set C =(A1 B1) (A2 B2) M ∪ M and verify (1.13) in the case when is ,thatis, ◦ ∪

(A1 A2) (B1 B2) C. (1.15) ∪ M ∪ ⊂

By definition, x (A1 A2) (B1 B2) if and only if x A1 A2 and x/B1 B2 or ∈ ∪ M ∪ ∈ ∪ ∈ ∪ conversely x B1 B2 and x/A1 A2. Since these two cases are symmetric, it suffices ∈ ∪ ∈ ∪ to consider the first case. Then either x A1 or x A2 while x/B1 and x/B2. If ∈ ∈ ∈ ∈ x A1 then ∈ x A1 B1 A1 B1 C ∈ \ ⊂ M ⊂ that is, x C. In the same way one treats the case x A2, which proves (1.15). For the∈ case when is ,weneedtoprove ∈ ◦ ∩

(A1 A2) (B1 B2) C. (1.16) ∩ M ∩ ⊂

Observe that x (A1 A2) (B1 B2) means that x A1 A2 and x/B1 B2 ∈ ∩ M ∩ ∈ ∩ ∈ ∩ or conversely. Again, it suffices to consider the first case. Then x A1 and x A2 ∈ ∈ while either x/B1 or x/B2.Ifx/B1 then x A1 B1 C,andifx/B2 then ∈ ∈ ∈ ∈ \ ⊂ ∈ x A2 B2 C. ∈For\ the case⊂ when is ,weneedtoprove ◦ \

(A1 A2) (B1 B2) C. (1.17) \ M \ ⊂

Observe that x (A1 A2) (B1 B2) means that x A1 A2 and x/B1 B2 or ∈ \ M \ ∈ \ ∈ \ conversely. Consider the first case, when x A1, x/A2 and either x/B1 or x B2.If ∈ ∈ ∈ ∈ x/B1 then combining with x A1 we obtain x A1 B1 C. If x B2 then combining ∈ ∈ ∈ \ ⊂ ∈ with x/A2,weobtain ∈ x B2 A2 A2 B2 C, ∈ \ ⊂ M ⊂ which finishes the proof. (b) Note that A B (A B) B (A B) ⊂ ∪ \ ⊂ ∪ M whence by the subadditivity of μ∗ (Lemma 1.5)

μ∗ (A) μ∗ (B)+μ∗ (A B) , ≤ M whence μ∗ (A) μ∗ (B) μ∗ (A B) . − ≤ M 15 Switching A and B, we obtain a similar estimate

μ∗ (B) μ∗ (A) μ∗ (A B) , − ≤ M whence (1.14) follows.

Remark Theonlypropertyofμ∗ used here was the finite subadditivity. So, inequality (1.14) holds for any finitely subadditive functional.

1.5.4 Measurable sets We continue considering the case when R is an algebra on M and μ is a σ-additive measure on R. Recall that the outer measure μ∗ is defined by (1.7). Definition. AsetA M is called measurable (with respect to the algebra R and the measure μ) if, for any ⊂ε>0,thereexistsB R such that ∈ μ∗ (A M B) <ε. (1.18)

In other words, set A is measurable if it can be approximated by sets from R arbitrarily closely, in the sense of (1.18). Now we can state one of the main theorems in this course.

Theorem 1.7 (Carathéodory’s extension theorem) Let R be an algebra on a set M and μ be a σ-additive measure on R.Denoteby the family of all measurable subsets of M. Then the following is true. M (a) is a σ-algebra containing R. M (b) The restriction of μ∗ on is a σ-additive measure (that extends measure μ from R to ). M (c)MIf μ is a σ-additive measure defined on a σ-algebra Σ such that

R Σ , e ⊂ ⊂ M then μ = μ∗ on Σ.

Hence,e parts (a) and (b) of this theorem ensure that a σ-additive measure μ can be extended from the algebra R to the σ-algebra of all measurable sets .Moreover, applying (c) with Σ = , we see that this extension is unique. M Since the minimal σM-algebra Σ (R) is contained in , it follows that measure μ can be extended from R to Σ (R). Applying (c) with Σ = ΣM(R), we obtain that this extension is also unique. Proof. We split the proof into a series of claims. Claim 1 The family of all measurable sets is an algebra containing R. If A R then A isM measurable because ∈

μ∗ (A A)=μ∗ ( )=μ ( )=0 M ∅ ∅ where μ∗ ( )=μ ( ) by Lemma 1.4. Hence, R . In particular, also the entire M is a measurable∅ ∅ set. ⊂ M

16 In order to verify that is an algebra, it suffices to show that if A1,A2 then M ∈ M also A1 A2 and A1 A2 are measurable. Let us prove this for A = A1 A2.Bydefinition, ∪ \ ∪ for any ε>0 there are sets B1,B2 R such that ∈ μ∗ (A1 M B1) <ε and μ∗ (A2 M B2) <ε. (1.19)

Setting B = B1 B2 R, we obtain by Lemma 1.6, ∪ ∈

A B (A1 B1) (A2 B2) M ⊂ M ∪ M and by the subadditivity of μ∗ (Lemma 1.5)

μ∗ (A B) μ∗ (A1 B1)+μ∗ (A2 B2) < 2ε. (1.20) M ≤ M M Since ε>0 is arbitrary and B R we obtain that A satisfies the definition of a measurable set. ∈ The fact that A1 A2 is proved in the same way. \ ∈ M Claim 2 μ∗ is σ-additive on . M Since is an algebra and μ∗ is σ-subadditive by Lemma 1.5, it suffices to prove that M μ∗ is finitely additive on (see Exercise 9). M Let us prove that, for any two disjoint measurable sets A1 and A2,wehave

μ∗ (A)=μ∗ (A1)+μ∗ (A2) where A = A1 A2. By Lemma 1.5, we have the inequality

F μ∗ (A) μ∗ (A1)+μ∗ (A2) ≤ sothatwearelefttoprovetheoppositeinequality

μ∗ (A) μ∗ (A1)+μ∗ (A2) . ≥

As in the previous step, for any ε>0 there are sets B1,B2 R such that (1.19) holds. ∈ Set B = B1 B2 R and apply Lemma 1.6, which says that ∪ ∈

μ∗ (A) μ∗ (B) μ∗ (A B) < 2ε, | − | ≤ M where in the last inequality we have used (1.20). In particular, we have

μ∗ (A) μ∗ (B) 2ε. (1.21) ≥ − On the other hand, since B R,wehavebyLemma1.4andtheadditivityofμ on R, that ∈ μ∗ (B)=μ (B)=μ (B1 B2)=μ (B1)+μ (B2) μ (B1 B2) . (1.22) ∪ − ∩ Next, we will estimate here μ (Bi) from below via μ∗ (Ai),andshowthatμ (B1 B2) is small enough. Indeed, using (1.19) and Lemma 1.6, we obtain, for any i =1, 2, ∩

μ∗ (Ai) μ∗ (Bi) μ∗ (Ai Bi) <ε | − | ≤ M whence μ (B1) μ∗ (A1) ε and μ (B2) μ∗ (A2) ε. (1.23) ≥ − ≥ − 17 On the other hand, by Lemma 1.6 and using A1 A2 = ,weobtain ∩ ∅

B1 B2 =(A1 A2) (B1 B2) (A1 B1) (A2 B2) ∩ ∩ M ∩ ⊂ M ∪ M whence by (1.20)

μ (B1 B2)=μ∗ (B1 B2) μ∗ (A1 B1)+μ∗ (A2 B2) < 2ε. (1.24) ∩ ∩ ≤ M M It follows from (1.21)—(1.24) that

μ∗ (A) (μ∗ (A1) ε)+(μ∗ (A2) ε) 2ε 2ε = μ∗ (A1)+μ∗ (A2) 6ε. ≥ − − − − − Letting ε 0,wefinish the proof. → Claim 3 is σ-algebra. M Assume that An n∞=1 is a sequence of measurable sets and prove that A := n∞=1 An is also measurable.{ Note} that S ∞ A = A1 (A2 A1) (A3 A2 A1) ... = An, ∪ \ ∪ \ \ n=1 F where e An := An An 1 ... A1 \ − \ \ ∈ M (hereweusethefactthat is an algebra — see Claim 1). Therefore, renaming An to Me An, we see that it suffices to treat the case of a disjoint union: given A = n∞=1 An where An ,provethatA . e Note∈ M that, for any fi∈xedMN, F

N N μ∗ (A) μ∗ An = μ∗ (An) , ≥ µn=1 ¶ n=1 F X where we have used the monotonicity of μ∗ and the additivity of μ∗ on (Claim 2). This implies by N that M →∞ ∞ μ∗ (An) μ∗ (A) < n=1 ≤ ∞ X

(cf. Lemma 1.4), so that the series n∞=1 μ∗ (An) converges. In particular, for any ε>0, there is N N such that ∈ P∞ μ∗ (An) <ε. n=N+1 X Setting N ∞ A0 = An and A00 = An, n=1 n=N+1 F F we obtain by the σ-subadditivity of μ∗

∞ μ∗ (A00) μ∗ (An) <ε. ≤ n=N+1 X 18 By Claim 1, the set A0 is measurable as a finite union of measurable sets. Hence, there is B R such that ∈ μ∗ (A0 M B) <ε.

Since A = A0 A00,wehave ∪ A B (A0 B) A00. (1.25) M ⊂ M ∪ Indeed, x A B means that x A and x/B or x/A and x B.Inthefirst case, ∈ M ∈ ∈ ∈ ∈ we have x A0 or x A00.Ifx A0 then together with x/B it gives ∈ ∈ ∈ ∈ x A0 B (A0 B) A00. ∈ M ⊂ M ∪ If x A00 then the inclusion is obvious. In the second case, we have x/A0 which together ∈ ∈ with x B implies x A0 M B,whichfinishes the proof of (1.25). It follows∈ from (1.25)∈ that

μ∗ (A B) μ∗ (A0 B)+μ∗ (A00) < 2ε. M ≤ M Since ε>0 is arbitrary and B R,weconcludethatA . ∈ ∈ M Claim 4 Let Σ be a σ-algebra such that R Σ ⊂ ⊂ M and let μ be a σ-additive measure on Σ such that μ = μ on R.Thenμ = μ∗ on Σ. We need to prove that μ (A)=μ∗ (A) for any A Σ By the definition of μ∗,wehave ∈ e e e ∞ ∞ μ∗ (A)=infe μ (An):An R and A An . (n=1 ∈ ⊂ n=1 ) X S Applying the σ-subadditivity of μ (see Exercise 7), we obtain

∞ ∞ μ (A)e μ (An)= μ (An) . ≤ n=1 n=1 X X Taking inf over all such sequencese An ,weobtaine { } μ (A) μ∗ (A) . (1.26) ≤ On the other hand, since A is measurable, for any ε>0 there is B R such that e ∈ μ∗ (A M B) <ε. (1.27) By Lemma 1.6(b), μ∗ (A) μ∗ (B) μ∗ (A B) <ε. (1.28) | − | ≤ M Note that the proof of Lemma 1.6(b) uses only subadditivity of μ∗.Sinceμ is also subadditive, we obtain that

μ (A) μ (B) μ (A B) μ∗ (A B) <ε, e | − | ≤ M ≤ M where we have also used (1.26) and (1.27) . Combining with (1.28) and using μ (B)= e e e μ (B)=μ∗ (B),weobtain

μ (A) μ∗ (A) μ∗ (A) μ∗ (B) + μ (A) μ (B) < 2ε. e | − | ≤ | − | | − | Letting ε 0,weconcludethatμ (A)=μ∗ (A) ,which finishes the proof. → e e e e 19 1.6 σ-finite measures Recall Theorem 1.7: any σ-additivemeasureonanalgebraR can be uniquely extended to the σ-algebra of all measurable sets (and to the minimal σ-algebra containing R). Theorem 1.7 has a certain restriction for applications: it requires that the full set M belongs to the initial ring R (so that R is an algebra). For example, if M = R and R is the ring of the bounded intervals then this condition is not satisfied. Moreover, for any reasonable extension of the notion of length, the length of the entire line R must be . This suggest that we should extended the notion of a measure to include also the values∞ of . ∞So far, we have not yet defined the term “measure” itself using “finitely additive measures” and “σ-additive measures”. Now we definethenotionofameasureasfollows. Definition. Let M be a non-empty set and S be a family of subsets of M. A functional N μ : S [0, + ] is called a measure if, for all sets A, Ak S such that A = k=1 Ak (where→N is either∞ finite or infinite), we have ∈ F N

μ (A)= μ (Ak) . k=1 X Hence, a measure is always σ-additive. The difference with the previous definition of “σ-additive measure” is that we allow for measure to take value + .Inthesummation, we use the convention that (+ )+x =+ for any x [0, + ]. ∞ ∞ ∞ ∈ ∞ Example. Let S be the family of all intervals on R (including the unbounded intervals) and define the length of any interval I R with the endpoints a, b by (I)= b a where a and b may take values from [ ⊂, + ]. Clearly, for any unbounded interval| − I| we have (I)=+ .Weclaimthat is−∞ a measure∞ on S in the above sense. ∞ N Indeed, let I = k=1 Ik where I,Ik S and N is either finite or infinite. We need to prove that ∈ F N (I)= (Ik) (1.29) k=1 X If I is bounded then all intervals Ik are bounded, and (1.29) follows from Theorem 1.1. If one of the intervals Ik is unbounded then also I is unbounded, and (1.29) is true because the both sides are + . Finally, consider the case when I is unbounded while all Ik ∞ are bounded. Choose any bounded closed interval I0 I.SinceI0 k∞=1 Ik,bythe σ-subadditivity of the length on the ring of bounded intervals⊂ (Lemma⊂ 1.2), we obtain S N

(I0) (Ik) . ≤ k=1 X

BythechooseofI0, the length (I0) can be made arbitrarily large, whence it follows that

N

(Ik)=+ = (I) . ∞ k=1 X

20 Of course, allowing the value + for measures, we receive trivial examples of measures as follows: just set μ (A)=+ for∞ any A M. The following definition is to ensure the existence of plenty of sets with∞finite measures.⊂ Definition. Ameasureμ on a set S is called finite if M S and μ (M) < .Ameasure ∈ ∞ μ is called σ-finite if there is a sequence Bk ∞ of sets from S such that μ (Bk) < { }k=1 ∞ and M = k∞=1 Bk. Of course, any finite measure is σ-finite. The measure in the setting of Theorem 1.7 S is finite. The length defined on the family of all intervals in R,isσ-finite but not finite. Let R bearinginasetM. Our next goal is to extend a σ-finite measure μ from R to a σ-algebra. For any B R,define the family RB of subsets of B by ∈ B RB = R 2 = A B : A R . ∩ { ⊂ ∈ }

Observe that RB is an algebra in B (indeed, RB is a ring as the intersection of two rings, and also B RB). If μ (B) < then μ is a finite measure on RB and by Theorem 1.7, ∈ ∞ μ extends uniquely to a finite measure μB on the σ-algebra B of measurable subsets of B. Our purpose now is to construct a σ-algebra of measurableM sets in M and extend measure μ to a measure on . M M Let Bk ∞ be a sequence of sets from R such that M = ∞ Bk and μ (Bk) < { }k=1 k=1 ∞ (such sequence exists by the σ-finiteness of μ). Replacing the sets Bk by the differences S

B1,B2 B1,B3 B1 B2, ..., \ \ \ we can assume that all Bk are disjoint, that is, M = k∞=1 Bk. In what follows, we fix such a sequence Bk . { } F

Definition. AsetA M is called measurable if A Bk Bk for any k.Forany measurable set A, set ∈ ∩ ∈ M

μ (A)= μ (A Bk) . (1.30) M Bk ∩ k X

Theorem 1.8 Let μ be a σ-finite measure on a ring R and be the family of all mea- surable sets defined above. Then the following is true. M (a) is a σ-algebra containing R. M (b) The functional μM defined by (1.30) is a measure on that extends measure μ on R. M (c) If μ is a measure defined on a σ-algebra Σ such that

R Σ , e ⊂ ⊂ M then μ = μM on Σ.

Proof. (a) If A R then A Bk R because Bk R.Therefore,A Bk RB e ∈ ∩ ∈ ∈ ∩ ∈ k whence A Bk Bk and A . Hence, R . ∩ ∈ M N∈ M ⊂ M LetusshowthatifA = n=1 An where An then also A (where N can be ). Indeed, we have ∈ M ∈ M ∞ S A Bk = (An Bk) Bk ∩ n ∩ ∈ M S 21 because An Bk B and B is a σ-algebra. Therefore, A .Inthesameway, ∩ ∈ M k M k ∈ M if A0,A00 then the difference A = A0 A00 belongs to because ∈ M \ M A Bk =(A0 Bk) (A00 Bk) B . ∩ ∩ \ ∩ ∈ M k Finally, M because ∈ M M Bk = Bk B . ∩ ∈ M k Hence, satisfies the definition of a σ-algebra. M (b) If A R then A Bk RB whence μ (A Bk)=μ (A Bk) . Since ∈ ∩ ∈ k Bk ∩ ∩ A = (A Bk) , k ∩ and μ is a measure on R,weobtain F

μ (A)= μ (A Bk)= μ (A Bk) . ∩ Bk ∩ k k X X Comparing with (1.30), we obtain μM (A)=μ (A). Hence, μM on R coincides with μ. N Let us show that μM is a measure. Let A = n=1 An where An and N is either finite or infinite.Weneedtoprovethat ∈ M F μM (A)= μM (An) . n X Indeed, we have

μ (An)= μ (An Bk) M Bk ∩ n n k X X X = μ (An Bk) Bk ∩ k n X X = μ (An Bk) Bk ∩ k µ n ¶ X F = μ (A Bk) Bk ∩ k X = μM (A) , wherewehaveusedthefactthat

A Bk = (An Bk) ∩ n ∩ F and the σ-additivity of measure μBk . (c) Let μ be another measure definedonaσ-algebra Σ such that R Σ and ⊂ ⊂ M μ = μ on R. Let us prove that μ (A)=μM (A) for any A Σ. Observing that μ and μ ∈ Bk coincide on RB , we obtain by Theorem 1.7 that μ and μ coincide on ΣB := Σ 2 . e k Bk k ∩ Then for any A Σ,wehaveA = (A Bk) and A Bk ΣB whence e ∈ e k ∩ ∩ ∈ k e e μ (A)= μ (A BFk)= μB (A Bk)= μM (A) , ∩ k ∩ k k k X X X which finishes thee proof. e

Remark. The definition of a measurable set uses the decomposition M = k Bk where Bk are sets from R with finite measure. It seems to depend on the choice of Bk but in fact it does not. Here is an equivalent definition: a set A M is called measurableF if ⊂ A B RB for any B R. For the proof see Exercise 19. ∩ ∈ ∈

22 1.7 Null sets Let R bearingonasetM and μ be a finite measure on R (that is, μ is a σ-additive functional on R and μ (M) < ). Let μ∗ betheoutermeasureasdefined by (1.7). ∞ Definition. AsetA M is called a (or a set of measure 0)ifμ∗ (A)=0. The family of all null sets is⊂ denoted by . N Using the definition (1.7) of the outer measure, one can restate the condition μ∗ (A)=0 as follows: for any ε>0 there exists a sequence Ak ∞ R such that { }k=1 ⊂

A Ak and μ (Ak) <ε. ⊂ k k S X If the ring R is the minimal ring containing a semi-ring S (that is, R = R (S))thenthe sequence Ak in the above definition can be taken from S. Indeed, if Ak R then, by { } ⊂ Theorem 1.3, Ak is a disjoint finite union of some sets Akn from S where n varies in a finite set. It follows that { } μ (Ak)= μ (Akn) n X and μ (Ak)= μ (Akn) . k k n X X X Since the double sequence Akn covers A,thesequence Ak R can be replaced by { } { } ⊂ the double sequence Akn S. { } ⊂ Example. Let S be the semi-ring of intervals in R and μ be the length. It is easy to see that a single point set x is a null set. Indeed, for any ε>0 there is an interval I { } covering x andsuchthat (I) <ε. Moreover, we claim that any countable set A R is ε ⊂ a null set. Indeed, if A = xk ∞ then cover xk by an interval Ik of length < k so that { }k=1 2 the sequence Ik of intervals covers A and (Ik) <ε. Hence, A is a null set. For { } k example, the set Q of all rationals is a null set. The Cantor set from Exercise 13 is an example of an uncountable set of measure 0. P In the same way, any countable subset of R2 is a null set (with respect to the area).

Lemma 1.9 (a) Anysubsetofanullsetisanullset. (b) The family of all null sets is a σ-ring. (c) Any null setN is measurable, that is, . N ⊂ M

Proof. (a) It follows from the monotonicity of μ∗ that B A implies μ∗ (B) μ∗ (A). ⊂ ≤ Hence, if μ∗ (A)=0then also μ∗ (B)=0. N (b) If A, B then A B by part (a), because A B A.LetA = An ∈ N \ ∈ N \ ⊂ n=1 where An and N is finite or infinite. Then, by the σ-subadditivity of μ∗ (Lemma 1.5), we have∈ N S

μ∗ (A) μ∗ (An)=0 ≤ n X whence A . (c) We∈ needN to show that if A then, for any ε>0 there is B R such that ∈ N ∈ μ∗ (A M B) <ε. Indeed, just take B = R.Thenμ∗ (A M B)=μ∗ (A)=0<ε,which wastobeproved. ∅∈

23 Let now μ be a σ-finite measure on a ring R.ThenwehaveM = k∞=1 Bk where Bk R and μ (Bk) < . ∈ ∞ F Definition. Inthecaseofaσ-finite measure, a set A M is called a null set if A Bk ⊂ ∩ is a null set in each Bk. Lemma 1.9 easily extends to σ-finite measures: one has just to apply the corresponding part of Lemma 1.9 to each Bk. Let Σ = Σ (R) be the minimal σ-algebra containing a ring R so that R Σ . ⊂ ⊂ M The relation between two σ-algebras Σ and is given by the following theorem. M Theorem 1.10 Let μ be a σ-finite measure on a ring R.ThenA if and only if ∈ M there is B Σ such that A B ,thatis,μ∗ (A B)=0. ∈ M ∈ N M This can be equivalently stated as follows:

— A if and only if there is B Σ and N such that A = B N. ∈ M ∈ ∈ N M — A if and only if there is N such that A N Σ. ∈ M ∈ N M ∈ Indeed, Theorem 1.10 says that for any A there is B Σ and N such that ∈ M ∈ ∈ N A M B = N. By the properties of the symmetric difference, the latter is equivalent to each of the following identities: A = B M N and B = A M N (see Exercise 15), which settle the claim. Proof of Theorem 1.10. We prove the statement in the case when measure μ is finite. The case of a σ-finite measure follows then straightforwardly. By the definition of a measurable set, for any n N there is Bn R such that ∈ ∈ n μ∗ (A M Bn) < 2− .

The set B Σ, which is to be found, will be constructed as a sort of of Bn as n . For∈ that, set →∞ ∞ Cn = Bn Bn+1 ... = Bk ∪ ∪ k=n so that Cn ∞ is a decreasing sequence, and define BSby { }n=1 ∞ B = Cn. n=1 T Clearly, B Σ since B is obtain from Bn R by countable unions and intersections. Let us show that∈ ∈ μ∗ (A M B)=0. We have ∞ ∞ A M Cn = A M Bk (A M Bk) k=n ⊂ k=n

(see Exercise 15) whence by the σ-subadditivityS ofSμ∗ (Lemma 1.5)

∞ ∞ k 1 n μ∗ (A Cn) μ∗ (A Bk) < 2− =2− . M ≤ M k=n k=n X X 24 Since the sequence Cn ∞ is decreasing, the intersection of all sets Cn does not change { }n=1 if we omit finitely many terms. Hence, we have for any N N ∈ ∞ B = Cn. n=N Hence, using again Exercise 15, we have T

∞ ∞ A M B = A M Cn (A M Cn) n=N ⊂ n=N whence T S ∞ ∞ 1 n 2 N μ∗ (A B) μ∗ (A Cn) 2 − =2− . M ≤ M ≤ n=N n=N X X Since N is arbitrary here, letting N we obtain μ∗ (A M B)=0. Conversely,letusshowthatifA →∞B is null set for some B Σ then A is measurable. M ∈ Indeed, the set N = A M B is a null set and, by Lemma 1.9, is measurable. Since A = B M N and both B and N are measurable, we conclude that A is also measurable.

1.8 Lebesgue measure in Rn Now we apply the above theory of extension of measures to Rn. For that, we need the notion of the .

1.8.1 Product measure

Let M1 and M2 be two non-empty sets, and S1, S2 be families of subsets of M1 and M2, respectively. Consider the set

M = M1 M2 := (x, y):x M1,y M2 × { ∈ ∈ } and the family S of subsets of M,defined

S = S1 S2 := A B : A S1,B S2 . × { × ∈ ∈ }

2 For example, if M1 = M2 = R then M = R .IfS1 and S2 are families of all intervals in R then S consists of all rectangles in R2. Claim. If S1 and S2 are semi-rings then S is also a semi-ring (see Exercise 6). In the sequel, let S1 and S2 be semi-rings. Let μ1 be a finitely additive measure on the semi-ring S1 and μ2 be a finitely additive measure on the semi-ring S2.Define the product measure μ = μ μ on S as follows: if 1 × 2 A S1 and B S2 then set ∈ ∈ μ (A B)=μ (A) μ (B) . × 1 2

Claim. μ is also a finitely additive measure on S (see By Exercise 20).

Remark. It is true that if μ1 and μ2 are σ-additive then μ is σ-additive as well, but the proof in the full generality is rather hard and will be given later in the course. By induction, one defines the product of n semi-rings and the product of n measures for any n N. ∈ 25 1.8.2 Construction of measure in Rn.

Let S1 be the semi-ring of all bounded intervals in R and define Sn (being the family of n subsets of R = R ... R) as the product of n copies of S1: × ×

Sn = S1 S1 ... S1. × × ×

That is, any set A Sn has the form ∈

A = I1 I2 ... In (1.31) × × × where Ik are bounded intervals in R. In other words, Sn consists of bounded boxes. Clearly, Sn is a semi-ring as the product of semi-rings. Define the product measure λn on Sn by λn = ... × × where is the length on S1. That is, for the set A from (1.31),

λn (A)= (I1) .... (In) .

n Then λn is a finitely additive measure on the semi-ring Sn in R .

Lemma 1.11 λn is a σ-additive measure on Sn.

Proof. We use the same approach as in the proof of σ-additivity of (Theorem 1.1), whichwasbasedonthefinite additivity of and on the compactness of a closed bounded interval. Since λn is finitely additive, in order to prove that it is also σ-additive, it suffices to prove that λn is regular in the following sense: for any A Sn and any ε>0,there ∈ exist a closed box K Sn andanopenboxU Sn such that ∈ ∈

K A U and λn (U) λn (K)+ε ⊂ ⊂ ≤ (see Exercise 17). Indeed, if A is as in (1.31) then, for any δ>0 and for any index j, thereisaclosedintervalKj andanopenintervalUj such that

Kj Ij Uj and (Uj) (Kj)+δ. ⊂ ⊂ ≤

Set K = K1 ... Kn and U = U1 ... Un so that K is closed, U is open, and × × × × K A U. ⊂ ⊂ It also follows that

λn (U)= (U1) ... (Un) ( (K1)+δ) ... ( (Kn)+δ) <(K1) ... (Kn)+ε = λn (K)+ε, ≤ provided δ = δ (ε) is chosen small enough. Hence, λn is regular, which finishes the proof.

Note that measure λn is also σ-finite. Indeed, let us cover R by a sequence Ik k∞=1 of { } n bounded intervals. Then all possible products Ik1 Ik2 ... Ikn forms a covering of R × × × by a sequence of bounded boxes. Hence, λn is a σ-finite measure on Sn. By Theorem 1.3, λn can be uniquely extended as a σ-additive measure to the minimal ring Rn = R (Sn),whichconsistsoffinite union of bounded boxes. Denote this extension

26 also by λn.Thenλn is a σ-finite measure on Rn. By Theorem 1.8, λn extends uniquely to n the σ-algebra n of all measurable sets in R . Denote this extension also by λn. Hence, M λn is a measure on the σ-algebra n that contains Rn. M n Definition. Measure λn on n is called the Lebesgue measure in R . The measurable M sets in Rn are also called Lebesgue measurable. 2 3 In particular, the measure λ2 in R is called area and the measure λ3 in R is called volume.Also,λn for any n 1 is frequently referred to as an n-dimensional volume. ≥ Definition. The minimal σ-algebra Σ (Rn) containing Rn is called the Borel σ-algebra n and is denoted by n (or by (R )). The sets from n are called the Borel sets (or Borel measurable sets). B B B

Hence, we have the inclusions Sn Rn n n. ⊂ ⊂ B ⊂ M Example. Let us show that any open subset U of Rn is a . Indeed, for any x U ∈ thereisanopenboxBx such that x Bx U.Clearly,Bx Rn and U = x U Bx. However, this does not immediately imply∈ that⊂ U is Borel since∈ the union is uncountable.∈ S To fix this, observe that Bx can be chosen so that all the coordinates of Bx (that is, all the endpoints of the intervals forming Bx) are rationals. The family of all possible boxes in Rn with rational coordinates is countable. For every such box, we can see if it occurs in the family Bx x U or not. Taking only those boxes that occur in this family, we obtain an at most{ countable} ∈ family of boxes, whose union is also U. It follows that U is a Borel set as a countable union of Borel sets. Since closed sets are complements of open sets, it follows that also closed sets are Borel sets. Then we obtain that countable intersections of open sets and countable unions of closed sets are also Borel, etc. Any set, that can be obtained from open and closed sets by applying countably many unions, intersections, , is again a Borel set. Example. Any non-empty U has a positive Lebesgue measure, because U contains a non-empty box B and λ (B) > 0. As an example, let us show that the n hyperplane A = xn =0 of R has measure 0.Itsuffices to prove that the intersection { } A B is a null set for any bounded box B in Rn. Indeed, set ∩

B0 = A B = x B : xn =0 ∩ { ∈ } n 1 and note that B0 is a bounded box in R − .Chooseε>0 and set

Bε = B0 ( ε, ε)= x B : xn <ε × − { ∈ | | } n so that Bε is a box in R and B0 Bε.Clearly, ⊂

λn (Bε)=λn 1 (B0) ( ε, ε)=2ελn 1 (B0) . − − −

Since ε can be made arbitrarily small, we conclude that B0 is a null set.

Remark. Recall that n n and, by Theorem 1.10, any set from n is the symmetric B ⊂ M M difference of a set from n and a null set. An interesting question is whether the family B n of Lebesgue measurable sets is actually larger than the family n of Borel sets. The M B answer is yes. Although it is difficult to construct an explicit example of a set in n n M \B

27 the fact that n n is non-empty can be used comparing the cardinal numbers n M \B |M | and n . Indeed, it is possible to show that |B | R n = R < 2 = n , (1.32) |B | | | |M | ¯ ¯ whichofcourseimpliesthat n n is non-empty.¯ ¯ Let us explain (not prove) why (1.32) is true. As we have seen, anyM open\B set is the union of a countable sequence of boxes with rational coordinates. Since the of such sequences is R , it follows that | | the family of all open sets has the cardinality R . Then the cardinality of the countable intersections of open sets amounts to that of countable| | sequences of reals, which is again R . Continuing this way and using the transfinite induction, one can show that the | | cardinality of the family of the Borel sets is R . | | R To show that n = 2 , we use the fact that any subset of a null set is also a null set and, hence, is|M measurable.| Assume first that n 2 and let A be a hyperplane from ¯ ¯ ≥ the previous example. Since¯ ¯ A = R and any subset of A belongs to n,weobtain A R | | | | M that n 2 = 2 ,whichfinishes the proof. If n =1then the example with a hyperplane|M | ≥ does not work, but one can choose A to be the Cantor set (see Exercise 13), ¯ ¯ ¯ ¯ which has the¯ cardinality¯ ¯ ¯ R and measure 0. Then the same argument works. | | 1.9 Probability spaces Probability theory can be considered as a branch of a measure theory where one uses specific notation and terminology, and considers specific questions. We start with the probabilistic notation and terminology. Definition. A is a triple (Ω, , P) where F Ω is a non-empty set, which is called the . • is a σ-algebra of subsets of Ω, whose elements are called events. •F P is a on ,thatis,P is a measure on and P (Ω)=1(in • F F particular, P is a finite measure). For any event A , P(A) is called the probability of A. ∈ F

Since is an algebra, that is, Ω , the operation of taking is defined in ,thatis,ifF A then the complement∈ F Ac := Ω A is also in . The event Ac is oppositeF to A and ∈ F \ F

c P (A )=P (Ω A)=P (Ω) P (A)=1 P (A) . \ − − Example. 1. Let Ω = 1, 2, ..., N be a finite set, and is the set of all subsets of Ω. { } N F Given N non-negative numbers pi such that i=1 pi =1,define P by P P(A)= pi. (1.33) i A X∈ The condition (1.33) ensures that P (Ω)=1.

28 For example, in the simplest case N =2, consists of the sets , 1 , 2 , 1, 2 . F ∅ { } { } { } Given two non-negative numbers p and q such that p+q =1, set P ( 1 )=p, P ( 2 )=q, { } { } while P ( )=0and P ( 1, 2 )=1. ∅ { } This example can be generalized to the case when Ω is a , say, Ω = N. Given a sequence pi ∞ of non-negative numbers pi such that ∞ pi =1,define for any { }i=1 i=1 set A Ω its probability by (1.33). Measure P constructed by means of (1.33) is called ⊂ P a discrete probability measure, and the corresponding space (Ω, , P) is called a discrete probability space. F 2. Let Ω =[0, 1], be the set of all Lebesgue measurable subsets of [0, 1] and P be F the Lebesgue measure λ1 restricted to [0, 1].Clearly,(Ω, , P) is a probability space. F 1.10 Independence

Let (Ω, , P) be a probability space. F Definition. Two events A and B are called independent (unabhängig) if

P(A B)=P(A)P(B). ∩

More generally, let Ai be a family of events parametrized by an index i.Thenthe { } family Ai is called independent (or one says that the events Ai are independent) if, for { } any finite set of distinct indices i1,i2, ..., ik,

P(Ai1 Ai2 ... Aik )=P(Ai1 )P(Ai2 )...P(Aik ). (1.34) ∩ ∩ ∩

For example, three events A, B, C are independent if

P (A B)=P (A) P (B) , P (A C)=P (A) P (C) , P (B C)=P (B) P (C) ∩ ∩ ∩ P (A B C)=P (A) P (B) P (C) . ∩ ∩

Example. In any probability space (Ω, , P), any couple A, Ω is independent, for any event A , which follows from F ∈ F P(A Ω)=P(A)=P(A)P(Ω). ∩ Similarly, the couple A, is independent. If the couple A, A is independent then P(A)=0 or 1, which follows from∅ 2 P(A)=P(A A)=P(A) . ∩

Example. Let Ω be a unit square [0, 1]2, consist of all measurable sets in Ω and F P = λ2.LetI and J be two intervals in [0, 1], and consider the events A = I [0, 1] and B =[0, 1] J. We claim that the events A and B are independent. Indeed, A× B is the rectangle I× J,and ∩ × P(A B)=λ2(I J)= (I) (J) . ∩ × Since P(A)= (I) and P(B)= (J) ,

29 we conclude P(A B)=P(A)P(B). ∩ We may choose I and J with the p and q, respectively, for any prescribed couple p, q (0, 1). Hence, this example shows how to construct two independent event with given∈ probabilities p, q. In the same way, one can construct n independent events (with prescribed probabili- ties) in the probability space Ω =[0, 1]n where is the family of all measurable sets in F Ω and P is the n-dimensional Lebesgue measure λn. Indeed, let I1, ..., In be intervals in [0, 1] of the length p1, ..., pn. Consider the events

Ak =[0, 1] ... Ik ... [0, 1] , × × × × where all terms in the direct product are [0, 1] except for the k-th term Ik.ThenAk is a box in Rn and P (Ak)=λn (Ak)= (Ik)=pk.

For any sequence of distinct indices i1,i2,...,ik, the intersection

Ai Ai ... Ai 1 ∩ 2 ∩ ∩ k is a direct product of some intervals [0, 1] with the intervals Ii1 , ..., Iik so that

P (Ai1 ... Aik )=pi1 ...pik = P (Ai1 ) ...P (Ain ) . ∩ ∩ n Hence, the sequence Ak is independent. { }k=1 It is natural to expect that independence is preserved by certain operations on events. For example, let A, B, C, D be independent events and let us ask whether the following couplesofeventsareindependent:

1. A B and C D ∩ ∩ 2. A B and C D ∪ ∪ 3. E =(A B) (C A) and D. ∩ ∪ \ It is easy to show that A B and C D are independent: ∩ ∩ P ((A B) (C D)) = P (A B C D) ∩ ∩ ∩ ∩ ∩ ∩ = P(A)P(B)P(C)P(D)=P(A B)P(C D). ∩ ∩ It is less obvious how to prove that A B and C D are independent. This will follow from the following more general statement.∪ ∪

Lemma 1.12 Let = Ai be an independent family of events. Suppose that a family A { } 0 of events is obtained from by one of the following procedures: A A 1. Adding to one of the events or Ω. A ∅ 2. Replacing two events A, B by one event A B, ∈ A ∩ 3. Replacing an event A by its complement Ac. ∈ A 30 4. Replacing two events A, B by one event A B. ∈ A ∪ 5. Replacing two events A, B by one event A B. ∈ A \

Then the family 0 is independent. A Applying the operation 4 twice, we obtain that if A, B, C, D are independent then A B and C D are independent. However, this lemma still does not answer why E and D∪are independent.∪ Proof. Each of the above procedures removes from some of the events and adds A a new event, say N.Denoteby 00 the family that remains after the removal, so that A 0 is obtained from 00 by adding N. Clearly, removing events does not change the A A independence, so that 00 is independent. Hence, in order to prove that 0 is independent, A A it suffices to show that, for any events A1,A2,...,Ak from 00 with distinct indices, A

P(N A1 ... Ak)=P(N)P(A1)...P(Ak). (1.35) ∩ ∩ ∩ Case 1. N = or Ω. The both sides of (1.35) vanish if N = . If N = Ω then it can be removed∅ from both sides of (1.35), so (1.35) follows from the∅ independence of A1,A2,...,Ak. Case 2. N = A B.Wehave ∩

P((A B) A1 A2 ... Ak)=P(A)P(B)P(A1)P(A2)...P(Ak) ∩ ∩ ∩ ∩ ∩ = P(A B)P(A1)P(A2)...P(Ak), ∩ which proves (1.35). Case 3. N = Ac. Using the identity

Ac B = B A = B (A B) ∩ \ \ ∩ and its consequence c P (A B)=P (A) P (A B) , ∩ − ∩ we obtain, for B = A1 A2 ... Ak,that ∩ ∩ ∩ c P(A A1 A2 ... Ak)=P(A1 A2 ... Ak) P(A A1 A2 ... Ak) ∩ ∩ ∩ ∩ ∩ ∩ ∩ − ∩ ∩ ∩ ∩ = P(A1)P(A2)...P(Ak) P(A)P(A1)P(A2)...P(Ak) − =(1 P(A)) P(A1)P(A2)...P(Ak) −c = P(A )P(A1)P(A2)...P(Ak) Case 4. N = A B. By the identity ∪ A B =(Ac Bc)c ∪ ∩ this case amounts to 2 and 3. Case 5. N = A B. By the identity \ A B = A Bc \ ∩ this case amounts to 2 and 3.

31 Example. Using Lemma 1.12, we can justify the probabilistic argument, introduced in Section 1.3 in order to prove the inequality

(1 pn)m +(1 qm)n 1, (1.36) − − ≥ where p, q [0, 1], p + q =1,andn, m N. Indeed, for that argument we need nm inde- pendent events,∈ each with the given probability∈ p.Aswehaveseeninanexampleabove, there exists an arbitrarily long finite sequence of independent events, and with arbitrarily prescribed probabilities. So, choose nm independent events each with probability p and denote them by Aij where i =1, 2,...,n and j =1, 2, ..., m, so that they can be arranged in a n m matrix Aij . For any index j (which is a column index), consider an event × { } n Cj = Aij, i=1 T that is, the intersection of all events Aij in the column j.SinceAij are independent, we obtain n c n P(Cj)=p and P(Cj )=1 p . − c By Lemma 1.12, the events Cj are independent and, hence, Cj are also independent. Setting { } { } m c C = Cj j=1 we obtain T n m P(C)=(1 p ) . − Similarly, considering the events m c Ri = Aij j=1 and T n c R = Ri , i=1 we obtain in the same way that T

m n P(R)=(1 q ) . − Finally, we claim that C R = Ω, (1.37) ∪ which would imply that P (C)+P (R) 1, ≥ that is, (1.36). To prove (1.37), observe that, by the definitions of R and C,

m m n c m n c c C = Cj = Aij = Aij j=1 j=1 µi=1 ¶ j=1 i=1 T T T T S and n c n n m c c c R = Ri = Ri = Aij. µi=1 ¶ i=1 i=1 j=1 T S S T 32 Denoting by ω an arbitrary element of Ω, we see that if ω Rc then there is an index c n ∈c i0 such that ω Ai0j for all j. This implies that ω i=1 Aij for any j, whence ω C. Hence, Rc C,∈ which is equivalent to (1.37). ∈ ∈ Let us give⊂ an alternative proof of (1.37) which is aS rigorous version of the argument with the coin tossing. The result of nm trials with coin tossing was a sequence of letters H, T (heads and tails). Now we make a sequence of digits 1, 0 instead as follows: for any ω Ω, set ∈ 1,ω Aij, Mij (ω)= ∈ 0,ω/Aij. ½ ∈ Hence, a point ω Ω represents the sequence of nm trials, and the results of the trials ∈ are registered in a random n m matrix M = Mij (the word “random” simply means that M is a function of ω). For× example, if { }

1 0 1 M (ω)= 0 0 1 µ ¶ then ω A11, ω A13, ω A23 but ω/A12, ω/A21, ω/A22. ∈ ∈ ∈ ∈ ∈ ∈ Thefactthatω Cj meansthatalltheentriesinthecolumnj of the matrix M (ω) c ∈ are 1; ω Cj means that there is an entry 0 in the column j; ω C means that there is 0 in every∈ column. In the same way, ω R means that there is∈1 in every row. The desired identity C R = Ω meansthateitherthereis∈ 0 in every column or there is 1 in every row. Indeed,∪ if the first event does not occur, that is, there is a column without 0, then this column contains only 1, for example, as here:

1 M (ω)= 1 . ⎛ 1 ⎞ ⎝ ⎠ However, this means that there is 1 in every row, which proves that C R = Ω. ∪ Although Lemma 1.12 was useful in the above argument, it is still not enough for other applications. For example, it does not imply that E =(A B) (C A) and D are independent (if A, B, C, D are independent), since A is involved∩ twice∪ in\ the formula defining E. There is a general theorem which allows to handle all such cases. Before we state it, let us generalize the notion of independence as follows.

Definition. Let i be a sequence of families of events. We say that i is independent {A } {A } if any sequence Ai such that Ai i,isindependent. { } ∈ A Example. Let Ai be an independent sequence of events and consider the families { } c i = ,Ai,A , Ω . A {∅ i }

Then the sequence i is independent, which follows from Lemma 1.12. {A } For any family of subsets of Ω,denotebyR ( ) the minimal algebra containing and by Σ ( ) — theA minimal σ-algebra containing A (we used to denote by R ( ) andA Σ ( ) respectivelyA the minimal ring and σ-ring containingA , but in the presentA context weA switch to algebras). A

33 Theorem 1.13 Suppose that Aij is an independent sequence of events parametrized by { } two indices i and j.Denoteby i the family of all events Aij with fixed i and arbitrary A j. Then the sequence of algebras R( i) is independent. Moreover, also the sequence of { A } σ-algebras Σ( i) is independent. { A }

For example, if the sequence Aij is represented by a matrix { }

A11 A12 ... A21 A22 ... ⎛ ...... ⎞ ⎝ ⎠ then i is the family of all events in the row i, and the claim is that the algebras (resp., σ-algebras)A generated by different rows, are independent. Clearly, the same applies to the columns. Let us apply this theorem to the above example of E =(A B) (C A) and D. Indeed, in the matrix ∩ ∪ \ ABC D ΩΩ µ ¶ all events are independent. Therefore, R(A, B, C) and R(D) are independent, and E = (A B) (C A) and D are independent because E R(A, B, C). ∩The proof∪ \ of Theorem 1.13 is largely based on a powerful∈ theorem of Dynkin, which explains how a given family of subsets can be completed into algebra or σ-algebra. Beforewestateit,letusintroducethefollowingnotation.A Let Ω be any non-empty set (not necessarily a probability space) and be a family of subsets of Ω.Let be an operation on subsets of Ω (such that union,A intersection, ∗ etc). Then denote by ∗ the minimal extension of , which is closed under the operation . More precisely, considerA all families of subsetsA of Ω,whichcontain and which are closed∗ under (the latter means that applying to the sets from such aA family, we obtain again a set from∗ that family). For example, the∗ family 2Ω of all subsets of Ω satisfies all these conditions. Then taking intersection of all such families, we obtain ∗.If is an A ∗ operation over a finite number of elements then ∗ can be obtained from by applying all possible finite sequences of operations . A A The operations to which we apply this∗ notion are the following:

1. Intersection “ ”thatisA, B A B ∩ 7→ ∩ 2. The monotone difference “ ”defined as follows: if A B then A B = A B. − ⊃ − \

3. The monotone limit lim defined on monotone sequences An ∞ as follows: if the { }n=1 sequence is increasing that is An An+1 then ⊂

∞ lim An = An n=1 [ and if An is decreasing that is An An+1 then ⊃

∞ lim An = An. n=1 \

34 Using the above notation, − is the minimal extension of using the monotone difference, and lim is the minimalA extension of using the monotoneA limit. A A Theorem 1.14 ( Theorem of Dynkin)LetΩ be an arbitrary non-empty set and be a family of subsets of Ω. A

(a) If contains Ω andisclosedunder then A ∩

R( )= −. A A (b) If is algebra of subsets of Ω then A Σ( )= lim. A A As a consequence we see that if is any family of subsets containing Ω then it can be extended to the minimal algebra RA( ) as follows: firstly, extend it using intersections A so that the resulting family A∩ is closed under intersections; secondly, extend A∩ to the algebra R ( ∩)=R ( ) using the monotone difference (which requires part (a) of Theorem 1.14).A Hence, weA obtain the identity

R( )=( ∩)− . (1.38) A A Similarly, applying further part (b),weobtain

lim Σ( )= ( ∩)− . (1.39) A A ³ ´ Themostnon-trivialpartis(1.38).Indeed,itsaysthatanysetthatcanbeobtained from sets of by a finite number of operations , , , can also be obtained by first applying a finiteA number of andthenapplyingfinite∩ ∪ number\ of “ ”. This is not quite obvious even for the simplest∩ case −

= Ω,A,B . A { } Indeed, (1.38) implies that the union A B canbeobtainedfromΩ,A,B by applying first and then “ ”. However, Theorem∪ 1.14 does not say how exactly one can do that. Theanswerinthisparticularcaseis∩ −

A B = Ω (Ω A (B (A B))). ∪ − − − − ∩ Proof of Theorem 1.14. Letusprovethefirst part of the theorem. Assuming that contains Ω and is closed under ,letusshowthat − is algebra, which will settle the A ∩ A claim. Indeed, as an algebra, − must contain R( ) since R ( ) is the smallest algebra extension of . On the otherA hand, R( ) is closedA under “ ”A and contains ;thenit A A − A contains also −, whence R( )= −. A A A To show that − is an algebra, we need to verify that − contains , Ω and is closed A A ∅ under , , . Obviously, Ω − and = Ω Ω −. Also, if A − then also c ∩c ∪ \ c ∈ A ∅ − ∈ A ∈ A A because A = Ω A. It remains to show that − is closed under intersections, ∈ A − A c c c that is, if A, B − then A B − (this will imply that also A B =(A B ) − ∈ A ∩ ∈ A ∪ ∩ ∈ A and A B = A (A B) −). \ − ∩ ∈ A 35 Given a set A −, call a set B Ω suitable for A if A B −. Denote the family of suitable sets by∈SA,thatis, ⊂ ∩ ∈ A

S = B Ω : A B − . ⊂ ∩ ∈ A

We need to prove that S −. Assume© first that A ª .ThenS contains because is closed under intersections.⊃ A Let us verify that S is∈ closedA under monotoneA difference. A Indeed, if B1 B2 are suitable sets then ⊃

A (B1 B2)=(A B1) (A B2) −, ∩ − ∩ − ∩ ∈ A whence B1 B2 S. Hence, the family S of all suitable sets contains and is closed ∩ ∈ A under “ ”, which implies that S contains − (because − is the minimal family with these properties).− Hence, we have proved thatA A

A B − (1.40) ∩ ∈ A whenever A and B −.SwitchingA and B, we see that (1.40) holds also if A − and B . ∈ A ∈ A ∈ A ∈ A Now we prove (1.40) for all A, B −. Consider again the family S of suitable sets for A. Aswehavejustshown,S ∈ .SinceA S is closed under “ ”, we conclude that ⊃ A − S −,whichfinishes the proof. ⊃TheA second statement about σ-algebras can be obtained similarly. Using the method of suitable sets, one proves that lim is an algebra, and the rest follows from the following observation. A

Lemma 1.15 If an algebra is closed under monotone limits then it is a σ-algebra (conversely, any σ-algebra is anA algebra and is closed under lim).

Indeed, it suffices to prove that if An for all n N then n∞=1 An . Indeed, n ∈ A ∈ ∈ A setting Bn = Ai,wehavetheidentity i=1 S

S ∞ ∞ An = Bn =limBn. n=1 n=1 [ [ Since Bn , it follows that also lim Bn ,whichfinishes the proof. Theorem∈ A 1.13 will be deduced from the∈ A following more general theorem.

Theorem 1.16 Let (Ω, , P) be a probability space. Let i be a sequence of families F {A } of events such that each family i contain Ω andisclosedunder . If the sequence i A ∩ {A } is independent then the sequence of the algebras R( i) is also independent. Moreover, { A } the sequence of σ-algebras Σ( i) is independent, too. { A } Proof of Theorem 1.16. In order to check the independence of families, one needs to test finite sequences of events chosen from those families. Hence, it suffice to restrict to the case when the number of families i is finite. So, assume that i runs over 1, 2, ..., n. A It suffices to show that the sequence R( 1), 2, ..., n is independent. If we know that { A A A } then we can by induction replace 2 by R( 2) etc. To show the independence of this A A

36 sequence, we need to take arbitrary events A1 R( 1),A2 2, ..., An n and prove that ∈ A ∈ A ∈ A P(A1 A2 ... An)=P(A1)P(A2)... P(An). (1.41) ∩ ∩ ∩ Indeed, by the definition of the independent events, one needs to check this property also for subsequences Aik but this amounts to the full sequence Ai by setting the missing events to be Ω. { } { } Denote for simplicity A = A1 and B = A2 A3 ... An.Since A2,A3, ..., An are independent, (1.41) amounts to ∩ ∩ ∩ { }

P(A B)=P(A)P(B). (1.42) ∩

In other words, we are left to prove that A and B are independent, for any A R( 1) ∈ A and B being an intersection of events from 2, 3,..., n. Fix such B andcallaneventA suitableAforAB if (1.42)A holds. We need to show that all events from R( 1) are suitable. Observe that all events from 1 are suitable. Let us prove that theA family of suitable sets is closed under the monotoneA difference “ ”. − Indeed, if A and A0 are suitable and A A0 then ⊃ P((A A0) B)=P(A B) P(A0 B) − ∩ ∩ − ∩ = P(A)P(B) P(A0)P(B) − = P(A A0)P(B). −

Hence, we conclude that the family of all suitable sets contains the extension of 1 by A “ ”, that is 1−. By Theorem 1.14, 1− = R( 1). Hence, all events from R( 1) are suitable,− whichA was to be proved. A A A The independence of Σ( i) is treated in the same way by verifying that the equality (1.42) is preserved by monotone{ A } limits. Proof of Theorem 1.13. Adding to each family i the event Ω does not change A the independence, so we may assume Ω i.Also,ifweextend i to ∩ then ∩ ∈ A A Ai {Ai } are also independent. Indeed, each event Bi ∩ is an intersection of a finite number of ∈ Ai events from i, that is, has the form A

Bi = Aij Aij ... Aij . 1 ∩ 2 ∩ ∩ k

Hence, the sequence Bi can be obtained by replacing in the double sequence Aij some elements by their{ intersections} (and throwing away the rest), and the independence{ } of Bi follows by Lemma 1.12 from the independence of Aij across all i and j. { } { } Therefore, the sequence ∩ satisfies the hypotheses of Theorem 1.16, and the se- {Ai } quence R( ∩) (and Σ( ∩) ) is independent. Since R( ∩)=R( i) and Σ( ∩)= { Ai } { Ai } Ai A Ai Σ( i), we obtain the claim. A

37 2Integration

In this Chapter, we define the general notion of the Lebesgue integral. Given a set M,a σ-algebra and a measure μ on ,weshouldbeabletodefine the integral M M fdμ ZM of any function f on M of an appropriate . Recall that if M is a bounded closed b interval [a, b] R then the integral a f (x) dx is defined for Riemann integrable function, in particular,⊂ for continuous functions on [a, b], and is obtained as the limit of the sums R n

f (ξi)(xi xi 1) − i=1 − n X n where xi is a partition of the interval [a, b] and ξ is a sequence of tags, that is, { }i=0 { i}i=1 ξi [xi 1,xi]. One can try also in the general case arrange a partition of the set M into ∈ − smaller sets E1,E2,... and define the integral sums

f (ξi) μ (Ei) i X where ξi Ei. There are two problems here: in what sense to understand the limit of the integral∈ sums and how to describe the class of functions for which the limit exists. Clearly, the partition Ei should be chosen so that the values of f in Ei do not change { } much and can be approximated by f (ξi). In the case of the Riemann , this is achieved by choosing Ei to be small intervals and by using the continuity of f.Inthe general case, there is another approach, due to Lebesgue, whose main idea is to choose Ei depending of f, as follows:

Ei = x M : ci 1

2.1 Measurable functions Let M be an arbitrary non-empty set and be a σ-algebra of subsets of M. M Definition. We say that a set A M is measurable if A .Wesaythatafunction ⊂ ∈ M f : M R is measurable if, for any c R, the set x M : f (x) c is measurable, that is,→ belongs to . ∈ { ∈ ≤ } M Of course, the measurability of sets and functions depends on the choice of the σ- algebra .Forexample,inRn we distinguish Lebesgue measurable sets and functions M 38 when = n (they are frequently called simply “measurable”), and the Borel measur- M M able sets and functions when = n (they are normally called “Borel sets” and “Borel functions” avoiding the wordM “measurable”).B The measurability of a function f can be also restated as follows. Since

1 f (x) c = f − ( ,c], { ≤ } −∞ 1 we can say that a function f is measurable if, for any c R, the set f − ( ,c] is measurable. Let us refer to the intervals of the form ( ,c] as∈ special intervals.Thenwe−∞ −∞ can say that a mapping f : M R is measurable if the preimage of any special interval is a measurable set. →

Example. Let A be an arbitrary subset of M.Define the 1A on M by 1,x A, 1 (x)= A 0,x/∈ A. ½ ∈ We claim that the set A is measurable if and only if the function 1A is measurable. Indeed, the set f (x) c can be described as follows: { ≤ } ,c<0, f (x) c = A∅ c, 0 c<1, { ≤ } ⎧ M, c ≤ 1. ⎨ ≥ The sets and M are always measurable,⎩ and Ac is measurable if and only if A is measurable,∅ whence the claim follows.

n n Example. Let M = R and = n.Letf (x) be a from R to R. 1 M B n Then the set f − ( ,c] is a closed subset of R as the preimage of a closed set ( ,c] −∞ −∞ in R. Since the closed sets are Borel, we conclude that any continuous function f is a Borel function.

n Definition. Amappingf : M R is called measurable if, for all c1,c2, ..., cn R,the set → ∈ x M : f1 (x) c1,f2 (x) c2,...,fn (x) cn { ∈ ≤ ≤ ≤ } is measurable. Here fk is the k-th component of f. In other words, consider an infinite box in Rn of the form:

B =( ,c1] ( ,c2] ... ( ,cn], −∞ × −∞ × × −∞ and call it a special box. Then a mapping f : M Rn is measurable if, for any special 1 → box B, the preimage f − (B) is a measurable subset of M.

Lemma 2.1 If a mapping f : M Rn is measurable then, for any Borel set A Rn, 1 → ⊂ the preimage f − (A) is a measurable set.

n 1 Proof. Let be the family of all sets A R such that f − (A) is measurable. By hypothesis, containsA all special boxes. Let us⊂ prove that is a σ-algebra (this follows also from ExerciseA 4 since in the notation of that exercise A= f ( )). If A, B then 1 1 A M ∈ A f − (A) and f − (B) are measurable whence

1 1 1 f − (A B)=f − (A) f − (B) , \ \ ∈ M 39 N whence A B . Also, if Ak k=1 is a finite or countable sequence of sets from then 1 \ ∈ A { } A f − (Ak) is measurable for all k whence

N N 1 1 f − Ak = f − (Ak) , ∈ M µk=1 ¶ k=1 N S S which implies that Ak . Finally, contains R because R is the countable union k=1 ∈ A A of the intervals ( ,n] where n N,and contains = R R. Hence, is−∞ a σS-algebra containing∈ allA special boxes.∅ It\ remains to show that A A contains all the boxes in Rn, which will imply that contains all Borel sets in Rn.In A fact, it suffices to show that any box in Rn can be obtained from special boxes by a countable sequence of set-theoretic operations. Assume first n =1and consider different types of intervals. If A =( ,a] then A by hypothesis. −∞ ∈LetA A =(a, b] where a

∞ A =(a, b)= (a, bk] k=1 S implies that A . ∈ A Let A =[a, b). Consider a strictly increasing sequence ak ∞ such that ak a as { }k=1 → k .Then(ak,b) bythepreviousargument,andtheset →∞ ∈ A ∞ A =[a, b)= (ak,b) k=1 T is also in . A Finally, let A =[a, b]. Observing that A = R ( ,a) (b, + ) where all the terms belong to , we conclude that A . \ −∞ \ ∞ ConsiderA now that general case∈ n>A 1. We are given that contains all boxes of the form A B = I1 I2 ... In × × × where Ik are special intervals, and we need to prove that contains all boxes of this form A with arbitrary intervals Ik.IfI1 is an arbitrary interval and I2, ..., In are special intervals then one shows that B Ausingthesameargumentasinthecasen =1since I1 can be obtained from the special∈ intervals by a countable sequence of set-theoretic operations, and the same sequence of operations can be applied to the product I1 I2 ... In.Now × × × let I1 and I2 be arbitrary intervals and I3, ..., In be special. We know that if I2 is special then B . Obtaining an arbitrary interval I2 from special intervals by a countable ∈ A sequence of operations, we obtain that B also for arbitrary I1 and I2.Continuing ∈ A thesameway,weobtainthatB if I1,I2,I3 are arbitrary intervals while I4, ..., In are ∈ A special, etc. Finally, we allow all the intervals I1, ..., In to be arbitrary. Example. If f : M R is a then the set → x M : f (x) is irrational { ∈ } 40 1 c c is measurable, because this set coincides with f − (Q ),andQ is Borel since Q is Borel as a countable set.

Theorem 2.2 Let f1, ..., fn be measurable functions from M to R and let Φ be a Borel function from Rn to R. Then the function

F = Φ (f1, ..., fn) is measurable.

In other words, the composition of a Borel function with measurable functions is mea- surable. Note that the composition of two measurable functions may be not measurable.

Example. It follows from Theorem 2.2 that if f1 and f2 are two measurable functions on M then their sum f1 + f2 is also measurable. Indeed, consider the function Φ (x1,x2)= 2 x1 + x2 in R , which is continuous and, hence, is Borel. Then f1 + f2 = Φ (f1,f2) and this function is measurable by Theorem 2.2. A by definition may be difficult: the fact that the set f1 + f2 c is measurable, is not immediately clear how to reduce this { ≤ } set to the measurable sets f1 a and f1 b . { ≤ } { ≤ } In the same way, the functions f1f2, f1/f2 (provided f2 =0) are measurable. Also, 6 the functions max (f1,f2) and min (f1,f2) are measurable, etc. Proof of Theorem 2.2. Consider the mapping f : M Rn whose components are n → fk. This mapping is measurable because for any c R , the set ∈

x M : f1 (x) c1,...,fn (x) cn = f1 (x) c1 f2 (x) c2 ... fn (x) cn { ∈ ≤ ≤ } { ≤ } ∩ { ≤ } ∩ ∩ { ≤ } 1 is measurable as the intersection of measurable sets. Let us show that F − (I) is a mea- surable set for any special interval I, which will prove that F is measurable. Indeed, since F (x)=Φ (f (x)), we obtain that

1 F − (I)= x M : F (x) I { ∈ ∈ } = x M : Φ (f (x)) I { ∈ ∈1 } = x M : f (x) Φ− (I) ∈ ∈ = f 1 Φ 1 (I) . ©− − ª 1 1 1 Since Φ− (I) is a Borel set, we obtain by¡ Lemma¢ 2.1 that f − (Φ− (I)) is measurable, 1 which proves that F − (I) is measurable.

Example. If A1,A2,...,An is a finite sequence of measurable sets then the function

f = c11A1 + c11A2 + ... + cn1An

is measurable (where ci are constants). Indeed, each of the function 1Ai is measurable, whence the claim following upon application of Theorem 2.2 with the function

Φ (x)=c1x1 + ... + cnxn.

41 2.2 Sequences of measurable functions As before, let M be a non-empty set and be a σ-algebra on M. M Definition. We say that a sequence fn ∞ of functions on M converges to a function { }n=1 f pointwise and write fn f if fn (x) f (x) as n for any x M. → → →∞ ∈

Theorem 2.3 Let fn n∞=1 be a sequence of measurable functions that converges pointwise to a function f.Then{ }f is measurable, too.

Proof. Fix some real c. Using the definition of a limit and the hypothesis that fn (x) f (x) as n , we obtain that the inequality f (x) c is equivalent to the → →∞ ≤ following condition: for any k N there is m N such that, for all n m, ∈ ∈ ≥ 1 f (x)

∞ ∞ ∞ 1 f (x) c = fn (x)

Proof. Indeed, this is a particular case of Theorem 2.3 with = n (for Lebesgue M M measurable functions) and with = n (for Borel functions). M B Example. Let us give an alternative proof of the fact that any continuous function f on R is Borel. Indeed, consider first a function of the form

∞ g (x)= ck1Ik (x) k=1 X where Ik is a disjoint sequence of intervals. Then { } N

g (x)= lim ck1I (x) N k →∞ k=1 X and, hence, g (x) is Borel as the limit of Borel functions. Now fixsomen N and consider the following sequence of intervals: ∈ k k +1 Ik =[ , ) where k Z, n n ∈

42 so that = I . Also, consider the function R k Z k ∈ F k g (x)= f 1 (x) . n n Ik k Z µ ¶ X∈ By the continuity of f,foranyx R,wehavegn (x) f (x) as n . Hence, we conclude that f is Borel as the pointwise∈ limit of Borel functions.→ →∞ Sofarwehaveonlyassumedthat is a σ-algebra of subsets of M. Now assume that there is also a measure μ on . M M Definition. We say that a measure μ is complete if any subset of a set of measure 0 belongs to .Thatis,ifA and μ (A)=0then every subset A0 of A is also in M ∈ M M (and, hence, μ (A0)=0). As we know, if μ is a σ-finite measure initially definedonsomeringR then by the Carathéodory extension theorems (Theorems 1.7 and 1.8), μ can be extended to a measure on a σ-algebra of measurable sets, which contains all null sets. It follows that if μ (A)=0thenMA is a null set and, by Theorem 1.10, any subset of A is also a null set and, hence, is measurable. Therefore, any measure μ that is constructed by the Carathéodory extension theorems, is automatically compete. In particular, the Lebesgue measure λn with the domain n is complete. In general, a measure doesM not have to be complete, It is possible to show that the Lebesgue measure λn restricted to n is no longer complete (this means, that ifA is a B Borel set of measure 0 in Rn then not necessarily any subset of it is Borel). On the other hand, every measure can be completed by adding to its domain the null sets — see Exercise 36. Assume in the sequence that μ is a definedonaσ-algebra of subsets of M.Thenweusethetermanull set as synonymous for a set of measureM0. Definition. We say that two functions f,g : M R are equal and write f = g a.e. if the set x M : f (x) = g (x) has→ measure 0. In other words, there is asetN of measure 0 such{ that∈ f = g on6 M N}. \ More generally, the term “almost everywhere” is used to indicate that some property holds for all points x M N where μ (N)=0. ∈ \ Claim 1. The relation f = g a.e.is an equivalence relation. Proof. We need to prove three properties that characterize the equivalence relations:

1. f = f a.e.. Indeed, the set f (x) = f (x) is empty and, hence, is a null set. { 6 } 2. f = g a.e.is equivalent to g = f a.e.,whichisobvious.

3. f = g a.e.and g = h a.e. imply f = h a.e. Indeed, we have

f = h f = g h = g { 6 } ⊂ { 6 } ∪ { 6 } whence we conclude that the set f (x) = g (x) is a null set as a subset of the union of two null sets. { 6 }

Claim 2 If f is measurable and g = f a.e. then g is also measurable.

43 Proof. Fix some real c and prove that the set g c is measurable. Observe that { ≤ } N := f c g c f = g . { ≤ } M { ≤ } ⊂ { 6 } Indeed, x N if x belongs to exactly to one of the sets f c and g c .For example, x∈belongs to the first one and does not belong to the{ second≤ } one, then{ ≤f (}x) c and g (x) >cwhence f (x) = g (x).Since f = g is a null set, the set N is also a null≤ set. Then we have 6 { 6 } g c = f c N, { ≤ } { ≤ } M which implies that g c is measurable. { ≤ } Definition. We say that a sequence of functions fn on M converges to a function f almost everywhere and write fn f a.e. if the set x M : fn (x) f (x) is a null set. → { ∈ 6→ } In other words, there is a set N of measure 0 such that fn f pointwise on M N. → \

Theorem 2.4 If fn is a sequence of measurable functions and fn f a.e. then f is also a measurable{ function.} →

Proof. Consider the set

N = x M : fn (x) f (x) , { ∈ 6→ } which has measure 0.Redefine fn (x) for x N by setting fn (x)=0.Sincewehave ∈ changed fn onanullset,thenewfunctionfn is also measurable. Then the new sequence fn (x) converges for all x M, because fn (x) f (x) for all x M N by hypothesis, { } ∈ → ∈ \ and fn (x) 0 for all x N by construction. By Theorem 2.3, the limit function is measurable,→ and since f∈is equal to the limit function almost everywhere, f is also measurable. We say a sequence of functions fn on a set M converges to a function f uniformly { } on M and write fn ⇒ f on M if

sup fn (x) f (x) 0 as n . x M | − | → →∞ ∈ We have obviously the following relations between the convergences:

f f = fn f pointwise = fn f a.e. (2.1) ⇒ ⇒ → ⇒ →

In general the converse for the both implications is not true. For example, let us show that the does not imply the uniform convergence if the set M is infinite. Indeed, let xk k∞=1 be a sequence of distinct points in M and set fk =1xk . { } { } Then, for any point x M, fk (x)=0for large enough k, which implies that fk (x) 0 ∈ → pointwise. On the other hand, sup fk =1so that fk 0. | | 6⇒ For the second example, let fk be any sequence of functions that converges pointwise { } to f.Define f as an arbitrary modification of f on a non-empty set of measure 0.Then still f f a.e. while f does not converge to f pointwise. → Surprisinglye enough, the convergence a.e. still implies the uniform convergence but on a smaller set,e as is stated in the following theorem.e

44 Theorem 2.5 (Theorem of Egorov) Let μ be a complete finite measure on a σ-algebra on a set M.Let fn be a sequence of measurable functions and assume that fn f M { } → a.e. on M. Then, for any ε>0,thereisasetMε M such that: ⊂

1. μ (M Mε) <ε \ 2. fn ⇒ f on Mε.

In other words, by removing a set M Mε is measure smaller than ε,onecanachieve \ that on the remaining set Mε the convergence is uniform. Proof. The condition fn ⇒ f on Mε (where Mε is yet to be defined) means that for any m N there is n = n (m) such that for all k n ∈ ≥ 1 sup fk f < . Mε | − | m

Hence, for any x Mε, ∈ 1 for any m 1 for any k n (m) fk (x) f (x) < , ≥ ≥ | − | m which implies that

∞ 1 Mε x M : fk (x) f (x) < . ⊂ m=1 k n(m) ∈ | − | m ≥ ½ ¾ T T Now we can define Mε to be the right hand side of this relation, but first we need to define n (m). For any couple of positive n, m, consider the set 1 Am,n = x M : fk (x) f (x) < for all k n . ∈ | − | m ≥ ½ ¾ This can also be written in the form 1 Am,n = x M : fk (x) f (x) < . k n ∈ | − | m ≥ ½ ¾ T By Theorem 2.4, function f is measurable, by Theorem 2.2 the function fn f is mea- | − | surable, which implies that Amn is measurable as a countable intersection of measurable sets. Observe that, for any fixed m, the set Am,n increases with n.Set

∞ Am = Am,n =limAm,n, n n=1 →∞ S that is, Am is the monotone limit of Am,n. Then, by Exercise 9, we have

μ (Am)= limμ (Am,n) n →∞ whence lim μ (Am Am,n)=0 (2.2) n →∞ \ 45 (to pass to (2.2), we use that μ (Am) < , which is true by the hypothesis of the finiteness of measure μ). ∞ Claim For any m,wehave μ (M Am)=0. (2.3) \ Indeed, by definition,

∞ ∞ 1 Am = Am,n = x M : fk (x) f (x) < , n=1 n=1 k n ∈ | − | m ≥ ½ ¾ S S T which means that x Am if and only if there is n such that for all k n, ∈ ≥ 1 fk (x) f (x) < . | − | m In particular, if fk (x) f (x) for some point x then this condition is satisfied so that → this point x is in Am.Byhypothesis,fn (x) f (x) for x, which implies that → μ (M Am)=0. It\ follows from (2.2) and (2.3) that

lim μ (M Am,n)=0, n →∞ \ which implies that there is n = n (m) such that ε μ M Am,n(m) < . (2.4) \ 2m Set ¡ ¢ ∞ ∞ 1 Mε = Am,n(m) = x M : fk (x) f (x) < (2.5) m=1 m=1 k n(m) ∈ | − | m ≥ ½ ¾ and show that the setT Mε satisfiesT the requiredT conditions. 1. Proof of μ (M Mε) <ε.Observe that by (2.5) \ c ∞ ∞ c μ (M Mε)=μ Am,n(m) = μ A \ n,m(n) µµm=1 ¶ ¶ µm=1 ¶ ∞ T S μ M An,m(n) ≤ m=1 \ X ¡ ¢ ∞ ε < = ε. 2m m=1 X Here we have used the subadditivity of measure and (2.4). 2. Proof of fn ⇒ f on Mε. Indeed, for any x M, we have by (2.5) that, for any m 1 and any k n (m), ∈ ≥ ≥ 1 fk (x) f (x) < . | − | m Hence, taking sup in x Mε, we obtain that also ∈ 1 sup fk f , Mε | − | ≤ m which implies that sup fk f 0 Mε | − | → and, hence, fk ⇒ f on Mε.

46 2.3 The Lebesgue integral for finite measures Let M be an arbitrary set, be a σ-algebra on M and μ be a complete measure on M . Wearegoingtodefine the notion of the integral M fdμ for an appropriate class of functionsM f.Wewillfirst do this for a finite measure, that is, assuming that μ (M) < , andthenextendtotheσ-finite measure. R ∞ Hence, assume here that μ is finite.

2.3.1 Simple functions

Definition. Afunctionf : M R is called simple if it is measurable and the set of its values is at most countable. →

Let ak be the sequence of distinct values of a f. Consider the sets { }

Ak = x M : f (x)=ak (2.6) { ∈ } which are measurable, and observe that

M = Ak. (2.7) k F Clearly, we have the identity

f (x)= ak1Ak (x) (2.8) k X for all x M. Note that any sequence Ak of disjoint measurable sets such that (2.7) ∈ { } and any sequence of distinct reals ak determine by (2.8) a function f (x) that satisfies also (2.6), which means that all simple{ } functions have the form (2.8). Definition. If f 0 is a simple function then define the Lebesgue integral fdμ by ≥ M R fdμ := akμ (Ak) . (2.9) M k Z X Thevalueintherighthandsideof(2.9)isalwaysdefined as the sum of a non- negative series, and can be either a non-negative real number or infinity. Note also that in order to be able to define μ (Ak), sets Ak must be measurable, which is equivalent to the measurability of f. For example, if f C for some constant C then we can write f = C1M and by (2.9) ≡ fdμ = Cμ(M) . ZM

The expression M fdμ has the full title “the integral of f over M against measure μ”. The notation M fdμ should be understood as a whole, since we do not define what dμ means. This notationR is traditionally used and has certain advantages. A modern shorter notation forR the integral is μ (f),whichreflects the idea that measure μ induces a functional on functions, which is exactly the integral. We have defined so far this functional for simple functions and then will extend it to a more general class. However, first we prove some property of the integral of simple functions.

47 Lemma 2.6 (a) Let M = k Bk where Bk is a finite or countable sequence of measur- able sets. Define a function f by { } F f = bk1Bk k X where bk is a sequence of non-negative reals, not necessarily distinct. Then { }

fdμ = bkμ (Bk) . M k Z X (b) If f is a non-negative simple function then, for any real c 0, cf is also non- negative real and ≥ cfdμ = c fdμ ZM ZM (if c =0and M fdμ =+ and then we use the convention 0 =0). (c) If f,g are non-negative∞ simple functions then f + g is also· ∞ simple and R (f + g) dμ = fdμ+ gdμ. ZM ZM ZM (d) If f,g are simple functions and 0 f g then ≤ ≤ fdμ gdμ. ≤ ZM ZM

Proof. (a) Let aj be the sequence of all distinct values in bk ,thatis, aj is the sequence of all distinct{ } values of f.Set { } { }

Aj = x M : f (x)=aj . { ∈ } Then Aj = Bk k:bk=aj { } F and μ (Aj)= μ (Bk) ,

k:bk=aj { X } whence

fdμ = ajμ (Aj)= aj μ (Bk)= bkμ (Bk)= bkμ (Bk) . M j j k:bk=aj j k:bk=aj k Z X X { X } X { X } X

(b) Let f = k ak1Ak where k Ak = M.Then P F cf = cak1Ak k X whence by (a)

cfdμ = cakμ (Ak)=c akμ (Ak) . M k k Z X X 48 (c) Let f = k ak1Ak where k Ak = M and g = j bj1Bj where j Bj = M.Then

P F M = (Ak Bj)P F k,j ∩ F andonthesetAk Bj we have f = ak and g = bj so that f + g = ak + bj. Hence, f + g is a simple function,∩ and by part (a) we obtain

(f + g) dμ = (ak + bj) μ (Ak Bj) . ∩ M k,j Z X Also, applying the same to functions f and g,wehave

fdμ = akμ (Ak Bj) ∩ M k,j Z X and

gdμ = bjμ (Ak Bj) , ∩ M k,j Z X whence the claim follows. (d) Clearly, g f is a non-negative simple functions so that by (c) − gdμ = (g f) dμ + fdμ fdμ. − ≥ ZM ZM ZM ZM

2.3.2 Positive measurable functions

Definition. Let f 0 be any measurable function on M. The Lebesgue integral of f is defined by ≥

fdμ = lim fndμ n ZM →∞ ZM where fn is any sequence of non-negative simple functions such that fn ⇒ f on M as n {. } →∞ To justify this definition, we prove the following statement.

Lemma 2.7 For any non-negative measurable functions f, there is a sequence of non- negative simple functions fn such that fn ⇒ f on M. Moreover, for any such sequence the limit { }

lim fndμ n →∞ ZM exists and does not depend on the choice of the sequence fn as long as fn f on M. { } ⇒ Proof. Fix the index n and, for any non-negative k, consider the set

k k +1 Ak,n = x M : f (x) < . ∈ n ≤ n ½ ¾

49 Clearly, M = k∞=0 Ak,n.Define function fn by k F f = 1 , n n Ak,n k X k that is, fn = n on Ak,n.Thenfn is a non-negative simple function and, on a set Ak,n,we have k +1 k 1 0 f fn < = ≤ − n − n n so that 1 sup f fn . M | − | ≤ n It follows that fn ⇒ f on M. Let now fn be any sequence of non-negative simple functions such that fn f.Let { } ⇒ us show that limn fndμ exists. The condition fn f on M implies that →∞ M ⇒

R sup fn fm 0 as n, m . M | − | → →∞

Assume that n, m are so big that C := sup fn fm is finite. Writing M | − | fm fn + C ≤ and noting that all the functions fm,fn,C are simple, we obtain by Lemma 2.6

fmdμ fndμ + Cdμ ≤ ZM ZM ZM = fndμ +sup fn fm μ (M) . M | − | ZM If fmdμ =+ for some m, then implies that that fndμ =+ for all large enough M ∞ M ∞ n, whence it follows that limn M fndμ =+ .If M fmdμ < for all large enough m,R then it follows that →∞ ∞ R ∞ R R

fmdμ fndμ sup fn fm μ (M) − ≤ M | − | ¯ZM ZM ¯ ¯ ¯ which implies that the¯ numerical sequence ¯ ¯ ¯

fndμ ½ZM ¾ is Cauchy and, hence, has a limit. Let now fn and gn be two sequences of non-negative simple functions such that { } { } fn ⇒ f and gn ⇒ f. Let us show that

lim fndμ =lim gndμ. (2.10) n n →∞ ZM →∞ ZM Indeed, consider a mixed sequence f1,g1,f2,g2, ... . Obviously, this sequence converges uniformly to f. Hence, by the previous{ part of the} proof, the sequence of integrals

f1dμ, g1dμ, f2dμ, g2dμ, ... ZM ZM ZM ZM 50 converges, which implies (2.10).

Hence, if f is a non-negative measurable function then the integral M fdμ is well- defined and takes value in [0, + ]. ∞ R Theorem 2.8 (a) (Linearity of the integral).Iff is a non-negative measurable function and c 0 is a real then ≥ cfdμ = c fdμ. ZM ZM If f and g aretwonon-negativemeasurablefunctions,then

(f + g) dμ = fdμ+ gdμ. ZM ZM ZM (b) (Monotonicity of the integral) If f g are non-negative measurable function then ≤ fdμ gdμ. ≤ ZM ZM

Proof. (a) By Lemma 2.7, there are sequences fn and gn of non-negative simple { } { } functions such that fn ⇒ f and gn ⇒ g on M.Thencfn ⇒ cf and by Lemma 2.6,

cfdμ = lim cfndμ =limc fndμ = c fdμ. n n ZM →∞ ZM →∞ ZM ZM Also, we have fn + gn ⇒ f + g and, by Lemma 2.6,

(f + g) dμ = lim (fn + gn) dμ = lim fndμ + gndμ = fdμ+ gdμ. n n ZM →∞ ZM →∞ µZM ZM ¶ ZM ZM (b) If f g then g f is a non-negative measurable functions, and g =(g f)+f whence by (≤a) − − gdμ = (g f) dμ + fdμ fdμ. − ≥ ZM ZM ZM ZM

Example. Let M =[a, b] where a

R a = x0

The lower Darboux sum is defined by

n

S (f,p)= mi (xi xi 1) , ∗ − i=1 − X where mi =inff. [xi 1,xi] −

51 By the property of the Riemann integral, we have

b f (x) dx = lim S (f,p) (2.11) m(p) 0 ∗ Za → where m (p)=maxi xi xi 1 is the mesh of the partition. | − − | Consider now a simple function Fp defined by

n

Fp = mi1[xi 1,xi). − i=1 X By Lemma 2.6, n

Fpdμ = miμ ([xi 1,xi)) = S (f,p) . − ∗ [a,b] i=1 Z X On the other hand, by the uniform continuity of function f,wehaveFp ⇒ f as m (p) 0, which implies by the definition of the Lebesgue integral that →

fdμ = lim Fpdμ =limS (f,p) . m(p) 0 m(p) 0 ∗ Z[a,b] → Z[a,b] → Comparing with (2.11) we obtain the identity

b fdμ= f (x) dx. Z[a,b] Za

Example. Consider on [0, 1] the Dirichlet function

1,x Q, f (x)= ∈ 0,x/Q. ½ ∈ This function is not Riemann integrable because any upper Darboux sum is 1 and the lower Darboux sum is 0. But the function f is non-negative and simple since it can be represented in the form f =1A where A = Q [0, 1] is a measurable set. Therefore, ∩ the Lebesgue integral [0,1] fdμis defined. Moreover, since A is a countable set, we have μ (A)=0and, hence, fdμ=0. R[0,1] R 2.3.3 Integrable functions To define the integral of a signed function f on M, let us introduce the notation

f (x) , if f (x) 0 0, if f (x) 0 f+ (x)= ≥ and f = ≥ . 0, if f (x) < 0 − f (x) , if f (x) < 0 ½ ½ −

The function f+ is called the positive part of f and f is called the negative part of f. − Note that f+ and f are non-negative functions, −

f = f+ f and f = f+ + f . − − | | −

52 It follows that f + f f f f+ = | | and f = | | − . 2 − 2

Also, if f is measurable then both f+ and f are measurable. − Definition. A measurable function f is called (Lebesgue) integrable if

f+dμ < and f dμ < . ∞ − ∞ ZM ZM For any integrable function, define its Lebesgue integral by

fdμ := f+dμ f dμ. − − ZM ZM ZM Note that the integral fdμ takesvaluesin( , + ). M −∞ ∞ In particular, if f 0 then f+ = f, f =0and f is integrable if and only if ≥R − fdμ < . ∞ ZM Lemma 2.9 (a) If f is a measurable function then the following conditions are equivalent:

1. f is integrable,

2. f+ and f are integrable, − 3. f is integrable. | | (b) If f is integrable then fdμ f dμ. ≤ | | ¯ZM ¯ ZM ¯ ¯ ¯ ¯ Proof. (a) The equivalence 1.¯ 2. holds¯ by definition. Since f = f+ +f , it follows ⇔ | | − that M f dμ < if and only if M f+dμ < and M f dμ < ,thatis,2. 3. (b) We| | have ∞ ∞ − ∞ ⇔ R R R

fdμ = f+dμ f dμ f+dμ + f dμ = f dμ. − − ≤ − | | ¯ZM ¯ ¯ZM ZM ¯ ZM ZM ZM ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Example. Letusshowthatiff isacontinuousfunctiononaninterval[a, b] and μ is theLebesguemeasureon[a, b] then f is Lebesgue integrable. Indeed, f+ and f_ are non- negative and continuous so that they are Lebesgue integrable by the previous Example. Hence, f is also Lebesgue integrable. Moreover, we have

b b b fdμ= f+ dμ f dμ = f+ dμ f dμ = fdμ − − − − Z[a,b] Z[a,b] Z[a,b] Za Za Za so that the Riemann and Lebesgue integrals of f coincide.

53 Theorem 2.10 (a) (Linearity of integral) If f is an integrable then, for any real c, cf is also integrable and cfdμ = c fdμ. ZM ZM If f,g areintegrablethenf + g is also integrable and

(f + g) dμ = fdμ+ gdμ. (2.12) ZM ZM ZM (b) (Monotonicity of integral) If f,g are integrable and f g then fdμ gdμ. ≤ M ≤ M R R Proof. (a) If c =0then there is nothing to prove. Let c>0.Then(cf)+ = cf+ and (cf) = cf whence by Lemma 2.8 − −

cfdμ = cf+dμ cf dμ = c f+dμ c f dμ = c fdμ. − − − − ZM ZM ZM ZM ZM ZM

If c<0 then (cf)+ = c f and (cf) = c f+ whence | | − − | |

cfdμ = c f dμ c f+dμ = c fdμ = c fdμ. | | − − | | − | | ZM ZM ZM ZM ZM

Note that (f + g)+ is not necessarily equal to f++g+ sothattheprevioussimpleargument does not work here. Using the triangle inequality

f + g f + g , | | ≤ | | | | we obtain f + g dμ f dμ + g dμ < , | | ≤ | | | | ∞ ZM ZM ZM which implies that the function f + g is integrable. To prove (2.12), observe that

f+ + g+ f g = f + g =(f + g)+ (f + g) − − − − − − whence

f+ + g+ +(f + g) =(f + g)+ + f + g . − − − Since these all are non-negative measurable (and even integrable) functions, we obtain by Theorem 2.8 that

f+dμ + g+dμ + (f + g) dμ = (f + g)+ dμ + f dμ + g dμ. − − − ZM ZM ZM ZM ZM ZM It follows that

(f + g) dμ = (f + g)+ dμ (f + g) dμ − − ZM ZM ZM = f+dμ + g+dμ f dμ g dμ − − − − ZM ZM ZM ZM = fdμ+ gdμ. ZM ZM 54 (b) Indeed, using the identity g =(g f)+f and that g f 0,weobtainbypart (a) − − ≥ gdμ= (g f) dμ + fdμ fdμ. − ≥ ZM ZM ZM ZM

Example. Let us show that, for any integrable function f,

(inf f) μ (M) fdμ (sup f) μ (M) . ≤ ≤ ZM Indeed, consider a function g (x) sup f so that f g on M.Sinceg is a constant function, we have ≡ ≤ fdμ gdμ =(supf) μ (M) . ≤ ZM ZM Inthesamewayoneprovesthelowerbound. The following statement shows the connection of integration to the notion f = g a.e.

Theorem 2.11 (a) If f =0a.e.then f is integrable and M fdμ =0. (b) If f is integrable, f 0a.e. and fdμ =0then f =0a.e.. ≥ M R Proof. (a) Since the constant 0 functionR is measurable, the function f is also mea- surable. It is suffices to prove that f+ and f areintegrableand f+dμ = f dμ =0. − M M − Note that f+ =0a.e. and f =0a.e.. Hence, renaming f+ or f to f, we can assume − R − R from the beginning that f 0 and f =0a.e., and need to prove that M fdμ=0.We have by definition ≥ R fdμ = lim fndμ n ZM →∞ ZM where fn is a simple function defined by

∞ k f (x)= A , n n k,n k=0 X where k k +1 Ak,n = x M : f (x) < . ∈ n ≤ n ½ ¾ The set Ak,n has measure 0 if k>0 whence it follows that

∞ k f dμ = μ (A )=0. n n k,n M k=0 Z X

Hence, also M fdμ =0. (b) Since f =0a.e.,wehavebypart(a) that f dμ =0which implies that also R − M − f+ dμ =0. It suffices to prove that f+ =0a.e..Renamingf+ to f, we can assume M R from the beginning that f 0 on M, and need to prove that M fdμ=0implies f =0 aR.e.. Assume from the contrary≥ that f =0a.e. is not true, that is, the set f>0 has 1R { } positive measure. For any k N, set Ak = x M : f (x) > and observe that ∈ ∈ k © ∞ ª f>0 = Ak. { } k=1 S 55 It follows that one of the sets Ak must have a positive measure. Fix this k and consider a simple function 1 ,x A g (x)= k k 0, otherwise,∈ ½ 1 that is, g = 1A . It follows that g is measurable and 0 g f. Hence, k k ≤ ≤ 1 fdμ gdμ = μ (Ak) > 0, ≥ k ZM ZM which contradicts the hypothesis. Corollary. If g is integrable function and f is a function such that f = g a.e. then f is integrable and M fdμ = M gdμ. Proof. Consider the function f g that vanishes a.e.. By the previous theorem, f g R R − − is integrable and M (f g) dμ =0. Then the function f =(f g)+g is also integrable and − − R fdμ = (f g) dμ + gdμ = gdμ, − ZM ZM ZM ZM which was to be proved.

2.4 Integration over subsets If A M is a non-empty measurable subset of M and f is a measurable function on A then⊂ restricting measure μ to A, we obtain the notion of the Lebesgue integral of f over set A, which is denoted by fdμ. ZA If f is a measurable function on M then the integral of f over A is defined by

fdμ= f A dμ. | ZA ZA

Claim If f is either a non-negative measurable function on M or an integrable function on M then

fdμ = f1A dμ. (2.13) ZA ZM

Proof. Note that f1A A = f A so that we can rename f1A by f and, hence, assume in the sequel that f =0on| M A| . Then (2.13) amounts to \ fdμ= fdμ. (2.14) ZA ZM Assume first that f is a simple non-negative function and represent it in the form

f = bk1Bk , (2.15) n X where the reals bk are distinct and the sets Bk are disjoint. If for some k we have { } { } bk =0then this value of index k can be removed from the sequence bk without violating { } 56 (2.15). Therefore, we can assume that bk =0.Thenf =0on Bk, and it follows that 6 6 Bk A. Hence, considering the identity (2.15) on both sets A and M,weobtain ⊂

fdμ= bkμ (Bk)= fdμ. A k M Z X Z If f is an arbitrary non-negative measurable function then, approximating it by a sequence of simple function, we obtain the same result. Finally, if f is an integrable function then applying the previous claim to f+ and f , we obtain again (2.14). − Theidentity(2.13)isfrequentlyusedasthedefinition of A fdμ. It has advantage that it allows to define this integral also for A = . Indeed, in this case f1A =0and the ∅ R integral is 0. Hence, we take by definition that also A fdμ=0whenA = so that (2.13) remains true for empty A as well. ∅ R Theorem 2.12 Let μ be a finite complete measure on a σ-algebra on a set M.Fixa non-negative measurable f function on M and, for any non-emptyM set A ,define a functional ν (A) by ∈ M ν (A)= fdμ. (2.16) ZA Then ν is a measure on . M

Example. It follows that any non-negative continuous function f on an interval (0, 1) 1 defines a new measure on this interval using (2.16). For example, if f = x then, for any interval [a, b] (0, 1),wehave ⊂ b 1 b ν ([a, b]) = fdμ= dx =ln . x a Z[a,b] Za This measure is not finite since ν (0, 1) = . ∞ Proof. Note that ν (A) [0, + ]. Weneedtoprovethatν is σ-additive, that is, if ∈ ∞ An is a finite or countable sequence of disjoint measurable sets and A = An then { } n F fdμ= fdμ. A n An Z X Z

Assume first that the function f is simple, say, f = k bk1Bk for a sequence Bk of disjoint sets. Then { } P 1Af = bk1 A Bk { ∩ } k X whence

fdμ= bkμ (A Bk) ∩ A k Z X andinthesameway

fdμ= bkμ (An Bk) . ∩ An k Z X

57 It follows that

fdμ = bkμ (An Bk) ∩ n An n k X Z X X = bkμ (A Bk) ∩ k X = fdμ, ZA wherewehaveusedtheσ-additivity of μ. For an arbitrary non-negative measurable f, find a simple non-negative function g so that 0 f g ε,foragivenε>0.Thenwehave ≤ − ≤ gdμ fdμ= gdμ+ (f g) dμ gdμ+ εμ (A) . ≤ − ≤ ZA ZA ZA ZA ZA Inthesameway,

gdμ fdμ gdμ + εμ (An) . ≤ ≤ ZAn ZAn ZAn Adding up these inequalities and using the fact that by the previous argument

gdμ= gdμ, n An A X Z Z we obtain

gdμ fdμ A ≤ n An Z X Z gdμ + ε μ (An) ≤ n An n X Z X = gdμ + εμ (A) . ZA Hence, both fdμ and fdμ belong to the interval gdμ, gdμ+ εμ (A) , A n An A A which implies that R P R £R R ¤ fdμ fdμ εμ (A) . ¯ An − A ¯ ≤ ¯ n Z Z ¯ ¯X ¯ Letting ε 0, we obtain the¯ required identity. ¯ → ¯ ¯ Corollary. Let f be a non-negative measurable function or a (signed) integrable function on M. (a) (σ-additivity of integral) If An is a sequence of disjoint measurable sets and { } A = n An then F fdμ= fdμ. (2.17) A n An Z X Z (b) (Continuity of integral) If An is a monotone (increasing or decreasing) sequence { } of measurable sets and A =limAn then

fdμ= lim fdμ. (2.18) n ZA →∞ ZAn 58 Proof. If f is non-negative measurable then (a) is equivalent to Theorem 2.12, and (b) follows from (a) because the continuity of measure ν is equivalent to σ-additivity (see

Exercise 9). Consider now the case when f is integrable, that is, both M f+ dμ and f dμ are finite. Then, using the σ-additivity of the integral for f+ and f ,weobtain M − R− R fdμ = f+ dμ f dμ − − ZA ZA ZA = f+ dμ f dμ − n An − n An X Z X Z = f+ dμ f dμ − n An − An X µZ Z ¶ = fdμ, n An X Z which proves (2.17). Finally, since (2.18) holds for the non-negative functions f+ and f , − it follows that (2.18) holds also for f = f+ f . − − 2.5 The Lebesgue integral for σ-finite measure Let us now extend the notion of the Lebesgue integral from finite measures to σ-finite measures. Let μ be a σ-finite measure on a σ-algebra on a set M.Bydefinition, there is a M sequence Bk ∞ of measurable sets in M such that μ (Bk) < and Bk = M.As { }k=1 ∞ k before, denote by μ the restriction of measure μ to measurable subsets of Bk so that Bk F μBk is a finite measure on Bk. Definition. For any non-negative measurable function f on M,set

fdμ := fdμBk . (2.19) M k Bk Z X Z The function f is called integrable if fdμ< . M ∞ If f is a signed measurable function then f is called integrable if both f and f are R + integrable. If f is integrable then set −

fdμ := f+dμ f dμ. − − ZM ZM ZM

Claim. The definition of the integral in (2.19) is independent of the choice of the sequence Bk . { } Proof. Indeed, let Cj be another sequence of disjoint measurable subsets of M { } such that M = j Cj. Using Theorem 2.12 for finite measures μBk and μCj (that is, the σ-additivity of the integrals against these measures), as well as the fact that in the F

59 intersection of Bk Cj measures μ and μ coincide, we obtain ∩ Bk Cj

fdμCj = fdμCj j Cj j k Cj Bk X Z X X Z ∩

= fdμBk k j Cj Bk X X Z ∩

= fdμk. k Bk X Z

If A is a non-empty measurable subset of M then the restriction of measure μ on

A is also σ-additive, since A = k (A Bk) and μ (A Bk) < . Therefore, for any non-negative measurable function f on ∩A, its integral over∩ set A is∞ defined by F

fdμ= fdμA Bk . ∩ A k A Bk Z X Z ∩ If f is a non-negative measurable function on M then it follows that

fdμ= f1A dμ ZA ZM (which follows from the same identity for finite measures). Most of the properties of integrals considered above for finite measures, remain true for σ-finite measures: this includes linearity, monotonicity, and additivity properties. More precisely, Theorems 2.8, 2.10, 2.11, 2.12 and Lemma 2.9 are true also for σ-finite measures, and the proofs are straightforward. For example, let us prove Theorem 2.12 for σ-additive measure: if f is a non-negative measurable function on M then then functional

ν (A)= fdμ ZA defines a measure on the σ-algebra . In fact, we need only to prove that ν is σ-additive, M that is, if An is a finite or countable sequence of disjoint measurable subsets of M and { } A = n An then F fdμ= fdμ. A n An Z X Z Indeed, using the above sequence Bk ,weobtain { }

fdμ = fdμAn Bk ∩ n An n k An Bk X Z X X Z ∩

= fdμBk k n An Bk X X Z ∩

= fdμBk k A Bk X Z ∩ = fdμ, ZA 60 wherewehaveusedthattheintegraloff against measure μBk is σ-additive (which is true by Theorem 2.12 for finite measures). Example. Any non-negative continuous function f (x) on R gives rise to a σ-finite measure ν on R defined by ν (A)= fdμ ZA where μ = λ1 is the Lebesgue measure and A is any Lebesgue measurable subset of R. Indeed, the fact that ν is a measure follows from the version of Theorem 2.12 proved above, and the σ-finite is obvious because ν (I) < for any bounded interval I.IfF is ∞ the primitive of f,thatis,F 0 = f,then,foranyintervalI with endpoints a

b ν (I)= fdμ= f (x) dx = F (b) F (a) . − Z[a,b] Za Alternatively, measure ν can also be constructed as follows: first define the functional ν of intervals by ν (I)=F (b) F (a) , − provethatitisσ-additive, and then extend ν to measurable sets by the Carathéodory extension theorem. For example, taking 1 1 f (x)= π 1+x2 we obtain a finite measure on R; moreover, in this case

+ ∞ dx + ∞ ν (R)= 2 =[arctanx] =1. 1+x −∞ Z−∞ Hence, ν is a probability measure on R.

2.6 Convergence theorems

Let μ be a complete measure on a σ-algebra on a set M. Considering a sequence fk of integrable (or non-negative measurable) functionsM on M,wewillbeconcernedwith{ } the following question: assuming that fk converges to a function f in some sense, say, pointwise or almost everywhere, when{ one} can claim that

fk dμ fdμ ? → ZM ZM Thefollowingexampleshowsthatingeneralthisisnotthecase.

Example. Consider on an interval (0, 1) a sequence of functions fk = k1Ak where Ak = 1 2 , . Clearly, fk (x) 0 as k for any point x (0, 1). On the other hand, for k k → →∞ ∈ μ = λ1,wehave £ ¤ fk dμ = kμ (Ak)=1 0. 6→ Z(0,1) For positive results, we start with a simple observation.

61 Lemma 2.13 Let μ be a finite complete measure. If fk is a sequence of integrable (or { } non-negative measurable) functions and fk ⇒ f on M then

fk dμ fdμ. → ZM ZM Proof. Indeed, we have

fk dμ = fdμ+ (fk f) dμ − ZM ZM ZM fdμ+ fk f dμ ≤ | − | ZM ZM fdμ+sup fk f μ (M) . ≤ | − | ZM Inthesameway,

fk dμ fdμ sup fk f μ (M) . ≥ − | − | ZM ZM Since sup fk f μ (M) 0 as k ,weconcludethat | − | → →∞

fk dμ fdμ. → ZM ZM

Next, we prove the major results about the integrals of convergent sequences.

Lemma 2.14 (Fatou’s lemma) Let μ be a σ-finite complete measure. Let fk be a sequence of non-negative measurable functions on M such that { }

fk f a.e. → Assume that, for some constant C and all k 1, ≥

fk dμ C. ≤ ZM Then also fdμ C. ≤ ZM Proof. First we assume that measure μ is finite. By Theorem 2.5 (Egorov’s theorem), for any ε>0 there is a set Mε M such that μ (M Mε) ε and fk f on Mε.Set ⊂ \ ≤ ⇒

An = M1 M1/2 M1/3... M1/n. ∪ ∪ ∪ 1 Then μ (M An) and fn f on An (because fn f on any M1/k). By construction, \ ≤ n ⇒ ⇒ the sequence An is increasing. Set { }

∞ A = lim An = An. n →∞ n=1 S 62 By the continuity of integral (Corollary to Theorem 2.12), we have

fdμ=lim fdμ. n ZA →∞ ZAn On the other hand, since fk ⇒ f on An, we have by Lemma 2.13

fdμ= lim fk dμ. k ZAn →∞ ZAn By hypothesis,

fk dμ fk dμ C. ≤ ≤ ZAn ZM Passing the limits as k and as n ,weobtain →∞ →∞ fdμ C. ≤ ZA Let us show that fdμ=0, (2.20) c ZA which will finish the proof. Indeed,

c ∞ ∞ c A = M An = An. \ n=1 n=1 Then, for any index n,wehave S T μ (Ac) μ (Ac ) 1/n, ≤ n ≤ whence μ (Ac)=0. by Theorem 2.11, this implies (2.20) because f =0a.e. on Ac. Let now μ be a σ-finite measure. Then there is a sequence of measurable sets Bk { } such that μ (Bk) < and ∞ Bk = M. Setting ∞ k=1 S An = B1 B2 ... Bn ∪ ∪ ∪ we obtain a similar sequence An but with the additional property that this sequence is increasing. Note that, for any{ non-negative} measurable function f on M,

fdμ=lim fdμ. n ZM →∞ ZAn The hypothesis

fk dμ C ≤ ZM implies that

fk dμ C, ≤ ZAn for all k and n.Sincemeasureμ on An is finite, applying the firstpartoftheproof,we conclude that fdμ C. ≤ ZAn Letting n , we finish the proof. →∞ 63 Theorem 2.15 (Monotone convergence theorem) Let μ be a σ-finite compete measure. Let fk be an increasing sequence of non-negative measurable functions and assume that { } the limit function f (x) = limk fk (x) is finite a.e..Then →∞

fdμ= lim fk dμ. k ZM →∞ ZM Remark. The integral of f was defined for finite functions f. However, if f is finite a.e., that is, away from some set N of measure 0 then still M fdμmakes sense as follows. Let us change the function f on the set N by assigning on N finite values. Then the integral R M fdμmakes sense and is independent of the values of f on N (Theorem 2.11). Proof. Since f fk,weobtain R ≥

fdμ fk dμ ≥ ZM ZM whence

fdμ lim fk dμ. ≥ k ZM →∞ ZM To prove the opposite inequality, set

C = lim fk dμ. k →∞ ZM

Since fk is an increasing sequence, we have for any k { }

fk dμ C ≤ ZM whencebyLemma2.14(Fatou’slemma)

fdμ C, ≤ ZM which finishes the proof.

Theorem 2.16 (The second monotone convergence theorem) Let μ be a σ-finite compete measure. Let fk be an increasing sequence of non-negative measurable functions. If there is a constant {C such} that, for all k,

fk dμ C (2.21) ≤ ZM then the sequence fn (x) converges a.e. on M, and for the limit function f (x)= { } limk fk (x) we have →∞ fdμ=lim fk dμ. (2.22) k ZM →∞ ZM

64 Proof. Since the sequence fn (x) is increasing, it has the limit f (x) for any x M. We need to prove that the limit{ f (x)}is finite a.e.,thatis,μ ( f = )=0(then (2.22)∈ follows from Theorem 2.15). It follows from (2.21) that, for any{ t>∞0,} 1 C μ fk >t fk dμ . (2.23) { } ≤ t ≤ t ZM Indeed, since fk t1 f >t , ≥ { k } it follows that

fk dμ t1 f >t dμ = tμ fk >t , ≥ { k } { } ZM ZM which proves (2.23). Next, since f (x)=limk fk (x), it follows that →∞

f (x) >t fk (x) >tfor some k { } ⊂ { } = fk (x) >t k { } =S lim fk (x) >t , k →∞ { } whereweusethefactthatfk monotone increases whence the sets fk >t also monotone increases with k. Hence, { } μ (f>t) = lim μ fk >t k →∞ { } whence by (2.23) C μ (f>t) . ≤ t Letting t ,weobtainthatμ (f = )=0which finishes the proof. →∞ ∞ Theorem 2.17 (Dominated convergence theorem) Let μ be a complete σ-finite measure. Let fk be a sequence of measurable functions such that fk f a.e. Assume that there { } → is a non-negative integrable function g such that fk g a.e. for all k.Thenfk and f areintegrableand | | ≤

fdμ=lim fk dμ. (2.24) k ZM →∞ ZM Proof. By removing from M a set of measure 0, we can assume that fk f pointwise → and fk g on M. The latter implies that | | ≤

fk dμ < , | | ∞ ZM that is, fk is integrable. whence the integrability of fk follows. Since also f g,the function f is integrable in the same way. We will prove that, in fact, | | ≤

fk f dμ 0 as k , | − | → →∞ ZM which will imply that

f kdμ fdμ = (fk f) dμ fk f dμ 0 − − ≤ | − | → ¯ZM ZM ¯ ¯ZM ¯ ZM ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ 65 ¯ which proves (2.24). Observe that 0 fk f 2g and fk f 0 pointwise. Hence, renaming fk f ≤ | − | ≤ | − | → | − | by fk and 2g by g, we reduce the problem to the following: given that

0 fk g and fk 0 pointwise, ≤ ≤ → prove that

fk dμ 0 as k . → →∞ ZM Consider the function

fn =sup fn,fn+1,... =sup fk = lim sup fk , { } k n { } m n k m { } ≥ →∞ ≤ ≤ e which is measurable as the limit of measurable functions. Clearly, we have 0 fn g ≤ ≤ and the sequence fn is decreasing. Let us show that fn 0 pointwise. Indeed, for a → e fixed point x, then conditiono fk (x) 0 means that for any ε>0 there is n such that for all e → e k n fk (x) <ε. ≥ ⇒ Taking sup for all k n,weobtainfn (x) ε. Since the sequence fn is decreasing, ≥ ≤ this implies that fn (x) 0,whichwasclaimed. n o → e e Therefore, the sequence g fn is increasing and g fn g pointwise. By Theorem e − − → 2.15, we conclude that n o e e g fn dμ gdμ M − → M Z ³ ´ Z whence it follows that e fn dμ 0. → ZM Since 0 fn fn, it follows that also efn dμ 0,whichfinishes the proof. ≤ ≤ M → Example. Both monotone convergenceR theorem and dominated convergence theorem says that, undere certain conditions, we have

lim fk dμ = lim fk dμ, k M M k →∞ Z Z ³ →∞ ´ that is, the integral and the pointwise limit are interchangeable. Let us show one appli- cation of this to differentiating the integrals depending on parameters. Let f (t, x) be a function from I M R where I is an open interval in R and M is as above a set with a σ-finite measure× μ;here→ t I and x M. Consider the function ∈ ∈ F (t)= f (t, x) dμ (x) ZM where dμ (x) means that the variable of integration is x while t is regarded as a constant. However, after the integration, we regard the result as a function of t. We claim that under certain conditions, the operation of differentiation of F in t and integration are also interchangeable.

66 Claim. Assume that f (t, x) is continuously differentiable in t I for any x M,and ∂f ∈ ∈ that the f 0 = ∂t satisfies the estimate

f 0 (t, x) g (x) for all t I, x M, | | ≤ ∈ ∈ where g is an integrable function on M.Then

F 0 (t)= f 0 (t, x) dμ. (2.25) ZM Proof. Indeed, for any fixed t I,wehave ∈ F (t + h) F (t) F 0 (t) = lim − h 0 h → f (t + h, x) f (t, x) = lim − dμ h 0 h → ZM = lim f 0 (c, x) dμ, h 0 → ZM where c = c (h, x) is some number in the interval (t, t + h), which exists by the Lagrange value theorem. Since c (h, x) t as h 0 for any x M, it follows that → → ∈

f 0 (c, x) f 0 (t, x) as h 0. → →

Since f 0 (c, x) g (x) and g is integrable, we obtain by the dominated convergence theorem| that | ≤

lim f 0 (c, x) dμ = f 0 (t, x) h 0 → ZM whence (2.25) follows. tx Let us apply this Claim to the following particular function f (t, x)=e− where t (0, + ) and x [0, + ). Then ∈ ∞ ∈ ∞ 1 F (t)= ∞ e txdx = . − t Z0 We claim that, for any t>0,

∞ ∂ tx ∞ tx F 0 (t)= e− dx = e− xdx ∂t − Z0 Z0 ¡ ¢ ∂ tx Indeed, to apply the Claim, we need to check that ∂t (e− ) is uniformly bounded by an integrable function g (x).Wehave ¯ ¯ ¯ ¯ ∂ e tx = xe tx. ∂t − − ¯ ¯ ¯ ¡ ¢¯ ¯ ¯ For all t>0, this function is bounded¯ only by¯g (x)=x, which is not integrable. However, it suffices to consider t within some interval (ε, + ) for ε>0 which gives that ∞

∂ tx εx e− xe− =: g (x) , ∂t ≤ ¯ ¯ ¯ ¡ ¢¯ ¯ ¯ ¯ ¯ 67 and this function g is integrable. Hence, we obtain

∞ tx 1 0 1 e− xdx = = − t −t2 Z0 µ ¶ that is, 1 ∞ e txxdx = . − t2 Z0 Differentiating further in t we obtain also the identity

∞ tx n 1 (n 1)! e− x − dx = − , (2.26) tn Z0 for any n N. Of course, this can also be obtained using the integration by parts. Recall∈ that the gamma function Γ (α) for any α>0 is defined by

∞ x α 1 Γ (α)= e− x − dx. Z0 Setting in (2.26) t =1,weobtainthat,foranyα N, ∈ Γ (α)=(α 1)!. − For non-integer α, Γ (α) can be considered as the generalization of the factorial.

2.7 Lebesgue function spaces Lp

Recall that if V is a linear space over R then a function N : V [0, + ) is called a semi-norm if it satisfies the following two properties: → ∞

1. (the triangle inequality) N (x + y) N (x)+N (y) for all x, y V ; ≤ ∈ 2. (the scaling property) N (αx)= α N (x) for all x V and α R. | | ∈ ∈ It follows from the second property that N (0) = 0 (where “0” denote both the zero of V and the zero of R). If in addition N (x) > 0 for all x =0then N is called a norm. If a norm is chosen in the space V then it is normally denoted6 by x rather than N (x). k k For example, if V = Rn then, for any p 1,define the p-norm ≥ n 1/p p x p = xi . k k à i=1 | | ! X Then it is known that the p-norm is indeed a norm. This definition extends to p = as follows: ∞ x =max xi , 1 i n k k∞ ≤ ≤ | | and the -norm is also a norm. ∞ n Consider in R the function N (x)= x1 , which is obviously a semi-norm. However, if n 2 then N (x) is not a norm because| N| (x)=0for x =(0, 1,...). ≥

68 The purpose of this section is to introduce similar norms in the spaces of Lebesgue integrable functions. Consider first the simplest example when M is a finite set of n elements, say, M = 1, ...., n , { } and measure μ is defined on 2M as the , that is, μ (A) is just the cardi- nality of A M.Anyfunctionf on M is characterized by n real numbers f (1) ,...,f(n), ⊂ which identifies the space F of all functions on M with Rn.Thep-norm on F is then given by n 1/p 1/p p p f p = f (i) = f dμ ,p< , k k à i=1 | | ! M | | ∞ X µZ ¶ and f =sup f . k k∞ M | | This will motivate similar constructions for general sets M with measures.

2.7.1 The p-norm Let μ be complete σ-finite measure on a σ-algebra on a set M.Fixsomep [1, + ). M ∈ ∞ For any measurable function f : M R,define its p-norm (with respect to measure μ) by → 1/p f := f p dμ . k kp | | µZM ¶ Note that f p is a non-negative measurable function so that the integral always exists, finite or infi| nite.| For example,

1/2 f = f dμ and f = f 2 dμ . k k1 | | k k2 ZM µZM ¶ In this generality, the p-norm is not necessarily a norm, because the norm must always be finite. Later on, we are going to restrict the domain of the p-norm to those functions for which f < but before that we consider the general properties of the p-norm. k kp ∞ The scaling property of p-norm is obvious: if α R then ∈ 1/p 1/p αf = α p f p dμ = α f p dμ = α f . k kp | | | | | | | | | |k kp µZM ¶ µZM ¶ Then rest of this subsection is devoted to the proof of the triangle inequality.

Lemma 2.18 (The Hölder inequality) Let p, q > 1 be such that 1 1 + =1. (2.27) p q Then, for all measurable functions f,g on M,

fg dμ f g (2.28) | | ≤ k kp k kp ZM (the undefined product 0 is understood as 0). · ∞ 69 Remark. Numbers p, q satisfying (2.27) are called the Hölder conjugate, and the couple (p, q) is called a Hölder couple. Proof. Renaming f to f and g to g, we can assume that f,g are non-negative. If | | | | f p =0then by Theorem 2.11 we have f =0a.e.and (2.28) is trivially satisfied. Hence, kassumek in the sequel that f and g > 0.If f = then (2.28) is again trivial. k kp k kq k kp ∞ Hence, we can assume that both f p and g q are in (0, + ). Next, observe that inequalityk (2.28)k is scalingk k invariant:∞ if f is replaced by αf when α R, then the of (2.28) does not change (indeed, when multiplying f by α,the ∈ 1 both sides of (2.28) are multiplied by α ). Hence, taking α = f and renaming αf to | | k kp f, we can assume that f p =1. In the same way, assume that g q =1. Next, we use the Youngk k inequality: k k ap bp ab + , ≤ p q which is true for all non-negative reals a, b and all Hölder couples p, q. Applying to f,g and integrating, we obtain f p g q fgdμ | | dμ + | | dμ M ≤ M p M q Z Z1 1 Z = + p q =1 = f g . k kp k kq

Theorem 2.19 (The Minkowski inequality) For all measurable functions f,g and for all p 1, ≥ f + g f + g . (2.29) k kp ≤ k kp k kp Proof. If p =1then (2.29) is trivially satisfied because

f + g = f + g dμ f dμ + g dμ = f + g . k k1 | | ≤ | | | | k k1 k k1 ZM ZM ZM Assume in the sequel that p>1.Since

f + g = f + g f + g , k kp k| |kp ≤ k| | | |kp it suffices to prove (2.29) for f and g instead of f and g, respectively. Hence, renaming f to f and g to g, we can| assume| | that| f and g are non-negative. | | | | If f p or g p is infinity then (2.29) is trivially satisfied. Hence, we can assume in the sequelk k thatk bothk f and g are finite. Using the inequality k kp k kp (a + b)p 2p (ap + bp) , ≤ we obtain that

(f + g)p dμ 2p f p dμ + gp dμ < . ≤ ∞ ZM µZM ZM ¶ 70 Hence, f + g p < . Also, we can assume that f + g p > 0 because otherwise (2.29) is triviallyk satiskfied.∞ k k Finally, we prove (2.29) as follows. Let q>1 be the Hölder conjugate for p,thatis, p q = p 1 .Wehave −

p p p 1 p 1 f + g = (f + g) dμ = f (f + g) − dμ + g (f + g) − dμ. (2.30) k kp ZM ZM ZM Using the Hölder inequality, we obtain

1/q p 1 p 1 (p 1)q p/q f (f + g) − dμ f (f + g) − = f (f + g) − dμ = f f + g . ≤ k kp q k kp k kp k kp ZM µZM ¶ ° ° Inthesameway,wehave ° °

p 1 p/q g (f + g) − dμ g f + g . ≤ k kp k kp ZM Adding up the two inequalities and using (2.30), we obtain

f + g p f + g f + g p/q . k kp ≤ k kp k kq k kp ³ ´ Since 0 < f + g < , dividing by f + g p/q and noticing that p p =1,weobtain k kp ∞ k kp − q (2.29).

2.7.2 Spaces Lp It follows from Theorem 2.19 that the p-norm is a semi-norm. In general, the p-norm is not a norm. Indeed, by Theorem 2.11

f p dμ =0 f =0a.e. | | ⇔ ZM Hence, there may be non-zero functions with zero p-norm. However, this can be corrected if we identify all functions that differ at a set of measure 0 so that any function f such that f =0a.e. will be identified with 0.Recallthattherelationf = g a.e. is an equivalence relation. Hence, as any equivalence relation, it induces equivalence classes. For any measurable function f,denote(temporarily)by[f] the class of all functions on M that are equal to f a.e.,thatis, [f]= g : f = g a.e. . { } Definethelinearoperationsoverclassesasfollows:

[f]+[g]=[f + g] α [f]=[αf] where f,g are measurable functions and α R.Also,define the p-norm on the classes by ∈ [f] = f . k kp k kp

71 Of course, this definition does not depend on the choice of a particular representative f of the class [f]. In other words, if f 0 = f a.e. then f 0 p = f p and, hence, [f] p is well-defined. The same remark applies to the linear operationsk k k onk the classes. k k Definition. (The Lebesgue function spaces) For any real p 1,denotebyLp = Lp (M,μ) the set of all equivalence classes [f] of measurable functions≥ such that f < . k kp ∞ Claim. The set Lp is a linear space with respect to the linear operations on the classes, and the p-norm is a norm in Lp. p Proof. If [f] , [g] L then f p and g p are finite, whence also [f + g] p = ∈ k pk k k p k k f + g p < so that [f]+[g] L . It is trivial that also α [f] L for any α R. Thefactthatthek k ∞ p-norm is a semi-norm∈ was already proved. We are∈ left to observe∈ that [f] =0implies that f =0a.e. and, hence, [f]=[0]. k kp Convention. It is customary to call the elements of Lp functions (although they are not) and to write f Lp instead of [f] Lp, in order to simplify the notation and the terminology. ∈ ∈ Recall that a normed space (V, ) is called complete (or Banach) if any Cauchy k·k sequence in V converges. That is, if xn is a sequence of vectors from V and xn xm { } k − k → 0 as n, m then there is x V such that xn x as n (that is, xn x 0). The second→∞ major theorem∈ of this course (after→ Carathéodory’s→∞ extensionk − theorem)k → is the following.

Theorem 2.20 For any p 1,thespaceLp is complete. ≥ Let us first prove the following criterion of convergence of series.

Lemma 2.21 Let fn be a sequence of measurable functions on M.If { } ∞ fn 1 < n=1 k k ∞ X then the series n∞=1 fn (x) converges a.e. on M and

P ∞ ∞ fn (x) dμ = fn (x) dμ. (2.31) M n=1 n=1 M Z X X Z Proof. We will prove that the series

∞ fn (x) n=1 | | X converges a.e. which then implies that the series n∞=1 fn (x) converges absolutely a.e.. Consider the partial sums k P Fk (x)= fn (x) . n=1 | | X Clearly, Fk is an increasing sequence of non-negative measurable functions, and, for all k, { } ∞ ∞ Fk dμ fn dμ = fn 1 =: C. M ≤ n=1 M | | n=1 k k Z X Z X 72 By Theorem 2.16, we conclude that the sequence Fk (x) converges a.e..LetF (x)= { } limk Fk (x). By Theorem 2.15, we have →∞

Fdμ= lim Fkdμ C. k ≤ ZM →∞ ZM Hence, the function F is integrable. Using that, we prove (2.31). Indeed, consider the partial sum k

Gk (x)= fn (x) n=1 X and observe that k

Gk (x) fn (x) = Fk (x) F (x) . | | ≤ n=1 | | ≤ X Since ∞ G (x):= fn (x)= limGk (x) , k n=1 →∞ X we conclude by the dominated convergence theorem that

Gdμ=lim Gk dμ, k ZM →∞ ZM which is exactly (2.31). Proof of Theorem 2.20. Consider first the case p =1.Let fn be a Cauchy sequence in L1,thatis, { } fn fm 0 as n, m (2.32) k − k1 → →∞ (more precisely, [fn] is a Cauchy sequence, but this indeed amounts to (2.32)). We need { } 1 to prove that there is a function f L such that fn f 1 0 as n . It is a general property of Cauchy∈ sequences thatk a− Cauchyk → sequence→∞ converges if and only if it has a convergent subsequence. Hence, it suffices to find a convergent subsequence fn . It follows from (2.32) that there is a subsequence fn ∞ such that { k } { k }k=1 k fn fn 2− . k+1 − k 1 ≤ ° ° Indeed, define nk to be a number° such that °

k fn fm 2− for all n, m nk. k − k1 ≤ ≥

Letusshowthatthesubsequence fn converges. For simplicity of notation, rename fn { k } k into fk, so that we can assume

k fk+1 fk < 2− . k − k1 In particular, ∞ fk+1 fk < , k − k1 ∞ k=1 X

73 which implies by Lemma 2.21 that the series

∞ (fk+1 fk) − k=1 X converges a.e.. Since the partial sums of this series are fn f1, it follows that the sequence − fn converges a.e.. { } Set f (x) = limn fn (x) and prove that fn f 1 0 as n . Use again the condition (2.32): for→∞ any ε>0 there exists N ksuch− thatk for→ all n, m→∞N ≥

fn fm <ε. k − k1

Since fn fm fn f a.e. as m , we conclude by Fatou’s lemma that, for all n N|, − | → | − | →∞ ≥ fn f ε, k − k1 ≤ which finishes the proof. p Consider now the case p>1. Given a Cauchy sequence fn in L ,weneedtoprove p p { } p p that it converges. If f L then M f dμ < and, hence, M f+ dμ and M f dμ are ∈ | | p∞ − also finite. Hence, both f+ and f belong to L . Let us show that the sequences (fn)+ −Rp R R and (fn) are also Cauchy in L . Indeed, set − © ª © ª t, t > 0, ϕ (t)=t = + 0,t 0, ½ ≤ so that f+ = ϕ (f).

y 5

3.75

2.5

1.25

0 -5 -2.5 0 2.5 5

x

Function ϕ (t)=t+

It is obvious that ϕ (a) ϕ (b) a b . | − | ≤ | − | Therefore, (fn) (fm) = ϕ (fn) ϕ (fm) fn fm , + − + | − | ≤ | − | whence ¯ ¯ ¯ ¯ (fn) (fm) fn fm 0 + − + p ≤ k − kp → ° ° ° ° 74 as n, m . This proves that the sequence (fn) isCauchy,andthesameargument →∞ + applies to (fn) . − © ª p It suffices to prove that each sequence (fn)+ and (fn) converges in L . Indeed, © ª p − if we know already that (fn)+ g and (fn) h in L then → © − → ª © ª

fn =(fn)+ (fn) g h. − − → −

Therefore, renaming each of the sequence (fn)+ and (fn) back to fn ,wecan − { } assume in the sequel that fn 0. p ≥ p © 1 ª © ª p The fact that fn L implies that fn L . Let us show that the sequence fn is Cauchy in L1. For that∈ we use the following∈ elementary inequalities. { } Claim. For al l a, b 0 ≥ p p p p 1 p 1 a b a b p a b a − + b − . (2.33) | − | ≤ | − | ≤ | − | ¡ ¢ If a = b then (2.33) is trivial. Assume now that a>b(the case a

(a b)p + bp ap, − ≤ which will follow from the following inequality

xp + yp (x + y)p , (2.34) ≤ that holds for all x, y 0. Indeed, without loss of generality, we can assume that x y and x>0. Then, using≥ the Bernoulli inequality, we obtain ≥ y p y y p (x + y)p = xp 1+ xp 1+p xp 1+ = xp + yp. x ≥ x ≥ x ³ ´ ³ ´ ³ ³ ´ ´ p p Coming back to a Cauchy sequence fn in L such that fn 0,letusshowthat fn is Cauchy in L1.Wehaveby(2.33) { } ≥ { }

p p p 1 p 1 f f p fn fm f − + f − . | n − m| ≤ | − | n m Recall the Hölder inequality ¡ ¢

1/p 1/q fg dμ f p dμ g q dμ , | | ≤ | | | | ZM µZM ¶ µZM ¶ 1 1 where q is the Hölder conjugate to p,thatis, p + q =1. Using this inequality, we obtain

1/p 1/q p p p p 1 p 1 q f f dμ p fn fm dμ f − + f − dμ . (2.35) | n − m| ≤ | − | n m ZM µZM ¶ µZM ¶ ¡ ¢ 75 To estimate the last integral, use the inequality (x + y)q 2q (xq + yq) , ≤ which yields p 1 p 1 q q (p 1)q (p 1)q q p p f − + f − 2 f − + f − =2 (f + f ) , (2.36) n m ≤ n m n m wherewehaveusedtheidentity¡ ¢ (p 1)¡ q = p. Note that¢ the numerical sequence fn − k kp is bounded. Indeed, by the Minkowski inequality, n o

fn fm fn fm 0 k kp − k kp ≤ k − kp → ¯ ¯ ¯ ¯ ∞ as n, m . Hence, the sequence fn p is a numerical Cauchy sequence and, →∞ ¯ k ¯k n=1 hence, is bounded. Integrating (2.36) wen obtaino that

p 1 p 1 q q p p f − + f − dμ 2 f dμ + f dμ const, n m ≤ n m ≤ ZM µZM ZM ¶ where the constant¡ const is independent¢ of n, m. Hence, by (2.35) 1/p p p p f f dμ const fn fm dμ =const fn fm 0 as n, m , | n − m| ≤ | − | k − kp → →∞ ZM µZM ¶ p 1 whence it follows that the sequence fn is Cauchy in L .Bythefirst part of the proof, we conclude that this sequence converges{ } to a function g L1.ByExercise52,wehave 1/p p ∈ g 0.Setf = g and show that fn f in L . Indeed, by (2.33) we have ≥ → p p p p fn f dμ f f dμ = f g dμ 0 | − | ≤ | n − | | n − | → ZM ZM ZM which was to be proved. p Finally, there is also the space L∞ (that is, L with p = ), which is defined in Exercise 56 and which is also complete. ∞

2.8 Product measures and Fubini’s theorem 2.8.1 Product measure

Let M1,M2 be two non-empty sets and S1, S2 be families of subsets of M1 and M2, respectively. Consider on the set M = M1 M2 the family of subsets × S = S1 S2 := A B : A S1,B S2 . × { × ∈ ∈ } Recall that, by Exercise 6, if S1 and S2 are semi-rings then S1 S2 is also a semi-ring. × Let μ1 and μ2 be two two functionals on S1 and S2, respectively. As before, define their product μ = μ μ as a functional on S by 1 × 2 μ (A B)=μ (A) μ (B) for all A S1, B S2. × 1 2 ∈ ∈ By Exercise 20, if μ1 and μ2 are finitely additive measures on semi-rings S1 and S2 then μ is finitely additive on S. We are going to prove (independently of Exercise 20), that if μ1 and μ2 are σ-additive then so is μ. A particular case of this result when M1 = M2 = R and μ1 = μ2 is the Lebesgue measure was proved in Lemma 1.11 using specific properties of R. Here we consider the general case.

76 Theorem 2.22 Assume that μ1 and μ2 are σ-finite measures on semi-rings S1 and S2 on the sets M1 and M2 respectively. Then μ = μ μ is a σ-finite measure on S = S1 S2. 1 × 2 × By Carathéodory’s extension theorem, μ can be then uniquely extended to the σ- algebra of measurable sets on M. The extended measure is also denoted by μ μ and 1 × 2 is called the product measure of μ1 and μ2. Observe that the product measure is σ-finite (and complete). Hence, one can define by induction the product μ ... μ of a finite 1 × × n sequence of σ-finite measures μ1, ..., μn. Notethatthisproductsatisfies the associative law (Exercise 57).

Proof. Let C, Cn be sets from S such that C = n Cn where n varies in a finite or countable set. We need to prove that F

μ (C)= μ (Cn) . (2.37) n X Let C = A B and Cn = An Bn where A, An S1 and B,Bn S2. Clearly, A = An × × ∈ ∈ n and B = Bn. n S Consider function fn on M1 defined by S μ2 (Bn) ,x An, fn (x)= ∈ 0,x/An ½ ∈ that is fn = μ2 (Bn)1An . Claim. For any x A, ∈ fn (x)=μ2 (B) . (2.38) n X We have

fn (x)= fn (x)= μ2 (Bn) . n n:x An n:x An X X∈ X∈ If x An and x Am and n = m then the sets Bn,Bm must be disjoint. Indeed, if ∈ ∈ 6 y Bn Bm then (x, y) Cn Cm, which contradicts the hypothesis that Cn,Cm are ∈ ∩ ∈ ∩ disjoint. Hence, all sets Bn with the condition x An are disjoint. Furthermore, their ∈ union is B because the set x B is covered by the union of the sets Cn. Hence, { }×

μ2 (Bn)=μ2 Bn = μ2 (B) , n:x An n:x An µ ∈ ¶ X∈ F which finishes the proof of the Claim. Integrating the identity (2.38) over A,weobtain

fn dμ1 = μ2 (B) μ1 (A)=μ (C) . A Ã n ! Z X On the other hand, since fn 0, the partials sums of the series n fn form an increasing sequence. Applying the monotone≥ convergence theorem to the partial sums, we obtain that the integral is interchangeable with the infinite sum, whenceP

fn dμ1 = fn dμ1 = μ2 (Bn) μ1 (An)= μ (Cn) . A Ã n ! n A n n Z X X Z X X 77 Comparing the above two lines, we obtain (2.37).

Finally, let us show that measure μ is σ-finite. Indeed, since μ1 and μ2 are σ-finite, there are sequences An S1 and Bn S2 such that μ (An) < , μ (Bn) < and { } ⊂ { } ⊂ 1 ∞ 2 ∞ An = M1, Bn = M2.SetCnm = An Bm.Thenμ (Cnm)=μ (An) μ (Bm) < n n × 1 2 ∞ and n,m Cnm = M. Since the sequence Cnm is countable, we conclude that measure μ Sis σ-finite. S { } S 2.8.2 Cavalieri principle Before we state the main result, let us discuss integration of functions taking value . Let μ be an complete σ-finite measure on a set M and let f be a measurable function∞ on M with values in [0, ].Iftheset ∞ S := x M : f (x)= { ∈ ∞} has measure 0 then there is no difficulty in defining M fdμsince the function f can be modified on set S to have finite values. If μ (S) > 0 then we define the integral of f by R fdμ= . (2.39) ∞ ZM For example, let f be a simple function, that is, f = ak1A where ak [0, ] are k k ∈ ∞ distinct values of f. Assume that one of ak is equal to .ThenS = Ak with this k,and ∞P if μ (Ak) > 0 then

akμ (Ak)= , ∞ k X which matches the definition (2.39). Many properties of integration remain true if the function under the integral is allowed to take value . We need the following extension of the monotone convergence theorem. ∞ Extended monotone convergence theorem If fk ∞ is an increasing sequence of { }k=1 measurable functions on M taking values in [0, ] and fk f a.e. as k then ∞ → →∞

fdμ=lim fk dμ. (2.40) k ZM →∞ ZM Proof. Indeed, if f is finite a.e. then this was proved in Theorem 2.15. Assume that f = on a set of positive measure. If one of fk is also equal to on a set of positive ∞ ∞ measure then fk dμ = for large enough k and the both sides of (2.40) are equal to M ∞ . Consider the remaining case when all fk are finite a.e..Set ∞ R

C = lim fk dμ. k →∞ ZM We need to prove that C = . Assume from the contrary that C< . Then, for all k, ∞ ∞

fk dμ C, ≤ ZM and by the second monotone convergence theorem (Theorem 2.16) we conclude that the limit f (x) = limk fk (x) is finite a.e., which contradicts the assumption that μ f = > 0. →∞ { ∞} 78 Now, let μ and μ be σ-finite measures defined on σ-algebras 1 and 2 on sets 1 2 M M M1,M2.Letμ = μ1 μ2 be the product measure on M = M1 M2,whichisdefined on a σ-algebra . Consider× a set A and, for any x M consider× the x-section of A, that is, the setM ∈ M ∈ Ax = y M2 :(x, y) A . { ∈ ∈ } Define the following function on M1:

μ (A ) , if A , ϕ (x)= 2 x x 2 (2.41) A 0 otherwise∈ M ½

Theorem 2.23 (The Cavalieri principle) If A then, for almost all x M1,theset ∈ M ∈ Ax is a measurable subset of M2 (that is, Ax 2), the function ϕ (x) is measurable ∈ M A on M1,and

μ (A)= ϕA dμ1. (2.42) ZM1

Note that the function ϕA (x) takesvaluesin[0, ] so that the above discussion of the integration of such functions applies. ∞ Example. Theformula(2.42)canbeusedtoevaluatetheareasinR2 (and in Rn). For example, consider a measurable function function ϕ :(a, b) [0, + ) and let A be the subgraph of ϕ,thatis, → ∞

2 A = (x, y) R : a

ϕA (x)=λ1 (Ax)=λ1 [0,ϕ(x)] = ϕ (x) , provided x (a, b) and ϕ (x)=0otherwise. Hence, ∈ A

λ2 (A)= ϕdλ1. Z(a,b) If ϕ is Riemann integrable then we obtain

b λ2 (A)= ϕ (x) dx, Za which was also proved in Exercise 23. Proof of Theorem 2.23. Denote by the family of all sets A for which all the claims of Theorem 2.23 are satisfied. WeA need to prove that = ∈ .M A M Consider first the case when the measures μ1,μ2 are finite. The finiteness of measure μ2 implies, in particular, that the function ϕ (x) is finite. Let show that S := 1 2 A A ⊃ M ×M (note that S is a semi-ring). Indeed, any set A S has the form A = A1 A2 where ∈ × Ai i.Thenwehave ∈ M A2,x A1, Ax = ∈ 2, ,x/A1, ∈ M ½ ∅ ∈

79 μ2 (A2) ,x A1, ϕA (x)= ∈ 0,x/A1, ½ ∈ so that ϕA is a measurable function on M1,and

ϕA dμ1 = μ2 (A2) μ1 (A1)=μ (A) . ZM1 All the required properties are satisfied and, hence, A . ∈ A Let us show that is closed under monotone difference (that is, − = ). Indeed, if A, B and A BAthen A A ∈ A ⊃ (A B) = Ax Bx, − x − so that (A B) 2 for almost all x M1, − x ∈ M ∈

ϕA B (x)=μ2 (Ax) μ2 (Bx)=ϕA (x) ϕB (x) − − − so that the function ϕA B is measurable, and −

ϕA B dμ1 = ϕA dμ1 ϕB dμ1 = μ (A) μ (B)=μ (A B) . − − − − ZM1 ZM1 ZM1 Inthesameway, is closed under taking finite disjoint unions. A Let us show that is closed under monotone limits. Assume first that An is A { } ⊂ A an increasing sequence and let A = limn An = n An.Then →∞

Ax = (An)Sx n S so that Ax 2 for almost all x M1, ∈ M ∈

ϕA (x)=μ2 (Ax)= limμ2 ((An) ) = lim ϕA (x) . n x n n →∞ →∞

In particular, ϕA is measurable as the pointwise limit of measurable functions. Since the sequence ϕAn is monotone increasing, we obtain by the monotone convergence theorem © ª ϕA dμ1 = lim ϕA dμ1 = lim μ (An)=μ (A) . (2.43) n n n ZM1 →∞ ZM1 →∞ Hence, A . Notethatthisargumentdoesnotrequirethefiniteness of measures μ ∈ A 1 and μ2 because if these measures are only σ-finite and, hence, the functions ϕAk are not necessarily finite, the monotone convergence theorem can be replaced by the extended monotone convergence theorem. This remark will be used below.

Let now An be decreasing and let A =limn An = n An.Thenalltheabove argument remains{ } ⊂ true,A the only difference comes in justi→∞fication of (2.43). We cannot use the monotone convergence theorem, but one can pass to the limitT under the of the integral by the dominated convergence theorem because all functions ϕAn are bounded by the function ϕA1 , and the latter is integrable because it is bounded by μ2 (M2) and the measure μ1 (M1) is finite. Let Σ be the minimal σ-algebra containing S. From the above properties of ,it follows that Σ. Indeed, let R be the minimal algebra containing S,thatis,A R consists of finiteA ⊃ disjoint union of elements of S.Since contains S and is closed A A 80 under taking finite disjoint unions, we conclude that contains R. By Theorem 1.14, Σ = Rlim,thatis,Σ is the extension of R by monotoneA limits. Since is closed under monotone limits, we conclude that A Σ. A To proceed further, we need a refi⊃nement of Theorem 1.10. The latter says that if A then there is B Σ such that μ (A M B)=0. In fact, set B can be chosen to contain∈ MA, as is claimed in∈ the following stronger statement. Claim (Exercise 54) For any set A ,thereisasetB Σ such that B A and μ (B A)=0. ∈ M ∈ ⊃ Using− this Claim, let us show that contains sets of measure 0. Indeed, for any set A such that μ (A)=0,thereisB Σ Asuch that A B and μ (B)=0(the latter being equivalent to μ (B A)=0). It follows∈ that ⊂ −

ϕB dμ1 = μ (B)=0, ZM1 which implies that μ (Bx)=ϕ (x)=0for almost all x M1.SinceAx Bx, it follows 2 B ∈ ⊂ that μ (Ax)=0for almost all x M1. In particular, the set Ax is measurable for almost 2 ∈ all x M1.Furthermore,wehaveϕ (x)=0a.e. and, hence, ∈ A

ϕA dμ1 =0=μ (A) , ZM1 which implies that A . Finally, let us show∈ A that = . By the above Claim, for any set A ,thereis B Σ such that B A and Aμ (B MA)=0.ThenB by the previous argument,∈ M and the∈ set C = B A ⊃is also in as− a set of measure 0.Since∈ A A = B C and is closed under monotone− difference, weA conclude that A . − A ∈ A Consider now the general case of σ-finite measures μ1, μ2. Consider an increasing sequences Mik ∞ of measurable sets in Mi (i =1, 2) such that μ (Mik) < and { }k=1 i ∞ k Mik = Mi. Then measure μi is finite in Mik so that the first part of the proof applies to sets S Ak = A (M1k M2k) , ∩ × so that Ak .Notethat Ak is an increasing sequence and A = k=1 Ak.Sincethe family is∈ closedA under increasing{ } monotone limits, we obtain that A ,whichfinishes the proof.A S∈ A Example. LetusevaluatetheareaofthediscinR2

2 2 2 2 D = (x, y) R : x + y 0.Wehave © ª 2 2 Dx = y R : y < √r x ∈ | | − and n o 2 2 ϕ (x)=λ1 (Dx)=2√r x , D −

81 provided x

n λn (Bn (r)) = cnr with some constant cn (see Exercise 59).

n 1 n Example. Let B be a measurable subset of R − of a finite measure, p be a point in R with pn > 0,andC be a cone over B withthepoleatp defined by C = (1 t) p + ty : y B, 0

Cz = x C : xn = z = (1 t) p + ty : y B, (1 t) pn = z { ∈ } { − ∈ − } =(1 t) p + tB, − z where t =1 ; besides, if z/(0,pn) then Cz = . Hence, for any 0

pn n 1 1 z − n 1 λn 1 (B) pn λn (C)=λn 1 (B) 1 dz = λn 1 (B) pn s − ds = − . − − pn − n Z0 µ ¶ Z0

82 2.8.3 Fubini’s theorem

Theorem 2.24 (Fubini’s theorem) Let μ1 and μ2 be complete σ-finite measures on the sets M1, M2, respectively. Let μ = μ μ be the product measure on M = M1 M2 1 × 2 × and let f (x, y) be a measurable function on M with values in [0, ],wherex M1 and ∞ ∈ y M2.Then ∈ fdμ= f (x, y) dμ2 (y) dμ1 (x) . (2.44) ZM ZM1 µZM2 ¶ More precisely, the function y f (x, y) is measurable on M2 for almost all x M1,the function 7→ ∈ x f (x, y) dμ 7→ 2 ZM2 is measurable on M1, and its integral over M1 is equal to M fdμ. Furthermore, if f is a (signed) integrable function on M, then the function y f (x, y) R 7→ is integrable on M2 for almost all x M1, the function ∈ x f (x, y) dμ 7→ 2 ZM2 is integrable on M1, and the identity (2.44) holds.

The first part of Theorem 2.24 dealing with non-negative functions is also called Ton- neli’s theorem. The expression on the right hand side of (2.44) is referred to as a repeated (or iterated) integral. Switching x and y in (2.44) yields the following identity, which is true under the same conditions:

fdμ= f (x, y) dμ1 (x) dμ2 (y) . ZM ZM2 µZM1 ¶

By induction, this theorem extends to a finite product μ = μ1 μ2 ... μn of measures. × × × Proof. Consider the product

M = M R =M1 M2 R × × × and measure μ on M,whichisdef fined by μ = μ λ = μ μ λ, e f × 1 × 2 × where λ = λ is the Lebesgue measure on (note that the product of measure satisfies 1 e R the associative law — see Exercise 57). Consider the set A M, which is the subgraph of function f,thatis, ⊂ A = (x, y, z) M :0 z f (x, yf) ∈ ≤ ≤ n o (here x M1, y M2, z R). By Theorem 2.23, we have ∈ ∈ ∈ f

μ (A)= λ A(x,y) dμ. ZM ¡ ¢

83 Since A(x,y) =[0,f(x, y)] and λ A(x,y) = f (x, y), it follows that ¡ ¢ μ (A)= fdμ. (2.45) ZM On the other hand, we have M = M1 (M2 R) × × and fμ = μ (μ λ) . 1 × 2 × Theorem 2.23 now says that e

μ (A)= (μ λ)(Ax) dμ (x) 2 × 1 ZM1 where Ax = (y, z):0 f (x, y) z . { ≤ ≤ } Applying Theorem 2.23 to the measure μ λ,weobtain 2 ×

(μ2 λ)(Ax)= λ (Ax)y dμ2 = f (x, y) dμ2 (y) . × M2 M2 Z ³ ´ Z Hence,

μ (A)= f (x, y) dμ2 (y) dμ1 (x) . ZM1 µZM2 ¶ Comparing with (2.45), we obtain (2.44). The claims about the measurability of the intermediate functions follow from Theorem 2.23. Now consider the case when f is integrable. By definition, function f is integrable if f+ and f are integrable. By the first part, we have −

f+dμ = f+ (x, y) dμ2 (y) dμ1. ZM ZM1 µZM2 ¶ Since the left hand side here is finite, it follows that the function

F (x)= f+ (x, y) dμ2 (y) ZM2 is integrable and, hence, is finite a.e.. Consequently, the function f+ (x, y) is integrable in y for almost all x.Inthesameway,weobtain

f dμ = f (x, y) dμ2 (y) dμ1, − − ZM ZM1 µZM2 ¶ so that the function

G (x)= f (x, y) dμ2 (y) − ZM2 is integrable and, hence, is finite a.e.. Therefore, the function f (x, y) is integrable in y − for almost all x. We conclude that f = f+ f is integrable in y for almost all x,the function − − f (x, y) dμ (y)=F (x) G (x) 2 − ZM2 84 is integrable on M1,and

f (x, y) dμ (y) dμ (x)= Fdμ Gdμ = fdμ. 2 1 1 − 1 ZM1 µZM2 ¶ ZM1 ZM1 ZM

Example. Let Q be the unit square in R2,thatis,

2 Q = (x, y) R :0

85 3 Integration in Euclidean spaces and in probability spaces

3.1 Change of variables in Lebesgue integral Recall the following rule of change of variable in a Riemann integral:

b β f (x) dx = f (ϕ (y)) ϕ0 (y) dy Za Zα if ϕ is a continuously differentiable function on [α, β] and a = ϕ (α), b = ϕ (β). Let us rewrite this formula for . Assume that a

f (x) dλ1 (x)= f (ϕ (y)) ϕ0 (y) dλ1 (y) , (3.1) | | Z[a,b] Z[α,β] wherewehaveusedthatϕ0 0. If ϕ is a decreasing diffeomorphism≥ from [α, β] onto [a, b] then ϕ (α)=b, ϕ (β)=a, and we have instead

f (x) dλ1 (x)= f (ϕ (y)) ϕ0 (y) dλ1 (y) − Z[a,b] Z[α,β]

= f (ϕ (y)) ϕ0 (y) dλ1 (y) | | Z[α,β] wherewehaveusedthatϕ0 0. Hence, the formula (3.1)≤ holds in the both cases when ϕ is an increasing or decreasing diffeomorphism. The next theorem generalizes (3.1) for higher dimensions.

n Theorem 3.1 (Transformationsatz) Let μ = λn be the Lebesgue measure in R .LetU, V be open subsets of Rn and Φ : U V be a diffeomorphism. Then, for any non-negative → measurable function f : V R, the function f Φ : U R is measurable and → ◦ →

fdμ= (f Φ) det Φ0 dμ. (3.2) ◦ | | ZV ZU The same holds for any integrable function f : V R. → 1 Recall that Φ : U V is a diffeomorphism if the inverse mapping Φ− : V U → → exists and both Φ and Φ0 are continuously differentiable. Recall also that Φ0 is the total derivative of Φ, which in this setting coincides with the Jacobi matrix, that is, ∂Φ n Φ = J = i . 0 Φ ∂x µ j ¶i,j=1 Let us rewrite (3.2) in the form matching (3.1):

f (x) dμ (x)= f (Φ (y)) det Φ0 (y) dμ (y) . (3.3) | | ZV ZU 86 The formula (3.3) can be explained and memorized as follows. In order to evaluate

V f (x) dμ (x),onefinds a suitable x = Φ (y), which maps one-to-one y U to x V , and substitutes x via Φ (y), using the rule ∈ R ∈

dμ (x)= det Φ0 (y) dμ (y) . | | dx Using the notation dy instead of Φ0 (y),thisrulecanalsobewrittenintheform

dμ (x) dx = det , (3.4) dμ (y) dy ¯ ¯ ¯ ¯ ¯ ¯ which is easy to remember. Although presently¯ the identity¯ (3.4) has no rigorous meaning, it can be turned into a theorem using a proper definition of the quotient dμ (x) /dμ (y). Example. Let Φ : Rn Rn be a linear mapping that is, Φ (y)=Ay where A is a → non-singular n n matrix. Then Φ0 (y)=A and (3.2) becomes × f (x) dμ = det A f (Ay) dμ (y) . n | | n ZR ZR n In particular, applying this to f =1S where S is a measurable subset of R ,weobtain

1 μ (S)= det A μ A− S | | 1 or, renaming A− S to S, ¡ ¢ μ (AS)= det A μ (S) . (3.5) | | In particular, if det A =1then μ (AS)=μ (S) that is, the Lebesgue measure is preserved by such linear mappings.| | Since all orthogonal matrices have determinants 1, it follows ± that the Lebesgue measure is preserved by orthogonal transformations of Rn,inparticular, by rotations. Example. Let S be the unit cube, that is,

S = x1e1 + ... + xnen :0

AS = x1a1 + ... + xnan :0

This set is called a parallelepiped spanned by the vectors a1, ..., an (its particular case for n =2is a parallelogram). Denote it by Π (a1, ..., an). Note that the columns of the matrix A are exactly the column-vectors ai. Therefore, we obtain that from (3.5)

μ (Π (a1, ..., an)) = det A = det (a1,a2, ..., an) . | | | |

Example. Consider a tetrahedron spanned by a1, ..., an,thatis,

T (a1, ..., an)= x1a1 + ... + xnan : x1 + x2 + ... + xn < 1,xi > 0,i=1,...,n . { }

87 It is possible to prove that 1 μ (T (a1,...,an)) = det (a1,a2, ..., an) n! | | (see Exercise 58).

Example. (The polar coordinates) Let (r, θ) be the polar coordinates in R2. The change from the Cartesian coordinates (x, y) to the polar coordinates (r, θ) can be regarded as a mapping 2 Φ : U R → where U = (r, θ):r>0, π<θ<π and { − } Φ (r, θ)=(r cos θ, r sin θ) .

The of U is the open set

2 V = R (x, 0) : x 0 , \{ ≤ } and Φ is a diffeomorphism between U and V .Wehave

∂rΦ1 ∂θΦ1 cos θ r sin θ Φ0 = = − ∂rΦ2 ∂θΦ2 sin θrcos θ µ ¶ µ ¶ whence det Φ0 = r. Hence, we obtain the formula for computing the integrals in the polar coordinates:

f (x, y) dλ2 (x, y)= f (r cos θ, r sin θ) rλ2 (r, θ) . (3.6) ZV ZU If function f is given on R2 then

fdλ2 = fdλ2 2 ZR ZV because the difference R2 V has measure 0. Also, using Fubini’s theorem in U,wecan express the right hand side\ of (3.6) via repeated integrals and, hence, obtain

π ∞ f (x, y) dλ2 (x, y)= f (r cos θ, r sin θ) dθ rdr. (3.7) R2 0 π Z Z µZ− ¶ 2 2 2 For example, if f =1D where D = (x, y):x + y

Example. Let us evaluate the integral

∞ x2 I = e− dx. Z−∞ 88 Using Fubini’s theorem, we obtain

2 ∞ x2 ∞ y2 I = e− dx e− dy µZ−∞ ¶µZ−∞ ¶ ∞ ∞ y2 x2 = e− dy e− dx Z−∞ µZ−∞ ¶ ∞ ∞ (x2+y2) = e− dy dx Z−∞ µZ−∞ ¶ (x2+y2) = e− dλ2 (x, y) . 2 ZR Then by (3.7)

π (x2+y2) ∞ r2 ∞ r2 ∞ r2 2 e− dλ2 = e− dθ rdr =2π e− rdr = π e− dr = π. R2 0 π 0 0 Z Z µZ− ¶ Z Z Hence, I2 = π and I = √π,thatis,

∞ x2 e− dx = √π. Z−∞ The proof of Theorem 3.1 will be preceded by a number of auxiliary claims. We say that a mapping Φ is good if (3.2) holds for all non-negative measurable func- tions. Theorem 3.1 will be proved if we show that all diffeomorphisms are good. 1 Claim 0. Φ isgoodifandonlyifforanymeasurablesetB V ,thesetA = Φ− (B) is also measurable and ⊂

μ (B)= det Φ0 dμ. (3.8) | | ZA

Proof. Applying (3.2) for indicator functions, that is, for functions f =1B where B is a measurable subset of V , we obtain that the function 1A =1B Φ is measurable and ◦

μ (B)= 1B (Φ (y)) det Φ0 (y) dμ (y)= 1A (y) det Φ0 dμ = det Φ0 dμ. (3.9) | | | | | | ZU ZU ZA Conversely, if (3.8) is true then (3.2) is true for indicators functions f =1B.Bythe linearity of the integral and the monotone convergence theorem for series, we obtain (3.2) for simple functions: ∞ f = bk1Bk . k=1 X Finally, any non-negative measurable function f is an increasing pointwise limit of simple functions, whence (3.2) follows for f. Theideaoftheproofof(3.8). We will first prove it for affine mappings, that is, the mappings of the form Ψ (x)=Cx + D where C is a constant n n matrix and D Rn.Ageneraldiffeomorphism Φ can be × ∈ approximated in a neighborhood of any point x0 U by the tangent mapping ∈ Ψx (x)=Φ (x0)+Φ0 (x0)(x x0) , 0 − 89 which is affine.ThisimpliesthatifA is a set in a small neighborhood of x0 then

μ (Φ (A)) μ (Ψx (A)) = det Ψ0 dμ det Φ0 dμ. ≈ 0 x0 ≈ | | ZA ZA ¯ ¯ Now split an arbitrary measurable set A ¯U into small¯ pieces Ak ∞ and apply the ⊂ { }k=1 previous approximate identity to each Ak:

μ (Φ (A)) = μ (Φ (Ak)) det Φ0 dμ = det Φ0 dμ. ≈ | | | | k k Ak A X X Z Z The error of approximation can be made arbitrarily small using finer partitions of A. Claim 1. If Φ : U V and Ψ : V W are good mappings then Ψ Φ is also good. → → ◦ Proof. Let f : W R be a non-negative measurable function. We need to prove that → fdμ= (f Ψ Φ) det (Ψ Φ)0 dμ. ◦ ◦ ◦ ZW ZU ¯ ¯ Since Ψ is good, we have ¯ ¯

fdμ= (f Ψ) det Ψ0 dμ. ◦ | | ZW ZV

Set g = f Ψ det Φ0 so that g is a function on V .SinceΦ is good, we have ◦ | |

gdμ= (g Φ) det Φ0 dμ. ◦ | | ZV ZU Combining these two lines and using the chain rule in the form

(Ψ Φ)0 =(Ψ0 Φ) Φ0 ◦ ◦ and it its consequence det (Ψ Φ)0 =det(Ψ0 Φ)detΦ0, ◦ ◦ we obtain

fdμ = (f Ψ Φ) det Ψ0 Φ det Φ0 dμ ◦ ◦ | ◦ || | ZW ZU = (f Ψ Φ) det (Ψ Φ)0 dμ, ◦ ◦ ◦ ZU ¯ ¯ which was to be proved ¯ ¯ Claim 2. All translations Rn are good. Proof. A translation of Rn is a mapping of the form Φ (x)=x + v where v is a n constant vector from R . Obviously, Φ0 =idand det Φ0 1. Hence, the fact that Φ is good amounts by (3.8) to ≡ μ (B)=μ (B ν) , − for any measurable subset B Rn. This is obviously true when B is a box. Then following the procedure of construction⊂ of the Lebesgue measure, we obtain that this is true for any measurable set B.

90 Note that the identity (3.2) for translations amounts to

f (x) dμ (x)= f (y + v) dμ (y) , (3.10) n n ZR ZR for any non-negative measurable function f on Rn. Claim 3. All homotheties of Rn are good. Proof. A homothety of Rn is a mapping Φ (x)=cx where c is a non-zero real. Since n Φ0 = c id and, hence, det Φ0 = c ,thefactthatΦ is good is equivalent to the identity

n 1 μ (B)= c μ c− B , (3.11) | | for any measurable set B Rn. This is true for¡ boxes¢ and then extends to all measurable sets by the measure extension⊂ procedure. The identity (3.2) for homotheties amounts to

f (x) dμ (x)= c n f (cy) dμ (y) , (3.12) n | | n ZR ZR for any non-negative measurable function f on Rn. Claim 4. All linear mappings are good. Proof. Let Φ be a linear mapping from Rn to Rn. Extending a function f, initially defined on V ,tothewholeRn by setting f =0in Rn V , we can assume that V = U = Rn. Let us use the fact from linear algebra that any non-singular\ linear mapping can be reduced by column-reduction of the matrices to the composition of finitely many elementary linear mappings, where an elementary linear mapping is one of the following:

1. Φ (x1,...,xn)=(cx1,x2,...,xn) for some c =0; 6

2. Φ (x1,...,xn)=(x1 + cx2,x2,...,xn);

3. Φ (x1,...,xi,...,xj,...,xn)=(x1,...,xj,...,xi,...,xn) (switching the variables xi and xj).

Hence, in the view of Claim 1, it suffices to prove Claim 4 if Φ is an elementary mapping. If Φ is of the first type then, using Fubini’s theorem and setting λ = λ1,wecan write

fdμ = ... f (x1,...,xn) dλ (x1) dλ (x2) ...dλ(xn) n ZR ZR ZR µZR ¶

= ... f (ct, ..., xn) c dλ (t) dλ (x2) ...dλ(xn) | | ZR ZR µZR ¶ = f (cx1,x2,...,xn) c dμ n | | ZR = f (Φ (x)) det Φ0 dμ, n | | ZR 1 wherewehaveusedthechangeofintegralunderahomothetyinR and det Φ0 = c.

91 If Φ is of the second type then

fdμ = ... f (x1, ..., xn) dλ (x1) dλ (x2) ...dλ(xn) n ZR ZR ZR µZR ¶

= ... f (x1 + cx2, ..., xn) dλ (x1) dλ (x2) ...dλ(xn) ZR ZR µZR ¶ = f (x1 + cx2,x2,...,xn) dμ n ZR = f (Φ (x)) det Φ0 dμ, n | | ZR 1 where we have used the translation invariance of the integral in R and det Φ0 =1. Let Φ be of the third type. For simplicity of notation, let

Φ (x1,x2,...,xn)=(x2,x1,...,xn) .

Then, using twice Fubini’s theorem with different orders of the repeated integrals, we obtain

fdμ = ... f (x1,x2, ..., xn) dλ (x2) dλ (x1) ...dλ(xn) n ZR ZR ZR µZR ¶

= ... f (y2,y1,y3.., yn) dλ (y1) dλ (y2) ...dλ(yn) ZR ZR µZR ¶ = f (y2,y1,...,yn) dμ (y) n ZR = f (Φ (y)) det Φ0 dμ. n | | ZR

In the second line we have changed notation y1 = x2, y2 = x1, yk = xk for k>2 (let us emphasize that this is not a change of variable but just a change of notation in each of the repeated integrals), and in the last line we have used det Φ0 = 1. It follows from Claims 1, 2 and 4 that all affine mappings are good.− It what follows we will use the -norm in Rn: ∞

x = x := max xi . 1 i n k k k k∞ ≤ ≤ | | Let Φ be a diffeomorphism between open sets U and V in Rn.LetK be a compact subset of U. Claim 5. For any η>0 there is δ>0 such that, for any interval [x, y] K with the condition y x <δ, the following is true: ⊂ k − k

Φ (y) Φ (x) Φ0 (x)(y x) η y x . k − − − k ≤ k − k

Proof. Consider the function

f (t)=Φ ((1 t) x + ty) , 0 t 1, − ≤ ≤

92 so that f (0) = Φ (x) and f (1) = Φ (y). Since the interval [x, y] is contained in U,the function f is defined for all t [0, 1]. By the fundamental theorem of , we have ∈ 1 1 Φ (y) Φ (x)= f 0 (t) dt = Φ0 ((1 t) x + ty)(y x) dt. − − − Z0 Z0

Since the function Φ0 (x) is uniformly continuous on K, δ can be chosen so that

x, z K, z x <δ Φ (z) Φ (x) <η. ∈ k − k ⇒ k − k Applying this with z =(1 t) x + ty and noticing that z x y x <δ,weobtain − k − k ≤ k − k

Φ0 ((1 t) x + ty) Φ0 (x) <η. k − − k Denoting F (t)=Φ0 ((1 t) x + ty) Φ0 (x) , − − we obtain

1 Φ (y) Φ (x)= (Φ0 (x)+F (t)) (y x) dt − 0 − Z 1 = Φ0 (x)(y x)+ F (t)(y x) dt − − Z0 and 1 Φ (y) Φ (x) Φ0 (x)(y x) = F (t)(y x) dt η y x . k − − − k k − k ≤ k − k Z0

For any x U,denotebyΨx (y) the tangent mapping of Φ at x,thatis, ∈

Ψx (y)=Φ (x)+Φ0 (x)(y x) . (3.13) − n Clearly, Ψx is a non-singular affine mapping in R .Inparticular,Ψx is good. Note also that Ψx (x)=Φ (x) and Ψx0 Φ0 (x) . The Claim 5 can be stated≡ as follows: for any η>0 there is δ>0 such that if [x, y] K and y x <δthen ⊂ k − k

Φ (y) Ψx (y) η y x . (3.14) k − k ≤ k − k

Consider a balls Qr (x) with respect to the -norm, that is, ∞ n Qr (x)= y R : y x

Clearly, Qr (x) isthecubewiththecenterx andwiththeedgelength2r.IfQ = Qr (x) and c>0 then set cQ = Qcr (x).

Claim 6. For any ε (0, 1),thereisδ>0 such that, for any cube Q = Qr (x) K with radius 0

93

(1-ε)Q Ψ

Q

Φ (1+ε)Q Ψ((1+ε)Q)

Φ(Q) Ψ((1-ε)Q)

Ψ(Q)

The set Φ (Q) is “squeezed” between Ψ ((1 ε) Q) and Ψ ((1 + ε) Q) −

Proof. Fix some η>0 (which will be chosen below as a function of ε)andletδ>0 bethesameasinClaim5.LetQ beacubeasinthestatement.Foranyy Q,wehave y x

Φ

y Φ(y)

Ψ Ψ(y) Φ(x)= Ψ(x) x Φ(Q)

Q=Qr(x) Ψ(Q)

Mappings Φ and Ψ

1 Now we will apply Ψ− to the difference Φ (y) Ψ (y). Note that by (3.13), for any − a Rn, ∈ 1 1 Ψ− (a)=Φ0 (x)− (a Φ (x)) + x − whence it follows that, for all a, b Rn, ∈ 1 1 1 Ψ− (a) Ψ− (b)=Φ0 (x)− (a b) . (3.17) − − 94 By the compactness of K,wehave

1 C := sup (Φ0)− < . K k k ∞ It follows from (3.17) that

1 1 Ψ− (a) Ψ− (b) C a b − ≤ k − k ° ° (itisimportantthattheconstant° C does not depend° on a particular choice of x K but is determined by the set K). ∈ In particular, setting here a = Φ (y), b = Ψ (y) and using (3.16), we obtain

1 Ψ− Φ (y) y C Φ (y) Ψ (y)

Φ

y Φ(y) r <εr -1 Ψ x Ψ-1Φ(y)

Q=Qr(x) εr (1+ε)Q

1 Comparing the points y Q and Ψ− Φ (y) ∈

It follows that

1 1 Ψ− Φ (y) x Ψ− Φ (y) y + y x <εr+ r =(1+ε) r k − k ≤ k − k k − k whence 1 Ψ− Φ (y) Q(1+ε)r (x)=(1+ε) Q ∈ and Φ (Q) Ψ ((1 + ε) Q) , ⊂ which is the right inclusion in (3.15). To prove the left inclusion in (3.15), observe that, for any y ∂Q := Q Q,wehave y x = r and, hence, ∈ \ k − k 1 1 Ψ− Φ (y) x y x Ψ− Φ (y) y >r εr =(1 ε) r, k − k ≥ k − k − k − k − − that is, 1 Ψ− Φ (y) / (1 ε) Q =: Q0. ∈ − 95

Φ

y Φ(y) r <εr -1 Ψ x Ψ-1Φ(y) Q́ εr Q

1 Comparing the points y ∂Q and Ψ− Φ (y) ∈

It follows that Φ (y) / Ψ (Q0) for any y ∂Q,thatis, ∈ ∈

Ψ (Q0) Φ (∂Q)= . (3.19) ∩ ∅ Consider two sets

A = Ψ (Q0) Φ (Q) , ∩ B = Ψ (Q0) Φ (Q) , \ which are obviously disjoint and their union is Ψ (Q0).

Φ Α = Ψ(Q́)∩Φ(Q) Φ(Q)

x Φ(x)= Ψ(x)

Q́ Ψ(Q́) Ψ Q Β = Ψ(Q́)∖Φ(Q) Sets A and B (the latter is shown as non-empty although it is in fact empty)

Observe that A is open as the intersection of two open sets1.Using(3.19)wecan write B = Ψ (Q0) Φ (Q) Φ (∂Q)=Ψ (Q0) Φ Q . \ \ \ Then B is open as the difference of an open and closed set. Hence,¡ ¢ the open set Ψ (Q0) is the disjoint union of two open sets A, B.SinceΨ (Q0) is connected as a continuous image of a connected set Q0, we conclude that one of the sets A, B is and the other is Ψ (Q0). ∅ 1 Recall that diffeomorphisms map open sets to open and closed sets to closed.

96 Note that A is non-empty because A contains the point Φ (x)=Ψ (x). Therefore, we conclude that A = Ψ (Q0), which implies

Φ (Q) Ψ (Q0) . ⊃

Claim 7. If set S U has measure 0 then also the set Φ (S) has measure 0. Proof. Exhausting⊂ U be a sequence of compact sets, we can assume that S is contained in some compact set K such that K U.Choosesomeρ>0 and denote by Kρ the ⊂ n closed ρ-neighborhood of K,thatis,Kρ = x R : x K ρ .Clearly,ρ be can { ∈ k − k ≤ } taken so small Kρ U. ⊂ The hypothesis μ (S)=0implies that, for any ε>0, there is a sequence Bk of { } boxes such that S Bk and ⊂ k μ (B ) <ε. (3.20) S k k X For any r>0,eachboxB in Rn can be covered by a sequence of cubes of radii r and such that the sum of their measures is bounded by 2μ (B)2.Findsuchcubesforanybox≤ Bk from (3.20), and denote by Qj thesequenceofallthecubesacrossallBk. Then the { } sequence Qj covers S,and { } μ (Qj) < 2ε. j X Clearly, we can remove from the sequence Qj those cubes that do not intersect S, so { } that we can assume that each Qj intersects S and, hence, K.Choosingr to be smaller than ρ/2, we obtain that all Qj are contained in Kρ.

ρ

S K

U

Figure 1: Covering set S of measure 0 by cubes of radii <ρ/2

By Claim 6, there is δ>0 such that, for any cube Q Kρ of radius <δand center x, ⊂

Φ (Q) Ψx (2Q) . ⊂ 2 Indeed, consider the cubes whose vertices have the coordinates that are multiples of a,wherea is sufficiently small. Then choose cubes those that intersect B. Clearly, the chosen cubes cover B and their total measure can be made arbitrarily close to μ (B) provided a is small enough.

97 Hence, assuming from the very beginning that r<δand denoting by xj the center of Qj,weobtain Φ (Qj) Ψx (2Qj)=:Aj. ⊂ j By Claim 4, we have

n μ (Aj)= det Ψ0 μ (2Qj)= det Φ0 (xk) 2 μ (Qj) . xk | | ¯ ¯ By the compactness of Kρ, ¯ ¯ C := sup det Φ0 < . Kρ | | ∞ It follows that n μ (Aj) C2 μ (Qj) ≤ and n n+1 μ (Aj) C2 μ (Qj) C2 ε. ≤ ≤ j k X X Since Aj covers S, we obtain by the sub-additivity of the outer measure that μ∗ (S) n+1{ } ≤ C2 ε.Sinceε>0 is arbitrary, we obtain μ∗ (S)=0,whichwastobeproved. Claim 8. For any diffeomorphism Φ : U V and for any measurable set A U,theset Φ (A) is also measurable. → ⊂ Proof. Any measurable set A Rn can be represented in the form B C where B is a Borel set and C is a set of measure⊂ 0 (see Exercise 54). Then Φ (A)=Φ (−B) Φ (C).By Lemma 2.1, the set Φ (B) is Borel, and by Claim 7, the set Φ (C) has measure− 0. Hence, Φ (A) is measurable. Proof of Theorem 3.1. By Claim 0, it suffices to prove that, for any measurable 1 set B V , the set A = Φ− (B) is measurable and ⊂

μ (B)= det Φ0 dμ. | | ZA 1 Applying Claim 8 to Φ and Φ− ,weobtainthatA is measurable if and only if B is measurable. Hence, it suffices to prove that, for any measurable set A U, ⊂

μ (Φ (A)) = det Φ0 dμ, (3.21) | | ZA By the monotone convergence theorem, the identity (3.21) is stable under taking monotone limits for increasing sequences of sets A. Therefore, exhausting U by compact sets, we can reduce the problem to the case when the set A iscontainedinsuchacompact set, say K.Sincedet Φ0 is bounded on K and, hence, is integrable, it follows from the dominated convergence| theorem| that (3.21) is stable under taking monotone decreasing limits as well. Clearly, (3.21) survives also taking monotone difference of sets. Now we use the facts that any measurable set is a monotone difference of a Borel set and a set of measure 0 (Exercise 54), and the Borel sets are obtained from the open sets by monotone differences and monotone limits (Theorem 1.14). Hence, it suffices to prove (3.21) for two cases: if set A has measure 0 and if A is an open set. For the sets of measure 0 (3.21) immediately follows from Claim 7 since the both sides of (3.21) are 0.

98 So, assume in the sequel that A is an open set. We claim that, for any open set, and in particular for A,andforanyρ>0, there is a disjoint sequence of cubes Qk or radii { } ρ such that Qk A and ≤ ⊂ μ A Qk =0. \ µ k ¶ S

k Using dyadic grids with steps 2− ρ, k =0, 1, ..., to split an open set A into disjoint union of cubes modulo a set of measure zero.

Furthermore, if ρ is small enough then, by Claim 6, we have

Ψx ((1 ε) Qk) Φ (Qk) Ψx ((1 + ε) Qk) , k − ⊂ ⊂ k where xk is the center of Qk and ε>0 is any given number. By the uniform continuity of ln det Φ0 (x) on K,ifρ is small enough then, for x, y K, | | ∈ det Φ0 (y) x y <ρ 1 ε<| | < 1+ε. k − k ⇒ − det Φ (x) | 0 | Since

μ (Ψ ((1 + ε) Q )) = det Ψ (1 + ε)n μ (Q ) xk k x0 k k n = det Φ0 (xk) (1 + ε) μ (Qk) ¯| ¯ | ¯ n+1¯ (1 + ε) det Φ0 (x) dμ, ≤ | | ZQk it follows that

n+1 n+1 μ (Φ (A)) = μ (Φ (Qk)) (1 + ε) det Φ0 (x) dμ =(1+ε) det Φ0 dμ. ≤ | | | | k k Qk A X X Z Z Similarly, one proves that

(n+1) μ (Φ (A)) (1 ε) det Φ0 dμ. ≥ − | | ZA Since ε can be made arbitrarily small, we obtain (3.21).

99 3.2 Random variables and their distributions

Recall that a probability space (Ω, , P) is a triple of a set Ω,aσ-algebra (also frequently F F called a σ-field) and a probability measure P on , that is, a measure with the total F 1 (that is, P (Ω)=1). Recall also that a function X : Ω R is called measurable with → respect to if for any x R the set F ∈ X x = ω Ω : X(ω) x { ≤ } { ∈ ≤ } is measurable, that is, belong to . In other words, X x is an event. F { ≤ } Definition. Any measurable function X on Ω is called a (Zufallsgröße).

For any event X and any x R, the probability P (X x) is defined, which is referred to as the probability that the random∈ variable X is bounded≤ by x. For example, if A is any event, then the indicator function 1A on Ω is a random variable. Fix a random variable X on Ω. Recall that by Lemma 2.1 for any Borel set A R, the set ⊂ 1 X A = X− (A) { ∈ } is measurable. Hence, the set X A is an event and we can consider the probability { ∈ } P (X A) that X is in A. Set for any Borel set A R, ∈ ⊂ PX (A)=P (X A) . ∈ Then we obtain a real-valued functional PX (A) on the σ-algebra (R) of all Borel sets B in R.

Lemma 3.2 For any random variable X, PX is a probability measure on (R).Con- B versely, given any probability measure μ on (R), there exists a probability space and a B random variable X on it such that PX = μ.

1 1 Proof. We can write PX (A)=P(X− (A)). Since X− preserves all set-theoretic operations, by this formula the probability measure P on induces a probability measure on . For example, check additivity: F B 1 1 1 1 1 PX (A B)=P(X− (A B)) = P(X− (A) X− (B)) = P(X− (A)) + P(X− (B)). t t t The σ-additivity is proved in the same way. Note also that

1 PX (R)=P(X− (R)) = P(Ω)=1. For the converse statement, consider the probability space

(Ω, , P)=(R, ,μ) F B and the random variable on it X(x)=x. Then PX(A)=P(X A)=P(x : X(x) A)=μ(A). ∈ ∈ 100 Hence, each random variable X induces a probability measure PX on real Borel sets. The measure PX is called the distribution of X. In probabilistic terminology, PX (A) is the probability that the value of the random variable X occurs to be in A. Any probability measure on is also called a distribution. Aswehaveseen,any distribution is the distribution ofB some random variable. Consider examples distributions on R. There is a large class of distributions possessing a density, which can be described as follows. Let f be a non-negative Lebesgue integrable function on R such that fdλ =1, (3.22) ZR where λ = λ1 is the one-dimensional Lebesgue measure. Define the measure μ on (R) by B μ(A)= fdλ. ZA By Theorem 2.12, μ is a measure. Since μ (R)=1, μ is a probability measure. Function f is called the density of μ or the density function of μ. Here are some frequently used examples of distributions which are defined by their densities. 1. The uniform distribution (I) on a bounded interval I R is given by the density 1 U ⊂ function f = (I) 1I . 2. The (a, b): N 1 (x a)2 f(x)= exp − √2πb − 2b µ ¶ where a R and b>0. For example, (0, 1) is given by ∈ N 1 x2 f(x)= exp . √2π − 2 µ ¶ To verify total mass identity (3.22), use the change y = x a in the following integral: √−2b 2 ∞ 1 (x a) 1 ∞ exp − dx = exp y2 dy =1, (3.23) √2πb − 2b √π − Z−∞ µ ¶ Z−∞ where the last identity follows from ¡ ¢ + ∞ y2 e− dy = √π Z−∞ (see an Example in Section 3.1). Here is the plot of f for (0, 1): N y

0.3

0.2

0.1

-2.5 -1.25 0 1.25 2.5

x

101 3. The Cauchy distribution: a f(x)= , π(x2 + a2) where a>0. Here is the plot of f in the case a =1:

y 0.3

0.25

0.2

0.15

0.1

0.05

-2.5 -1.25 0 1.25 2.5

x

4. The exponential distribution:

ae ax,x>0, f(x)= − 0,x0, ½ ≤ where a>0.Thecasea =1is plotted here:

y 1

0.8

0.6

0.4

0.2

-2.5 -1.25 0 1.25 2.5

x

5. The gamma distribution:

c xa 1 exp( x/b),x>0, f(x)= a,b − , 0,x− 0, ½ ≤

where a>1,b>0 and ca,b is chosen from the normalization condition (3.22), which gives 1 c = . a,b Γ(a)ba x For the case a =2and b =1,wehaveca,b =1and f (x)=xe− ,whichisplotted below.

102 y 0.5

0.375

0.25

0.125

0 -1.25 0 1.25 2.5

x

Another class of distributions consists of discrete distributions (or atomic distribu- tions). Choose a finite or countable sequence xk R of distinct reals, a sequence pk of non-negative numbers such that { } ⊂ { }

pk =1, (3.24) k X and define for any set A R (in particular, for any Borel set A) ⊂

μ(A)= pk.

xk A X∈ By Exercise 2, μ is a measure, and by (3.24) μ is a probability measure. Clearly, measure μ is concentrated at points xk which are called atoms of this measure. If X is a random variable with distribution μ then X takes the value xk with probability pk. Here are two examples of such distributions, assuming xk = k.

1. The B(n, p) where n N and 0

n k n k pk = p (1 p) − ,k=0, 1, ..., n. k − µ ¶ The identity (3.24) holds by the binomial formula. In particular, if n =1then one gets Bernoulli’s distribution

p0 = p, p1 =1 p. − 2. The Poisson distribution Po(λ):

λk p = e λ,k=0, 1, 2,.... k k! − where λ>0. The identity (3.24) follows from the expansion of eλ into the power series.

103 3.3 Functionals of random variables

Definition. If X is a random variable on a probability space (Ω, , P) then its expectation is defined by F E (X)= X(ω)dP(ω), ZΩ provided the integral in the right hand side exists (that is, either X is non-negative or X is integrable, that is E ( X ) < ). | | ∞ In other words, the notation E (X) is another (shorter) notation for the integral Ω XdP. By simple properties of Lebesgue integration and a probability measure, we have the R following properties of E:

1. E(X + Y )=E (X)+E (Y ) provided all terms make sense;

2. E (αX)=αE (X) where α R; ∈ 3. E1=1.

4. inf X EX sup X. ≤ ≤ 5. EX E X . | | ≤ | |

Definition. If X is an integrable random variable then its variance is defined by

2 var X = E (X c) , (3.25) − where c = E (X). ¡ ¢ The variance measures the quadratic mean deviation of X from its mean value c = E (X). Another useful expression for variance is the following:

2 2 var X = E X (E (X)) . (3.26) − Indeed, from (3.25), we obtain ¡ ¢

2 2 2 2 2 2 2 var X = E X 2cE (X)+c = E X 2c + c = E X c . − − − Since by (3.25) var¡ X ¢ 0, it follows from (3.26)¡ ¢ that ¡ ¢ ≥ 2 2 (E (X)) E(X ). (3.27) ≤ Alternatively, (3.27) follows from a more general Cauchy-Schwarz inequality: for any two random variables X and Y ,

2 2 2 (E ( XY )) E(X )E(Y ). (3.28) | | ≤ The definition of expectation is convenient to prove the properties of E but no good for computation. For the latter, there is the following theorem.

104 Theorem 3.3 Let X be a random variable and f be a Borel function on ( , + ). Then f(X) is also a random variable and −∞ ∞

E(f(X)) = f(x)dPX(x). (3.29) ZR If measure PX has the density g (x) then

E(f(X)) = f(x)g (x) dλ (x) . (3.30) ZR Proof. Function f(X) is measurable by Theorem 2.2. It sufficestoprove(3.29)for non-negative f.Considerfirst an indicator function

f(x)=1A(x), for some Borel set A R. For this function, the integral in (3.29) is equal to ⊂

dPX (A)=PX (A). ZA On the other hand, f(X)=1A(X)=1 X A { ∈ } and

E (f(X)) = 1 X A dP = P(X A)=PX (A). { ∈ } ∈ ZΩ Hence, (3.29) holds for the indicator functions. In the same way one proves (3.30) for indicator functions. By the linearity of the integral, (3.29) (and (3.30)) extends to functions which are finite linear combinations of indicator functions. Then by the monotone convergence theorem the identity (3.29) (and (3.30)) extends to infinite linear combinations of the indicator functions, that is, to simple functions, and then to arbitrary Borel functions f. In particular, it follows from (3.29) and (3.30) that

E (X)= xdPX (x)= xg (x) dλ (x) . (3.31) ZR ZR Also, denoting c = E (X),weobtain

2 2 2 var X = E (X c) = (x c) dPX = (x c) g (x) dλ (x) . − − − ZR ZR Example. If X (a, b) then one finds ∼ N + 2 + 2 ∞ x (x a) ∞ y + a y E (X)= exp − dx = exp dy = a, √2πb − 2b √2πb −2b Z−∞ Ã ! Z−∞ µ ¶ wherewehaveusedthefactthat + y y2 ∞ exp dy =0 √2πb −2b Z−∞ µ ¶ 105 because the function under the integral is odd, and + a y2 ∞ exp dy = a, √2πb −2b Z−∞ µ ¶ which is true by (3.23). Similarly, we have

+ 2 2 + 2 2 ∞ (x a) (x a) ∞ y y var X = − exp − dx = exp dy = b, (3.32) √2πb − 2b √2πb −2b Z−∞ Ã ! Z−∞ µ ¶ where the last identity holds by Exercise 50. Hence, for the normal distribution (a, b), the expectation is a and the variance is b. N

Example. If PX is a discrete distribution with atoms xk and values pk then (3.31) becomes { } { } ∞ E (X)= xkpk. k=0 X For example, if X Po(λ) then one finds ∼ k k 1 ∞ λ ∞ λ − (X)= k e λ = λ e λ = λ. E k! − (k 1)! − k=1 k=1 X X − Similarly, we have

k k 2 ∞ 2 λ λ ∞ λ λ E X = k e− = (k 1+1) e− k! − (k 1)! k=1 k=1 − ¡ ¢ X k 2 X k 1 ∞ λ − ∞ λ − = λ2 e λ + λ e λ = λ2 + λ, (k 2)! − (k 1)! − k=2 k=1 X − X − whence 2 2 var X = E X (E (X)) = λ. − Hence, for the Poisson distribution with¡ the¢ parameter λ, both the expectation and the variance are equal to λ.

3.4 Random vectors and joint distributions

Definition. A mapping X : Ω Rn is called a random vector (or a vector-valued random variable) if it is measurable→ with respect to the σ-algebra . F Let X1, ..., Xn bethecomponentsofX. Recall that the measurability of X means that the following set

ω Ω : X1 (ω) x1, ..., Xn (ω) xn (3.33) { ∈ ≤ ≤ } is measurable for all reals x1, ..., xn.

Lemma 3.4 A mapping X : Ω Rn is a random vector if and only if all its components → X1, ..., Xn are random variables.

106 Proof. If all components are measurable then all sets X1 x1 ,..., Xn xn are measurable and, hence, the set (3.33) is also measurable as their{ ≤ intersection.} { Conversely,≤ } let the set (3.33) is measurable for all xk.Since

∞ X1 x = X1 x, X2 m, X3 m, ..., Xn m { ≤ } m=1 { ≤ ≤ ≤ ≤ } S we see that X1 x is measurable and, hence, X1 is a random variable. In the same { ≤ } way one handles Xk with k>1. Corollary. If X : Ω Rn is a random vector and f : Rn Rm is a Borel mapping then → → f X : Ω Rm is a random vector. ◦ → Proof. Let f1, ..., fm be the components of f. By the argument similar to Lemma n 3.4, each fk is a Borel function on R . Since the components X1,...,Xn are measurable functions, by Theorem 2.2 the function fk (X1, ..., Xn) is measurable, that is, a random variable. Hence, again by Lemma 3.4 the mapping f (X1, ..., Xn) is a random vector. As follows from Lemma 2.1, for any random vector X : Ω Rn and for any Borel n 1 → set A R , the set X A = X− (A) is measurable. Similarly to the one-dimensional ⊂ { ∈ } n case, introduce the distribution measure PX on (R ) by B PX (A)=P (X A) . ∈ n Lemma 3.5 If X is a random vector then PX is a probability measure on (R ).Con- B versely, if μ is any probability measure on (Rn) then there exists a random vector X B such that PX = μ. Theproofisthesameasinthedimension1 (see Lemma 3.2) and is omitted. If X1, ..., Xn is a sequence of random variable then consider the random vector X = (X1,...,Xn) and its distribution PX .

Definition. The measure PX is referred to as the joint distribution of the random vari- ables X1,X2, ..., Xn. It is also denoted by PX1X2...Xn . As in the one-dimensional case, any non-negative measurable function g(x) on Rn such that g(x)dx =1, n ZR is associated with a probability measure μ on (Rn),whichisdefined by B

μ(A)= g(x)λn (x) . ZA The function g is called the density of μ. If the distribution PX of a random vector X has the density g then we also say that X has the density g (or the density function g).

n Theorem 3.6 If X =(X1, ..., Xn) is a random vector and f : R R is a Borel function then →

Ef(X)= f(x)dPX (x). n ZR If X has the density g (x) then

Ef(X)= f(x)g (x) dλn(x). (3.34) n ZR 107 The proof is exactly as that of Theorem 3.3 and is omitted. One can also write (3.34) in a more explicit form:

Ef(X1,X2, ..., Xn)= f(x1, ..., xn)g (x1,...,xn) dλn(x1,...,xn). (3.35) n ZR As an example of application of Theorem 3.6, let us show how to recover the density of a component Xk given the density of X.

Corollary. If a random vector X has the density function g then its component X1 has the density function

g1(x)= g(x, x2, ..., xn) dλn 1 (x2, ..., xn) . (3.36) n 1 − ZR − Similar formulas take places for other components.

Proof. Let us apply (3.35) with function f(x)=1 x1 A where A is a Borel set on R; that is, { ∈ } 1,x1 A f(x)= ∈ = 1A R ... R . 0,x1 / A × × × ½ ∈ Then

Ef(X1,...,Xn)=P (X1 A)=PX1 (A), ∈ and (3.35) together with Fubini’s theorem yield

PX1 (A)= 1A R ... R (x1, ..., xn) g (x1, ..., xn) dλn (x1,...,xn) n × × × ZR = g (x1, ..., xn) dλn (x1, ..., xn) A Rn 1 Z × −

= g (x1, ..., xn) dλn 1 (x2, ..., xn) dλ1 (x1) . n 1 − ZA µZR − ¶ This implies that the function in the brackets (which is a function of x1) is the density of X1,whichproves(3.36). Example. Let X, Y be two random variables with the joint density function 1 x2 + y2 g(x, y)= exp (3.37) 2π − 2 µ ¶ (the two-dimensional normal distribution). Then X has the density function

+ 2 ∞ 1 x g1(x)= g(x, y)dy = exp , √2π − 2 Z−∞ µ ¶ that is, X (0, 1).InthesamewayY (0, 1). ∼ N ∼ N Theorem 3.7 Let X : Ω Rn be a random variable with the density function g.Let → Φ : Rn Rn be a diffeomorphism. Then the random variable Y = Φ (X) has the following density→ function 1 1 h = g Φ− det Φ− 0 . ◦ ¯ ¯ ¯ ¡ ¢ ¯ 108¯ ¯ Proof. We have for any open set A Rn ∈

1 P (Φ (X) A)=P X Φ− (A) = g (x) dλn (x) . ∈ ∈ 1 ZΦ− (A) ¡ ¢ 1 Substituting x = Φ− (y) we obtain by Theorem 3.1

1 1 g (x) dλn (x)= g Φ− (y) det Φ− (y) 0 dλn (y)= hdλn, Φ 1(A) A A Z − Z ¯ ¯ Z ¡ ¢ ¯ ¡ ¢ ¯ so that ¯ ¯

P (Y A)= hdλn. (3.38) ∈ ZA Since on the both sides of this identity we have finite measures on (Rn) that coincide on open sets, it follows from the uniqueness part of the CarathéodoryB extension theorem that the measures coincides on all Borel sets. Hence, (3.38) holds for all Borel sets A, which was to be proved. Example. For any α R 0 , the density function of Y = αX is ∈ \{ } x n h (x)=g α − . α | | ³ ´ In particular, if X (a, b) then αX (αa, α2b) . ∼ N ∼ N As another example of application of Theorem 3.7, let us prove the following statement. Corollary. Assuming that X, Y are two random variables with the joint density function g (x, y). Then the random variable U = X + Y has the density function

1 u + v u v h1 (u)= g , − dλ (v)= g (t, u t) dt, (3.39) 2 2 2 − ZR µ ¶ ZR and V = X Y has the density function − 1 u + v u v h2 (v)= g , − dλ (u)= g (t, t v) dt. (3.40) 2 2 2 − ZR µ ¶ ZR

Proof. Consider the mapping Φ : R2 R2 is given by → x x + y Φ = , y x y µ ¶ µ − ¶ and the random vector U X + Y = = Φ (X, Y ) . V X Y µ ¶ µ − ¶ Clearly, Φ is a diffeomorphism, and

u+v 1 1 u 2 1 1/21/2 1 Φ− = u v , Φ− 0 = , det Φ− 0 = . v −2 1/2 1/2 2 µ ¶ µ ¶ µ − ¶ ¯ ¯ ¡ ¢ ¯ ¡ ¢ ¯ ¯ ¯ 109 Therefore, the density function h of (U, V ) is given by

1 u + v u v h (u, v)= g , − . 2 2 2 µ ¶

By Corollary to Theorem 3.6, the density function h1 of U is given by 1 u + v u v h1 (u)= h (u, v) dλ (v)= g , − dλ (v) . 2 2 2 ZR ZR µ ¶ u+v The second identity in (3.39) is obtained using the substitution t = 2 .Inthesameway oneproves(3.40). Example. Let X, Y be again random vectors with joint density (3.37). Then by (3.39) we obtain the density of X + Y :

2 2 1 ∞ 1 (u + v) +(u v) h1 (u)= exp − dv 2 2π − 8 Z−∞ Ã ! 1 u2 v2 = ∞ exp dv 4π − 4 − 4 Z−∞ µ ¶ 1 1 u2 = e− 4 . √4π Hence, X + Y (0, 2) and in the same way X Y (0, 2). ∼ N − ∼ N

3.5 Independent random variables

Let (Ω, , P) be a probability space as before. F Definition. Two random vectors X : Ω Rn and Y : Ω Rm are called independent if, → → for all Borel sets A n and B m, the events X A and Y B are independent, that is ∈ B ∈ B { ∈ } { ∈ } P (X A and Y B)=P(X A)P(Y B). ∈ ∈ ∈ ∈ ni Similarly, a sequence Xi of random vectors Xi : Ω R is called independent if, for { } → any sequence Ai of Borel sets Ai ni ,theevents Xi Ai are independent. Here the index i runs{ in} any index set (which∈ B may be finite,{ countable,∈ } or even uncountable).

ni If X1,...,Xk is a finite sequence of random vectors, such that Xi : Ω R then we → can form a vector X =(X1, ..., Xk) whose components are those of all Xi;thatis,X is an n-dimensional random vector where n = n1 + ... + nk. The distribution measure PX of X is called the joint distribution of the sequence X1, ..., Xk and is also denoted by PX1...Xk . A particular case of the notion of a joint distribution for the case when all Xi are random variables, was considered above.

Theorem 3.8 Let Xi be a ni-dimensional random vector, i =1,...,k. The sequence

X1,X2, ..., Xk is independent if and only if their joint distribution PX1...Xk coincides with theproductmeasureofPX1 ,PX2 , ..., PXk ,thatis

PX ...X = PX ... PX . (3.41) 1 k 1 × × k

110 If X1,X2, ..., Xk are independent and in addition Xi has the density function fi(x) then the sequence X1, ..., Xk has the joint density function

f(x)=f1(x1)f2(x2)...fk(xk),

ni n where xi R and x =(x1, ..., xk) R . ∈ ∈

k ni Proof. If (3.41) holds then, for any sequence Ai i=1 of Borel sets Ai R , consider their product { } ⊂ n A = A1 A2 ... Ak R (3.42) × × × ⊂ and observe that

P (X1 A1, ..., Xk Ak)=P (X A) ∈ ∈ ∈ = PX (A)

= PX ... PX (A1 ... Ak) 1 × × k × × = PX1 (A1) ...PXk (Ak) = P (X1 A1) ...P (Xk Ak) . ∈ ∈

Hence, X1,...,Xk are independent. Conversely, if X1, ..., Xk are independent then, for any set A of the product form (3.42), we obtain

PX (A)=P (X A) ∈ = P (X1 A1,X2 A2, ..., Xk Ak) ∈ ∈ ∈ = P (X1 A1 )P(X2 A2)...P( Xk Ak) ∈ ∈ ∈ = PX ... PX (A) . 1 × × k

Hence, the measure PX and the product measure PX ... PX coincide on the sets of the 1 × × k form (3.42). Since n is the minimal σ-algebra containing the sets (3.42), the uniqueness part of the CarathéodoryB extension theorem implies that these two measures coincide on n,whichwastobeproved. B To prove the second claim, observe that, for any set A of the form (3.42), we have by Fubini’s theorem

PX (A)=PX1 (A1) ...PXk (Ak)

= f1 (x1) dλn1 (x1) ... fk (xk) dλnk (xk) ZA1 ZAk

= f1 (x1) ...fk (xk) dλn (x) A1 ... Ak Z × ×

= f (x) dλn (x) , ZA n where x =(x1, ..., xk) R . Hence, the two measures PX (A) and ∈

μ (A)= f (x) dλn (x) ZA

111 coincide of the product sets, which implies by the uniqueness part of the Carathéodory extension theorem that they coincide on all Borel sets A Rn. ⊂ Corollary. If X and Y are independent integrable random variables then

E(XY )=E (X) E (Y ) (3.43) and var(X + Y )=varX +varY. (3.44)

Proof. Using Theorems 3.6, 3.8, and Fubini’s theorem, we have

E(XY )= xydPXY = xyd (PX PY )= xdPX ydPY = E(X)E(Y ). 2 2 × ZR ZR µZR ¶µZR ¶ To prove the second claim, observe that var (X + c)=varX for any constant c. Hence, we can assume that E (X)=E (Y )=0. In this case, we have var X = E (X2).Using (3.43) we obtain

2 2 2 2 2 var(X + Y )=E(X + Y ) = E X +2E(XY )+E Y = EX + EY =varX +varY. ¡ ¢ ¡ ¢ The identities (3.43) and (3.44) extend to the case of an arbitrary finite sequence of independent integrable random variables X1,...,Xn as follows:

E (X1...Xn)=E (X1) ...E (Xn) ,

var (X1 + ... + Xn)=varX1 + ... +varXn , and the proofs are the same. Example. Let X, Y be independent random variables and

X (a, b) ,Y (a0,b0) . ∼ N ∼ N Let us show that X + Y (a + a0,b+ b0) . ∼ N Let us first simply evaluate the expectation and the variance of X + Y :

E (X + Y )=E (X)+E (Y )=a + a0, and, using the independence of X, Y and (3.44),

var (X + Y )=var(X)+var(Y )=b + b0. However, this does not yet give the distribution of X + Y . To simplify the further argument, rename X a to X, Y a0 to Y ,sothatwecanassumeinthesequelthat − − a = a0 =0. Also for simplicity of notation, set c = b0. By Theorem 3.8, the joint density function of X, Y is as follows: 1 x2 1 y2 g (x, y)= exp exp √2πb −2b √2πc −2c µ ¶ µ ¶ 1 x2 y2 = exp . 2π√bc −2b − 2c µ ¶ 112 By Corollary to Theorem 3.7, we obtain that the random variable U = X + Y has the density function

h (u)= g (t, u t) dλ (t) − ZR 2 2 ∞ 1 t (u t) = exp − dt 2π√bc −2b − 2c Z−∞ Ã ! 1 1 1 t2 ut u2 = ∞ exp + + dt. 2π√bc − b c 2 c − 2c Z−∞ µ µ ¶ ¶ Next compute + ∞ exp αt2 +2βt dt − Z−∞ ¡ ¢ where α>0 and β R.Wehave ∈ 2β β2 β2 αt2 +2βt = α t2 t + − − − α α2 − α2 µ ¶ β 2 β2 = α t + . − − α α µ ¶ Hence, substituting √α t β = s,weobtain − α + ¡ ¢ β2/α e π 2 ∞ exp αt2 +2βt dt = ∞ exp s2 ds = eβ /α. − √α − α Z−∞ Z−∞ r ¡ ¢ ¡ ¢ 1 1 1 u Setting here α = 2 b + c and β = 2c ,weobtain

¡ ¢ 2 2 1 u2/(2c) 2bcπ u 2bc 1 u h (u)= e− e 4c2 b+c = exp , 2π√bc b + c 2π (b + c) −2(b + c) r µ ¶ that is, U (0,b+ c). p ∼ N n Example. Consider a sequence Xi of independent Bernoulli variables taking 1 with { }i=1 probability p,and0 with probability 1 p,where0

n k n k P (S = k)= p (1 p) − . (3.45) k − µ ¶

Indeed, the sum S is equal to k if and only if exactly k of the values X1, ..., Xn are equal to 1 and n k are equal to 0. The probability, that the given k variables from Xi,say − k n k Xi1 , ..., Xik are equal to 1 and the rest are equal to 0,isequaltop (1 p) − .Since n − the sequence (i1, ..., ik) canbechosenin k ways, we obtain (3.45). Hence, S has the binomial distribution B(n, p). This can be used to obtain in an easy way the expectation and the variance of the binomial distribution¡ ¢ — see Exercise 67d. In the next statement, we collect some more useful properties of independent random variables.

113 ni Theorem 3.9 (a) If Xi : Ω R is a sequence of independent random vectors and ni mi → fi : R R is a sequence of Borel functions then the random vectors fi(Xi) are independent.→ { } (b) If X1,X2, ..., Xn,Y1,Y2, .., Ym is a sequence of independent random variables then the random{ vectors }

X =(X1,X2, ..., Xn) and Y =(Y1,Y2, ..., Ym) are independent. (c) Under conditions of (b), for all Borel functions f : Rn R and g : Rm R,the → → random variables f (X1, ..., Xn) and g (Y1, ..., Ym) are independent.

Proof. (a) Let Ai be a sequence of Borel sets, Ai m . We need to show that the { } ∈ B i events fi (Xi) Ai are independent. Since { ∈ } 1 fi (Xi) Ai = Xi f − (Ai) = Xi Bi , { ∈ } ∈ i { ∈ } 1 where Bi = f − (Ai) n , these sets© are independentª by the definition of the indepen- ∈ B i dence of Xi . (b) We{ have} by Theorem 3.8

PXY = PX ... PX PY ... PY 1 × × n × 1 × × m = PX PY . × Hence, X and Y are independent. (c) The claim is an obvious combination of (a) and (b) .

3.6 Sequences of random variables

Let Xi i∞=1 be a sequence of independent random variables and consider their partial sums{ } Sn = X1 + X2 + ... + Xn.

We will be interested in the behavior of Sn as n . The necessity of such considerations arises in many mathematical models of random→∞ phenomena.

Example. Consider a sequence on n independent trials of coin tossing and set Xi =1 if at the i-th tossing the coin lands showing heads, and Xi =0if the coin lands showing tails. Assuming that the heads appear with probability p,weseeeachXi has the same : P (Xi =1)=p and P (Xi =0)=1 p. Based on the experience, − we can assume that the sequence Xi is independent. Then Sn is exactly the number of { } heads in a series of n trials. If n is big enough then one can expect that Sn np. The claims that justify this conjecture are called laws of large numbers. Two such statements≈ of this kind will be presented below. As we have seen in Section 1.10, for any positive integer n,thereexistn independent events with prescribed probabilities. Taking their indicators, we obtain n independent Bernoulli random variables. However, in order to make the above consideration of infinite sequences of independent random variables meaningful, one has to make sure that such sequences do exist. The next theorem ensures that, providing enough supply of sequences of independent random variables.

114 Theorem 3.10 Let μi i I be a family of probability measures on (R) where I is an { } ∈ B arbitrary index set. Then there exists a probability space (Ω, , P) and a family Xi i I F { } ∈ of independent random variables on Ω such that PXi = μi.

The space Ω is constructed as the (infinite) product RI and the probability measure P on Ω is constructed as the product i I μi of the measures μi. The argument is similar to Theorem 2.22 but technically more involved.∈ The proof is omitted. Theorem 3.10 is a particular caseN of a more general Kolmogorov’s extension theorem that allows constructing families of random variables with prescribed joint densities.

3.7 The weak

Definition. We say that a sequence Yn of random variables converges in probability to P{ } arandomvariableY and write Yn Y if, for any ε>0, −→

lim ( Yn Y >ε)=0. n P →∞ | − |

This is a particular case of the convergence in measure — see Exercise 41.

Theorem 3.11 (The weak law of large numbers) Let Xi ∞ be independent sequence { }i=1 of random variables having a common finite expectation E (Xi)=a and a common finite variance var Xi = b.Then S n P a. (3.46) n −→ Hence, for any ε>0,

Sn P a ε 1 as n n − ≤ → →∞ µ¯ ¯ ¶ ¯ ¯ ¯ ¯ which means that, for large enough¯ n,theaverage¯ Sn/n concentrates around a with a probability close to 1.

Sn P Example. In the case of coin tossing, a = E (Xi)=p and var Xi is finite so that p. n −→ One sees that in some sense Sn/n p,andtheprecisemeaningofthatisthefollowing: ≈

Sn P p ε 1 as n . n − ≤ → →∞ µ¯ ¯ ¶ ¯ ¯ ¯ ¯ ¯ ¯ Example. Assume that all Xi (0, 1). Using the properties of the sums of indepen- dent random variables with the∼ noNrmal distribution, we obtain

Sn (0,n) ∼ N and S 1 n 0, . n ∼ N n µ ¶

115 Sn Sn Hence, for large n, the variance of n becomes very small and n becomes more concen- Sn trated around its mean 0. On the plot below one can see the density functions of n ,that is, n x2 gn (x)= exp n , 2π − 2 r µ ¶ which becomes steeper for larger n.

y

2

1.5

1

0.5

0 -5 -2.5 0 2.5 5

x

One can conclude (without even using Theorem 3.11) that

Sn P ε 1 as n , n ≤ → →∞ µ¯ ¯ ¶ ¯ ¯ ¯ ¯ that is, Sn P 0. ¯ ¯ n −→ Proof of Theorem 3.11. We use the following simple inequality, which is called Chebyshev’s inequality: if Y 0 is a random variable then, for all t>0, ≥

1 2 P (Y t) E Y . (3.47) ≥ ≤ t2 Indeed, ¡ ¢ 2 2 2 2 2 E Y = Y dP Y dP t dP = t P (Y t) , Ω ≥ Y t ≥ Y y ≥ Z Z{ ≥ } Z{ ≥ } whence (3.47)¡ follows.¢ LetusevaluatetheexpectationandthevarianceofSn:

n E (Sn)= EXi = an, i=1 X n

var Sn = var Xi = bn, k=i X 116 where in the second line we have used the independence of Xi and Corollary to Theorem { } 3.8. Hence, applying (3.47) with Y = Sn an ,weobtain | − |

1 2 var Sn bn b P ( Sn an εn) E (Sn an) = = = . | − | ≥ ≤ (εn)2 − (εn)2 (εn)2 ε2n

Therefore, Sn b P a ε , (3.48) n − ≥ ≤ ε2n µ¯ ¯ ¶ ¯ ¯ whence (3.46) follows. ¯ ¯ ¯ ¯ 3.8 The strong law of large numbers In this section we use the convergence of random variables almost surely (a.s.) Namely, if

Xn n∞=1 is a sequence of random variables then we say that Xn converges to X a.s. and { } a.s. write Xn X if Xn X almost everywhere, that is, −→ →

P (Xn X)=1. → In the expanded form, this means that

ω :limXn (ω)=X (ω) =1. P n ³n →∞ o´ To understand the difference between the convergence a.s. and the convergence in probability, consider an example.

P Example. Let us show an example where Xn X but Xn does not converges to X −→ a.s. Let Ω =[0, 1], = ([0, 1]),andP = λ1.LetAn be an interval in [0, 1] and set F B P Xn = 1A .Observethatif (An) 0 then Xn 0 because for all ε (0, 1) n → −→ ∈

P (Xn >ε)=P (Xn =1)= (An) .

a.s. However, for certain choices of An,wedonothaveXn 0. Indeed, one can construct −→ An so that (An) 0 but nevertheless any point ω [0, 1] is covered by these intervals { } → ∈ infinitely many times. For example, one can take as An the following sequence: { } [0, 1] , 1 1 0, 2 , 2 , 1 , 0, 1 , 1 , 2 , 2 , 1 , £ 3 ¤ £ 3 3¤ 3 0, 1 , 1 , 2 , 2 , 3 , 3 , 1 £ 4 ¤ £ 4 4 ¤ £ 4 4¤ 4 ... £ ¤ £ ¤ £ ¤ £ ¤

For any ω [0, 1], the sequence Xn(ω) contains the value 1 infinitely many times, which ∈ { } meansthatthissequencedoesnotconvergeto0. It follows that Xn does not converge to 0 a.s.; moreover, we have P (Xn 0) = 0. → Thefollowingtheoremstatestherelationsbetweenthetwotypesofconvergence.

117 Theorem 3.12 Let Xn be a sequence of random variables. a.s. { } P (a) If Xn X then Xn X (hence, the convergence a.s. is stronger than the convergence in−→ probability). −→ (b) If, for any ε>0, ∞ P ( Xn X >ε) < (3.49) n=1 | − | ∞ a.s. X then Xn X. −→ P a.s. (c) If Xn X then there exists a subsequence Xn X. −→ k −→ The proof of Theorem 3.12 is contained in Exercise 42. In fact, we need only part (b) of this theorem. The condition (3.49) is called the Borel-Cantelli condition. It is obviously P stronger that Xn X because the convergence of a series implies that the terms of the series go to 0.Part−→ (b) says that the Borel-Cantelli condition is also stronger than a.s. Xn X, which is not that obvious. −→ Let Xi be a sequence of random variables. We say that this sequence is identically { } distributed if all Xi have the same distribution measure. If Xi are identically distributed then their expectations are the same and the variances are{ the} same. As before, set Sn = X1 + ... + Xn.

Theorem 3.13 (The strong law of large numbers) Let Xi ∞ be a sequence of indepen- { }i=1 dent identically distributed random variables with a finite expectation E (Xi)=a and a finite variance var Xi = b.Then S n a.s. a. (3.50) n −→ The word “strong” is the title of the theorem refers to the convergence a.s. in (3.50), as opposed to the weaker convergence in probability in Theorem 3.11 (cf. (3.46)). The statement of Theorem 3.13 remains true if one drops the assumption of the finite- ness of var Xn.Moreover,thefiniteness of the mean E (Xn) is not only sufficient but also sn necessary condition for the existence of the limit lim n a.s. (Kolmogorov’s theorem). An- other possibility to relax the hypotheses is to drop the assumption that Xn are identically distributed but still require that Xn have a common finite mean and a common finite variance. The proofs of these stronger results are much longer and will not be presented here. Proof of Theorem 3.13. By Theorem 3.12(b),itsuffices to verify the Borel-Cantelli condition, that is, for any ε>0,

∞ Sn P a >ε < . (3.51) n − ∞ n=1 µ¯ ¯ ¶ X ¯ ¯ ¯ ¯ In the proof of Theorem 3.11, we have¯ obtained¯ the estimate

Sn b P a ε , (3.52) n − ≥ ≤ ε2n µ¯ ¯ ¶ ¯ ¯ which however is not enough to prove¯ (3.51),¯ because ¯ ¯ ∞ 1 = . n n=1 ∞ X 118 Nevertheless, taking in (3.52) n to be a perfect square k2,wherek N,weobtain ∈

∞ Sk2 ∞ b P a >ε < , k2 − ≤ ε2k2 ∞ k=1 µ¯ ¯ ¶ k=1 X ¯ ¯ X ¯ ¯ whence by Theorem 3.12(b) ¯ ¯ S 2 k a.s. a. (3.53) k2 −→ Now we will extend this convergence to the whole sequence Sn,thatis,fill gaps between the perfect squares. This will be done under the assumption that all Xi are non-negative (the general case of a signed Xi will be treated afterwards). In this case the sequence Sn is increasing. For any positive integer n, find k so that

k2 n<(k +1)2. (3.54) ≤

Using the monotonicity of Sn and (3.54), we obtain

S 2 S S 2 k n (k+1) . (k +1)2 ≤ n ≤ k2 Since k2 (k +1)2 as k , it follows from (3.53) that ∼ →∞ S 2 S 2 k a.s. a and (k+1) a.s. a, (k +1)2 −→ k2 −→ whence we conclude that S n a.s. a. n −→ Note that in this argument we have not used the hypothesis that Xi are identically distributed. As in the proof of Theorem 3.11, we only have used that Xi are independent random variables and that the have a finite common expectation a and a finite common variance. + Now we get rid of the restriction Xi 0.ForasignedXi, consider the positive Xi ≥ + part and the negative parts Xi−. Then the random variables Xi are independent because + + we can write Xi = f(Xi) where f (x)=x+, and apply Theorem 3.9. Next, Xi has finite expectation and variance by X+ X . Furthermore, by Theorem 3.3 we have ≤ | | + E Xi = E (f (Xi)) = f (x) dPXi . ZR ¡ ¢ + Since all measures PXi are the same, it follows that all the expectations E Xi are the same; set + + ¡ ¢ a := E Xi . + Inthesameway,thevariancesvar Xi are the¡ same.¢ Setting + + + + Sn = X1 + X2 + ... + Xn , we obtain by the first part of the proof that S+ n a.s. a+ as n . n −→ →∞ 119 Inthesameway,consideringXi− and setting a− = E Xi− and ¡ ¢ Sn− = X1− + ... + Xn− , we obtain Sn− a.s. a− as n . n −→ →∞ Subtracting the two relations and using that

+ S S− = Sn n − n and + + a a− = E Xi E Xi− = E (Xi)=a, − − we obtain ¡ ¢ ¡ ¢ S n a.s. a as n , n −→ →∞ which finishes the proof.

120 3.9 Extra material: the proof of the Weierstrass theorem using theweaklawoflargenumbers Here we show an example how the weak law of large numbers allows to prove the following purely analytic theorem. Theorem 3.14 (The Weierstrass theorem) Let f (x) be a continuous function on a bounded interval [a, b]. Then, for any ε>0, there exists a polynomial P(x) such that sup f(x) P (x) <ε. x [a,b] | − | ∈

Proof. It suffices to consider the case of the interval [0, 1]. Consider a sequence Xi i∞=1 of independent Bernoulli variables taking 1 with probability p,and0 with probability{ 1} p, − and set as before Sn = X1 + ... + Xn. Then, for k =0, 1, ..., n,

n k n k P (Sn = k)= p (1 p) − , k − µ ¶ which implies n n Sn k k n k n k Ef = f( )P(Sn = k)= f( ) p (1 p) − . n n n k − k=0 k=0 µ ¶ X X µ ¶ The right hand side here can be considered as a polynomial in p. Denote n k n k n k Bn(p)= f( ) p (1 p) − . (3.55) n k − k=0 X µ ¶ The polynomial Bn(p) is called the Bernstein’s polynomial of f. It turns out to be a good approximation for f. The idea is that Sn/n converges in some sense to p. Therefore, we Sn may expect that Ef( n ) converges to f(p). In fact, we will show that

lim sup f(p) Bn(p) =0, (3.56) n p [0,1] | − | →∞ ∈ which will prove the claim of the theorem. First observe that f is uniformly continuous so that for any δ>0 there exists ε = ε(δ) > 0 such that if x y ε then f(x) f(y) δ. Using the , we obtain | − | ≤ | − | ≤ n n n k n k 1=(p +(1 p)) = p (1 p) − , − k − k=0 X µ ¶ whence n n k n k f(p)= f(p) p (1 p) − . k − k=0 X µ ¶ Comparing with (3.55), we obtain n k n k n k f(p) Bn(p) f(p) f( ) p (1 p) − | − | ≤ − n k − k=0 ¯ ¯ µ ¶ X ¯ ¯ ¯ ¯ ¯ ¯ k n k n k = + f(p) f( ) p (1 p) − . ⎛ ⎞ − n k − k p ε k p >ε ¯ ¯ µ ¶ nX− ≤ nX− ¯ ¯ ⎜| | | | ⎟ ¯ ¯ ⎝ ⎠ ¯ ¯ 121 By the choice of ε,inthefirst sum we have k f(p) f( ) δ, − n ≤ ¯ ¯ ¯ ¯ so that the sum is bounded by δ. ¯ ¯ In the second sum, we use the fact¯ that f is¯ bounded by some constant C so that k f(p) f( ) 2C, − n ≤ ¯ ¯ ¯ ¯ and the second sum is bounded by¯ ¯ ¯ ¯ n k n k Sn 2C p (1 p) − =2C P (Sn = k)=2CP p >ε . k − n − k p >ε µ ¶ k p >ε µ¯ ¯ ¶ nX− nX− ¯ ¯ | | | | ¯ ¯ Using the estimate (3.48) and E (Xi)=p,weobtain ¯ ¯

Sn b P p >ε , n − ≤ ε2n µ¯ ¯ ¶ ¯ ¯ where b =varXi = p(1 p) < 1. ¯ ¯ ¯ ¯ Hence, for all p [0,−1], ∈ 2C f(p) Bn(p) δ + , | − | ≤ ε2n whence the claim follows. To illustrate this theorem, the next plot contains a sequence of Bernstein’s approxi- mations with n =3, 4, 10, 30 to the function f (x)= x 1/2 . | − | y

0.625

0.5

0.375

0.25

0.125

0 0 0.25 0.5 0.75 1

x

Remark. The upper bound for the sum

n k n k p (1 p) − k − k p >ε µ ¶ | nX− | can be proved also analytically, which gives also another proof of the weak law of large numbers in the case when all Xi are Bernoulli variables. Such a proof was found by Jacob Bernoulli in 1713, which was historically the first proof of the law of large numbers.

122