<<

Some necessary math (part I) Note: This is a partial collection of several formulae that we will need throughout the first part of the course. For convenience, we will review them now quickly. The ones for which I do not give proofs here are either very well known, or will be parts of your first assignment – if needed, you can find the missing proofs in the solutions of the first assignment.

1 Euler

(a) The is defined for any a> 0 as: ∞ Γ(a)= dxxa−1e−x (1) Z0 Properties: 1 1. Γ(1) = 1;Γ( 2 )= √π. 2. Γ(a +1) = aΓ(a) 3. Γ(n +1) = n! for any integer n 1. We will use this a lot! ≥ (b) We will use this second function just once, to compute the volume of a N-dimensional sphere, a bit later on. This type of appears quite often all over physics, so it’s good to know it in general. For any a> 0,b> 0, we define:

1 B(a, b)= dxxa−1(1 x)b−1 (2) Z0 − Then, Γ(a)Γ(b) B(a, b)= Γ(a + b)

2 Gaussian integrals

(a) one-dimensional: (1) For any a such that Re(a) > 0, we define:

∞ 2 π I(a)= dxe−ax = (3) Z−∞ r a This is an integral that we will use many many times, so let us prove that it is true. The trick is to compute [I(a)]2 using polar coordinates: ∞ ∞ 2 2 [I(a)]2 = dxe−ax dye−ay Z−∞ Z−∞ Now, change to polar coordinates x = r cos(φ), y = r sin(φ) dxdy = rdrdφ, x2 + y2 = r2, and of course the integral is over the whole 2D plane, φ [0, 2π], r →[0, ) ∈ ∈ ∞ → ∞ 2π ∞ 2 du π [I(a)]2 = drr dφe−ar = 2π e−au = Z0 Z0 Z0 2 a where we changed the variable u = r2. The last step is a trivial integral.

1 (2) For any b and a with Re(a) > 0, we define:

∞ 2 −ax2+bx π b J(a, b)= dxe = e 4a (4) Z−∞ r a ′ b The proof of this is straightforward, by changing the variable of integration to x = x 2a . (3) For any even integer n, we define: −

n ∞ Γ +1 n −ax2 2 In(a)= dxx e = ³n+1 ´ (5) Z−∞ a 2

If n is odd, In(a) = 0 because the integrand is an odd function. 2 n n−1 n+1 Proof: change variables ax = y xdx = dy/(2a) x dx = dyy 2 /(2a 2 ) → → → n ∞ ∞ Γ +1 n −ax2 1 n−1 −y 2 2 In(a) = 2 dxx e = n+1 dyy e = ³n+1 ´ Z0 a 2 Z0 a 2 by definition of the Γ function. (b) m-dimensional Gaussian integral: leta ˆ be a m m matrix which is symmetric (aij = aji), non- × singular (deta = 0) and positive defined (all its eigenvalues, let’s call them λ , ..., λm, are larger than 6 1 zero: λi > 0, i = 1, ..., n). Then:

∞ ∞ ∞ m − m m a x x π i j ij i j I(ˆa)= dx1 dx2 . . . dxme =1 =1 = (6) −∞ −∞ −∞ sdeta Z Z Z P P The demonstration is fairly straightforward: perform a change of variables to new variables for m m m 2 which the exponent is diagonalized: aijxixj λiy . The integral now factorizes i=1 j=1 → i=1 i in m independent, simple gaussian integrals.P P Since deta =P λ1λ2 ...λm, the final results follows immediately. Finally, if ˆb =(b1, . . . , bm) is a collection of m numbers, then:

∞ ∞ ∞ m − m m a x x m b x π 1 m m a−1 b b i j ij i j + i i i 4 i j ( )ij i j J(ˆa, ˆb)= dx1 dx2 . . . dxme =1 =1 =1 = e =1 =1 Z−∞ Z−∞ Z−∞ sdeta P P P P P (7) where a−1 is the inverse of the matrix a. So these last two are just direct generalizations of the one-dimensional Gaussian integrals. I don’t think we will need them – generally we will only encounter the one-dimensional gaussian integrals. Those, you absolutely have to be able to deal with, they will appear over and over.

3 Saddle-point approximation and Stirling formula

The saddle-point (or gaussian) approximation is a very convenient way to obtain approximations for integrals of the general form: ∞ I = dxeF (x) (8) Z0 when F (x) is a function which has a single maximum at a value xM 1 (see sketch below). In this ≫ case, the integrand exp(F (x)) is a function that is highly-peaked at xM , and quickly becomes very small for values x far from xM . The idea, then, is to approximate the function F (x) accurately in the vicinity of xM , since that’s where most of the contribution to the integral comes from.

2 F(x)

We use a Taylor expansion, and stop to the first non-trivial term: 2 (x xM ) F (x) F (x )+(x x )F ′(x )+ − F ′′(x )+... M M M 2 M xM x ≈ − ′ ′′ exp[F(x)] But F (xM ) = 0,F (xM ) < 0 (conditions for a ′′ maximum). Let a = F (xM )/2 > 0. Then: − ∞ ∞ 2 2 I dxeF (xM )−a(x−xM ) = eF (xM ) dye−ay ≈ Z0 Z−xM −ax2 where y = x xM . Now, if xM 1 e M 1, and we may− as well extend the≫ lower→ limit all≪ the x x M way to , since we’re only adding an extremely Fig 1. Upper plot: some function F (x) with a maximum small contribution.−∞ In this case, the integral be- at xM . The dashed line shows the Taylor expansion to second order (a parabola). The long vertical lines show comes a simple gaussian (hence the name gaussian where is this Taylor expansion a good approximation. approximation) and we find: Lower plot: the exponential of F (x) (i.e., the integrand of interest to us) is generally large in the region where ∞ F (x) 2π F (x ) the Taylor expansion is valid. M I = dxe ′′ e 0 ≈ s F (xM ) Z | | The most famous example of such an approximation is the Stirling formula, which we will use very many times. This is an approximation for the Gamma function, for large values of a 1: ≫ ∞ ∞ Γ(a +1) = dxxae−x = dxea ln x−x Z0 Z0 ′ a Define F (x)= a ln x x. It has an extremum at F (x)= x 1 = 0 xM = a 1, by assumption. − ′′ a ′′ −1 → ≫ This is a maximum, because F (x)= x2 < 0 F (xM ) = a ,F (xM )= a ln a a. Then, applying the gaussian approximation, we find that:− → | | − 2π a a F (xM ) √ a ln a−a √ Γ(a + 1) ′′ e = 2πae = 2πa (9) ≈ s F (xM ) e | | µ ¶ We will use this most often for integers a = n 1, in which case we have: ≫ n n Γ(n +1) = n! √2πn (10) ≈ µ e ¶ From this, it follows that: 1 ln(n!) n ln n n + ln(2πn)+ . . . (11) ≈ − 2 Remember that n is a very large number; as a result, the first term here is the largest one, the second one is smaller, the third is even smaller, and neglected corrections are even smaller. Depending on how large n is, we can even stop at the first term. Stirling’s formula keeps just the first two terms: n ln(n!) n ln( ) (12) ≈ e

3 1

0.995

ratio 0.99

0.985

60 80 100 120 140 x

x Figure 1: Checking the accuracy of Stirling’s formula. The red (lower) curve shows the ratio [x ln( e )]/ ln(Γ(x + 1)), x 1 while the green (upper curve) shows the improved approximation [x ln( e )+ 2 ln(2πx)]/ ln(Γ(x+1)). Even for relatively small numbers x 100, the formulae work quite well. Even Stirling’s formula recovers over 99% of the actual value. For the sorts of numbers∼ of interest to us, n 1023, this approximation is REALLY good. ∼

Note: this plot was made with Maple. If you want to test the Stirling formula some more, all you need to know is that the Maple name of the Gamma function is GAMMA(x).

4 Volume of a n-dimensional hypersphere

This result is demonstrated in the textbook (calculation begins on the bottom of page 129 and page 130); there it is done by a different method. While I obviously expect you to know the answers for n = 1, 2, 3, I will give you the general formula, if it is needed in an exam – so don’t worry about learning it by heart. Although it is sort of a cool result. Question: what is the “volume” enclosed in a n-dimensional hypersphere of radius R, defined by n 2 2 the equation i=1 xi = R ? By definition: P n 2 2 Vn(R)= dx1 . . . dxnΘ(R xi ) − i Z Z X=1 Θ(x) is called a Heaviside (or step) function and appears quite often in more advanced courses. It is defined by: 0 ,x< 0 Θ(x)= ( 1 ,x> 0

All it does is to tell us that we should “sum” the infinitesimal volumes dV = dx1dx2 ...dxn as long as R2 n x2, i.e. as long as we are inside the hypersphere. ≥ i=1 i First,P define yi = xi/R. It follows immediately that: 1 1 n n 2 n Vn(R)= R dy1 . . . dynΘ(1 yi )= R Vn(1) −1 −1 − i Z Z X=1

4 i.e. the volume of a hypersphere of radius R is Rn times the volume of the unit hypersphere. We know this is true for dimensional reasons: the volume must have units of mn, where m is a meter. The volume can only depend on n (dimensionless) and on R (units of m), so it must be Rn times some number that can only depend on n. In 1D the hypersphere is a line, and its “volume” is the 1 length 2R = R 2 V1(1) = 2; in 2D, the hypersphere is a circle and its “volume” is the area 2 2 × → 3 3 πR = R π V2(1) = π; in 3D, the volume is 4πR /3= R 4π/3 V3(1) = 4π/3. So indeed, “volumes”× scale→ as Rn times some number, which we are now trying× to→ find. Note that I put limits on the integrals over y; in principle the limits are infinity, but we know that if any of the y has a modulus larger than 1, that point is outside the sphere and there is no contribution. Let’s continue, by separating one of the variables:

1 1 n 1 1 1 n 2 2 2 Vn(1) = dy1 . . . dynΘ(1 yi )= dy1 dy2 . . . dynΘ((1 y1) yi ) −1 −1 − i −1 " −1 −1 − − i # Z Z X=1 Z Z Z X=2 But the quantity in the square brackets is just the volume of a n 1-dimensional hypersphere of − radius 1 y2, so: − 1 q 1 1 n−1 1 n−1 2 2 2 2 2 Vn(1) = dy1Vn−1( 1 y1)= dy1(1 y1) Vn−1(1) = 2Vn−1(1) dy1(1 y1) −1 − −1 − 0 − Z q Z Z n−1 I used Vn−1(R)= R Vn−1(1) and the fact that the integrand is an even function. Changing variables to y2 z, we finally find: → 1 1 n+1 − 1 n−1 1 n + 1 Γ( )Γ( ) 2 2 2 2 Vn(1) = Vn−1(1) dzz (1 z) = Vn−1(1)B , = Vn−1(1) n+2 Z0 − µ2 2 ¶ Γ( 2 ) where I used definitions and equalities discussed on the first page of this write-up. We are now practically done, since:

Vn(1) Vn−1(1) V2(1) Vn(1) = . . . V1(1) Vn−1(1) Vn−2(1) V1(1) and we know all these ratios, and V1(1) = 2. Substituting the ratio, using Γ(1/2) = √π and after some simplifications, we find: n n n π 2 Vn(R)= R Vn(1) = R n+2 Γ( 2 ) Let’s check this quickly: (i) n = 1 Γ( n+2 )=Γ( 3 )= 1 Γ( 1 )= √π/2 so things are fine. → 2 2 2 2 (ii) n = 2 Γ( 2+2 ) = 1! = 1, so we get the correct answer again. → 2 (iii) n = 3 Γ( n+2 )=Γ( 5 )= 3 Γ( 3 )= 3 Γ( 1 )= 3 √π giving the expected answer. → 2 2 2 2 4 2 4 5 Elementary results of probability theory

Let us assume that we look at a random event, which can have n possible outcomes, each of which happens with probability , i = 1..n. Obviously, it must be that: n 0 pi 1; pi = 1 ≤ ≤ i X=1

5 Examples: throw of a coin can have two outcomes, heads or tails, so n = 2. If the coin is fair, either of these will happen with a probability of 1/2. Similarly, a die has 6 possible outcomes, etc. By probability, we mean that if we repeat the experiment (in identical conditions) for a very large number of times N, and count the number Ni when the outcome is the desired event i, then

Ni pi = lim . N→∞ N Some examples: suppose we have 2 dice, and we want to know what is the probability that the first one we throw gives a 5, and the second one gives a 3. Let me call this p5,3. Since throws are independent events, p5,3 = p5p3, where pi is probability to throw “i”. For a fair die p1 = ... = p6 = 1/6, it follows that p5,3 = 1/36. Another way to arrive at the same result, is to realize that there are 36 possible outcomes of the two throws (first die shows any number between 1 and 6, second one also any number between 1 and 6 36 possible outcomes) and only one of them is the desired one probability of success is 1/36.→ How about if we ask what is the probability to throw a 5 and a → 3, irrespective of the order? We have p5and3 = p5,3 + p3,5 = 2/36. Or, from the 36 possible pairs, two pairs, (5,3) and (3,5), are the desired ones, so probability of success is 2/36. Note: unless a probability is 0 or 1, in which case the desired event is either impossible or certain, knowing a value 0

n ′ i = ipi = F (1); h i i X=1 n ′′ 2 2 2 ′′ ′ ′ 2 F (1) = i(i 1)pi = i i i i = F (1) + F (1) [F (1)] i − h i−h i→h i−h i − X=1 while the normalization condition is n 1= pi = F (1) i X=1 Some famous examples:

6 5.1 Bernoulli (or binomial) distribution The binomial distribution is defined by the probabilities:

N! n N−n PN (n)= p (1 p) n!(N n)! − − where n = 0, 1, ..., N. It corresponds to the following type of random process. Assume we have an event which has two possible outcomes: the desired outcome has probability p, and the undesired outcome has probability q = 1 p (from normalization). Then, the binomial distribution PN (n) gives the probability that if we repeat− the event N times, we obtain the desired outcome exactly n times. Example: assume we flip a coin N times, and ask what is the probability that we get heads n times (we don’t care about the order, only that n out of N times we get heads). The answer is: N! 1 N n!(N n)! 2 − µ ¶ since in this case p = q = 1/2. Let’s justify this formula. Start with PN (N), i.e. the probability that in each case we get the de- sired outcome. Since the events are independent of each other, this is just the product of probabilities N that each event produces the desired outcome, i.e. PN (N)= p . Done. How about PN (N 1)? Let’s count the possibilities: the “wrong” outcome can occur in the very first event, and then the− remaining N 1 events must have the “right” outcome – the probability for this sequence is (1 p)pN−1. But it is− also possible to have a sequence r, w, r, r, ..., r, where r =”right outcome” and w−=”wrong out- come”. The probability for this sequence is p(1 p)pN−2 = (1 p)pN−1, just as before. But the wrong outcome could happen in the third, or fourth,− ..., or last try.−In each case, that particular sequence of events has probability (1 p)pN−1, and since there are N distinct sequences that give the desired − N−1 overall number of right vs. wrong outcomes, we find that PN (N 1) = Np (1 p). Correct! How − − about PN (N 2)? In this case, we must have 2 “wrong” and N 2 “right” outcomes. Each such sequence appears− with probability pN−2(1 p)2. How many distinct− sequences are there? Well, this is just the number of distinct ways in which− we can pick 2 out of N (which 2 to be “wrong”). For the first pick we can choose any of the N, while for the second we can choose any of the remaining N 1. However, the total number of distinct sequences is not N(N 1), but N(N 1)/2. The reason− is that the picks are “indistinguishable” – it makes no difference− if the first pick− is i and the second j, or the first is j and the second i: both give the same sequence where events i and j are N(N−1) N−2 N−n wrong, and all the other ones are right. It follows that PN (N 2) = 2 p (1 p) . Now let’s generalize: if we want to have n right outcomes,− the probability of− any particular sequence with n right and N n wrong outcomes is pn(1 p)N−n. How many such sequences are there? Well, it’s the number of− distinct ways to pick n out− of N (which n are right, this time). If the picks were distinguishable (i.e., it made a difference in which order we make the picks), then we would have N(N 1) . . . (N n +1) = N!/(N n)! possible sequences, since at the first pick we could choose any− of the N, at− the second any of− the remaining N 1, and so on. Because we are interested in the case where the picks are indistinguishable, we have− to divide this by the number of identical copies we get (sequences which differ only by the order in which we made the picks, but not by the order of right vs. wrong inside the sequence). There are n! such copies, since this is the number of ways in which we can order n objects. Suppose that the n “right” picks have positions 1 i1 < i2 < ... < in N in the sequence of N events. Any of these n could have been associated with≤ the first pick, any≤ of the remaining ones could correspond to the second pick, etc, so indeed n! picks give the same sequence of r vs w.

7 To conclude, we have a total of N! n!(N n)! − ways to choose n out of N (i.e., distinct sequences), each of which comes with probability pn(1 p)N−n. − Multiplying the two, we get PN (n). Hopefully this is now clear. Is this distribution normalized? Yes:

N N N! n N−n N PN (n)= p (1 p) = [p + (1 p)] = 1 n n (N n)!n! − − X=1 X=0 − What is the characteristic function?

N n N FN (x)= x PN (n) = [px + 1 p] n − X=0 Then, the average number of “right” outcomes is:

′ N−1 n = F (1) = pN[px + 1 p] x = pN h i N − | =1 This makes very good sense. How about the standard deviation?

n2 n 2 1 p (n n )2 = n2 n 2 = F ′′(1) + F ′(1) [F ′(1)]2 = Np(1 p) h i−h i = − h −h i i h i−h i − − → q n s Np h i Let’s see an example. It is time to discuss some physics, so let’s assume we discuss spins 1/2. As you know, each spin 1/2 can be found in one of two possible quantum states: either it has the spin projection “up” (i.e., sz = +1/2), or “down” (sz = 1/2). We can manipulate the probability p of finding the spin “up” by turning on a magnetic field.− If there is no field, we expect p = 1/2, since then there is nothing to make a difference between the two choices. Assume we have a chain of N such spins, and we want to know the probability that n of them point up. From what we just discussed, this is N! 1 N PN (n)= = PN (N n) (13) n!(N n)! 2 − − µ ¶ Because there is no distinction between “up” and “down” in the absence of a magnetic field, we expect the second equality to hold for reasons of symmetry – we could simply change the direction of the axis of quantization and the outcome should be the same. If we add a magnetic field, we can no longer do that, so we expect PN (n) = PN (N n). Let us plot this probability distribution,6 for− different N, first for p = 1/2. In order to be able to compare different plots, I will rescale the x-axis to shown x = n/N, so that all variables go between 0 and 1 as n = 0, .., N. I will also rescale the y-axis, to show not PN (n), but NPN (n). The N reason for this is the following: we know that the normalization condition is n=0 PN (n) = 1 N 1 → n=0 N (NPN (n)) = 1. But 1/N = δx is the spacing between two allowed points,P so we can interpret theP sum as being an approximation of the “area” below the curve generated by these points. To keep this area “constant” and make comparisons more easy, we then have to plot NPN (n = xN).

8 N PN(n) 20 For p = 1/2, the entire dependence on n comes N=10 from the combinatorics factor of N!/(n!(N n)!) N=100 (see Eq. (13)). [Note: you can easily− gen- 15 N=500 erate these numbers in Maple; the command binomial(N, n) will return the value of N!/(n!(N − 10 n)!)]. This shows us that this distribution is peaked at n = pN = N/2 simply because if all

5 other things are equal, there are many more ways of getting distinct sequences with n↑ = N/2 up and n↓ = N/2 down spins, than for any other values of 0 0 0.2 0.4 0.6 0.8 1 n↑ and n↓. This becomes more and more true (i.e., n/N probability of finding n very different from N/2 is For p = 1/2, the rescaled NPN (n) is plotted as a function ↑ of n/N. The probability is largest for n = hni = pN → smaller and smaller) the larger N is. For instance, x = 1/2, indeed, and this prediction becomes more accu- from the figure, we see that for N = 10, there is rate (the distribution is more peaked near 1/2) the larger still a decent chance to find less than 20% or more N is. The “width” of the distribution is characterized by the standard deviation. Here this width is also rescaled than 80% of spins “up”, whereas for N = 500, it (divided by N, because of rescaled x-axis). is already extremely unlikely that we will find less than 40% or more than 60% spins up. This is expected, because on this scale, the width of the distribution is (n n )2 /N = p(1−p) 0 if N . We will generally be interested in chains h −h i i N → → ∞ with N 1023 spins! In that case,q this distribution is so peaked (its width is so tiny), that we are virtually∼certain to always find exactly N/2 spins up. This is why statistics works for large systems. For small samples, though, fluctuations from the “average” expected result can be quite considerable. Let’s see one more plot. If p = 1/2, then the factor n N−n 6 N PN(n) p (1 p) will shift the maximum away from the 25 value−n/N = 1/2 favored by combinatorics, to the N=10 new n /N = p. Again, the distribution is more 20 N=100 h i N=500 highly peaked around this value (i.e., the chance of finding n = n = pN is larger and larger) the 15 larger N is. Whath i this tells us is that if we turn on a magnetic field so that the chance of having 10 a spin-up is, say, p = 20%, then for a very long chain with N 1 spins, we are virtually assured 5 to find exactly≫pN spins up. The fluctuations from √ 0 this value are of order 1/ N, so they become neg- 0 0.2 0.4 0.6 0.8 1 n/N ligible for really big samples, but could be quite Same as before, but now for p = 0.2. important for small samples. The binomial distribution has two limiting cases which are very famous on their own right, so let’s investigate them. The first one is the Poisson distribution, which is obtained in the limit p 0,N such that pN = a = const. →In this→ limit, ∞ the characteristic function becomes:

F (x) = [1 p(1 x)]N = eN ln[1−p(1−x)] e−Np(1−x) = e−a(1−x) − − → where I used the Taylor expansion ln(1 x) x if x 1. − ≈ − ≪

9 5.2 The Poisson distribution As just discussed, it is defined by the characteristic function: ∞ n ∞ −a(1−x) −a a n n F (x)= e = e x = pnx n n! n X=0 X=0 so we have an p = e−a n n! for any integer n. Using F (x) it is straightforward to find: n = a; n2 n 2 = a h i h i−h i

The second limit of interest, for the Bernoulli distribution, is for N 1 and n n = pN 1. The reason we are interested in this is because we will generally be working≫ with large∼h iN numbers.≫ In this case, we can use the Stirling approximation for the various , since they all corre- spond to very large numbers. We find:

ln PN (n)=ln N! ln n! ln(N n)! + n ln p +(N n) ln(1 p) − − − − − After replacing ln N! N ln N N, etc, and some rearrangements, we find: ≈ − N N n p ln PN (n) N ln + n ln − + n ln + N ln(1 p) ≈ N n n 1 p − − − We would like to find the maximum of this distribution. Since n 1, let’s treat it as if it is a continuous variable (we’ve already done this in the discussion of thermodynamics,≫ where we wrote dN for the differential of the number N of particles. Strictly speaking that is a discrete number and the differential makes no sense; but if N 1023, then a dN 1 is really infinitesimal). If n is treated as continuous, to find the maximum we∼ simply take the derivative∼ :

d ln PN (n) N N n 1 1 p N n p + ln − + n + ln = ln − + ln dn ≈ N n n −N n − n 1 p N 1 p − µ − ¶ − − The maximum is at the value nM for which

N nM p N nM p ln − = 0 − = 1 nM = Np " N 1 p# → N 1 p → − − Which is correct, we know that this is the expected result for a binomial distribution. But we now have verified that our approximations have not ruined this. One can easily check that this is a maximum, by computing: 2 d ln PN (n) 1 1 n n = = dn2 | = M −Np(1 p) − (n n )2 − h −h i i using again a result obtained for the binomial distribution. What we really want, is an expression for the probability distribution in the vicinity of this maximum at nM , because from all that we’ve discussed so far, we know that only there is the probability going to be large. Using a Taylor expansion, we have:

1 1 2 ln PN (n) ln PN (nM ) (n nM ) + ... ≈ − 2 Np(1 p) − − 10 or in other words, (n−hni)2 − 2 P (n)= P (nM )e 2h(n−hni) i where P (nM ) is some number (a normalization constant). A probability of this type, which applies for continuous variables, is called a gaussian distribution.

5.3 Gaussian distribution Let x be a continuous variable, and assume we have a random event whose result can be x (eg: throw a dart at a target and measure the x position of its location). In this case it makes no sense to ask for probability to obtain an x; what is meaningful is to discuss the probability to obtain a value between x and x + dx. By definition, this is dp = P (x)dx, where now P (x) is called the density of probability. For a gaussian distribution, the density of probability has the general form:

− a (x−b)2 P (x)= Ce 2 where a, b, C are some constants. Let’s find their meanings. First, we find C from the normalization condition: ∞ 2π a dxP (x)=1= C C = s Z−∞ a → r2π (our friend the gaussian integral). Let’s find the average value of x. Generalizing the definition we had for discrete variables, ∞ ∞ x = xdp = dxxP (x) h i Z−∞ Z−∞ For our gaussian distribution:

∞ ∞ a − a (x−b)2 a − a y2 x = dxxe 2 = dy(y + b)e 2 = b h i r2π Z−∞ r2π Z−∞ 2π (this integral is a sum of a I0(a/2) = a and a I1(a/2) = 0, see discussion around Eq. (5)). So b is simply the average value of the result.q This is expected, since P (x) has its maximum at value b, and P (b + δx)= P (b δx), i.e. finding a value larger than b by δx is as probable as finding a value smaller than b by δx −(assuming we’re looking in a neighborhood of width dx which is the same in both case). So clearly, b must be the average. Finally, ∞ (x x )2 = dx(x b)2P (x) h −h i i Z−∞ − since x = b. After changing variables to y = x b, this integral is proportional to I2(a/2), and we find: h i − 1 (x x )2 = (∆x)2 = h −h i i h i a In other words, we can write:

2 1 − (x−hxi) P (x)= e 2h(∆x)2i (14) 2π (∆x)2 h i q

11 P(x)

C We can now see the exact meaning of (∆x)2 , h 2 i from the fact that for x± = x 2 (∆x) ln 2 h i ± h i we have q C/2 P (x ) 1 ± = e− ln 2 = P ( x ) 2 h i i.e, at a distance 2 (∆x)2 ln 2 away from the h i x x maximum, the probabilityq distribution has de- − b= x +=b+ 2ln(2)/a Gaussian distribution: it has a maximum height given creased to half of its maximum value. In other words, (∆x)2 characterizes the width of this by C ∼ 1/a, it is centered at hxi = b and has a width h i proportional to 1/a = h(∆x)2i. distributionq (see Figure). p ← p p This completes our short introduction to probabilities and some of the famous examples that we will encounter in this course.

12