Probability and Statistics Basic DeNitions

Probability and statistics Basic denitions • Statistics is a mathematical discipline that allows us to understand phenomena shaped by many events that we cannot keep track of. Since we miss information to predict the individual events, we consider them random. However, the collective eect of many such events can usually be understood and predicted. Precisely this problem is the subject of statistical mechanics, which applies mathematical statistics to describe physical systems with many degrees of freedom (whose dynamics appears random to us due to complexity). • Events or outcomes are mathematically represented by values that a random variable x can take in a measurement. Most generally, if the random variable can have any value from a set X, then an event is dened by a subset A ⊆ X. If one measures x and nds x 2 A, then the event A occurred. The most elementary event fxg is specied by a single value that x can take. • Ensemble and probability: One assumes that every measurement is done in exactly the same conditions and independently of all other measurements. A way to facilitate such a complete equality of all measurements is to prepare an ensemble of N identical systems and perform one measurement of x on each. If one obtains N (x) times a measurement outcome x, then the objective probability p(x) of this outcome is dened as: N (x) p(x) ≡ p(fxg) = lim N !1 N For non-elementary events: X N (x 2 A) p(A) = p(fxg) = lim N !1 N x2A where N (x 2 A) is the number of events A in N measurements (the number of times it was found that x 2 A). One must often estimate an objective probability using fundamental principles, as in statistical mechanics; such estimates can be considered subjective probabilities. • In general, probability has the following properties (which trivially follow from the denition): 0 ≤ p(x) ≤ 1 (8x 2 X) X p(x) = p(X) = 1 x2X • An event represented by the union A [ B of two sets A and B occurs if either one of the events A or B occurs. It is possible for both events A and B to occur at the same time if their intersection A \ B is not an empty set. However, we consider A and B as outcomes of a single measurement on a scalar random variable. The events A and B are not independent since the occurrence of one can exclude the other (when their intersection is empty). Under these assumptions, the following can be easily deduced from the denition of probability: p(A [ B) = p(A) + p(B) − p(A \ B) • Two events A and B are independent if they correspond to the measurements of two uncorrelated random variables a 2 Xa and b 2 Xb. The event A occurs if a 2 A and the event B occurs if b 2 B. If we label by AB (or A ⊗ B) the event in which both A and B happen (mathematically, (a; b) 2 A ⊗ B), then it can be easily shown from the denition of probability that: p(AB) = p(A)p(B) • Conditional probability. If an event B occurs (and one observes it), then the conditional probability p(AjB) that an event A would occur as well at the same time is: p(AB) p(AjB) = p(B) If A and B are independent, then the probability of A has nothing to do with B and p(AjB) = p(A). 14 • If the random variable x takes values from a continuous set X (such as real numbers), then one can sensibly dene probability only for a range of values, e.g.: N(x1 ≤ x < x2) p(x1; x2) = lim N!1 N where N(x1 ≤ x < x2) is the number of outcomes x1 ≤ x < x2 among N experiments. Cumulative probability function: P (x) = p(−∞; x) is the probability that a measurement outcome will be smaller than x. The most direct substitute of the probability function itself is probability distribution function (PDF), or probability density: dP (x) f(x) = dx The probability that a measurement outcome will be in the interval (x; x + dx) is f(x)dx. The PDF is normalized: 1 f(x)dx = 1 ˆ −∞ • The average, or expectation value, of the random variable x 2 X is labeled as hxi or x¯, and dened by: X hxi = xp(x) or hxi = dx xf(x) ˆ x2X for discrete and continuous distributions respectively. It is the single value that represents best the entire distribution of random outcomes. If the distribution scatters little about the most probable outcome xmost, then hxi ≈ xmost (but they are generally not equal). If the distribution scatters symmetrically about x0 (which need not be equal to xmost), then hxi = x0. The average value is especially useful for learning about the statistics of the sum of random variables. One similarly denes the expectation value of any function of x: X hg(x)i = g(x)p(x) or hg(x)i = dx g(x)f(x) ˆ x2X • The variance of a random distribution is dened by: X Var(x) = h(x − hxi)2i = (x − hxi)2p(x) x2X It is usually more convenient to calculate it as: h(x − hxi)2i = hx2 − 2xhxi + hxi2i = hx2i − 2hxihxi + hxi2 = hx2i − hxi2 Standard deviation σ = pVar(x) = ph(x − hxi)2i is the measure of how much the random outcomes scatter about the average value of the distribution (the average of the deviation from the mean; this is squared inside the average in order to avoid the cancellation of scatters in opposite directions, but the square root eventually undoes this squaring and makes σ directly comparable to the outcomes of x). A sharp distribution that scatters little has a small σ. 15 Uniform distribution • The random variable x whose all possible outcomes are equally likely belongs to the uniform probability distribution U(X). If the set X of all possible outcomes is nite and has N elements, then: 1 p(x) = (8x 2 X) N Otherwise, if X = (a; b) is a continuous range of real numbers between a and b, then: 1 f(x) = b − a b b 1 1 b2 − a2 b + a hxi = dx xf(x) = dx x = = ˆ b − a ˆ b − a 2 2 a a Binomial distribution • The random variable N X n = xi i=1 where 1 ; with probability p x 2 (8i) i 0 ; with probability 1 − p belongs to the binomial distribution: n ∼ B(N; p). This distribution describes the statistics of the number m of random events xi, which independently occur with probability p. Example applications are tossing a coin, throwing a dice (how many times n one gets a particular number on the dice with 1 in attempts), random walk (how far does a random walker reach after steps if the p = 6 N n N probability of a forward step is p), etc. • Let W (n) be the probability that n events will occur in N attempts. In order to determine W (n) experimentally, one must carry out M times a sequence of N measurements of the random variables xi, and count how many times M(n) there were n positive xi = 1 outcomes in a sequence. We will determine this probability analytically. Consider a single sequence. The probability that the sequence will be 011 ··· 01, for example, is equal to the probability that x1 = 0 (1 − p) and x2 = 1 (p) and x3 = 1 (p) and... xN−1 = 0 (1 − p) and xN = 1 (p). Since all individual events are independent, the probability of the whole sequence is the product of individual event probabilities: W (011 ··· 01) = (1 − p) · p · p ··· (1 − p) · p The probability of a particular long sequence can be very small because the number of possible sequences is very large. If there were n outcomes xi = 1 in the sequence, and hence N − n outcomes xi = 0, then: W (sequence) = pn(1 − p)N−n There are dierent sequences that have the same number n of positive outcomes. They are also independent and non-overlapping, so their probabilities add up: N W (n) = pn(1 − p)N−n n where the binomial coecient N N! (N − n + 1)(N − n + 2) ··· (N − 1)N ≡ = n n!(N − n)! n! 16 is the number of sequences that have n positive outcomes, i.e. the number of ways one can select n elements from a set of N elements (there are N ways to select the 1st element, then N − 1 ways to select the 2nd element, down to N − n + 1 ways to select the last nth element from the remaining ones; the numerator on the right is the number of possible selections, but among them each group of selected elements is assembled in n! dierent orders and we do not care about the order). • Normalization: N N X X N h iN W (n) = pn(1 − p)N−n = p + (1 − p) = 1 n n=0 n=0 To see this, just consider brute-force expanding the square brackets above (without disassembling the brackets for 1 − p). There are N copies of [p + (1 − p)] that are multiplied together and generate the sum of all possible products of N factors that can be either p or 1 − p... • The expectation value of binomial distribution B(N; p) is pN, which can be easily understood given that it represents the number of positive outcomes of probability p out of N attempts (recall the very denition of objective probability). Formally: N N N X X N! X N hni = nW (n) = n × pn(1 − p)N−n = pn(1 − p)N−n n!(N − n)! (n − 1)!(N − n)! n=0 n=1 n=1 N X (N − 1)! = pN pn−1(1 − p)(N−1)−(n−1) (n − 1)![(N − 1) − (n − 1)]! n=1 N−1 X N − 1 0 0 = pN pn (1 − p)N−1−n n0 n0=0 = pN The last line follows from the normalization of the binomial distribution B(N − 1; p).

Load more

Probability and Statistics Basic DeNitions

Probability and Statistics Basic DeNitions