Cambridge University Press 052184472X - Elements of Distribution Theory Thomas A. Severini Excerpt More information
1
Properties of Probability Distributions
1.1 Introduction Distribution theory is concerned with probability distributions of random variables, with the emphasis on the types of random variables frequently used in the theory and application of statistical methods. For instance, in a statistical estimation problem we may need to determine the probability distribution of a proposed estimator or to calculate probabilities in order to construct a confidence interval. Clearly, there is a close relationship between distribution theory and probability theory; in some sense, distribution theory consists of those aspects of probability theory that are often used in the development of statistical theory and methodology. In particular, the problem of deriving properties of probability distributions of statistics, such as the sample mean or sample standard deviation, based on assumptions on the distributions of the underlying random variables, receives much emphasis in distribution theory. In this chapter, we consider the basic properties of probability distributions. Although these concepts most likely are familiar to anyone who has studied elementary probability theory, they play such a central role in the subsequent chapters that they are presented here for completeness.
1.2 Basic Framework The starting point for probability theory and, hence, distribution theory is the concept of an experiment. The term experiment may actually refer to a physical experiment in the usual sense, but more generally we will refer to something as an experiment when it has the following properties: there is a well-defined set of possible outcomes of the experiment, each time the experiment is performed exactly one of the possible outcomes occurs, and the outcome that occurs is governed by some chance mechanism. Let denote the sample space of the experiment, the set of possible outcomes of the experiment; a subset A of is called an event. Associated with each event A is a probability P(A). Hence, P is a function defined on subsets of and taking values in the interval [0, 1]. The function P is required to have certain properties:
(P1) P( ) = 1 (P2) If A and B are disjoint subsets of , then P(A ∪ B) = P(A) + P(B).
1
© Cambridge University Press www.cambridge.org Cambridge University Press 052184472X - Elements of Distribution Theory Thomas A. Severini Excerpt More information
2 Properties of Probability Distributions
(P3) If A1, A2,...,are disjoint subsets of , then ∞ ∞ P An = P(An). n=1 n=1 Note that (P3) implies (P2); however, (P3), which is concerned with an infinite sequence of events, is of a different nature than (P2) and it is useful to consider them separately. There are a number of straightforward consequences of (P1)–(P3). For instance, P(∅) = 0, c c if A denotes the complement of A, then P(A ) = 1 − P(A), and, for A1, A2 not necessarily disjoint,
P(A1 ∪ A2) = P(A1) + P(A2) − P(A1 ∩ A2).
Example 1.1 (Sampling from a finite population). Suppose that is a finite set and that, for each ω ∈ , P({ω}) = c for some constant c. Clearly, c = 1/| | where | | denotes the cardinality of . Let A denote a subset of . Then |A| A = . P( ) | | Thus, the problem of determining P(A) is essentially the problem of counting the number of elements in A and .
Example 1.2 (Bernoulli trials). Let
n ={x ∈ R : x = (x1,...,xn), x j = 0or1, j = 1,...,n}
so that an element of is a vector of ones and zeros. For ω = (x1,...,xn) ∈ , take n − P(ω) = θ x j (1 − θ)1 x j j=1 where 0 <θ<1 is a given constant.
Example 1.3 (Uniform distribution). Suppose that = (0, 1) and suppose that the prob- ability of any interval in is the length of the interval. More generally, we may take the probability of a subset A of to be P(A) = dx. A Ideally, P is defined on the set of all subsets of . Unfortunately, it is not generally possible to do so and still have properties (P1)–(P3) be satisfied. Instead P is defined only on a set F of subsets of ;ifA ⊂ is not in F, then P(A) is not defined. The sets in F are said to be measurable. The triple ( ,F, P) is called a probability space; for example, we might refer to a random variable X defined on some probability space. Clearly for such an approach to probability theory to be useful for applications, the set F must contain all subsets of of practical interest. For instance, when is a countable set, F may be taken to be the set of all subsets of . When may be taken to be a
© Cambridge University Press www.cambridge.org Cambridge University Press 052184472X - Elements of Distribution Theory Thomas A. Severini Excerpt More information
1.2 Basic Framework 3
Euclidean space Rd , F may be taken to be the set of all subsets of Rd formed by starting with a countable set of rectangles in Rd and then performing a countable number of set operations such as intersections and unions. The same approach works when is a subset of a Euclidean space. The study of theses issues forms the branch of mathematics known as measure theory. In this book, we avoid such issues and implicitly assume that any event of interest is measurable. Note that condition (P3), which deals with an infinite number of events, is of a different nature than conditions (P1) and (P2). This condition is often referred to as countable additiv- ity of a probability function. However, it is best understood as a type of continuity condition on P. It is easier to see the connection between (P3) and continuity if it is expressed in terms of one of two equivalent conditions. Consider the following:
(P4) If A1, A2,...,are subsets of satisfying A1 ⊂ A2 ⊂···, then ∞ P An = lim P(An) n→∞ n=1
(P5) If A1, A2,...,are subsets of satisfying A1 ⊃ A2 ⊃···, then ∞ P An = lim P(An). n→∞ n=1
Suppose that, as in (P4), A1, A2,...is a sequence of increasing subsets of . Then we may take the limit of this sequence to be the union of the An; that is, ∞ lim An = An. n→∞ n=1 Condition (P4) may then be written as P lim An = lim P(An). n→∞ n→∞ A similar interpretation applies to (P5). Thus, (P4) and (P5) may be viewed as continuity conditions on P. The equivalence of (P3), (P4), and (P5) is established in the following theorem.
Theorem 1.1. Consider an experiment with sample space . Let P denote a function defined on subsets of such that conditions (P1) and (P2) are satisfied. Then conditions (P3), (P4), and (P5) are equivalent in the sense that if any one of these conditions holds, the other two hold as well.
Proof. First note that if A1, A2,... is an increasing sequence of subsets of , then c, c,... = , ,..., A1 A2 is a decreasing sequence of subsets and, since, for each k 1 2 k c k = c , An An n=1 n=1