<<

Outcomes, , Countability, Measures and Probability OPRE 7310 Lecture Notes by Metin C¸akanyıldırım Compiled at 02:01 on Thursday 20th August, 2020

1 Why Probability?

We have limited information about the experiments so we cannot know their outcomes with certainty. More information can be collected, if doing so is profitable, to reduce uncertainty. But some amount of uncertainty always remains as information collection is costly and might even be impossible or inaccurate. So we are often bound to work with probability models. Example: Instead of forecasting the of probability books sold at the UTD bookstore tomorrow, let us ask everybody (students, faculty, staff, residents of Plano and Richardson) if they plan to buy a book tomor- row. Surveying potential customers in this manner is always possible. But surveys are costly and inaccurate. 

Before setting up probability models, we observe experiments and their outcomes in real-life to have a sense of what is likely to happen. Deriving from observations is the field of Statistics. These inferences about the likelihood of outcomes become the ingredients of probability models that are designed to mimic the real-life experiments. Probability models can later be used to make decisions to manage the real-life contexts. Example: Some real-life experiments that are worthy of probability models are subatomic particle collisions, genetic breeding, weather forecasting, financial securities, queues. 

2 An Event – A Collection of the Outcomes of an Experiment

Outcome of an experiment may be uncertain before the experiment happens. That is, the outcome may not be determined with (sufficient) certainty ex ante. Here the word experiment has a broad meaning that covers more than laboratory experiments or on-site experiments. It covers any action or activity whose outcomes are of interest. This broader meaning is illustrated with the next example. Example: As an experiment, we can consider an update of Generally Accepted Accounting Principles (GAAP) issued by Financial Accounting Standards Board (FASB.org). Suppose that the board is investi- gating an update of reporting requirements for startups (more formally, Development Stage Entities). The board can decide to keep (K) the status quo, increase (I) the reporting requirements or decrease (D) them. Although accounting professionals can assess the likelihood of each of the outcomes K, I and D, they cannot be certain whether the board’s discussion will lead to I or D, or K, so the outcomes of the update experiment are uncertain. 

Sufficiency of certainty depends on the intended use of the associated probability model. A room ther- mostat may be assumed to be showing the room temperature with sufficient certainty for the purpose of controlling the temperature with an air conditioner. The same thermostat may have insufficient certainty for controlling the speed of a heat releasing chemical reaction. When the uncertainty is deemed to be sufficient, it can be reduced, say by employing a more accurate thermostat. Or a probabilistic model can be designed to incorporate the uncertainty, say by controlling the average speed of the reaction.

1 Example: Outcomes of a dice rolling experiment are 1, 2, 3, 4, 5 and 6. For a fair dice, each outcome is (sufficiently) uncertain. 

Each outcome of an experiment can be denoted generically by ω or indexed as ωi for specificity. Often these outcomes are minimal outcomes that cannot be or are not preferred to be separated into several other outcomes. Such minimal outcomes can be called elementary outcomes. Two elementary outcomes cannot occur at once, so elementary outcomes are mutually exclusive. Then elementary outcomes can be collected to obtain a of outcomes Ω, that is generally called the sample space.

Example: For the experiment of updating the accounting principles, ω1 = K, ω2 = I, ω3 = D and Ω = {K} ∪ {I} ∪ {D} = {K, I, D}. 

Example: For the experiment of rolling a dice, ωi = i for i = 1, . . . , 6 and Ω = {1, 2, 3, 4, 5, 6}. 

An event is a collection of the outcomes of an experiment. So, an event is a of the sample space, i.e., a non-empty event A has ω ∈ A ⊆ Ω for some ω. An event can be empty and is denoted by A = ∅. We are not interested in impossible events in practice, the consideration of ∅ is useful for theoretical construction of probability models. Example: For the experiment of updating the accounting principles, the event of not increasing the reporting requirements can be denoted by {K, D}.  Example: For the experiment of rolling a dice, the event of an even outcome is {2, 4, 6} and the event of no outcome is ∅.  Example: Consider the collision of two hydrogen atoms on a plane. One of the atoms is stationary at the origin and is hit by another moving from the left to right. After the collision, the atom moving from left to right can move to the 1st, 2nd, 3rd or 4th quadrant. The sample space for the movement of this atom is {1, 2, 3, 4} and the event that it bounces back (moves from right to left after the collision) is {2, 3}. 

Since an event corresponds to a set in the sample space, we can apply set operations on events. In particular, for two events A and B, we can speak of intersection, , set difference. If the intersection of two events is empty, they are called disjoint: If A ∩ B = ∅, then A and B are disjoint. Example: In an experiment of rolling a dice twice, we can consider the sum and the multiplication of the in the first and second rolls. Let A be the event that the multiplication is odd; B be the event that the sum is odd; C be the event that both multiplication and sum are odd; D be the event that both multiplication and sum are even. The outcomes in A have both the first and second number odd, while the outcomes in B have an odd number and an even number. Hence no outcome can be in both event A and B, which turn out to be disjoint events: A ∩ B = ∅ = C. To be in D, an outcome must have both numbers even. In each outcome, either both numbers are odd (so multiplication is odd and sum is even =⇒ A); or one is odd while the other is even (so multiplication is even and sum is odd =⇒ B); or both numbers are even (so multiplication and sum are even =⇒ D). Hence, A ∪ B ∪ D = Ω. To convince yourself further, you can enumerate each outcome and see whether it falls in A or B or D by completing Table 1. 

Some experiments are truly physical but their outcome is treated as a in probability contexts. A good examples are rolling a die and flipping a coin. If we put a coin head up on our thumb and throw it up with no spin, it comes down head up. Then there is no flip or randomness. Some bakery cooks flatten the dough bread an throw it in the air without any flips1. Similarly, good players can consistently

1Indian cook throws dough for six metres https://www.youtube.com/watch?v=VhoiBIdz dU

2 Table 1: Sample space Ω for two dice rolls is composed of pairs (i, j) for i, j = 1 . . . 6. What is shown in each cell below are (multiplication=ij,sum=i + j) and the associated event. Second Roll 1 2 3 4 5 6 1 (1, 2), A (2, 3), B (3, 4), A (4, 5), B (5, 6), A (6, 7), B 2 (2, 3), B (4, 4), D (6, 5), B (8, 6), D (10, 7), B (12, 8), D First 3 (3, 4), (6, 5), (9, 6), (12, 7), (15, 8), (18, 9), Roll 4 (4, 5), (8, 6), (12, 7), (16, 8), (20, 9), (24, 10), 5 6 throw a frisbee or a football without a flip to facilitate a catch. We readily accept the absence of random- ness or a flip in the throws of flattened doug, friebee and football but make numerous examples based on randomness in throws of a coin. Perhaps, there is less rasndomness in coin flips than we are used to.

Example: Suppose a coin is flipped with a vertical speed of v| and stays in the air for tair = 2v|/g seconds where g is the gravitational acceleration constant. The same coin has the angular speed of v◦ given in terms of revolutions per second. Then the coin makes v◦tair revolutions in the air. If the coin has head up and is thrown up to make 1 revolution in the air, it comes down as head up. If it makes (1/4, 3/4) revolutions in the air, it comes down as tail. In general, it comes down tail after (k + 1/4, k + 3/4) revolutions for k = 0, 1, 2 . . . . Said differently, v◦tair ∈ (k + 1/4, k + 3/4) for k = 0, 1, 2 . . . guarantees an outcome of tail. To start with head up and end with tail up (H → T), we need to relate vertical and angular speeds to each other:   v| 1 1 3 H → T if v◦ ∈ k + , k + for k = 0, 1, 2 . . . . g 2 4 4

Figure 1 shows the regions of speed that maintain head (H → H) or switch head to tail (H → T). These regions are separated by hyperbolas of the form v◦v| = constant. From our experience, we can say that a coin stays in the air for about half a second, so v|/g ≈ 0.25 and the left-hand side of Figure 1 is more relevant in practical applications than its right-hand side. If we can fix v|/g = 1/4,  1 3 H → T if v◦ ∈ 2k + , 2k + for k = 0, 1, 2 . . . . 2 2

By inspection, we can say a coin has k = 20 revolutions in each throw, which requires v◦ ∈ (40.5, 41.5) revo- lutions per second for H → T. A slightly lower angular speed of (39.5, 40.5) or higher speed of (41.5, 42.5) yield H → H. To control the outcome of the coin toss, one has to train his/her thumb to flip coins with con- sistent angular speeds as an error of 2% variation in the speed yields the opposite outcome. If the speeds v◦ and v| can be adjusted by a well-calibrated thumb (or a machine under some ideal conditions as in Diaconis et al. 2007), the outcome of the coin flip experiment is not random but controllable (manipulatable). 

3 Counting Countable Outcomes

When outcomes are countable, they can be finite or infinite.

3.1 Finite Outcomes 3.1.1 Multiplication Principle We can manually count the outcomes of an experiment if the outcomes are finite. In the experiment of up- dating the accounting principles, the experiment of rolling a dice and the experiment of rolling a dice twice,

3 ݒ௢ ͸ Ͷ

ͷ Ͷ 1

͵ ܪ ՜ ܪ Ͷ ܪ ՜ ܶ ܪ ՜ ܪ ݒ௢ ൌ ͻ݃Ȁͺ כ ݒȁ ʹ Ͷ ȁ ௢ ݒ ൌ ͹݃Ȁͺ כ ՜ ܶ ݒ ܪ ݒ௢ ൌ ͷ݃Ȁͺ כ ͳ ݒȁ Ͷ ݒ௢ ൌ ͵݃Ȁͺ כ ݒȁ ܪ ՜ ܪ 1 ݒ௢ ൌ2 ݃Ȁͺ כ ݒȁ ͳ ʹ ͵ ͷ ͸ ͹ ݒȁ Ͷ Ͷ Ͷ Ͷ Ͷ Ͷ Figure 1: H → T if g/8 ≤ v◦v| ≤ 3g/8 or 5g/8 ≤ v◦v| ≤݃ 7g/8. the number of outcomes are respectively 3, 6 and 36. These numbers are found by manually counting the outcomes. Sometimes instead of manually counting, we use the multiplication principle of counting illustrated in the next example. Example: An online dress retailer carries 3 styles of lady dresses: Night dress, Corporate dress and Sporty dress. Each style has 20 cuts, 8 sizes and 5 different colors. A stock keeping unit (sku) for the online retailer is defined by the dress style, cut, size and color as these four characteristics fully describe the dress item and are used to buy dresses from the suppliers. The number of skus for this retailer is 3 × 20 × 8 × 5. 

When the outcome of an experiment is defined by K characteristics that are independent of each other, we can use the multiplication principle. We start by enumerating the number of ways characteristic k can materialize and denote it by nk. Then the number of outcomes is n1n2 ... nK. For the example of the online retailer above, nstyle = 3, ncut = 20, nsize = 8, ncolor = 5 for the set of characteristics {style, cut, size, color}. Another way to denote these is to set 1 := style, 2 := cut, 3 := size, 4 := color, so K = 4 and n1 = 3, n2 = 20, n3 = 8, n4 = 5. If the characteristics are not all independent of each other, we can still use the multiplication principle with some adjustments. Example: After a market research study, the online dress retailer decides to customize its offerings. It offers 22 cuts of Night dresses, 18 cuts of Corporate dresses and 34 cuts of Sporty dresses. Night dresses need to fit more closely so they have 10 sizes, while Corporate and Sporty dresses have respectively 8 and 6 sizes. The number of skus become (22 × 10 + 18 × 8 + 34 × 6) × 5. In this case, the color is independent of other characteristics. Within each style, the cut and the size characteristics are independent. 

3.1.2 Permutations There are other ways of counting the outcomes of experiments. Counting permutations is one of them. Example: The online retailer intends to show each dress in 5 different colors side by side on its web site so that customers can easily compare the colors and buy the one(s) they like. The primary five colors are White (W), B (Black), L (Blue), R (Red), Y (Yellow). The intention is that the customer picks the dress in each color and brings it into a cell in a 5 × 1 table on the screen and compares the colors. To make this process

4 efficient, the online retailer asks the web designer to restrict customers so that they can pick each color exactly once. Some example outcomes are [W, B, L, R, Y], [B, L, R, Y, W], [L, R, Y, W, B], [R, Y, W, B, L] , [Y, W, B, L, R]. The number of ways 5 colors can be put in the 5 × 1 table without repeating the colors is the number of permutations of colors. There are 5 color choices for the first cell, 4 color choices for the second, 3 choices for the third, 2 choices for the fourth, only 1 choice for the last. Using the principle of multiplication, 5 colors can have 5 × 4 × 3 × 2 × 1 permutations.  In general, n distinct objects can have n!:= n × (n − 1) × · · · × 2 × 1 permutations of length n. If the n permutation length is k ≤ n, then the number of such permutations is Pk := n × (n − 1) × · · · × (n − k + 1), n which has multiplication of exactly k terms. Said differently, Pk is the number of permutations of k objects out of n n distinct objects. Pk is referred to as k-permutations-of-n. As in the online retailer’s color example, sometimes objects are virtual and can be repeated (picked up) as many times as necessary. This is often referred to as sampling with repetition. Example: Despite the online retailer’s specification, the web designer cannot restrict the customers to pick a color only once. For example, W can be picked up for the first and second cells in the 5 × 1 table. Then the colors are sampled by the customers with repetition and we cannot speak of permutations. Rather we can ask the number of ways 5 colors can be put in the table with repetition. There are 5 color choices for the first cell, 5 color choices for the second, 5 choices for the third, 5 choices for the fourth and 5 choice for the last. Once more using the principle of multiplication, 5 colors can be placed in 5 cells with repetition in 5 × 5 × 5 × 5 × 5 ways. 

In general, n distinct objects can be put in k boxes with repetition in nk ways. Repetitions increase the number k n of ways objects can be organized: n > Pk .

3.1.3 Combinations A key question in counting the outcomes is whether the of objects in an outcome makes it different from another outcome including exactly the same objects. Consider [W, B, L, R, Y] vs. [B, L, R, Y, W] of the colors, we considered these two as different above and wrote them as a vector by using square brackets. If the comparison of colors depends on the sequence of colors, the sequence matters and customers perceive [W, B, L, R, Y] different from [B, L, R, Y, W]. If the sequence does not matter both [W, B, L, R, Y] and [B, L, R, Y, W] have the same colors and can be mapped to the set {W, B, L, R, Y}. Such mapping is many-to- one because many sequences boil down to the same set of objects. Example: Suppose that the web designer has created a 3 × 1 table and can restrict customers to put at most one color in each cell. If the sequence matters and sampling is without repetition, the number of placing 5 5 colors in 3 cells is P3 = 5 × 4 × 3 = 60. Now consider colors B, R, Y and the sequences that can be generated only from these three colors: [B, R, Y], [B, Y, R], [R, B, Y], [R, Y, B], [Y, B, R], [Y, R, B]. It is easy to see that 3 colors can make 6=3! sequences, so the mapping from the set of sequences to the set of items is (3!)-to-(1). In other words, when we start treating different sequences with the same items as the same sets, the number of outcomes based on sequences should drop by a factor of 6 to obtain the number of outcomes based on sets. If the sequence does not matter and sampling is without repetition, the number of placing 5 colors in 3 cells 5 is P3 /3! = 5 × 4 × 3/6 = 10. 

From above, the number of sequences with length k that can be made without repetition from n items is n Pk . When we consider different sequences with the same items as the same set, the number of sets become one k!th of the number of sequences. Hence, the number of including exactly k items out of n distinct items n n n is Ck := Pk /k!. Ck is referred to as k-choose-from-n. Picking subsets from sets is called making combinations. Example: In what is called a combination lock (often with 4 digits), there are several concentric dials, each with digits {0, 1, 2, . . . , }. The lock unlocks when all dials show previously chosen digits in the correct order.

5 These previously chosen digits and their order act like a password for the lock. For the lock, the sequence of digits matter, e.g., 1234 is different from 4321. So this sort of lock should be called a permutation lock as opposed to a combination lock. 

Example: OM area has 20 Ph.D. and 200 Master of Science students. Faculty considers inviting 2 Ph.D. and 4 master students to a curriculum meeting. How many ways are there to choose 2 Ph.D. and 4 master stu- 20 200 dents? There are C2 ways to choose Ph.D. students and C4 ways to choose master students, so the number 20 200 of ways is C2 C4 . 

n The number of combinations of k objects taken out of n objects is Ck . When we are picking k objects, we are making up two subsets - objects picked and objects unpicked. What happens if we are to make up r r subsets out of n objects such that subset i has ki objects and ∑i=1 ki = n. The number ways r subsets can be n made up is C = n!/(k !k !... kr!). k1,k2,...kr 1 2 Example: 11 Ph.D. students are to be assigned to 4 professors: Ganesh, Shun-Chen, Anyan and Metin so that Ganesh and Anyan have 4 students each and Shun-Chen has 2 students while Metin has 1 student. How many assignments are possible? We are splitting students into 3 subsets with nG = nA = 4 and nS = 2, nM = 1 while nG + nA + nS + nM = 11. The number of ways is 11!/(4!4!2!1!).  Example: How many distinct permutations can be obtained from the letters of Mississippi? Mississippi has 11 letters, 4 Is, 4 Ss, 2 Ps and 1 M, so the number of permutations is 11!/(4!4!2!1!). 

Given N distinct and ordered objects, a (permutation) cycle is a collection of the objects that replaced each other in the order. We use (·) to list the objects in a cycle. With 2 numbers 1 and 2, the permutation 12 has no cycles whereas the permutation 21 has the cycle (21). Writing this cycle as 1 ← 2 ← 1, we observe that the permutation becomes 21 with 2 taking position 1 and 1 taking position 2. Writing this cycle as 2 ← 1 ← 2, we observe that the permutation becomes 21 with 1 taking position 2 and 2 taking position 1. Clearly, cycles (21) and (12) are the same. Example: With 3 numbers 1, 2 and 3, there are 3! permutations. Permutation 123 has no cycles. Permutation 132 has the cycle (23). Permutation 213 has the cycle (12). Permutation 321 has the cycle (13): writing the cycle as 1 ← 3 ← 1, we observe that the permutation becomes 321 with 3 taking position 1 and 1 taking position 3. Permutation 231 has the cycle (123): writing the cycle as 1 ← 2 ← 3, 2 ← 3 ← 1 or 3 ← 1 ← 2, we observe that the permutation becomes 231 with 2 taking position 1, 3 taking position 2 and 1 taking position 3. Note that 1 ← 2 ← 3, 2 ← 3 ← 1 and 3 ← 1 ← 2 seem to be different but actually lead to the same permutation, so they are the same cycle; rotation of objects in a cycle does not lead to another cycle. Permutation 312 has the cycle (123): writing the cycle as 1 ← 3 ← 2, we observe that the permutation becomes 312 with 3 taking position 1, 2 taking position 3 and 1 taking position 2.  Example: Starting with N objects, how many k-cycles (of length k) can be constructed for 1 ≤ k ≤ N? First N we pick k objects to use in the cycle in Ck ways. Any permutation of these k objects is distinct cycle provided that it cannot be obtained by rotating any other cycle. The number of distinct k-cycles that can be made from N k given objects is (k − 1)!. So Ck (k − 1)! is the number k-cycles with N objects. 

While discussing combinations above, we have referred to the number of subsets, whose elements cannot repeat. So the combination discussion above pertains only to sampling without repetition. If repetition is allowed in a collection, the collection is called a multiset. Each set is a multiset, so the multiset notion is a generalization of the set notion. In a multiset, the total number of elements, including repetitions, is the of the multiset and the number of times an appears is the multiplicity of that element. Example: Suppose that we are to pick colors B, R, Y to create multisets of cardinality 3. hB, R, Yi is the unique multiset whose elements do not repeat, so {B, R, Y} is also a set. As repetition is allowed in a multiset, some

6 elements can be used twice or thrice while the others are not used at all. If Y is not used, we can still construct multisets hB, R, Ri, hB, B, Ri, hB, B, Bi and hR, R, Ri. If Y must be used, we can construct some other multisets hB, Y, Yi, hR, Y, Yi, hY, R, Ri, hY, B, Bi and hY, Y, Yi. The multiplicity of B in hB, R, Ri is xB = 1, while the multiplicity of B in hB, B, Ri is xB = 2. 

Using n distinct elements, how many multisets with cardinality k can we construct? Each multiset is uniquely identified by its multiplicities {x1, x2,..., xn}. Since we set the cardinality equal to k, we need to insist on x1 + x2 + ··· + xn = k. Also each multiplicity must be a (non-negative integer), i.e., xi ∈ N for i = 1 . . . n. The number of multisets with cardinality k is the number of solutions to

X := {x1 + x2 + ··· + xn = k, xi ∈ N for i = 1 . . . n}.

To find the number of solutions to X , we first consider a seemingly different problem. Suppose that we have n + k − 1 objects denoted by “+”.

+ + + ...... + + 1st 2nd 3rd ...... n + k − 2nd n + k − 1st

We encircle n − 1 of these + objects to obtain exactly n segments made up of some or no + objects. If j − 1st and jth encircled +s are next to each other, then the jth segment has no + or xj = 0. In general, xj is the number +s in the jth segment.

+ ... + L + ... + L ...... L + ... + 1st ... x1th 1st circle 1st ... x2nd 2nd circle ...... n − 1st circle 1st ... xnth, where the indexing of + objects restart from 1 after each encircled +. By using n − 1 circles, we have obtained n segments, each segment j has xj elements, the sum of xjs must be n + k − 1 minus n − 1 as we start with n + k − 1 objects and encircle n − 1 of them. Hence, x1 + x2 + ··· + xn = k. Each solution to X has a corresponding way of encircling +s, and vice versa. So the number of solutions to X is the number of ways n+k−1 n+k−1 we can encircle n − 1 objects out of n + k − 1 objects, which is Cn−1 = Ck . Example: How many multisets with cardinality k = 3 can be assembled from n = 3 colors B,R,Y? Plugging 5 in the numbers, the answer turns out to be C3 = 10. For this small problem, we can list all of these multisets: hB, R, Yi, hB, R, Ri, hB, B, Ri, hB, B, Bi, hR, R, Ri, hB, Y, Yi, hR, Y, Yi, hY, R, Ri, hY, B, Bi and hY, Y, Yi. If we add colors bLack and White, we have n = 5 colors to create more multisets with cardinality k = 3. The 7 number of such multisets is C3 = 35. 

Next table summarizes our discussion on finite outcomes. Table 2: Number of ways of constructing cardinality k permutations, strings, sets or multisets from n distinct objects. Sequence matters? Yes No n n No Pk permutations Ck sets Repeat? k n+k−1 Yes n strings Ck multisets

3.2 Infinite Outcomes Infinite outcomes can be generated from an experiment that can potentially be repeated infinitely many times. Therefore, the experiment itself should be repeatable.

7 Example: Throwing a coin, rolling a dice and calling a call center are experiments that can be repeated. Each time they are performed, they generate outcomes: {Head,Tail} for throwing a coin, {1, 2, 3, 4, 5, 6} for rolling a dice, {Busy,Available} for calling a call center. If we perform these experiments independently m times, the outcomes become {H, T}m, {1, 2, 3, 4, 5, 6}m, {B, A}m. Here superscript m denotes the applied m times, e.g., {H, T}2 := {H, T} × {H, T}. 

(Nearly-) Infinite outcomes need to be considered when we repeat an experiment until something (un-) desirable happens. If we are waiting for heads in a coin tossing experiment and recording the outcomes, we can see arbitrarily long sequences TT ... TTH. We can throw two dices simultaneously and wait until their sum turns out to be 7, then again arbitrarily long sequences of sums can be observed 6, 8, 12, 9, 9, 4, 3, . . . , 8, 10, 8, 11, 2, 7. Or a hacker can attempt to find a password of length 4 made out of digits {0, 1, . . . , 9} and keep attempting different 4-permutations of these 10 digits: 2012, 7634, 1803, etc. The sample space for the k-long permutations (passwords) made with n objects has nk elements. If the hacker is attempting random permutations to find a password, he may have to this infinitely many times. If he is enumerating all the permutations and testing each one by one, he need to do this only 10,000 times. A set is countable if each of its elements can be associated with a single natural number. The sample space for the experiment of waiting for heads has infinite elements but it is countable. This sample space has T, HT, HHT, HHHT, and so on. Associating the number of H’s in the outcome with exactly the same natural number (including 0), we can see that the sample space is countable. The sample space for rolling two dices until obtaining the sum of 7 is also countable. The appendix has more on countability of sets and shows that rational numbers are countable while real numbers are not.

Example: In the experiment of throwing a coin m times, let xi = 1 if the ith throw turns out to be head; m otherwise, xi = 0. After the mth experiment, we can compute the frequency of head X(m) := ∑i=1 xi/m. We have 0 ≤ X(m) ≤ 1. It is easy to see that the sample space for the frequency of head is ΩX(m) := {0 = 0/m, 1/m, 2/m,..., m/m = 1}. ΩX(m) is a , as any subset of rational numbers is countable. Can we then say that the sample space for this experiment is the interval [0, 1]? Asked differently, as we increase m, does ΩX(m) contain every element of [0, 1]? Note that [0, 1] is an interval over real numbers, which in- clude both rational and irrational√ numbers, so this interval is not countable. If the answer is yes, we can take an , say 2/2 ∈ [0, 1], and this number must be in ΩX(m). But the elements of ΩX(m) are only rational numbers. Hence, the answer is no, the sample space of ΩX(m) does not become [0, 1] for any m. 

The last example shows that repeating an experiment many times does not make its sample space un- countable. On the other hand, a single experiment without any repetition can yield an uncountable sample space. Because of these, the case of uncountable sample space deserves a separate discussion.

4 Uncountable Outcomes

Outcomes that take values over a continuum are uncountable. Formally speaking, such outcomes are in an interval of real numbers <. This brings up a philosophical question that is what, if any, quantity in nature takes really continuous values. That is, is there a quantity which must be measured in continuous amounts? Many attempts to find such a quantity turns out to be futile, once we consider enough details. Example: The amount of oxygen molecules in a room can be said to be a certain number of liters. This number can be reported by an environmental engineer as if it is continuous, hence taking values in <. But a chemist may attempt to count the number of oxygen molecules and report only a natural number from N . The amount of energy obtained from splitting a radioactive isotope can also be reported to be in < by a nuclear reactor operator while it can also be argued to be in N by a quantum physicist. You can continue this exercise and see if you can find an amount that requires continuous measurements. You can consider

8 the number of shipments made by Amazon, ratio of shipments made to Texas; number of patients arriving at a hospital, ratio of underage patients arriving at that hospital, etc. 

It appears that the nature has quantities that can be measured by rational numbers rather than real num- bers, i.e., the nature does not require continuous measures. The next question is whether we create continu- ous measures in basic sciences or social sciences. One of the social sciences that deals with measurement and reporting of activities is accounting. Accounting does not seem to create amounts that require continuous measures. Example: The monetary values reported by accounting systems are numbers with at most two decimal dig- its, so these numbers are rational. Accounting systems also compute Key Performance Indicators (KPIs) by taking a ratio of two rational numbers. For example Rate of Return (ROI) of an investment is the annual return made by the investment divided by the amount of investment. Since both the numerator and the denominator are rational in these KPI computations, the ratio is also rational. 

We can also consider the prices from the standpoint of Finance, demand from the standpoint of Mar- keting, personnel characteristics from the standpoint of Organizational Behavior, and conclude that we can use only natural or rational numbers in our analyses. However, you should also realize that many models in these disciplines are based on variables that take continuous values from an interval of real numbers. It appears that when we switch from observing what is happening to the analysis of what will happen, we tend to use continuous values. The reason behind this can be speculated to be the ease of analysis. We can take the point of view that continuous values are invented by the analysts for the purpose of ease of analysis. Example: Time is one of the oldest inventions of the human and is often considered to take continuous values. This is the reason why time is an uncountable noun in English when it refers to amount as in the sen- tence: “I spend too much time to understand the difference between countable and uncountable outcomes”. It can be mathematically more convenient to build models that take time as a continuous value. 

When an outcome (a variable) takes countable values, we call it a discrete variable; otherwise it is a con- tinuous variable. Note that discrete outcomes can be finite or infinite; the distinction between discrete vs. continuous is based on countability. Discrete variables can approximate continuous variables fairly well. For ex- ample, every can be approximated by a at any desired accuracy. This is known as the fact that rational numbers are everywhere dense in real numbers.

Example: Is ΩX(m) := {0/m, 1/m, 2/m,..., m/m} dense in the rational numbers in the interval [0, 1]? For a desired accuracy level e and an arbitrary rational q ∈ [0, 1], we can fix m such that sup |ω − q| ≤ e. ω∈ΩX(m) As a matter of fact, m = 1/e suffices for every rational q. Therefore, ΩX(m) is dense in the rational numbers in the interval [0, 1]. 

Since discrete and continuous variables approximate each other well, we can justify using continuous variables instead of discrete ones when the underlying quantity is in fact discrete. Continuous variables can also be appealing because they are easier to communicate and fit well with the practice of defining variables over a range. Example: Demand forecasters in practice often talk about ranges for the demand values. They say the de- mand is going to be between a and b, or over the range [a, b] of real numbers, although the demand is actually a natural number in this range. They also say that the Texan demand is certain percentage of the national demand and this percentage takes values in [a, b] for 0 ≤ a ≤ b ≤ 1, although the percentage is actually a rational number in this range. 

9 When dealing with uncountable outcomes (continuous variables), we often come across sample spaces of the form Ω = {ω : a ≤ ω ≤ b} = [a, b]. When there are m continuous variables, we may have Ω = [a1, b1] × · · · × [am, bm]. The same variable can be continuous over a range and be discrete afterwards. Such a mixture can indicate an assumption, a need to focus on some particular observations or methodology used in data collection. Example: An employee can quit a job within the first year of starting or afterwards. If the quitting happens in the first year, it is reported in terms of fractions of a year; otherwise, it is reported as multiples of a year. The historical tenure data then belongs to [0, 1] ∪ {2, 3, 4 . . . }. Note that the data for the first year is more accurate than the other years. Such increased accuracy within the first year may be required by the human resources department to accurately understand what triggers premature quitting. Hence, the employee tenure is both continuous (uncountable) over [0, 1] and discrete (countable) over {2, 3, . . . }. 

5 Probability Measure

Up to now, we have defined experiments, events and sample spaces. Most of the probability theory is about computing the probability of an event; the probability of event A is denoted by P(A). Viewing P as a mapping from the sample space Ω to nonnegative real numbers <+, we can say that P measures the size of set A ∈ Ω. Although we focus on probability measures in this section, the concept of measure is more general. A general measure µ : Ω → [0, ∞) satisfies µ(∅) = 0 and the countable additivity condition that ∞ ∞ µ(∪i=1 Ai) = ∑i=1 µ(Ai) for each disjoint sequence of sets A1, A2, . . . . A probability measure in addition is required to satisfy P(Ω) = 1. This section addresses the issue of measuring a set first from a countable sample space and then from an uncountable sample space.

5.1 Countable Sample Spaces Countable sample spaces can be finite. Then we can list P({ω}) for each ω ∈ Ω. Example: For the experiment of tossing a fair coin, P({H}) = 1/2 and P({T}) = 1/2. For the experiment of rolling a fair dice, P({i}) = 1/6 for i = 1 . . . 6. 

When no ambiguity happens, as in the above example, we can drop curly brackets and write P(ω) as opposed to P({ω}), e.g., P(H) = 1/2. The probability of event A, or the probability measure of set A, is P(A) = ∑ P(ω) for A ⊆ Ω that is finite. ω∈A When the sample space is finite, so is A and the sum above is a sum of finite terms. Then the probability of an event can be found by summing up the probability of outcomes making up the event. Example: What is the probability that sum of the rolls on two dices is 7? The numbers on rolls can be considered as pairs. To sum up to 7, these pairs must be (1,6), (2,5), (3,4), (4,3), (5,2), (6,1). There are six elementary outcomes summing up to 7 out of 36 elementary outcomes. Hence, P(Sum of the numbers is 7) = P((1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)) = 6/36.  Example: A box contains 4 balls: 2 Black and 2 White. Two balls are removed from the box without re- placement. Let the sample space be ordered pairs indicating the ball colors. Then Ω = {bb, wb, bw, ww}. We can check that P(bb) = P(ww) = (2/4)(1/3) = 1/6 and P(wb) = P(bw) = (2/4)(2/3) = 2/6. Let us define events A, B, C as A = {a white ball is chosen}, B = {a black ball is chosen} and C = {two chosen balls are of different color}.P(A) = 1 − P(bb) = 1 − 1/6 = 5/6, P(B) = 1 − P(ww) = 1 − 1/6 = 5/6 and P(C) = P(wb) + P(bw) = 4/6. 

10 For an event A ∈ Ω,P(A) can have frequentist and behavioral interpretations. P(A) can be thought as the relative frequency of event A known from the history of observing the experiment in the past. P(A) can be interpreted also as the fair price of a bet that pays $1 if event A happens and $0 otherwise. For example, if A = {Head f irst, Tail afterwards} in tossing a coin twice, the fair price is $0.25. Countable sample spaces can be infinite but even then we always have

∞ ∞ Ω = ∪i=1{wi} and so A = ∪i=1 ({wi} ∩ A) . The countable additivity of the probability measure immediately implies

∞ ∞ P(A) = ∑ P ({wi} ∩ A) = ∑ P (wi) for A ⊆ Ω that is countable. i=1 i=1 ωi∈A When the outcomes are finite, we can express the probability of each outcome explicitly. That is, we can write P(ω) for every ω ∈ Ω. When the outcomes are infinite but countable, we can still write expressions for each P(ω). Example: A fair coin is tossed until H appears. The sample space has outcomes such as H, TH, TTH, TTTH . . . . In general, an outcome has k = 1, 2, 3, . . . tosses, first k − 1 of them are T and the last one is H. Let An be the event of stopping in at most n tosses.

A1 = {H} and P(A1) = P(H) = 1/2.

A2 = {H, TH} and P(A2) = P(H) + P(TH) = 1/2 + 1/4 = 3/4.

A3 = {H, TH, TTH} and P(A3) = P(H) + P(TH) + P(TTH) = 1/2 + 1/4 + 1/8 = 7/8.

Let Bn be the event of requiring n + 1 or more tosses until H appears. Clearly, B0 = Ω. And B1 = {TH, TTH,..., T ... TH,... },

∞ P(B1) = P(TH, TTH,..., T ... TH,... ) = ∑ P(First k tosses are T and the k + 1st is H) k=1 ∞ ∞ = ∑ (1/2)k(1/2) = (1/4) ∑ (1/2)k = (1/4)(1/(1 − 1/2)) = 1/2. k=1 k=0

Also ∞ P(B2) = P(TTH, TTTH,..., TT ... TH,... ) = ∑ P(First k tosses are T and the k + 1st is H) k=2 ∞ ∞ = ∑ (1/2)k(1/2) = (1/8) ∑ (1/2)k = (1/8)(1/(1 − 1/2)) = 1/4. k=2 k=0

And in general

∞ P(Bn) = = ∑ P(First k tosses are T and the k + 1st is H) k=n ∞ ∞ = ∑ (1/2)k(1/2) = (1/2)n+1 ∑ (1/2)k = (1/2)n+1(1/(1 − 1/2)) = (1/2)n. k=n k=0

n The probability of requiring at least n + 1 tosses until H appears is P(Bn) = (1/2) . Once more, we have written each P(ω) in the sums above.

11 It is also worth noting that A1 ⊂ A2 ⊂ · · · ⊂ An and B1 ⊃ B2 ⊃ · · · ⊃ Bn. {An} is an increasing sequence of sets and its limit is limn→∞ An = Ω, then we can set A∞ = Ω. {Bn} is a decreasing sequence of sets and its limit is limn→∞ Bn = ∅, then we can set B∞ = ∅. You can check that

∞ ∞ A∞ = ∪n=1 Ai and B∞ = ∩n=1Bi.

Furthermore, An ∪ Bn = Ω and An, Bn are disjoint for each n = 1, 2, . . . . 

5.2 Uncountable Sample Spaces For uncountable Ω, the tactic of writing each P(ω) in Ω runs into a difficulty. For example, we cannot list the outcomes that make up the uncountable sample space [0, 1]. We do not know where to start and go next for such a list. If the outcomes in [0, 1] are equally likely and we attach a positive probability to each outcome, the sum of of probabilities will be more than 1. If we attach positive probability to only countable outcomes, then the sample space essentially becomes countable. Unless otherwise stated, [a, b] always denotes an interval of real numbers. If we cannot attach a probability to each outcome of an uncountable sample space, what can we do? That is, we want to attach probabilities to some subsets of Ω such that we can compute probabilities for all the other sets of interest. These subsets do not have to be elementary outcomes, they can be sets including more than one outcome. In other words, we want to measure every set in Ω. Disappointingly, every set is not measurable with every measure. Example: cannot measure a specially constructed set. When its domain is restricted to [0, 1], Lebesgue measure defined as µl(A) := (b − a) for the interval A = [a, b] is a probability measure. To construct the special set E, we consider grouping√ of two real numbers x, y in the same√ if√ their difference√ x − y is rational. For√ example, real number 2 is grouped in√ the same class√ with 1 +√ 2, 2.1 + 2, −0.9 + 2, etc. Real number 3 is grouped in the same class with 1 + 3, 2.1 + 3, −0.9 + 3, etc. Also, all of the ra- tional numbers belong to the same class. Membership in a class can be considered as a relation, which turns out to be reflexive, symmetric and transitive. Hence, this relation yields equivalence classes defined over real numbers. Each equivalence class can be called Er where r is a real number and Er = r + {Rational Numbers} for r ∈ <. The equivalence classes are disjoint Er1 ∩ Er2 = ∅ for r1 6= r2. The new set E is assembled by picking a single element from Er ∩ [0, 1] for r ∈ <. The resulting set E of reals numbers is uncountable and subset of [0, 1]. Attempts to measure E with µl(E) results in contradictions. The details of this contradiction is outside the scope but can be found in the appendix and on pp.27-28 of Cohn (2013). Attaching significant importance to this construction, Cohn provides it as 1.4.9. Gelbaum and Olmsted (2003) discusses this issue in §8.11 titled as A nonmeasurable set. 

Since every set in Ω is not measurable, we want to consider a special collection of sets and assess the probability of only the sets in this collection. From the last example we realize that this collection of measur- able sets cannot be closed under uncountable union operation. For example, E is an uncountable union of measurable singletons and is not measurable. Giving up on the uncountable unions of sets, we can entertain the next-best property: being closed under countable union operation.

5.2.1 Sigma-field When every set is not measurable, the alternative is to measure a collection of sets. We let F denote a collection of events (subsets of Ω) and start wondering what properties F need to satisfy to be useful in the probability context. Intuitively, we want to able to consider unions and intersections of events, and assess their probabilities or assign probabilities to them. To formalize this, we want to attach a probability to A ∪ B,

12 if we have done so for A and B. So we would like to include A ∪ B in F if A, B ∈ F. Then we can assess the probability for the event where either A or B happens. Is including the unions in the collection F sufficient? It is not for the purpose of assessing the probability of A ∩ B, or the event that both A and B happen. An indirect way to include A ∩ B in F is to require that both the union A ∪ B and the Ac are in F if A, B ∈ F. This is because A ∩ B = (Ac ∪ Bc)c ∈ F if A, B ∈ F. Requiring A ∪ B ∈ F and Ac ∈ F makes F closed under set difference operations: Difference A \ B = A ∩ Bc ∈ F and A4B = (A \ B) ∪ (B \ A) ∈ F. If we stop here and require A ∪ B ∈ F, Ac ∈ F and Ω ∈ F, then the collection F is called a field. By using A ∪ B ∈ F several times in an induction , we can also obtain that a finite number of unions n are also in F: ∪i=1 Ai ∈ F if Ai ∈ F for i = 1 . . . n. Unfortunately, this does not suffice for our purposes of being able to consider the probability associated with infinitely many events (say probability of getting H on an odd throw). Hence, we require a stronger condition that the countable number of unions are in F when ∞ constructing probability models: ∪i=1 Ai ∈ F if Ai ∈ F for i = 1, 2, . . . . If a field is closed under countable unions, it is called a sigma-field, denoted by σ-field. In summary, we obtain the following three conditions that define a σ-field. F is a σ-field if F i) Includes the sample space: Ω ∈ F. ii) Closed under complement: Ac ∈ F if A ∈ F. ∞ iii) Closed under countable unions: ∪i=1 Ai ∈ F if Ai ∈ F for i = 1, 2, . . . . Note that ii) and iii) imply that the σ-field is closed under countable intersections. If Ω is finite, then any field over Ω is also a σ-field. Example: For Ω = {1, 2, 3, 4}, one of the σ-fields is F = {∅, Ω, {1, 2}, {3, 4}}. Another σ-field is F = {∅, Ω, {1, 3}, {2, 4}}. For Ω = N , F = {∅, Ω, odd natural numbers, even natural numbers} is a σ-field. 

Sometimes a given collection of subsets of Ω is not a σ-field, but it can be turned into a σ-field by adding more subsets to it. Addition of subsets to the collection may continue until all of the subsets of Ω are included. All of the subsets of Ω is the largest σ-field over Ω. Example: For Ω = {1, 2, 3, 4}, {∅, Ω, {1, 2}, {2}, {3, 4}} is not a σ-field because it does not include {2}c or the union {2} ∪ {3, 4}. We can add these to the collection to obtain {∅, Ω, {1, 2}, {2}, {3, 4}, {1, 3, 4}, {2, 3, 4}}, which is not a σ-field because it does not include the complement {2, 3, 4}c. We can add this to the collec- tion to obtain {∅, Ω, {1, 2}, {2}, {3, 4}, {1, 3, 4}, {2, 3, 4}, {1}}, which is a σ-field. We can say that the σ-field {∅, Ω, {1, 2}, {2}, {3, 4}, {1, 3, 4}, {2, 3, 4}, {1}} is generated by {∅, Ω, {1, 2}, {2}, {3, 4}}. 

The σ-field generated by collection C is the smallest σ-field that includes C. We write σ(C) to refer to the σ-field generated by C. By definition σ(C) = ∩{F σ-filed including C}. The examples above are based on finite sample spaces, we can define σ-fields over uncountable sample spaces. One of the most used σ-fields is the Borel field R over <. Borel field R is generated by open intervals in <:

R = σ({(a, b) : −∞ < a ≤ b < ∞}).

So the Borel field contains all the open intervals, their countable unions as well as complements. By using ∞ [a, b] = ∩n=1(a − 1/n, b + 1/n), we can see that closed intervals of < also generate the Borel field. Example: Each rational number q can be written as an intersection of countable open intervals: {q} = ∞ ∩n=1(q − 1/n, q + 1/n). So each including a rational is in the Borel field, as well as their countable union, which is the set of rational numbers. Since the set of rational numbers are in the Borel field, so is its complement – the set of irrational numbers. 

13 Pairing the sample space Ω with a σ-field F defined over it, we obtain (Ω, F), which is called a measurable space. Measurable spaces are used to define measurable functions. A ξ : Ω → < is called F- measurable function if {ω ∈ Ω : a ≤ ξ(ω) ≤ b} ∈ F for each a, b ∈ F. Example: Over Ω = {1, 2, 3, 4}, consider the σ-field F = {∅, Ω, {1}, {2}, {1, 2}, {3, 4}, {1, 3, 4}, {2, 3, 4}}. Let us check if ξ1(ω) = ω for ω ∈ Ω is F measurable. Since {ω ∈ Ω : 3 ≤ ξ1(ω) ≤ 3} = {3} ∈/ F, ξ is not measurable. Let us check if ξ2(ω) = ω for ω ∈ {1, 2, 3} and ξ2(4) = 3 is F measurable. This time {ω ∈ Ω : 3 ≤ ξ2(ω) ≤ 3} = {3, 4} ∈ F. Moreover, {ω ∈ Ω : 1 ≤ ξ2(ω) ≤ 1} = {1} ∈ F, {ω ∈ Ω : 2 ≤ ξ2(ω) ≤ 2} = {2} ∈ F and {ω ∈ Ω : 4 ≤ ξ2(ω) ≤ 4} = ∅ ∈ F. Further- more, {ω ∈ Ω : 1 ≤ ξ2(ω) ≤ 2} = {1, 2} ∈ F, {ω ∈ Ω : 2 ≤ ξ2(ω) ≤ 3} = {2, 3, 4} ∈ F and {ω ∈ Ω : 1 ≤ ξ2(ω) ≤ 3} = {1, 2, 3, 4} ∈ F. So ξ2 is F measurable. 

A set in a σ-field does not have to be finite or countable. If you pick up an uncountable set A ∈ F and want to find its probability through P(A) = P(∪ω∈Aω), you run into a problem. This is because the probability measure is only countably additive and is not uncountably additive:

P(∪ω∈Aω) 6= ∑ P(ω) for A ⊆ Ω that are both uncountable. ω∈A

This non-equality can be articulated by saying that a line piece A has a positive length even though the sum of the length of each point ω it contains is zero.

5.2.2 : Sample Space, Sigma-field, Probability Measure To obtain a probability space from the measurable space (Ω, F), we need to define probability measure P such that i) Measure the sets in F: That is, P : F → [0, 1], i.e., for every A ∈ F there exists a real number P(A) ∈ [0, 1]. ∞ ∞ ii) Countable additivity: P(∪i=1 Ai) = ∑i=1 P(Ai) for disjoint A1, A2,.... iii) Sample space always happens: P(Ω) = 1. These three properties can also be called of probability. It is easy to justify them when P(·) is inter- preted as frequency. Note that ii) applies only to countable collection of sets Ai, it does not necessarily apply when the collection is uncountable. A probability space is a triplet (Ω, F,P) made from the sample space Ω, σ-field F and the probability measure P. Example: Consider the Borel field R[1, 11] defined over the interval Ω = [1, 11] and the function f : R[1, 11] → [0, 1] defined as f ([a, b]) = (b − a)/10. For each A ∈ R, we partition it into open and closed ∞ 1 1  ∞ 2 2  ∞ 3 3  ∞ 4 4  intervals as follows A = ∪i=1[ai , bi ] ∪ ∪i=1(ai , bi ] ∪ ∪i=1[ai , bi ) ∪ ∪i=1(ai , bi )) . Such a countable ∞ 1 1 partition is possible because the Borel field includes only countable unions. Then f (A) = f (∪i=1[ai , bi ] ∪ ∞ 2 2 ∞ 3 3 ∞ 4 4 ∪i=1(ai , bi ] ∪ ∪i=1[ai , bi ) ∪ ∪i=1(ai , bi ])). Now we can check that f satisfies conditions i), ii) and iii) to be a probability measure and makes ([1, 11], R[1, 11], f ) a probability space. 

In < with the Borel field R, the length of the interval is called the Lebesgue measure. This can be extended to higher dimensions and the Lebesgue measure becomes the area of a set in <2 and the volume of a set in <3. If we take a rock with a volume of 1 liter and break it into smaller pieces, then the total volume of some certain pieces in a collection is the sum of the volume of each piece in that certain collection. This sounds quite trivial, so should its analog ii) in the probability theory. Why countable additivity is assumed as opposed to finite additivity? Let us consider the experiment of tossing a fair coin until the head shows up. We want to compute the probability that the number tosses, say k, is an odd number. Since k is an odd number, we can write it as k = 2n − 1 for n = 1, 2, . . . . If k = 1, then the outcome is H, n = 1 and with probability 1/2. If k = 3, the outcome is TTH, n = 2 with probability (1/2)3. 2n−1 For a generic n, the outcome say ωn has 2(n − 1) H and 1 T with probability is (1/2) . Now we need

14 ∞ ∞ to compute P(∪n=1{ωn}), which becomes ∑n=1 P(ωn) by countable additivity. If we do not have countable ∞ ∞ additivity but just finite additivity, we cannot write P(∪n=1{ωn}) = ∑n=1 P(ωn). Justifying this equality via ∞ ∞ 2n−1 ∞ n countable additivity what remains is to evaluate ∑n=1 P(ωn), which is ∑n=1(1/2) = (1/2) ∑n=0(1/4) = (1/2)(4/3) = 2/3. As an exercise you can also compute the probability for even number of tosses. As this example illustrates, finite additivity can be insufficient when dealing with infinite outcomes. Why countable additivity is assumed as opposed to uncountable additivity? Expression of uncountable additivity can be P(∪ω∈A{ω}) = ∑ω∈A P(ω) for the uncountable set A ∈ Ω. To test this equality, consider the probability space (Ω = [0, 1], R[0, 1], µl) where R[0, 1] is the Borel field on [0, 1] and µl is the Lebesgue probability measure P([a, b]) = b − a for 0 ≤ a ≤ b ≤ 1. So P(ω) = P([ω, ω]) = 0 for 0 ≤ ω ≤ 1. The left-hand side of the assumed uncountable additivity equation yields P(∪ω∈A{ω}) = 1 for A = Ω. The right-hand side of the same equation yields ∑ω∈A P(ω) = ∑ω∈A 0 = 0. Assuming uncountable additivity leads us to a contradiction of 1 = 0, which is incorrect. Hence, uncountable additivity can be incorrect. Countable additivity and the associated axiomatic development of Probability theory as presented here is due to Grundbegriffe der Wahrscheinlichkeitsrechnung (Kolmogorov 1933). Before Kolmogorov’s work, others were experimenting with different set of axioms. Among those, notably Borel introduced countable additivity to mathematics and carried it over to probability. Borel’s work and Hilbert’s open problem (6th among twenty three at Congress of Mathematicians in Paris, 1900) motivated mathematicians in the first half of the 20th century to develop an axiomatic framework. Some researchers made more assumptions than necessary, referring to this, Kolmogorov wrote in his preface: “In the pertinent mathematical circles it has been common for some time to construct probability theory in accordance with this general point of view. But a complete presentation of the whole system, free from superfluous complications, has been missing.” Simultaneously with Kolmogorov, Frechet´ was writing a book to provide a similar axiomatic framework. Kolmogorov wrote faster and a shorter book, which was noted later as a significant achievement in Frechet’s´ book: “It is not enough to have all the ideas in mind, to recall them now and then; one must make sure that their totality is sufficient, bring them together explicitly, and take responsibility for saying that nothing further is needed in order to construct the theory. This is what Mr. Kolmogorov did. This is his achievement.” One of the reasons why Frechet´ did not finish his book before the publication of Kolmogorov’s could be that he was debating the necessity of countable additivity with de Finetti. de Finetti was against countable additivity and was asking Frechet´ to produce evidence for it and explain its meaning. Kolmogorov on the other hand was a pragmatic and did not get into the meaning: “Since the new is essential only for infinite fields of probability, it is hardly possible to explain its empirical meaning . . . ”. More details about the interactions among Borel, Hilbert, Kolmogorov, Frechet´ and de Finetti can be found in Shafer and Vovk (2006). One of the examples used by de Finetti to attack countable additivity is the infinite lottery. Suppose that an integer number is chosen at random among all integers (countably infinite set). One would like to argue that the probability of choosing each integer is the same as another integer, which is impossible under countable additivity: i) If we assign 0 to this probability for every integer, the sum of all probabilities is zero whereas the probability of choosing an integer (probability of countable union of events of choosing a single integer) is 1. ii) If we assign e > 0 to the probability of choosing any one of the integers, the sum of all probabilities becomes infinity. Note that these are similar to those used above to rule out uncountable additivity. De Finetti wanted to rule out countable additivity and develop probability with only finite additivity. We adopt the traditional Kolmogorov framework and maintain countable additivity throughout. For more details, see Section 3.1 of Hajek (2008).

15 6 Solved Exercises

1. N passengers are boarding an N-seat airplane. Each passenger has a ticket that shows the seat number. Passengers board the airplane in an order, so we can name them as the first, second and so on according to this order. Unfortunately, the first passenger (passenger 1) is forgetful and lost both his ticket and his seat number information, so he just takes a random seat. Each subsequent passenger, in an attempt to avoid confrontation with already seated passengers, sits at his own empty seat or takes a random empty seat when his is taken. What is the probability that the last passenger (passenger N) sits at his own seat? a) Consider two events

A1 = [Seat N is available when passenger N boards] and A2 = [Seat 1 is taken before seat N].

Decide if A1 = A2 or one event is a subset of the other. ANSWER We claim that A1 = A2 and establish this by considering A1 ⊆ A2 and A1 ⊇ A2. A1 ⊆ A2: Let ω1 ∈ A1 be a permutation such that seat N is available when passenger N boards. Then other seats [1 : N − 1] are taken when passenger N boards, so ω1 ∈ A2. A2 ⊆ A1: Let ω2 ∈ A2 be a permutation such that seat 1 is taken before seat N. Let passenger s be the passenger sitting at seat 1 in permutation ω2 with the possibility of s = 1. We have s < N in ω2. What does passenger s + 1 see in ω2? Seat 1 is taken by passenger s, some of the passengers in [1 : s] occupy each other’s seat, the remaining passengers in [1 : s] sit at their own seats and seat N is available. Let us denote occupiers as o1, o2,..., or for r ∈ {0} ∪ [2 : s], so for r 6= 0, o1 is passenger 1 and or is passenger s. That is, oi is the original passenger index of the oith occupier. If passenger 1 takes seat 1, we have r = 0. In general, 2 ≤ r ≤ s and o1 sits at o2’s seat, o2 sits at o3’s seat, so an and or is passenger s sitting at o1’s seat, which is seat 1. We have o1 = 1 and or = s and o2,..., or−1 ≤ s because all occupiers arrive before passenger s. Then passenger s + 1 finds seats [s + 1 : N] unoccupied. Similarly, passenger N finds seat N available in ω2. Hence, ω2 ∈ A1. b) Consider the event B = [Passenger N does not sit at seat 1 or N]. Is B = ∅, explain? ANSWER B = ∅. The seating location of passenger N is determined exactly when either seat 1 (the first seat) or seat N (the last seat) is selected. This is because passenger N either sits at seat 1 or seat N. Any other seat is necessarily taken by the time passenger N gets his turn to choose a seat. To see this concretely, consider seat n ∈ [2 : N − 1]. Suppose seat n is empty when passenger N is boarding, it must be empty when passenger n is boarding and must have been taken by passenger n. This leads to a contradiction, and seat n is actually taken by the time passenger N is boarding. For example, with N = 3 the seating permutations are 123, 213, 312, 321. Permutation 132 or 231 is not possible because passenger 2 must choose his own seat 2, which is available as the passenger 1 has respec- tively picked seat 1 and seat 3. In all of the permutations 123, 213, 312, 321, passenger 3 sits either at seat 1 or seat 3.

2. Given objects 1, 2, . . . N, what is the number of k cycles containing object 1? ANSWER First pick 1 and place it at the beginning of the cycle. Since the cycle does not change with rotation, 1 can be placed at the begining without loss of generality to obtain (1 . . . ). To make up the rest of the N−1 numbers in the cycle, we pick k − 1 numbers out of N − 1 numbers in Ck−1 ways. Ordering these k − 1 numbers among eachother leads to a particular cycle, this canbe done in k! ways. The number of possible k-cycles containing 1 is (N − 1)! CN−1(k − 1)! = . k−1 (N − k)!

16 7 Exercises

1. Consider the set Q of rectangles whose sides have lengths of (p, q) from the set N of natural numbers. We would like to show that Q is countable by finding a one-to-one function f : Q → N . Suggest such a function and establish that your function is one-to-one. Hint: Recall Fundamental Theorem of Arithmetic.

2. N passengers are boarding an N-seat airplane. Each passenger has a ticket that shows the seat number. Passengers board the airplane in an order, so we can name them as the first, second and so on according to this order. Unfortunately, the first passenger (passenger 1) is forgetful and lost both his ticket and his seat number information, so he just takes a random seat. Each subsequent passenger, in an attempt to avoid confrontation with already seated passengers, sits at his own empty seat or takes a random empty seat when his is taken. What is the probability that the last passenger (passenger N) sits at his own seat? Hint: See solved exercises.

3. Given objects 1, 2, . . . N = 3, list all permutations in which objects 1 and 2 is in a k-cycles for k = 2 and k = 3? For a general value of N, what is the number of permutations in which objects 1 and 2 is in a k cycles fro 2 ≤ k ≤ N? Does this number depend on k?

4. 2 cards are selected from a deck of 52 cards. a) How many ways are there to select 2 cards? b) How many ways are there to choose so that one of the cards is an ace and the other is either king, queen or jack?

5. How many signals – each consisting of 9 flags hung in a line – can be made from a set of 4 white flags, 3 red flags and 2 blue flags if all flags of the same color are identical?

6. Consider a set of balls, 5 of which are red and 3 of which are yellow. Assume that all of the red balls and all of the yellow balls are indistinguishable. How many ways are there to line up the balls so that no two yellow balls are next to each other?

7. UTD School of Management used to give aliases made out of the initials (first and last name) and a digit ranging from 0 to 9 for the current students. For example, Jane Smith has alias of JS0. If there is another Jane Smith, that person might get JS7. This scheme had worked fine until the number students reached several thousands at which time it was impossible to generate a distinct alias for each student. a) Can a distinct alias be given to each student after student population reaches 7,000? Explain. b) Recently the school has expanded the aliasing scheme by appending 4 digits to the initials rather than only 1. What is the maximum number of distinct aliases that can be created in this new scheme if all students have the same initials?

8. Four musicians make up a chamber orchestra to play cello, violin, flute and piano. a) If each musician can play all of the four instruments, how many orchestral arrangements are possi- ble? b) If each musician can play all of the four instruments except for one who can play only 2 instruments, how many orchestral arrangements are possible?

9. The UT Dallas WalMart Supply Chain case competition team of 8 people are to return from Bentonville, Arkansas to Dallas with two cars. If each of the two cars can take at most 5 people, how many ways the team members can be distributed to these two cars?

17 10. Your pocket has 2 quarters (25-cent), 2 dimes (10-cent), 2 nickels (5-cent) and 3 pennies (1-cent) coins. a) In how many different ways can you pick two coins from your pocket? Let the sample space be the set of two coins picked. List all the members of the sample space. b) In how many different ways can you pick three coins from your pocket? List all the members of the sample space.

11. The UT Dallas Revenue Management course has 16 students and a group project assignment. Projects must be done by groups of exactly 4 students. a) How many ways are there to split 16 students to 4 groups, each of which contains 4 students? b) Each group divides the project work among its four members as analysis, writing, presentation and coordination that are handled by Analyst (A), Writer (W), Presenter (P) and group Captain (C). Suppose that the students already know which role they want to play and their group: 4 students are ready to be A, 4 ready to be W, 4 ready to be P; each group has exactly 1 A, 1 W, 1 P and 1 C. For a state-wide competition, the course instructor wants to assemble a single group of 4 by taking 1 member from each group and having exactly one A, W, P and C in the assembled group. How many ways are there to assemble such a group of 4 students from 4 existing groups? For example, Group 1 Analyst, Group 2 Writer and Group 3 Presenter can be coordinated by Group 4 Captain. c) Right before the state-competition, the travel funding is cut and each university is asked to reduce its representation to a 3-student group as opposed to a 4-student group. This forces the instructor to assemble groups without a student playing the role of A, W, P or C. For the competition, the instructor wants to assemble a group of 3 by taking at most 1 member from each group and having at most one A, W, P and C. How many ways are there to assemble such a group of 3 students from 4 existing groups?

12. How many ways are there to distribute a deck of 52 cards to 13 players so that each player has exactly 4 cards and each of these 4 cards come from a different suit (spades, hearts, diamonds, clubs)?

13. Given natural numbers {1, 2, . . . , n}, let π be a permutation of them: π(i) = j means that number i is in position j. Let Π be the set of all permutations. a) Suppose that n = 4 and consider the permutation 2 1 3 4, what is the associated π(1), π(2), π(3), π(4)? b) Consider n = 4. How many permutations are there with the property π(1) 6= 1? How many per- mutations are there with the property π(1) = 1 and π(2) 6= 2? c) Define the set Πk of permutations as follows

Πk = {π : π(i) = i for 1 ≤ i ≤ k and π(k + 1) 6= k + 1} for 0 ≤ k ≤ n − 1

and Πn = {π : π(i) = i for 1 ≤ i ≤ n}. Πk is the set of permutations whose first k elements are numbers 1, 2, . . . , k but k + 1st element is not number k + 1. For n = 3, provide Π0, Π1, Π2 and Π3. n d) For a general n, check to see if {Πk}k=0 partitions the set Π: i) Πk ∩ Πm = ∅ for 1 ≤ k < m ≤ n and n ii) ∪k=0Πk = Π. e) Use parts above to prove n−1 n! = ∑ (i)i! + 1. i=0

14. For each positive integer n, establish the formula

n 2n n 2 Cn = ∑ (Ci ) i=0

by considering the number of ways of choosing balls from an urn that contains 2n distinct balls.

18 15. In an example of the main body, we have found that head (or for the same matter tail on a coin) switches according to the condition     v| 1 1 3 [Switch: H → T or T → H] if v◦ ∈ k + , k + for k = 0, 1, 2 . . . . g 2 4 4 This can be specialized for the approximate gravitational constant of g = 10 metres/second2 to obtain  20k + 5 20k + 15  Switch if v◦v ∈ , for k = 0, 1, 2 . . . . | 4 4 a) Since both the vertical speed and angular speed are initiated with the same thumb push, they are related to each other. Studying a particular person, this relationship is found to be v◦ = 2v| where the multiplier 2 has the dimension of metre−1. Further specialize the switch condition for this person. b) If the head does not switch to tail or tail does not switch to head, we say that it is maintained. Provide the maintain condition for the person in a).

c) If the person throws the coin 3 times in an experiment with the vertical speeds v| of 21, 22, 23 metres/second, is the head or tail maintained or switched in these throws? 16. In an example of the main body, we have found that head (or for the same matter tail on a coin) switches according to the condition     v| 1 1 3 Switch if v◦ ∈ k + , k + for k = 0, 1, 2 . . . . g 2 4 4

This can be specialized for a coin that stays tair = 2v|/g = 1 second in the air to obtain   1 3  Switch if v◦ ∈ k + , k + for k = 0, 1, 2 . . . . 4 4

S Let 1I (v◦) ∈ {0, 1} be the indicator for the switch with the angular speed of v◦. Similarly define the M indicator 1I (v◦) for maintaining head or tail. S M a) Does limv◦→∞ 1I (v◦) or limv◦→∞ 1I (v◦) exist? If they do, what are they? b) Does R v◦ S( ) R v◦ M( ) 0 1I x dx 0 1I x dx lim R v◦ S R v◦ M or lim R v◦ S R v◦ M v◦→∞ ( ) + ( ) v◦→∞ ( ) + ( ) 0 1I x dx 0 1I x dx 0 1I x dx 0 1I x dx exist? If they do, what are they? Hint: You may find a uniform bound (independent of vo) for | R v◦ ( M( ) − S( )) | 0 1I x 1I x dx . 17. In how many ways one can place seven indistinguishable balls in four distinct boxes with no box left empty?

18. How many non-negative integer solutions are there for x1 + x2 + ··· + xn = b for integer b ≥ 0? Express the number of non-negative integer solution to x1 + x2 + ··· + xn ≤ b in terms of n and b. This will give you an idea about the cardinality of feasible sets in integer programs.

19. How many positive integer solutions are there for x1 + x2 + x3 = 4? 20. a) How many different paths a rook (which moves only horizontally and vertically) can move from the southwest corner of a chessboard to the northeast corner without ever moving to the west or south? We are interested in paths not in the specific moves of the rook. Thus, we can assume that the rook makes 14 moves: 7 to the East and 7 to the North. b) How many of the paths consist of four or more consecutive eastward moves?

19 21. A board (table) has M + 1 columns and N + 1 rows. A piece is located at cell (1, 1) and will move to cell (M + 1, N + 1) either by moving up 1 cell or by moving right 1 cell. a) How many moves are necessary to go from (1, 1) to (M + 1, N + 1)? b) How many distinct paths do exist from (1, 1) to (M + 1, N + 1)? c) How many distinct non-decreasing integer-valued functions can be defined over the domain of inte- gers {a, a + 1, . . . , a + M} and range of integers {b, b + 1, . . . , b + N} such that the functions go through (a, b) and (a + M, b + N)? d) How many distinct non-decreasing integer-valued functions can be defined over the domain of inte- gers {a, a + 1, . . . , a + M} and range of integers {b, b + 1, . . . , b + N} such that the functions go through the point (a, b) and between the points (a + M, b) and (a + M, b + N)?

20 Appendix: Failing to Measure E – The meticulously constructed uncountable set of reals

Since rational numbers are countable, we can count those in (−1, 1) to arrive at their {rn}. Then we define En = rn + E. The argument requires establishing three facts: i) Em ∩ En = ∅ for m 6= n. ii) ∪nEn ⊆ (−1, 2). iii) (0, 1) ⊆ ∪nEn.

To obtain i) with a proof by contradiction, suppose Em ∩ En 6= ∅. Presence of an element in Em ∩ En 0 0 requires that element to be represented as both e + rm and e + rn for e, e ∈ E and rationals rn, rm. That is, 0 0 0 e + rm = e + rn or e − e = rn − rm, a rational number, which implies that e, e are in the same equivalence 0 class. Since E contains exactly one element from each equivalence class, we must have e = e and rn = rm, and in turn n = m. In sum, Em ∩ En 6= ∅ implies n = m and establishes i). To see ii), it is sufficient to recall that E ⊆ (0, 1) and rn ∈ (−1, 1), which together yield E + rn ∈ (−1, 2). For iii), we start with a fixed x ∈ (0, 1) and show that x ∈ En for some n. For this x, there is e ∈ E such that x, e are in the same equivalence class. Hence, x − e is a rational number in (−1, 1). Let that rational number be rn and then x = rn + e so x ∈ En. Suppose that E is Lebesgue measurable by µl. Since En = rn + E is a translation of E, En is also Lebesgue measurable and µl(En) = µl(E). By using the countable additivity property of Lebesgue measure on the countable set of rationals: µl(∪nEn) = ∑ µl(En) = µl(E) ∑ 1. n n

If we set µl(E) = 0, we get µl(∪nEn) = 0 which contradicts with iii) and µl(0, 1) = 1. If we set µl(E) > 0, we get µl(∪nEn) = ∞ which contradicts with ii) and µl(−1, 2) = 3. These contradictions imply that our supposition of measurable E is incorrect. Indeed, E is not measurable.

Appendix: When does L’Hospital Rule apply?

The typical statement of the L’Hospital’s rule is

f (x) f 0(x) lim = lim . x→∞ g(x) x→∞ g0(x)

But this statement presumes that limx→∞ f (x) = limx→∞ g(x) = ∞ and that the functions f and g are differ- entiable, in particular around x. For the equality to make sense, we also need to associate a meaning with the right-hand side. That is, the limit on the right-hand side must exist. Example: Let two differentiable functions be f (x) = x3/6 and g(x) = x2/2. If we write

f (x) f 0(x) x2/2 lim = lim = lim , x→∞ g(x) x→∞ g0(x) x→∞ x we still have the indeterminate form ∞/∞ on the right-hand side and the equality is not meaningful. Iterat- ing one more time with the L’Hospital rule, we can also write

f (x) f 00(x) x lim = lim = lim = ∞. x→∞ g(x) x→∞ g00(x) x→∞ 1 This time, the equalities hold and have a meaning. 

21 Note that we allow the limit to be ∞ and −∞. The symbols ∞ and −∞ are not real numbers. But we can include these symbols among the real numbers to obtain what is called the extended real number system; see p.11 of Rudin (1976).

Example: Let 1Ibxc even be an indicator function defined for the positive real number x. The notation bxc indicates the floor of x, which is the largest integer smaller than x. When the floor of x is an even number, we = ( ) = R u ( ) = ( ) ( ) have 1Ibxc even 1. We consider the functions f u 0 1Ibxc evendx and g u u. What is limu→∞ f u /g u ? Since f is not too complex we can obtain it in closed form as follows. When buc is an even number, f (u) increases at the rate of 1 with u. When buc is an odd number, f (u) is constant in u. These facts along with f (0) = 0 describe the function f . With this description, we can explicitly write

 buc   buc + 1 f (u) = 1I u − + 1 − 1I  buceven 2 buceven 2 and obtain the following table. u 0 1/2 1 3/2 2 5/2 3 7/2 4 9/2 5 11/2 6 13/2 7 15/2 8 17/2 f (u) 0 1/2 1 1 1 3/2 2 2 2 5/2 3 3 3 7/2 4 4 4 9/2

Checking f (u)/g(u), we realize that limu→∞ f (u) = limu→∞ f (u) = ∞, so f (u)/g(u) is indeterminate. We resort to the L’Hospital rule to write

0 f (u) f (u) 1Ibuc even lim = lim = lim , u→∞ g(u) u→∞ g0(u) u→∞ 1 but the right-hand side limu→∞ 1Ibuc even consists of an alternating sequence and does not exist. Can we conclude that limu→∞ f (u)/g(u) does not exist? To answer this, we can first obtain f (u)/g(u):

f (u)  1 buc  1 buc 1  = 1I 1 − + 1 − 1I  + . g(u) buceven 2 u buceven 2 u 2u

Taking the limit of f (u)/g(u) directly and noting buc/u → 1 as u → ∞, we obtain the limit     f (u) 1  1 1 lim = 1Ibuceven 1 − + 1 − 1Ibuceven = . u→∞ g(u) 2 2 2

This example shows us that the limit of f (u)/g(u) may exist when the limit of f 0(u)/g0(u) is inconclusive. 

One then needs a proper statement of the L’Hospital rule, for which we refer to p.109 of Rudin (1976).

• L’Hospital Rule: Suppose f , g are real and differentiable in (a, b), and g0(x) 6= 0 for x ∈ (a, b), where −∞ ≤ a < b ≤ ∞. Suppose f 0(x) lim 0 = A. x→b g (x)

If limx→b f (x) = limx→b g(x) = ∞, then f (x) lim = A. x→b g(x) In the statement of the L’Hospital Rule, we can let x → ∞ and also let x → −∞. More importantly, the rule assumes the existence of the limit of f 0(x)/g0(x). We can apply the rule only when f 0(x)/g0(x) has a limit. In the last example f 0(x)/g0(x) does not have a limit and the L’Hospital Rule does not apply.

22 Appendix: Countability of Rationals and Uncountability of Reals 2

Here are two questions to consider. – Are there the same number of integers as natural numbers? – Are there the same number of rational numbers as natural numbers? The idea is that there can be infinite sets that do not have the same size. To make sense of that statement, we have to know what it means to say that two sets have the same size. This really goes back to our ideas of what it means to count the elements of a set. When I look out of my window to a field of sheep, I count the sheep by matching each sheep to a number: 1, 2, 3, 4, . . . . And if I count the books in the bookcase, I do the same thing: I match each book to a number: 1, 2, 3, 4, . . . . I would say there are 10 sheep (or books) if I can match each sheep (book) to a number from 1 to 10 in such a way that each number gets used, and no two sheep get the same number. We say that the set of sheep in this field has the same size as the set {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, because we can match sheep to numbers. When would we say there are more than 10 sheep? If we match the sheep to the numbers 1 to 10 and still have some sheep left over, there must be more than 10 sheep. If we match the sheep to the numbers 1 to 10 and still have numbers left over, there must be fewer than 10 sheep. We use the same idea with infinite sets. An infinite set is one that cannot be matched to any finite set. We use the natural numbers as our counting set, and try to match numbers to the naturals. If we can do that, we say that the set is countable; if we cannot, we say that the set is uncountable. There is a slight ambiguity about whether a finite set is countable. For our purposes, finite sets are countable. Here is another way to think about countability. A countable set can be listed: we write the element matched to 1 first, the element matched to 2 second, and so on. And if we are given a list, we can get a matching (the first element gets matched to 1, the second to 2, and so on). Sometimes that can be a useful way of thinking. Let us try to think of some examples of countable sets. The natural numbers are countable, because I can match each natural n to itself. Let us try to think of some more interesting examples! Are the integers countable? Can we write them in a list? We might write them as . . . , -3, -2, -1, 0, 1, 2, 3, . . . , but that does not count because the list does not have a first element. So we need another strategy. We need to be sure that we list everything exactly once. Here is one possibility: 0, 1, -1, 2, -2, 3, -3, . . . . We “start in the middle and work outwards”. So yes, the integers are countable. Are the rationals countable? Can we list the rationals? Yes. The rational numbers are countable. Let us see how to prove this. One way of thinking about our proof that the integers are countable is that we wrote them in a line (. . . , -3, -2, -1, 0, 1, 2, 3, . . . ) and then drew a path that took us through them all. You can see this by drawing it for yourself. If we could do something similar for the rationals, that would be great. But it is not quite as obvious how to write them in a line. In fact, thinking about it for a bit it seems more natural to write them in a grid:

p q 1 2 3 4 5 . . . 1 1/1 → 2/1 . 3/1 → 4/1 . 5/1 → ... 2 1/2 ↓ 2/2 % 3/2 . 4/2 % 5/2 . . . 3 1/3 % 2/3 . 3/3 % 4/3 5/3 . . . 4 1/4 ↓ 2/4 % 3/4 4/4 5/4 ... 5 1/5 % 2/5 3/5 4/5 5/5 ...... , where the rational p/q is written in the pth column and qth row.

2Based on posts on http://theoremoftheweek.wordpress.com

23 Now, how can we plot a path through these? We can imagine working through each diagonal in turn. So we might get something like 1/1, 2/1, 1/2, 1/3, 2/2, 3/1, 4/1, 3/2, 2/3, 1/4, 1/5, . . . . This is not quite allowed, because we have counted some rationals twice (e.g., 2/2 = 1/1). But that is easily fixed: we say that we will follow this path, simply missing out anything we have seen before. This gives us a listing of the positive rationals. The real numbers are uncountable. This means there is no way of listing the real numbers. So our aim is to prove that it is impossible to write the real numbers in a list. How could we possibly do that? We are going to suppose that it is possible to list the real numbers. Then we will somehow derive a contradiction from that, which will mean our original supposition must have been wrong. So, we are supposing for the moment that we can list the real numbers. In that case, we can certainly list the real numbers in [0, 1]. Let us imagine that we have done this, and we have written them all out in order, using their decimal expansions. Slightly annoyingly, some numbers have two expansions, since 0.9999999999 ··· = 1, but let us say we always write the finite version rather than the one ending in infinitely many 9s. So they look like 1st real : 0.a1,1a1,2a1,3a1,4a1,5 ... 2nd real : 0.a2,1a2,2a2,3a2,4a2,5 ... 3rd real : 0.a3,1a3,2a3,3a3,4a3,5 ... 4th real : 0.a4,1a4,2a4,3a4,4a4,5 ... 5th real : 0.a5,1a5,2a5,3a5,4a5,5 ... ai,j is the jth decimal in the ith real number. To derive a contradiction, we are going to build another real number between 0 and 1, one that is not on our list. Since our list was supposed to contain all such real numbers, that will be a contradiction, and we will be done. So let us think about how to build another real number between 0 and 1 in such a way that we can be sure it is not on our list. Let us say this new number will be 0.b1b2b3b4b5 . . . , where we are about to define the digits bi. We want to make sure that our new number is not the same as the first number on our list. So let us ensure that they have different numbers in the first decimal place. Say if a1,1 = 3 then b1 = 7 and otherwise b1 = 3. We really define b1 to be any digit apart from a1,1, but we want to make sure that we do not get a number that ends in infinitely many 9s, because of the irritating fact that 0.9999999999 ··· = 1, so we never choose b1 to be 9. Now we want to make sure that our new number is not the same as the second number in our list. We can do this by making sure that the second digit of our new number is not the same as the second digit of the second number. So let us put b2 = 7 if a2,2 = 3 and b2 = 3 otherwise. And so on. At each stage, we make sure that our new number is not the same as the nth number on the list, by making sure that bn is not the same as an,n. And that defines our new real number, one that is definitely not on our list because we built it that way. If we apply the above argument to prove that rational numbers are uncountable, where would the ar- gument break? Hint: Rational numbers have repeating decimals while real numbers can have nonrepeating decimals. Being a rational, the new number 0.b1b2b3b4b5 . . . must repeat some of b’s and so the digits b1, b2,... cannot be chosen as freely in the case of rationals as in the case of reals.

References

D.L. Cohn. 2013. Measure Theory. 2nd edition published by Birkhauser.¨ P. Diaconis, S. Holmes and R. Montgomery. 2007. Dynamical Bias in the Coin Toss. SIAM Review, Vol.49, No.2: 211-235. B.R. Gelbaum and J.M.H. Olmsted. 2003. Counterexamples in Analysis. Published by Dover in 2003 and based on the 2nd edition published by Holden Day in 1965.

24 A. Hajek. 2008. Probability - A Philosophical Overview, in B. Gold and R. Simons (ed.), Proof and Other Dilemmas: Mathematics and Philosophy, published by Mathematical Association of America: 323-339. A.N. Kolmogorov. 1933. Grundbegriffe der Wahrscheinlichkeitsrechnung. Springer, Berlin.3 W. Rudin. 1976. Principles of Mathematical Analysis. 3rd edition published by McGraw Hill. G. Shafer and V. Vovk. 2006. The Sources of Kolmogorov’s Grundbegriffe. Statistical Science, Vol.21, No.1: 70-98.

3A Russian translation by G. M. Bavli appeared in 1936. An English translation by N. Morrison under the title “Foundations of the Theory of Probability” was published by Chelsea Publishing Company in 1950.

25