<<

CIS 321 Intro to Sagnik Basumallik

1. The Main idea:

• Experiment is repeated a number of times, • Uncontrollable factors cause “random” variation. • We view them as a sequence of independent random variables, with the same unknown distribution • Results are averaged  reason on them.

Independent and identically distributed sequence (i.i.d): experimental conditions of subsequent experiments are identical, and the of any one experiment does not influence the outcomes of others.

For example, a single roll of a fair, six-sided die produces one of the numbers 1, 2, 3, 4, 5, or 6, each with equal probability. Therefore, the of a single die roll is:

According to the law of large numbers, if a large number of six-sided are rolled, the of their values (sometimes called the sample ) is likely to be close to 3.5, with the precision increasing as more dice are rolled.

Check R simulations: #Simulation 1

Q1. The casino La bella Fortuna is for sale and you think you might want to buy it, but you want to know how much money you are going to make. All the present owner can tell you is that the game Red or Black is played about 1000 times a night, 365 days a year. Each time it is played you have probability 19/37 of winning the player’s bet of $1 and probability 18/37 of having to pay the player $1. Using the law of large numbers, determine the income of the casino.

The Law of Large Numbers is why casinos win in the long term. Even with a slight benefit of the odds in the game, in the long term, the results of all the bets and chances will reflect the odds. Sure there is variation, sometimes you win, yet on average with many attempts, the house wins. Casinos, for example, live and die by the law of large numbers. Each game has a house edge built into it, representing the average loss over the initial bet. Some sample edges:

• Blackjack – 0.75% • Baccarat – 1.2% • Craps – 1.4% • Roulette – 5% • Slot machines – 5-10%

Chebyshev’s inequality:

Main idea: for any , most probability mass is within a few standard deviations from the expectation. It is a that characterizes the dispersion of data away from its mean (average).

Intuition:

In a , we know that 68% of the data is one standard deviation from the mean, 95% is two standard deviations from the mean, and approximately 99% is within three standard deviations from the mean.

Provide a bound for the probability that the Y is outside the interval (E[Y ] − a, E[Y ] + a).

Law of Large Numbers:

Basic Idea: the sample average is usually close to the mean when the sample size is big.

ket a = kσ ( say lying within k standard deviations)

P(|Y-E(Y)|>=a)= P(|Y-E(Y)|>=kσ) <=σ^2/(k^2 σ^2)

Proof: