Statistical Methods for Data Science, Lecture 5 Interval Estimates; Comparing Systems

Statistical methods for Data Science, Lecture 5 Interval estimates; comparing systems Richard Johansson November 18, 2018 statistical inference: overview IIestimate the value of some parameter (last lecture): Iwhat is the error rate of my drug test? determine some interval that is very likely to contain the true value of the parameter (today): Iinterval estimate for the error rate Itest some hypothesis about the parameter (today): IIis the error rate significantly different from 0.03? are users significantly more satisfied with web page A than with web page B? “recipes” Iin this lecture, we’ll look at a few “recipes” that you’ll use in the assignment IIIinterval estimate for a proportion (“heads probability”) comparing a proportion to a specified value comparing two proportions IIadditionally, we’ll see the standard method to compute an interval estimate for the mean of a normal I will also post some pointers to additional tests Iremember to check that the preconditions are satisfied: what kind of experiment? what assumptions about the data? overview interval estimates significance testing for the accuracy comparing two classifiers p-value fishing interval estimates IIif we get some estimate by ML, can we say something about how reliable that estimate is? informally, an interval estimate for the parameter p is an interval I = [plow , phigh] so that the true value of the parameter is “likely” to be contained in I Ifor instance: with 95% probability, the error rate of the spam filter is in the interval [0.05, 0.08] frequentists and Bayesians again. . . II[frequentist] a 95% confidence interval I is computed using a procedure that will return intervals that contain p at least 95% of the time [Bayesian] a 95% credible interval I for the parameter p is an interval such that p lies in I with a probability of at least 95% interval estimates: overview IIwe will now see two recipes for computing confidence/credible intervals in specific situations: Ifor probability estimates, such as the accuracy of a classifier (to be used in the next assignment) for the mean, when the data is assumed to be normal I. . . and then, a general method the distribution of our estimator IIour ML or MAP estimator applied to randomly selected samples is a random variable with a distribution this distribution depends on the sample size Ilarge sample → more concentrated distribution <ul style="display: flex;"><li style="flex:1">estimator distribution and sample size (p = 0.35) </li><li style="flex:1">confidence and credible intervals for the proportion </li></ul>parameter IIIseveral recipes, see <a href="/goto?url=https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval" target="_blank">https: </a><a href="/goto?url=https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval" target="_blank">//en.wikipedia.org/wiki/Binomial_proportion_confidence_interval </a>traditional textbook method for confidence intervals is based on approximating a binomial with a normal instead, we’ll consider a method to compute a Bayesian credible interval that does not use any approximations Iworks fine even if the numbers are small credible intervals in Bayesian statistics 1. choose a prior distribution 2. compute a posterior distribution from the prior and the data 3. select an interval that covers e.g. 95% of the posterior distribution recipe 1: credible interval for the estimation of a probability Iassume we carry out n independent trials, with k successes, n − k failures Ichoose a Beta prior for the probability; that is, select shape parameters a and b (for uniform prior, set a = b = 1) IIthen the posterior is also a Beta, with parameters k + a and (n − k) + b select a 95% interval in Scipy Iassume n_success successes out of n IIrecall that we use ppf to get the percentiles! or even simpler, use interval a = 1 b = a n_fail = n - n_success posterior_distr = stats.beta(n_success + a, n_fail + b) p_low, p_high = posterior_distr.interval(0.95) example: political polling Iwe ask 87 randomly selected Gothenburgers about whether they support the proposed aerial tramway line over the river II81 of them say yes a 95% credible interval for the popularity of the tramway is 0.857 – 0.967 n_for = 81 n = 87 n_against = n - n_for p_mle = n_for / n posterior_distr = stats.beta(n_for + 1, n_against + 1) print(’ML / MAP estimate:’, p_mle) print(’95% credible interval: ’, posterior_distr.interval(0.95)) don’t forget your common sense III ask 14 Applied Data Science students about whether they support free transporation between Johanneberg and Lindholmen, 12 of them say yes will I get a good estimate? recipe 2: mean of a normal Iwe have some sample that we assume follows some normal distribution; we don’t know the mean µ or the standard deviation σ; the data points are independent Ican we make an interval estimate for the parameter µ? recipe 2: mean of a normal Iwe have some sample that we assume follows some normal distribution; we don’t know the mean µ or the standard deviation σ; the data points are independent IIcan we make an interval estimate for the parameter µ? frequentist confidence intervals, but also Bayesian credible intervals, are based on the t distribution Ithis is a bell-shaped distribution with longer tails than the normal Ithe t distribution has a parameter called degrees of freedom (df) that controls the tails recipe 2: mean of a normal (continued) IIx_mle is the sample mean; the size of the dataset is n; the sample standard deviation is s we consider a t distribution: posterior_distr = stats.t(loc = x_mle, scale = s/np.sqrt(n), df = n-1) Ito get an interval estimate, select a 95% interval in this distribution example IIto demonstrate, we generate some data: x = pd.Series(np.random.normal(loc=3, scale=0.5, size=500)) a 95% confidence/credible interval for the mean: mu_mle = x.mean() s = x.std() n = len(x) posterior_distr = stats.t(df=n-1, loc=mu_mle, scale=s/np.sqrt(n)) print(’estimate:’, mu_mle) print(’95% credible interval: ’, posterior_distr.interval(0.95)) alternative: estimation using bayes_mvs IISciPy has a built-in function for the estimation of mean, variance, and standard deviation: <a href="/goto?url=https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.bayes_mvs.html" target="_blank">https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/ </a><a href="/goto?url=https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.bayes_mvs.html" target="_blank">scipy.stats.bayes_mvs.html </a>95% credible intervals for the mean and the std: res_mean, _, res_std = stats.bayes_mvs(x, 0.95) mu_est, (mu_low, mu_high) = res_mean sigma_est, (sigma_low, sigma_high) = res_std recipe 3 (if we have time): brute force Iwhat if we have no clue about how our measurements are distributed? Iword error rate for speech recognition BLEU for machine translation Ithe brute-force solution to interval estimates IIthe variation in our estimate depends on the distribution of possible datasets in theory, we could find a confidence interval by considering the distribution of all possible datasets, but this can’t be done in practice the brute-force solution to interval estimates IIthe variation in our estimate depends on the distribution of possible datasets in theory, we could find a confidence interval by considering the distribution of all possible datasets, but this can’t be done in practice Ithe trick in bootstrapping – invented by Bradley Efron – is to assume that we can simulate the distribution of possible datasets by picking randomly from the original dataset bootstrapping a confidence interval, pseudocode IIwe have a dataset D consisting of k items we compute a confidence interval by generating N random datasets and finding the interval where most estimates end up repeat N times D∗ = pick k items randomly from D m = estimate on D∗ store m in a list M return 2.5% and 97.5% percentiles of M Isee <a href="/goto?url=https://en.wikipedia.org/wiki/Bootstrapping_(statistics)#Deriving_confidence_intervals_from_the_bootstrap_distribution" target="_blank">Wikipedia </a>for different varieties overview interval estimates significance testing for the accuracy comparing two classifiers p-value fishing statistical significance testing for the accuracy Iin the assignment, you will consider two questions: Ihow sure are we that the true accuracy is different from 0.80? how sure are we that classifier A is better than classifier B? IIIwe’ll see recipes that can be used in these two scenarios these recipes work when we can assume that the “tests” (e.g. documents) are independent Ifor tests in general, see e.g. <a href="/goto?url=https://en.wikipedia.org/wiki/Statistical_hypothesis_testing" target="_blank">Wikipedia </a>comparing the accuracy to some given value IImy boss has told me to build a classifier with an accuracy of at least 0.70 my NB classifier made 40 correct predictions out of 50 Iso the MLE of the accuracy is 0.80 IIbased on this experiment, how certain can I be that the accuracy is really different from 0.70? if the true accuracy is 0.70, how unusual is our outcome? null hypothesis significance tests (NHST) Iwe assume a null hypothesis and then see how unusual (extreme) our outcome is Ithe null hypothesis is typically “boring”: the true accuracy is equal to 0.7 IIthe “unusualness” is measured by the p-value Iif the null hypothesis is true, how likely are we to see an outcome as unusual as the one we got? the traditional threshold for p-values to be considered “significant” is 0.05 the exact binomial test Ithe exact binomial test is used when comparing an estimated probability/proportion (e.g. the accuracy) to some fixed value I40 correct guesses out of 50 is the true accuracy really different from 0.70? IIIif the null hypothesis is true, then this experiment corresponds to a binomially distributed r.v. with parameters 50 and 0.70 we compute the p-value as the probability of getting an outcome at least as unusual as 40 historical side note: sex ratio at birth Ithe first known case where a p-value was computed involved the investigation of sex ratios at birth in London in 1710 IInull hypothesis: P(boy) = P(girl) = 0.5 result: p close to 0 (significantly more boys) “From whence it follows, that it is Art, not Chance, that governs.” (Arbuthnot, <a href="/goto?url=https://www.york.ac.uk/depts/maths/histstat/arbuthnot.pdf" target="_blank">An argument for Divine Providence, taken from the constant </a><a href="/goto?url=https://www.york.ac.uk/depts/maths/histstat/arbuthnot.pdf" target="_blank">regularity observed in the births of both sexes</a>, 1710)

Statistical Methods for Data Science, Lecture 5 Interval Estimates; Comparing Systems

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support