Statistical Methods for Data Science, Lecture 5 Interval Estimates; Comparing Systems

Statistical Methods for Data Science, Lecture 5 Interval Estimates; Comparing Systems

Statistical methods for Data Science, Lecture 5 Interval estimates; comparing systems Richard Johansson November 18, 2018 statistical inference: overview I estimate the value of some parameter (last lecture): I what is the error rate of my drug test? I determine some interval that is very likely to contain the true value of the parameter (today): I interval estimate for the error rate I test some hypothesis about the parameter (today): I is the error rate significantly different from 0.03? I are users significantly more satisfied with web page A than with web page B? -20pt “recipes” I in this lecture, we’ll look at a few “recipes” that you’ll use in the assignment I interval estimate for a proportion (“heads probability”) I comparing a proportion to a specified value I comparing two proportions I additionally, we’ll see the standard method to compute an interval estimate for the mean of a normal I I will also post some pointers to additional tests I remember to check that the preconditions are satisfied: what kind of experiment? what assumptions about the data? -20pt overview interval estimates significance testing for the accuracy comparing two classifiers p-value fishing -20pt interval estimates I if we get some estimate by ML, can we say something about how reliable that estimate is? I informally, an interval estimate for the parameter p is an interval I = [plow ; phigh] so that the true value of the parameter is “likely” to be contained in I I for instance: with 95% probability, the error rate of the spam filter is in the interval [0:05; 0:08] -20pt frequentists and Bayesians again. I [frequentist] a 95% confidence interval I is computed using a procedure that will return intervals that contain p at least 95% of the time I [Bayesian] a 95% credible interval I for the parameter p is an interval such that p lies in I with a probability of at least 95% -20pt interval estimates: overview I we will now see two recipes for computing confidence/credible intervals in specific situations: I for probability estimates, such as the accuracy of a classifier (to be used in the next assignment) I for the mean, when the data is assumed to be normal I . and then, a general method -20pt the distribution of our estimator I our ML or MAP estimator applied to randomly selected samples is a random variable with a distribution I this distribution depends on the 0.15 sample size 0.10 I large sample ! more concentrated distribution 0.05 0.00 0.0 0.2 0.4 0.6 0.8 1.0 n = 25 -20pt estimator distribution and sample size (p = 0:35) 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0.0 0.2 0.4 0.6 0.8 1.0 n = 10 0.15 0.14 0.10 0.10 0.12 0.08 0.10 0.05 0.06 0.08 0.00 0.06 0.0 0.2 0.4 0.6 0.8 1.0 0.04 n = 25 0.04 0.02 0.02 0.00 0.00 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 n = 50 n = 100 -20pt confidence and credible intervals for the proportion parameter I several recipes, see https: //en.wikipedia.org/wiki/Binomial_proportion_confidence_interval I traditional textbook method for confidence intervals is based on approximating a binomial with a normal I instead, we’ll consider a method to compute a Bayesian credible interval that does not use any approximations I works fine even if the numbers are small -20pt credible intervals in Bayesian statistics 1. choose a prior distribution 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 2. compute a posterior distribution from the prior and the data 5 4 3 2 1 0 0.0 0.2 0.4 0.6 0.8 1.0 3. select an interval that covers e.g. 95% of the posterior distribution 5 4 3 2 1 0 0.0 0.2 0.4 0.6 0.8 1.0 -20pt recipe 1: credible interval for the estimation of a probability I assume we carry out n independent trials, with k successes, n − k failures 1.4 1.2 choose a Beta prior for1.0 the probability; that is, select shape I 0.8 0.6 parameters a and b (for0.4 uniform prior, set a = b = 1) 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 5 4 then the posterior is also3 a Beta, with parameters k + a and I 2 (n − k) + b 1 0 0.0 0.2 0.4 0.6 0.8 1.0 5 4 I select a 95% interval 3 2 1 0 0.0 0.2 0.4 0.6 0.8 1.0 -20pt in Scipy I assume n_success successes out of n I recall that we use ppf to get the percentiles! I or even simpler, use interval a = 1 b = a n_fail = n - n_success posterior_distr = stats.beta(n_success + a, n_fail + b) p_low, p_high = posterior_distr.interval(0.95) -20pt example: political polling I we ask 87 randomly selected Gothenburgers about whether they support the proposed aerial tramway line over the river I 81 of them say yes I a 95% credible interval for the popularity of the tramway is 0.857 – 0.967 n_for = 81 n = 87 n_against = n - n_for p_mle = n_for / n posterior_distr = stats.beta(n_for + 1, n_against + 1) print(’ML / MAP estimate:’, p_mle) print(’95% credible interval: ’, posterior_distr.interval(0.95)) -20pt don’t forget your common sense I I ask 14 Applied Data Science students about whether they support free transporation between Johanneberg and Lindholmen, 12 of them say yes I will I get a good estimate? -20pt I frequentist confidence intervals, but also Bayesian credible intervals, are based on the t distribution I this is a bell-shaped distribution with longer tails than the normal 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 I the t distribution has a parameter called degrees of freedom (df) that controls the tails recipe 2: mean of a normal I we have some sample that we assume follows some normal distribution; we don’t know the mean µ or the standard deviation σ; the data points are independent I can we make an interval estimate for the parameter µ? -20pt 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 recipe 2: mean of a normal I we have some sample that we assume follows some normal distribution; we don’t know the mean µ or the standard deviation σ; the data points are independent I can we make an interval estimate for the parameter µ? I frequentist confidence intervals, but also Bayesian credible intervals, are based on the t distribution I this is a bell-shaped distribution with longer tails than the normal I the t distribution has a parameter called degrees of freedom (df) that controls the tails -20pt 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 recipe 2: mean of a normal (continued) I x_mle is the sample mean; the size of the dataset is n; the sample standard deviation is s I we consider a t distribution: posterior_distr = stats.t(loc = x_mle, scale = s/np.sqrt(n), df = n-1) I to get an interval estimate, select a 95% interval in this distribution 1.75 1.50 1.25 1.00 0.75 0.50 0.25 0.00 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 -20pt example I to demonstrate, we generate some data: x = pd.Series(np.random.normal(loc=3, scale=0.5, size=500)) I a 95% confidence/credible interval for the mean: mu_mle = x.mean() s = x.std() n = len(x) posterior_distr = stats.t(df=n-1, loc=mu_mle, scale=s/np.sqrt(n)) print(’estimate:’, mu_mle) print(’95% credible interval: ’, posterior_distr.interval(0.95)) -20pt alternative: estimation using bayes_mvs I SciPy has a built-in function for the estimation of mean, variance, and standard deviation: https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/ scipy.stats.bayes_mvs.html I 95% credible intervals for the mean and the std: res_mean, _, res_std = stats.bayes_mvs(x, 0.95) mu_est, (mu_low, mu_high) = res_mean sigma_est, (sigma_low, sigma_high) = res_std -20pt recipe 3 (if we have time): brute force I what if we have no clue about how our measurements are distributed? I word error rate for speech recognition I BLEU for machine translation -20pt I the trick in bootstrapping – invented by Bradley Efron – is to assume that we can simulate the distribution of possible datasets by picking randomly from the original dataset the brute-force solution to interval estimates I the variation in our estimate depends on the distribution of possible datasets I in theory, we could find a confidence interval by considering the distribution of all possible datasets, but this can’t be done in practice -20pt the brute-force solution to interval estimates I the variation in our estimate depends on the distribution of possible datasets I in theory, we could find a confidence interval by considering the distribution of all possible datasets, but this can’t be done in practice I the trick in bootstrapping – invented by Bradley Efron – is to assume that we can simulate the distribution of possible datasets by picking randomly from the original dataset -20pt bootstrapping a confidence interval, pseudocode I we have a dataset D consisting of k items I we compute a confidence interval by generating N random datasets and finding the interval where most estimates end up repeat N times 4000 ∗ 3500 D = pick k items randomly from D 3000 ∗ 2500 m = estimate on D 2000 1500 store m in a list M 1000 return 2.5% and 97.5% percentiles of M 500 0 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 I see Wikipedia for different varieties -20pt overview interval estimates significance testing for the accuracy comparing two classifiers p-value fishing -20pt statistical significance testing for the accuracy I in the assignment, you will consider two questions: I how sure are we that the true accuracy is different from 0.80? I how sure are we that classifier A is better than classifier B? I we’ll see recipes that can be used in these two scenarios I these recipes work when we can assume that the “tests” (e.g.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    50 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us