Lecture 02: Statistical Inference for Binomial Parameters

Total Page:16

File Type:pdf, Size:1020Kb

Lecture 02: Statistical Inference for Binomial Parameters Lecture 02: Statistical Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 02: Statistical Inference for Binomial Parameters – p. 1/50 Inference for a probability • Phase II cancer clinical trials are usually designed to see if a new, single treatment produces favorable results (proportion of success), when compared to a known, ‘industry standard’). • If the new treatment produces good results, then further testing will be done in a Phase III study, in which patients will be randomized to the new treatment or the ‘industry standard’. • In particular, n independent patients on the study are given just one treatment, and the outcome for each patient is usually 1 if new treatment shrinks tumor (success) Yi = , ( 0 if new treatment does not shrinks tumor (failure) i = 1,...,n • For example, suppose n = 30 subjects are given Polen Springs water, and the tumor shrinks in 5 subjects. • The goal of the study is to estimate the probability of success, get a confidence interval for it, or perform a test about it. Lecture 02: Statistical Inference for Binomial Parameters – p. 2/50 • Suppose we are interested in testing H0 : p = .5 where .5 is the probability of success on the “industry standard" As discussed in the previous lecture, there are three ML approaches we can consider. • Wald Test (non-null standard error) • Score Test (null standard error) • Likelihood Ratio test Lecture 02: Statistical Inference for Binomial Parameters – p. 3/50 Wald Test For the hypotheses H0 : p = p0 H : p = p A 6 0 The Wald statistic can be written as pb−p0 zW = SE b− = p p0 √pb(1−pb)/n Lecture 02: Statistical Inference for Binomial Parameters – p. 4/50 Score Test Agresti equations 1.8 and 1.9 yield y n y u(p0)= − p0 − 1 p0 − n ι(p0)= p0(1 p0) − u(p0) zS = 1/2 [ι(p0)] = (some algebra) b− = p p0 √p0(1−p0)/n Lecture 02: Statistical Inference for Binomial Parameters – p. 5/50 Application of Wald and Score Tests • Suppose we are interested in testing H0 : p = .5, • Suppose Y = 2 and n = 10 so p = .2 • Then, b (.2 .5) ZW = − = 2.37171 .2(1 .8)/10 − − and p (.2 .5) ZS = − = 1.89737 .5(1 .5)/10 − − • Here, ZW > ZS and at the α =p 0.05 level, the statistical conclusion would differ. Lecture 02: Statistical Inference for Binomial Parameters – p. 6/50 Notes about ZW and ZS • Under the null, ZW and ZS are both approximately N(0, 1) . However, ZS ’s sampling distribution is closer to the standard normal than ZW so it is generally preferred. • When testing H0 : p = .5, Z Z | W |≥| S | i.e., (p .5) (p .5) − − ˛ ˛≥ ˛ ˛ ˛ p(1 p)/n ˛ ˛ .5(1 .5)/n ˛ ˛ − ˛ ˛ − ˛ ˛ b ˛ ˛ b ˛ • ˛ p ˛ ˛ p ˛ Why ? Note that ˛ b b ˛ ˛ ˛ p(1 p) .5(1 .5), − ≤ − i.e., p(1 p) takes on its maximum value at p = .5: − b b p .10 .20 .30 .40 .50 .60 .70 .80 .90 p(1-p) .09 .16 .21 .24 .25 .24 .21 .16 .09 • Since the denominator of Z is always less than the denominator of Z , Z Z W S | W |≥| S | Lecture 02: Statistical Inference for Binomial Parameters – p. 7/50 • Under the null, p = .5, p(1 p) .5(1 .5), − ≈ − so b b Z Z | S |≈| W | • However, under the alternative, H : p = .5, A 6 ZS and ZW could be very different, and, since Z Z , | W |≥| S | the test based on ZW is more powerful (when testing against a null of 0.5). Lecture 02: Statistical Inference for Binomial Parameters – p. 8/50 • For the general test H0 : p = po, for a specified value po, the two test statistics are (p po) ZS = − po(1 po)/n b − and p (p po) ZW = − p(1 p)/n b − • For this general test, there is no strict rulep that b b Z Z | W |≥| S | Lecture 02: Statistical Inference for Binomial Parameters – p. 9/50 Likelihood-Ratio Test • It can be shown that L(p HA) 2 2 log | = 2[log L(p HA) log L(po H0)] χ1 L(po H0) | − | ∼ | ff b where b L(p H ) | A is the likelihood after replacing p by its estimate, p, under the alternative (H ), and b A L(po H0) b | is the likelihood after replacing p by its specified value, po, under the null (H0). Lecture 02: Statistical Inference for Binomial Parameters – p. 10/50 Likelihood Ratio for Binomial Data • For the binomial, recall that the log-likelihood equals n log L(p) = log + y log p +(n y) log(1 p), y ! − − • Suppose we are interested in testing H0 : p = .5 versus H0 : p = .5 6 • The likelihood ratio statistic generally only is for a two-sided alternative (recall it is χ2 based) • Under the alternative, n log L(p HA) = log + y log p +(n y) log(1 p), | y ! − − b b b • Under the null, n log L(.5 H0) = log + y log .5+(n y) log(1 .5), | y − − ! Lecture 02: Statistical Inference for Binomial Parameters – p. 11/50 Then, the likelihood ratio statistic is n 2[log L(p HA) log L(po H0)] = 2 log + y log p +(n y) log(1 p) | − | " y ! − − # n b 2 log + y logb .5+(n y) log(1 b .5) − " y ! − − # b −b = 2 y log p +(n y) log 1 p .5 − 1−.5 h “ ” “ ”i − = 2 y log y +(n y) log n y , .5n − (1−.5)n h ` ´ “ ”i 2 which is approximately χ1 Lecture 02: Statistical Inference for Binomial Parameters – p. 12/50 Example • Recall from previous example, Y = 2 and n = 10 so p = .2 • Then, the Likelihood Ratio Statistic is b .2 .8 2 2 log + (8) log = 3.85490(p = 0.049601) .5 .5 » „ « „ «– • Recall, both ZW and ZS are N(0,1), and the square of a N(0,1) is a chi-square 1 df. • 2 2 2 Then, the Likelihood ratio statistic is on the same scale as ZW and ZS , since both ZW 2 and ZS are chi-square 1 df • For this example 2 2 (.2 .5) ZS = − = 3.6 " .5(1 .5)/10 # − and p 2 2 (.2 .5) ZW = − = 5.625 " .2(1 .8)/10 # − • p 2 2 The Likelihood Ratio Statistic is between ZS and ZW . Lecture 02: Statistical Inference for Binomial Parameters – p. 13/50 Likelihood Ratio Statistic For the general test H0 : p = po, the Likelihood Ratio Statistic is p 1 p 2 2 y log +(n y) log − χ1 po − 1 po ∼ » „ « „ − «– b b asymptotically under the null. Lecture 02: Statistical Inference for Binomial Parameters – p. 14/50 Large Sample Confidence Intervals • In large samples, since p(1 p) p N p, − , ∼ n „ « we can obtain a 95% confidenceb interval for p with p(1 p) p 1.96 − ± r n b b • However, since 0 p 1, we wouldb want the endpoints of the confidence interval to ≤ ≤ be in [0, 1], but the endpoints of this confidence interval are not restricted to be in [0, 1]. • When p is close to 0 or 1 (so that p will usually be close to 0 or 1), and/or in small samples, we could get endpoints outside of [0,1]. The solution would be the truncate the interval endpoint at 0 or 1. b Lecture 02: Statistical Inference for Binomial Parameters – p. 15/50 Example • Suppose n = 10, and Y = 1, then 1 p = = .1 10 and the 95% confidence interval is b p(1 p) p 1.96 − , ± r n b b b .1(1 .1) .1 1.96 − , ± r 10 [ .086,.2867] − • After truncating, you get, [0,.2867] Lecture 02: Statistical Inference for Binomial Parameters – p. 16/50 Exact Test Statistics and Confidence Intervals Unfortunately, many of the phase II trails have small samples, and the above asymptotic test statistics and confidence intervals have very poor properties in small samples. (A 95% confidence interval may only have 80% coverage–See Figure 1.3 in Agresti). In this situation, ‘Exact test statistics and Confidence Intervals’ can be obtained. One-sided Exact Test Statistic • The historical norm for the clinical trial you are doing is 50%, so you want to test if the response rate of the new treatment is greater then 50%. • In general, you want to test H0:p = po = 0.5 versus HA:p>po = 0.5 Lecture 02: Statistical Inference for Binomial Parameters – p. 17/50 • The test statistic Y = the number of successes out of n trials Suppose you observe yobs successes ; Under the null hypothesis, np = Y Bin(n,po), ∼ i.e., b n y n−y P (Y = y H0:p = po)= po (1 po) | y ! − • When would you tend to reject H0:p = po in favor of HA:p>po Lecture 02: Statistical Inference for Binomial Parameters – p. 18/50 Answer Under H0:p = po, you would expect p po ≈ (Y npo) ≈ Under HA:p>po, you would expectbp>po (Y >npo) i.e., you would expect Y to be ‘large’b under the alternative. Lecture 02: Statistical Inference for Binomial Parameters – p. 19/50 Exact one-sided p-value • If you observe yobs successes, the exact one-sided p-value is the probability of getting the observed yobs plus any larger (more extreme) Y p value = P (Y y H :p = po) − ≥ obs| 0 n n j n−j = j=y po(1 po) obs j ! − P Lecture 02: Statistical Inference for Binomial Parameters – p.
Recommended publications
  • Binomials, Contingency Tables, Chi-Square Tests
    Binomials, contingency tables, chi-square tests Joe Felsenstein Department of Genome Sciences and Department of Biology Binomials, contingency tables, chi-square tests – p.1/16 Confidence limits on a proportion To work out a confidence interval on binomial proportions, use binom.test. If we want to see whether 0.20 is too high a probability of Heads when we observe 5 Heads out of 50 tosses, we use binom.test(5, 50, 0.2) which gives probability of 5 or fewer Heads as 0.09667, so a test does not exclude 0.20 as the Heads probability. > binom.test(5,50,0.2) Exact binomial test data: 5 and 50 number of successes = 5, number of trials = 50, p-value = 0.07883 alternative hypothesis: true probability of success is not equal to 0.2 95 percent confidence interval: 0.03327509 0.21813537 sample estimates: probability of success 0.1 Binomials, contingency tables, chi-square tests – p.2/16 Confidence intervals and tails of binomials Confidence limits for p if there are 5 Heads out of 50 tosses p = 0.0332750 is 0.025 p = 0.21813537 is 0.025 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Binomials, contingency tables, chi-square tests – p.3/16 Testing equality of binomial proportions How do we test whether two coins have the same Heads probability? This is hard, but there is a good approximation, the chi-square test. You set up a 2 × 2 table of numbers of outcomes: Heads Tails Coin #1 15 25 Coin #2 9 31 In fact the chi-square test can test bigger tables: R rows by C columns.
    [Show full text]
  • Binomial Test Models and Item Difficulty
    Binomial Test Models and Item Difficulty Wim J. van der Linden Twente University of Technology In choosing a binomial test model, it is impor- have characteristic functions of the Guttman type. tant to know exactly what conditions are imposed In contrast, the stochastic conception allows non- on item difficulty. In this paper these conditions Guttman items but requires that all characteristic are examined for both a deterministic and a sto- functions must intersect at the same point, which chastic conception of item responses. It appears implies equal classically defined difficulty. The that they are more restrictive than is generally beta-binomial model assumes identical char- understood and differ for both conceptions. When acteristic functions for both conceptions, and this the binomial model is applied to a fixed examinee, also implies equal difficulty. Finally, the compound the deterministic conception imposes no conditions binomial model entails no restrictions on item diffi- on item difficulty but requires instead that all items culty. In educational and psychological testing, binomial models are a class of models increasingly being applied. For example, in the area of criterion-referenced measurement or mastery testing, where tests are usually conceptualized as samples of items randomly drawn from a large pool or do- main, binomial models are frequently used for estimating examinees’ mastery of a domain and for de- termining sample size. Despite the agreement among several writers on the usefulness of binomial models, opinions seem to differ on the restrictions on item difficulties implied by the models. Mill- man (1973, 1974) noted that in applying the binomial models, items may be relatively heterogeneous in difficulty.
    [Show full text]
  • Bitest — Binomial Probability Test
    Title stata.com bitest — Binomial probability test Description Quick start Menu Syntax Option Remarks and examples Stored results Methods and formulas Reference Also see Description bitest performs exact hypothesis tests for binomial random variables. The null hypothesis is that the probability of a success on a trial is #p. The total number of trials is the number of nonmissing values of varname (in bitest) or #N (in bitesti). The number of observed successes is the number of 1s in varname (in bitest) or #succ (in bitesti). varname must contain only 0s, 1s, and missing. bitesti is the immediate form of bitest; see [U] 19 Immediate commands for a general introduction to immediate commands. Quick start Exact test for probability of success (a = 1) is 0.4 bitest a = .4 With additional exact probabilities bitest a = .4, detail Exact test that the probability of success is 0.46, given 22 successes in 74 trials bitesti 74 22 .46 Menu bitest Statistics > Summaries, tables, and tests > Classical tests of hypotheses > Binomial probability test bitesti Statistics > Summaries, tables, and tests > Classical tests of hypotheses > Binomial probability test calculator 1 2 bitest — Binomial probability test Syntax Binomial probability test bitest varname== #p if in weight , detail Immediate form of binomial probability test bitesti #N #succ #p , detail by is allowed with bitest; see [D] by. fweights are allowed with bitest; see [U] 11.1.6 weight. Option Advanced £ £detail shows the probability of the observed number of successes, kobs; the probability of the number of successes on the opposite tail of the distribution that is used to compute the two-sided p-value, kopp; and the probability of the point next to kopp.
    [Show full text]
  • Three Statistical Testing Procedures in Logistic Regression: Their Performance in Differential Item Functioning (DIF) Investigation
    Research Report Three Statistical Testing Procedures in Logistic Regression: Their Performance in Differential Item Functioning (DIF) Investigation Insu Paek December 2009 ETS RR-09-35 Listening. Learning. Leading.® Three Statistical Testing Procedures in Logistic Regression: Their Performance in Differential Item Functioning (DIF) Investigation Insu Paek ETS, Princeton, New Jersey December 2009 As part of its nonprofit mission, ETS conducts and disseminates the results of research to advance quality and equity in education and assessment for the benefit of ETS’s constituents and the field. To obtain a PDF or a print copy of a report, please visit: http://www.ets.org/research/contact.html Copyright © 2009 by Educational Testing Service. All rights reserved. ETS, the ETS logo, GRE, and LISTENING. LEARNING. LEADING. are registered trademarks of Educational Testing Service (ETS). SAT is a registered trademark of the College Board. Abstract Three statistical testing procedures well-known in the maximum likelihood approach are the Wald, likelihood ratio (LR), and score tests. Although well-known, the application of these three testing procedures in the logistic regression method to investigate differential item function (DIF) has not been rigorously made yet. Employing a variety of simulation conditions, this research (a) assessed the three tests’ performance for DIF detection and (b) compared DIF detection in different DIF testing modes (targeted vs. general DIF testing). Simulation results showed small differences between the three tests and different testing modes. However, targeted DIF testing consistently performed better than general DIF testing; the three tests differed more in performance in general DIF testing and nonuniform DIF conditions than in targeted DIF testing and uniform DIF conditions; and the LR and score tests consistently performed better than the Wald test.
    [Show full text]
  • Testing for INAR Effects
    Communications in Statistics - Simulation and Computation ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: https://www.tandfonline.com/loi/lssp20 Testing for INAR effects Rolf Larsson To cite this article: Rolf Larsson (2020) Testing for INAR effects, Communications in Statistics - Simulation and Computation, 49:10, 2745-2764, DOI: 10.1080/03610918.2018.1530784 To link to this article: https://doi.org/10.1080/03610918.2018.1530784 © 2019 The Author(s). Published with license by Taylor & Francis Group, LLC Published online: 21 Jan 2019. Submit your article to this journal Article views: 294 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=lssp20 COMMUNICATIONS IN STATISTICS - SIMULATION AND COMPUTATIONVR 2020, VOL. 49, NO. 10, 2745–2764 https://doi.org/10.1080/03610918.2018.1530784 Testing for INAR effects Rolf Larsson Department of Mathematics, Uppsala University, Uppsala, Sweden ABSTRACT ARTICLE HISTORY In this article, we focus on the integer valued autoregressive model, Received 10 April 2018 INAR (1), with Poisson innovations. We test the null of serial independ- Accepted 25 September 2018 ence, where the INAR parameter is zero, versus the alternative of a KEYWORDS positive INAR parameter. To this end, we propose different explicit INAR model; Likelihood approximations of the likelihood ratio (LR) statistic. We derive the limit- ratio test ing distributions of our statistics under the null. In a simulation study, we compare size and power of our tests with the score test, proposed by Sun and McCabe [2013. Score statistics for testing serial depend- ence in count data.
    [Show full text]
  • Jise.Org/Volume31/N1/Jisev31n1p72.Html
    Journal of Information Volume 31 Systems Issue 1 Education Winter 2020 Experiences in Using a Multiparadigm and Multiprogramming Approach to Teach an Information Systems Course on Introduction to Programming Juan Gutiérrez-Cárdenas Recommended Citation: Gutiérrez-Cárdenas, J. (2020). Experiences in Using a Multiparadigm and Multiprogramming Approach to Teach an Information Systems Course on Introduction to Programming. Journal of Information Systems Education, 31(1), 72-82. Article Link: http://jise.org/Volume31/n1/JISEv31n1p72.html Initial Submission: 23 December 2018 Accepted: 24 May 2019 Abstract Posted Online: 12 September 2019 Published: 3 March 2020 Full terms and conditions of access and use, archived papers, submission instructions, a search tool, and much more can be found on the JISE website: http://jise.org ISSN: 2574-3872 (Online) 1055-3096 (Print) Journal of Information Systems Education, Vol. 31(1) Winter 2020 Experiences in Using a Multiparadigm and Multiprogramming Approach to Teach an Information Systems Course on Introduction to Programming Juan Gutiérrez-Cárdenas Faculty of Engineering and Architecture Universidad de Lima Lima, 15023, Perú [email protected] ABSTRACT In the current literature, there is limited evidence of the effects of teaching programming languages using two different paradigms concurrently. In this paper, we present our experience in using a multiparadigm and multiprogramming approach for an Introduction to Programming course. The multiparadigm element consisted of teaching the imperative and functional paradigms, while the multiprogramming element involved the Scheme and Python programming languages. For the multiparadigm part, the lectures were oriented to compare the similarities and differences between the functional and imperative approaches. For the multiprogramming part, we chose syntactically simple software tools that have a robust set of prebuilt functions and available libraries.
    [Show full text]
  • Comparison of Wald, Score, and Likelihood Ratio Tests for Response Adaptive Designs
    Journal of Statistical Theory and Applications Volume 10, Number 4, 2011, pp. 553-569 ISSN 1538-7887 Comparison of Wald, Score, and Likelihood Ratio Tests for Response Adaptive Designs Yanqing Yi1∗and Xikui Wang2 1 Division of Community Health and Humanities, Faculty of Medicine, Memorial University of Newfoundland, St. Johns, Newfoundland, Canada A1B 3V6 2 Department of Statistics, University of Manitoba, Winnipeg, Manitoba, Canada R3T 2N2 Abstract Data collected from response adaptive designs are dependent. Traditional statistical methods need to be justified for the use in response adaptive designs. This paper gener- alizes the Rao's score test to response adaptive designs and introduces a generalized score statistic. Simulation is conducted to compare the statistical powers of the Wald, the score, the generalized score and the likelihood ratio statistics. The overall statistical power of the Wald statistic is better than the score, the generalized score and the likelihood ratio statistics for small to medium sample sizes. The score statistic does not show good sample properties for adaptive designs and the generalized score statistic is better than the score statistic under the adaptive designs considered. When the sample size becomes large, the statistical power is similar for the Wald, the sore, the generalized score and the likelihood ratio test statistics. MSC: 62L05, 62F03 Keywords and Phrases: Response adaptive design, likelihood ratio test, maximum likelihood estimation, Rao's score test, statistical power, the Wald test ∗Corresponding author. Fax: 1-709-777-7382. E-mail addresses: [email protected] (Yanqing Yi), xikui [email protected] (Xikui Wang) Y. Yi and X.
    [Show full text]
  • Robust Score and Portmanteau Tests of Volatility Spillover Mike Aguilar, Jonathan B
    Journal of Econometrics 184 (2015) 37–61 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom Robust score and portmanteau tests of volatility spillover Mike Aguilar, Jonathan B. Hill ∗ Department of Economics, University of North Carolina at Chapel Hill, United States article info a b s t r a c t Article history: This paper presents a variety of tests of volatility spillover that are robust to heavy tails generated by Received 27 March 2012 large errors or GARCH-type feedback. The tests are couched in a general conditional heteroskedasticity Received in revised form framework with idiosyncratic shocks that are only required to have a finite variance if they are 1 September 2014 independent. We negligibly trim test equations, or components of the equations, and construct heavy Accepted 1 September 2014 tail robust score and portmanteau statistics. Trimming is either simple based on an indicator function, or Available online 16 September 2014 smoothed. In particular, we develop the tail-trimmed sample correlation coefficient for robust inference, and prove that its Gaussian limit under the null hypothesis of no spillover has the same standardization JEL classification: C13 irrespective of tail thickness. Further, if spillover occurs within a specified horizon, our test statistics C20 obtain power of one asymptotically. We discuss the choice of trimming portion, including a smoothed C22 p-value over a window of extreme observations. A Monte Carlo study shows our tests provide significant improvements over extant GARCH-based tests of spillover, and we apply the tests to financial returns data. Keywords: Finally, based on ideas in Patton (2011) we construct a heavy tail robust forecast improvement statistic, Volatility spillover which allows us to demonstrate that our spillover test can be used as a model specification pre-test to Heavy tails Tail trimming improve volatility forecasting.
    [Show full text]
  • The Exact Binomial Test Between Two Independent Proportions: a Companion
    ¦ 2021 Vol. 17 no. 2 The EXACT BINOMIAL TEST BETWEEN TWO INDEPENDENT proportions: A COMPANION Louis Laurencelle A B A Universit´ED’Ottawa AbstrACT ACTING Editor This note, A SUMMARY IN English OF A FORMER ARTICLE IN French (Laurencelle, 2017), De- PRESENTS AN OUTLINE OF THE THEORY AND IMPLEMENTATION OF AN EXACT TEST OF THE DIFFERENCE BETWEEN NIS Cousineau (Uni- VERSIT´ED’Ottawa) TWO BINOMIAL variables. Essential FORMULAS ARE provided, AS WELL AS TWO EXECUTABLE COMPUTER pro- Reviewers GRams, ONE IN Delphi, THE OTHER IN R. KEYWORDS TOOLS No REVIEWER EXACT TEST OF TWO proportions. Delphi, R. B [email protected] 10.20982/tqmp.17.2.p076 Introduction SERVED DIFFERENCE AND THEN ACCUMULATING THE PROBABILITIES OF EQUAL OR MORE EXTREME differences. Let THERE BE TWO SAMPLES OF GIVEN SIZES n1 AND n2, AND Our PROPOSED test, SIMILAR TO THAT OF Liddell (1978), USES AND NUMBERS x1 AND x2 OF “SUCCESSES” IN CORRESPONDING THE SAME PROCEDURE OF ENUMERATING THE (n1 +1)×(n2 +1) samples, WHERE 0 ≤ x1 ≤ n1 AND 0 ≤ x2 ≤ n2. The POSSIBLE differences, THIS TIME EXPLOITING THE INTEGRAL do- PURPOSE OF THE TEST IS TO DETERMINE WHETHER THE DIFFERENCE MAIN (0::1) OF THE π PARAMETER IN COMBINATION WITH AN ap- x1=n1 − x2=n2 RESULTS FROM ORDINARY RANDOM VARIATION PROPRIATE WEIGHTING function. OR IF IT IS PROBABILISTICALLY exceptional, GIVEN A probabil- Derivation OF THE EXACT TEST ITY threshold. Treatises AND TEXTBOOKS ON APPLIED statis- TICS REPORT A NUMBER OF TEST PROCEDURES USING SOME FORM OF Let R1 = (x1; n1) AND R2 = (x2; n2) BE THE RESULTS OF normal-based approximation, SUCH AS A Chi-squared test, TWO BINOMIAL observations, AND THEIR OBSERVED DIFFERENCE VARIOUS SORTS OF z (normal) AND t (Student’s) TESTS WITH OR dO = x1=n1 − x2=n2, WHERE R1 ∼ B(x1jn1; π1), R2 ∼ WITHOUT CONTINUITY correction, etc.
    [Show full text]
  • Econometrics-I-11.Pdf
    Econometrics I Professor William Greene Stern School of Business Department of Economics 11-1/78 Part 11: Hypothesis Testing - 2 Econometrics I Part 11 – Hypothesis Testing 11-2/78 Part 11: Hypothesis Testing - 2 Classical Hypothesis Testing We are interested in using the linear regression to support or cast doubt on the validity of a theory about the real world counterpart to our statistical model. The model is used to test hypotheses about the underlying data generating process. 11-3/78 Part 11: Hypothesis Testing - 2 Types of Tests Nested Models: Restriction on the parameters of a particular model y = 1 + 2x + 3T + , 3 = 0 (The “treatment” works; 3 0 .) Nonnested models: E.g., different RHS variables yt = 1 + 2xt + 3xt-1 + t yt = 1 + 2xt + 3yt-1 + wt (Lagged effects occur immediately or spread over time.) Specification tests: ~ N[0,2] vs. some other distribution (The “null” spec. is true or some other spec. is true.) 11-4/78 Part 11: Hypothesis Testing - 2 Hypothesis Testing Nested vs. nonnested specifications y=b1x+e vs. y=b1x+b2z+e: Nested y=bx+e vs. y=cz+u: Not nested y=bx+e vs. logy=clogx: Not nested y=bx+e; e ~ Normal vs. e ~ t[.]: Not nested Fixed vs. random effects: Not nested Logit vs. probit: Not nested x is (not) endogenous: Maybe nested. We’ll see … Parametric restrictions Linear: R-q = 0, R is JxK, J < K, full row rank General: r(,q) = 0, r = a vector of J functions, R(,q) = r(,q)/’. Use r(,q)=0 for linear and nonlinear cases 11-5/78 Part 11: Hypothesis Testing - 2 Broad Approaches Bayesian: Does not reach a firm conclusion.
    [Show full text]
  • A Flexible Statistical Power Analysis Program for the Social, Behavioral, and Biomedical Sciences
    Running Head: G*Power 3 G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences (in press). Behavior Research Methods. Franz Faul Christian-Albrechts-Universität Kiel Kiel, Germany Edgar Erdfelder Universität Mannheim Mannheim, Germany Albert-Georg Lang and Axel Buchner Heinrich-Heine-Universität Düsseldorf Düsseldorf, Germany Please send correspondence to: Prof. Dr. Edgar Erdfelder Lehrstuhl für Psychologie III Universität Mannheim Schloss Ehrenhof Ost 255 D-68131 Mannheim, Germany Email: [email protected] Phone +49 621 / 181 – 2146 Fax: + 49 621 / 181 - 3997 G*Power 3 (BSC702) Page 2 Abstract G*Power (Erdfelder, Faul, & Buchner, Behavior Research Methods, Instruments, & Computers, 1996) was designed as a general stand-alone power analysis program for statistical tests commonly used in social and behavioral research. G*Power 3 is a major extension of, and improvement over, the previous versions. It runs on widely used computer platforms (Windows XP, Windows Vista, Mac OS X 10.4) and covers many different statistical tests of the t-, F-, and !2-test families. In addition, it includes power analyses for z tests and some exact tests. G*Power 3 provides improved effect size calculators and graphic options, it supports both a distribution-based and a design-based input mode, and it offers all types of power analyses users might be interested in. Like its predecessors, G*Power 3 is free. G*Power 3 (BSC702) Page 3 G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences Statistics textbooks in the social, behavioral, and biomedical sciences typically stress the importance of power analyses.
    [Show full text]
  • Power Analysis for the Wald, LR, Score, and Gradient Tests in a Marginal Maximum Likelihood Framework: Applications in IRT
    Power analysis for the Wald, LR, score, and gradient tests in a marginal maximum likelihood framework: Applications in IRT Felix Zimmer1, Clemens Draxler2, and Rudolf Debelak1 1University of Zurich 2The Health and Life Sciences University March 9, 2021 Abstract The Wald, likelihood ratio, score and the recently proposed gradient statistics can be used to assess a broad range of hypotheses in item re- sponse theory models, for instance, to check the overall model fit or to detect differential item functioning. We introduce new methods for power analysis and sample size planning that can be applied when marginal maximum likelihood estimation is used. This avails the application to a variety of IRT models, which are increasingly used in practice, e.g., in large-scale educational assessments. An analytical method utilizes the asymptotic distributions of the statistics under alternative hypotheses. For a larger number of items, we also provide a sampling-based method, which is necessary due to an exponentially increasing computational load of the analytical approach. We performed extensive simulation studies in two practically relevant settings, i.e., testing a Rasch model against a 2PL model and testing for differential item functioning. The observed distribu- tions of the test statistics and the power of the tests agreed well with the predictions by the proposed methods. We provide an openly accessible R package that implements the methods for user-supplied hypotheses. Keywords: marginal maximum likelihood, item response theory, power analysis, Wald test, score test, likelihood ratio, gradient test We wish to thank Carolin Strobl for helpful comments in the course of this research.
    [Show full text]