Chapter 1 Introduction

Total Page:16

File Type:pdf, Size:1020Kb

Chapter 1 Introduction Outline Chapter 1 Introduction • Variable types • Review of binomial and multinomial distributions • Likelihood and maximum likelihood method Yibi Huang • Inference for a binomial proportion (Wald, score and likelihood Department of Statistics ratio tests and confidence intervals) University of Chicago • Sample sample inference 1 Regression methods are used to analyze data when the response variable is numerical • e.g., temperature, blood pressure, heights, speeds, income • Stat 22200, Stat 22400 Methods in categorical data analysis are used when the response variable takes categorical (or qualitative) values Variable Types • e.g., • gender (male, female), • political philosophy (liberal, moderate, conservative), • region (metropolitan, urban, suburban, rural) • Stat 22600 In either case, the explanatory variables can be numerical or categorical. 2 Two Types of Categorical Variables Nominal : unordered categories, e.g., • transport to work (car, bus, bicycle, walk, other) • favorite music (rock, hiphop, pop, classical, jazz, country, folk) Review of Binomial and Ordinal : ordered categories Multinomial Distributions • patient condition (excellent, good, fair, poor) • government spending (too high, about right, too low) We pay special attention to — binary variables: success or failure for which nominal-ordinal distinction is unimportant. 3 Binomial Distributions (Review) Example If n Bernoulli trials are performed: Vote (Dem, Rep). Suppose π = Pr(Dem) = 0:4: • only two possible outcomes for each trial (success, failure) Sample n = 3 voters, let y = number of Dem votes among them. − • π = P(success), 1 π = P(failure), for each trial, n! 3! P(y) = πy (1 − π)n−y = (0:4)y (0:6)3−y • trials are independent y!(n − y)! y!(3 − y)! • Y = number of successes out of n trials 3! P(0) = (0:4)0(0:6)3 = (0:6)3 = 0:216 0!3! then Y has a binomial distribution, denoted as 3! P(1) = (0:4)1(0:6)2 = 3(0:4)(0:6)2 = 0:432 Y ∼ binomial (n; π): 1!2! The probability function of Y is y P(y) ! n 0 0.216 P(Y = y) = πy (1 − π)n−y ; y = 0; 1;:::; n: y 1 0.432 ! n n! where = is the binomial coefficient and 2 0.288 y y!(n − y)! 3 0.064 m! = “m factorial” = m × (m − 1) × (m − 2) × · · · × 1: total 1 Note that 0! = 1. 4 5 R Codes Facts About the Binomial Distribution > dbinom(x=0, size=3, p=0.4) If Y is a binomial (n; π) random variable, then [1] 0.216 > dbinom(0, 3, 0.4) • E(Y) = nπ [1] 0.216 p p • σ(Y) = Var(Y) = nπ(1 − π), > dbinom(1, 3, 0.4) [1] 0.432 • Binomial (n; π) can be approx. by Normal (nπ, nπ(1 − π)) > dbinom(0:3, 3, 0.4) when n is large (nπ > 5 and n(1 − π) > 5). [1] 0.216 0.432 0.288 0.064 binomial(n = 8; π = 0:2) binomial(n = 25; π = 0:2) > plot(0:3, dbinom(0:3, 3, .4), type = "h", xlab = "y", ylab = "P(y)") 0.20 0.30 0.4 0.10 P(y) P(y) 0.3 0.15 P(y) 0.2 0.00 0.00 0.1 0 2 4 6 8 0 5 10 15 0.0 1.0 2.0 3.0 y y y 6 7 Multinomial Distribution — Generalization of Binomial Example If n trials are performed: Suppose proportions of individuals with genotypes AA, Aa, and aa in a large population are • in each trial there are c > 2 possible outcomes (categories) (πAA ; πAa ; πaa ) = (0:25; 0:5; 0:25): Pc • πi = P(category i), for each trial, i=1 πi = 1 Randomly sample n = 5 individuals from the population. • trials are independent The chance of getting 2 AA’s, 2 Aa’s, and 1 aa is • Y = number of trials fall in category i out of n trials 5! i P(Y = 2; Y = 2; Y = 1) = (0:25)2(0:5)2(0:25)1 AA Aa aa 2! 2! 1! 5 · 4 · 3 · 2 · 1 then the joint distribution of (Y ; Y ;:::; Y ) has a multinomial = (0:25)2(0:5)2(0:25)1 ≈ 0:117 1 2 c (2 · 1)(2 · 1)(1) distribution, with probability function and the chance of getting no AA, 3 Aa’s, and 2 aa’s is n! y y y ( = = = ) = 1 2 ··· c 5! 0 3 2 P Y1 y1; Y2 y2;:::; Yc yc π1 π2 πc P(YAA = 0; YAa = 3; Yaa = 2) = (0:25) (0:5) (0:25) y1! y2! ··· yc ! 0! 3! 2! P 5 · 4 · 3 · 2 · 1 0 3 2 where 0 ≤ yi ≤ n for all i and i yi = n: = (0:25) (0:5) (0:25) ≈ 0:078 (1)(3 · 2 · 1)(2 · 1) 8 9 Facts About the Multinomial Distribution If (Y1; Y2;:::; Yc ) has a multinomial distribution with n trials and category probabilities (π1; π2; ··· ; πc ), then Likelihood and Maximum Likelihood Estimation • E(Yi) = nπi for i = 1; 2;:::; c p p • σ(Yi) = Var(Yi) = nπi(1 − πi), • Cov(Yi; Yj) = −nπiπj 10 A Probability Question A Statistics Question A push pin is tossed n = 5 times. Let Y be the number of times the push pin lands on its head. What is P(Y = 3)? Suppose a push pin is observed to land on its head Y = 8 times in n = 20 tosses. Can we infer about the value of Answer. As the tosses are indep., Y is binomial (n = 5, π) π = P(push pin lands on its head in a toss)? n! P(Y = y; π) = πy (1 − π)n−y y!(n − y)! The chance to observe Y = 8 in n = 20 tosses is 8 where π = P(push pin lands on its head in a toss): > 20 (0:3)8(0:7)12 ≈ 0:1143 if π = 0:3 <> 8 P(Y = 8; π) = > :> 20 (0:6)8(0:4)12 ≈ 0:0354 if π = 0:6 If π is known to be 0.4, then 8 It appears that π = 0:3 is more likely than π = 0:6; since the former 5! P(Y = 3; π) = (0:4)3(0:6)2 = 0:2304: 3!2! gives a higher prob. to observe the outcome Y = 8. 11 12 The probability Likelihood ! n P(Y = y; π) = πy (1 − π)n−y = `(πjy) y In general, suppose the observed data (Y1; Y2;:::; Yn) have a joint viewed as a function of π, is called the likelihood function, (or just probability distribution with some parameter(s) θ likelihood) of π, denoted as `(πjy). P(Y1 = y1; Y2 = y2;:::; Yn = yn) = f(y1; y2;:::; ynjθ) It is a measure of the “plausibility” for a value being the true value The likelihood function for the parameterθ is of π. `(θ) = `(θjy1; y2;:::; yn) = f(y1; y2;:::; ynjθ): 1.0 0.8 y=0 • Note the likelihood function regards the probability as a 0.6 function of the parameter θ rather than as a function of the 0.4 y=2 y=8 y=14 data y1; y2;:::; yn. Likelihood 0.2 • If 0.0 `(θ1jy1;:::; yn) > `(θ2jy1;:::; yn); 0.0 0.2 0.4 0.6 0.8 1.0 π then θ1 appears more plausible to be the true value of θ than θ2 does, given the observed data y1;:::; yn: Curves for the likelihood `(πjy) at different values of y for n = 20: 13 14 Maximum Likelihood Estimate (MLE) Maximizing the Log-likelihood The maximum likelihood estimate (MLE) of a parameter θ is the value at which the likelihood function is maximized. Rather than maximizing the likelihood, it is usually computationally Example. If a push pin lands on head Y = 8 times in n = 20 easier to maximize its logarithm, called the log-likelihood, tosses, the likelihood function ! 20 log `(πjy) `(πjy = 8) = π8(1 − π)12 8 which is equivalent since logarithm is strictly increasing, reach its maximum at π = 0:4, the MLE for π is bπ = 0:4 given the data Y = 8: x1 > x2 () log(x1) > log(x2): 0.15 So 0.10 `(π1jy) > `(π2jy) () log `(π1jy) > log `(π2jy): 0.05 0.00 0.0 0.2 0.4 0.6 0.8 1.0 π 15 16 Example (MLE for Binomial) More Facts about MLEs If the observed data Y ∼ binomial (n; π) but π is unknown, the likelihood of π is ! n 2 `(πjy) = p(Y = yjπ) = πy (1 − π)n−y • If Y1; Y2;:::; Yn are i.i.d. N(µ, σ ), the MLE of µ is the sample y mean Pn Y =n: and the log-likelihood is i=1 i ! n log `(πjy) = log + y log(π) + (n − y) log(1 − π): y • In ordinary linear regression, From calculus, we know a function f(x) reaches its max at x = x0 if 2 d ( ) = = d ( ) = Yi = β0 + β1xi1 + ··· + βpxip + "i dx f x 0 at x x0 and dx2 f x < 0 at x x0: As d y n − y y − nπ log `(πjy) = − = when the noise "i are i.i.d. normal, the usual least squares dπ π 1 − π π(1 − π): estimates for β0; β1; : : : ; βp are MLEs. 2 = d ( j ) = − y − n−y equals 0 when π y=n and dπ2 log ` π y π2 (1−π)2 < 0 is always true, we know log `(πjy) reaches its max when π = y=n. So the MLE of π is y=n: 17 18 Large Sample Optimality of MLEs MLEs are not always the best estimators but they have a number of good properties.
Recommended publications
  • Hypothesis Testing and Likelihood Ratio Tests
    Hypottthesiiis tttestttiiing and llliiikellliiihood ratttiiio tttesttts Y We will adopt the following model for observed data. The distribution of Y = (Y1, ..., Yn) is parameter considered known except for some paramett er ç, which may be a vector ç = (ç1, ..., çk); ç“Ç, the paramettter space. The parameter space will usually be an open set. If Y is a continuous random variable, its probabiiillliiittty densiiittty functttiiion (pdf) will de denoted f(yy;ç) . If Y is y probability mass function y Y y discrete then f(yy;ç) represents the probabii ll ii tt y mass functt ii on (pmf); f(yy;ç) = Pç(YY=yy). A stttatttiiistttiiicalll hypottthesiiis is a statement about the value of ç. We are interested in testing the null hypothesis H0: ç“Ç0 versus the alternative hypothesis H1: ç“Ç1. Where Ç0 and Ç1 ¶ Ç. hypothesis test Naturally Ç0 § Ç1 = ∅, but we need not have Ç0 ∞ Ç1 = Ç. A hypott hesii s tt estt is a procedure critical region for deciding between H0 and H1 based on the sample data. It is equivalent to a crii tt ii call regii on: a critical region is a set C ¶ Rn y such that if y = (y1, ..., yn) “ C, H0 is rejected. Typically C is expressed in terms of the value of some tttesttt stttatttiiistttiiic, a function of the sample data. For µ example, we might have C = {(y , ..., y ): y – 0 ≥ 3.324}. The number 3.324 here is called a 1 n s/ n µ criiitttiiicalll valllue of the test statistic Y – 0 . S/ n If y“C but ç“Ç 0, we have committed a Type I error.
    [Show full text]
  • Binomials, Contingency Tables, Chi-Square Tests
    Binomials, contingency tables, chi-square tests Joe Felsenstein Department of Genome Sciences and Department of Biology Binomials, contingency tables, chi-square tests – p.1/16 Confidence limits on a proportion To work out a confidence interval on binomial proportions, use binom.test. If we want to see whether 0.20 is too high a probability of Heads when we observe 5 Heads out of 50 tosses, we use binom.test(5, 50, 0.2) which gives probability of 5 or fewer Heads as 0.09667, so a test does not exclude 0.20 as the Heads probability. > binom.test(5,50,0.2) Exact binomial test data: 5 and 50 number of successes = 5, number of trials = 50, p-value = 0.07883 alternative hypothesis: true probability of success is not equal to 0.2 95 percent confidence interval: 0.03327509 0.21813537 sample estimates: probability of success 0.1 Binomials, contingency tables, chi-square tests – p.2/16 Confidence intervals and tails of binomials Confidence limits for p if there are 5 Heads out of 50 tosses p = 0.0332750 is 0.025 p = 0.21813537 is 0.025 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Binomials, contingency tables, chi-square tests – p.3/16 Testing equality of binomial proportions How do we test whether two coins have the same Heads probability? This is hard, but there is a good approximation, the chi-square test. You set up a 2 × 2 table of numbers of outcomes: Heads Tails Coin #1 15 25 Coin #2 9 31 In fact the chi-square test can test bigger tables: R rows by C columns.
    [Show full text]
  • Data 8 Final Stats Review
    Data 8 Final Stats review I. Hypothesis Testing Purpose: To answer a question about a process or the world by testing two hypotheses, a null and an alternative. Usually the null hypothesis makes a statement that “the world/process works this way”, and the alternative hypothesis says “the world/process does not work that way”. Examples: Null: “The customer was not cheating-his chances of winning and losing were like random tosses of a fair coin-50% chance of winning, 50% of losing. Any variation from what we expect is due to chance variation.” Alternative: “The customer was cheating-his chances of winning were something other than 50%”. Pro tip: You must be very precise about chances in your hypotheses. Hypotheses such as “the customer cheated” or “Their chances of winning were normal” are vague and might be considered incorrect, because you don’t state the exact chances associated with the events. Pro tip: Null hypothesis should also explain differences in the data. For example, if your hypothesis stated that the coin was fair, then why did you get 70 heads out of 100 flips? Since it’s possible to get that many (though not expected), your null hypothesis should also contain a statement along the lines of “Any difference in outcome from what we expect is due to chance variation”. Steps: 1) Precisely state your null and alternative hypotheses. 2) Decide on a test statistic (think of it as a general formula) to help you either reject or fail to reject the null hypothesis. • If your data is categorical, a good test statistic might be the Total Variation Distance (TVD) between your sample and the distribution it was drawn from.
    [Show full text]
  • Use of Statistical Tables
    TUTORIAL | SCOPE USE OF STATISTICAL TABLES Lucy Radford, Jenny V Freeman and Stephen J Walters introduce three important statistical distributions: the standard Normal, t and Chi-squared distributions PREVIOUS TUTORIALS HAVE LOOKED at hypothesis testing1 and basic statistical tests.2–4 As part of the process of statistical hypothesis testing, a test statistic is calculated and compared to a hypothesised critical value and this is used to obtain a P- value. This P-value is then used to decide whether the study results are statistically significant or not. It will explain how statistical tables are used to link test statistics to P-values. This tutorial introduces tables for three important statistical distributions (the TABLE 1. Extract from two-tailed standard Normal, t and Chi-squared standard Normal table. Values distributions) and explains how to use tabulated are P-values corresponding them with the help of some simple to particular cut-offs and are for z examples. values calculated to two decimal places. STANDARD NORMAL DISTRIBUTION TABLE 1 The Normal distribution is widely used in statistics and has been discussed in z 0.00 0.01 0.02 0.03 0.050.04 0.05 0.06 0.07 0.08 0.09 detail previously.5 As the mean of a Normally distributed variable can take 0.00 1.0000 0.9920 0.9840 0.9761 0.9681 0.9601 0.9522 0.9442 0.9362 0.9283 any value (−∞ to ∞) and the standard 0.10 0.9203 0.9124 0.9045 0.8966 0.8887 0.8808 0.8729 0.8650 0.8572 0.8493 deviation any positive value (0 to ∞), 0.20 0.8415 0.8337 0.8259 0.8181 0.8103 0.8206 0.7949 0.7872 0.7795 0.7718 there are an infinite number of possible 0.30 0.7642 0.7566 0.7490 0.7414 0.7339 0.7263 0.7188 0.7114 0.7039 0.6965 Normal distributions.
    [Show full text]
  • The Gompertz Distribution and Maximum Likelihood Estimation of Its Parameters - a Revision
    Max-Planck-Institut für demografi sche Forschung Max Planck Institute for Demographic Research Konrad-Zuse-Strasse 1 · D-18057 Rostock · GERMANY Tel +49 (0) 3 81 20 81 - 0; Fax +49 (0) 3 81 20 81 - 202; http://www.demogr.mpg.de MPIDR WORKING PAPER WP 2012-008 FEBRUARY 2012 The Gompertz distribution and Maximum Likelihood Estimation of its parameters - a revision Adam Lenart ([email protected]) This working paper has been approved for release by: James W. Vaupel ([email protected]), Head of the Laboratory of Survival and Longevity and Head of the Laboratory of Evolutionary Biodemography. © Copyright is held by the authors. Working papers of the Max Planck Institute for Demographic Research receive only limited review. Views or opinions expressed in working papers are attributable to the authors and do not necessarily refl ect those of the Institute. The Gompertz distribution and Maximum Likelihood Estimation of its parameters - a revision Adam Lenart November 28, 2011 Abstract The Gompertz distribution is widely used to describe the distribution of adult deaths. Previous works concentrated on formulating approximate relationships to char- acterize it. However, using the generalized integro-exponential function Milgram (1985) exact formulas can be derived for its moment-generating function and central moments. Based on the exact central moments, higher accuracy approximations can be defined for them. In demographic or actuarial applications, maximum-likelihood estimation is often used to determine the parameters of the Gompertz distribution. By solving the maximum-likelihood estimates analytically, the dimension of the optimization problem can be reduced to one both in the case of discrete and continuous data.
    [Show full text]
  • Binomial Test Models and Item Difficulty
    Binomial Test Models and Item Difficulty Wim J. van der Linden Twente University of Technology In choosing a binomial test model, it is impor- have characteristic functions of the Guttman type. tant to know exactly what conditions are imposed In contrast, the stochastic conception allows non- on item difficulty. In this paper these conditions Guttman items but requires that all characteristic are examined for both a deterministic and a sto- functions must intersect at the same point, which chastic conception of item responses. It appears implies equal classically defined difficulty. The that they are more restrictive than is generally beta-binomial model assumes identical char- understood and differ for both conceptions. When acteristic functions for both conceptions, and this the binomial model is applied to a fixed examinee, also implies equal difficulty. Finally, the compound the deterministic conception imposes no conditions binomial model entails no restrictions on item diffi- on item difficulty but requires instead that all items culty. In educational and psychological testing, binomial models are a class of models increasingly being applied. For example, in the area of criterion-referenced measurement or mastery testing, where tests are usually conceptualized as samples of items randomly drawn from a large pool or do- main, binomial models are frequently used for estimating examinees’ mastery of a domain and for de- termining sample size. Despite the agreement among several writers on the usefulness of binomial models, opinions seem to differ on the restrictions on item difficulties implied by the models. Mill- man (1973, 1974) noted that in applying the binomial models, items may be relatively heterogeneous in difficulty.
    [Show full text]
  • Bitest — Binomial Probability Test
    Title stata.com bitest — Binomial probability test Description Quick start Menu Syntax Option Remarks and examples Stored results Methods and formulas Reference Also see Description bitest performs exact hypothesis tests for binomial random variables. The null hypothesis is that the probability of a success on a trial is #p. The total number of trials is the number of nonmissing values of varname (in bitest) or #N (in bitesti). The number of observed successes is the number of 1s in varname (in bitest) or #succ (in bitesti). varname must contain only 0s, 1s, and missing. bitesti is the immediate form of bitest; see [U] 19 Immediate commands for a general introduction to immediate commands. Quick start Exact test for probability of success (a = 1) is 0.4 bitest a = .4 With additional exact probabilities bitest a = .4, detail Exact test that the probability of success is 0.46, given 22 successes in 74 trials bitesti 74 22 .46 Menu bitest Statistics > Summaries, tables, and tests > Classical tests of hypotheses > Binomial probability test bitesti Statistics > Summaries, tables, and tests > Classical tests of hypotheses > Binomial probability test calculator 1 2 bitest — Binomial probability test Syntax Binomial probability test bitest varname== #p if in weight , detail Immediate form of binomial probability test bitesti #N #succ #p , detail by is allowed with bitest; see [D] by. fweights are allowed with bitest; see [U] 11.1.6 weight. Option Advanced £ £detail shows the probability of the observed number of successes, kobs; the probability of the number of successes on the opposite tail of the distribution that is used to compute the two-sided p-value, kopp; and the probability of the point next to kopp.
    [Show full text]
  • Three Statistical Testing Procedures in Logistic Regression: Their Performance in Differential Item Functioning (DIF) Investigation
    Research Report Three Statistical Testing Procedures in Logistic Regression: Their Performance in Differential Item Functioning (DIF) Investigation Insu Paek December 2009 ETS RR-09-35 Listening. Learning. Leading.® Three Statistical Testing Procedures in Logistic Regression: Their Performance in Differential Item Functioning (DIF) Investigation Insu Paek ETS, Princeton, New Jersey December 2009 As part of its nonprofit mission, ETS conducts and disseminates the results of research to advance quality and equity in education and assessment for the benefit of ETS’s constituents and the field. To obtain a PDF or a print copy of a report, please visit: http://www.ets.org/research/contact.html Copyright © 2009 by Educational Testing Service. All rights reserved. ETS, the ETS logo, GRE, and LISTENING. LEARNING. LEADING. are registered trademarks of Educational Testing Service (ETS). SAT is a registered trademark of the College Board. Abstract Three statistical testing procedures well-known in the maximum likelihood approach are the Wald, likelihood ratio (LR), and score tests. Although well-known, the application of these three testing procedures in the logistic regression method to investigate differential item function (DIF) has not been rigorously made yet. Employing a variety of simulation conditions, this research (a) assessed the three tests’ performance for DIF detection and (b) compared DIF detection in different DIF testing modes (targeted vs. general DIF testing). Simulation results showed small differences between the three tests and different testing modes. However, targeted DIF testing consistently performed better than general DIF testing; the three tests differed more in performance in general DIF testing and nonuniform DIF conditions than in targeted DIF testing and uniform DIF conditions; and the LR and score tests consistently performed better than the Wald test.
    [Show full text]
  • The Likelihood Principle
    1 01/28/99 ãMarc Nerlove 1999 Chapter 1: The Likelihood Principle "What has now appeared is that the mathematical concept of probability is ... inadequate to express our mental confidence or diffidence in making ... inferences, and that the mathematical quantity which usually appears to be appropriate for measuring our order of preference among different possible populations does not in fact obey the laws of probability. To distinguish it from probability, I have used the term 'Likelihood' to designate this quantity; since both the words 'likelihood' and 'probability' are loosely used in common speech to cover both kinds of relationship." R. A. Fisher, Statistical Methods for Research Workers, 1925. "What we can find from a sample is the likelihood of any particular value of r [a parameter], if we define the likelihood as a quantity proportional to the probability that, from a particular population having that particular value of r, a sample having the observed value r [a statistic] should be obtained. So defined, probability and likelihood are quantities of an entirely different nature." R. A. Fisher, "On the 'Probable Error' of a Coefficient of Correlation Deduced from a Small Sample," Metron, 1:3-32, 1921. Introduction The likelihood principle as stated by Edwards (1972, p. 30) is that Within the framework of a statistical model, all the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of those hypotheses on the data. ...For a continuum of hypotheses, this principle
    [Show full text]
  • Testing for INAR Effects
    Communications in Statistics - Simulation and Computation ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: https://www.tandfonline.com/loi/lssp20 Testing for INAR effects Rolf Larsson To cite this article: Rolf Larsson (2020) Testing for INAR effects, Communications in Statistics - Simulation and Computation, 49:10, 2745-2764, DOI: 10.1080/03610918.2018.1530784 To link to this article: https://doi.org/10.1080/03610918.2018.1530784 © 2019 The Author(s). Published with license by Taylor & Francis Group, LLC Published online: 21 Jan 2019. Submit your article to this journal Article views: 294 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=lssp20 COMMUNICATIONS IN STATISTICS - SIMULATION AND COMPUTATIONVR 2020, VOL. 49, NO. 10, 2745–2764 https://doi.org/10.1080/03610918.2018.1530784 Testing for INAR effects Rolf Larsson Department of Mathematics, Uppsala University, Uppsala, Sweden ABSTRACT ARTICLE HISTORY In this article, we focus on the integer valued autoregressive model, Received 10 April 2018 INAR (1), with Poisson innovations. We test the null of serial independ- Accepted 25 September 2018 ence, where the INAR parameter is zero, versus the alternative of a KEYWORDS positive INAR parameter. To this end, we propose different explicit INAR model; Likelihood approximations of the likelihood ratio (LR) statistic. We derive the limit- ratio test ing distributions of our statistics under the null. In a simulation study, we compare size and power of our tests with the score test, proposed by Sun and McCabe [2013. Score statistics for testing serial depend- ence in count data.
    [Show full text]
  • Jise.Org/Volume31/N1/Jisev31n1p72.Html
    Journal of Information Volume 31 Systems Issue 1 Education Winter 2020 Experiences in Using a Multiparadigm and Multiprogramming Approach to Teach an Information Systems Course on Introduction to Programming Juan Gutiérrez-Cárdenas Recommended Citation: Gutiérrez-Cárdenas, J. (2020). Experiences in Using a Multiparadigm and Multiprogramming Approach to Teach an Information Systems Course on Introduction to Programming. Journal of Information Systems Education, 31(1), 72-82. Article Link: http://jise.org/Volume31/n1/JISEv31n1p72.html Initial Submission: 23 December 2018 Accepted: 24 May 2019 Abstract Posted Online: 12 September 2019 Published: 3 March 2020 Full terms and conditions of access and use, archived papers, submission instructions, a search tool, and much more can be found on the JISE website: http://jise.org ISSN: 2574-3872 (Online) 1055-3096 (Print) Journal of Information Systems Education, Vol. 31(1) Winter 2020 Experiences in Using a Multiparadigm and Multiprogramming Approach to Teach an Information Systems Course on Introduction to Programming Juan Gutiérrez-Cárdenas Faculty of Engineering and Architecture Universidad de Lima Lima, 15023, Perú [email protected] ABSTRACT In the current literature, there is limited evidence of the effects of teaching programming languages using two different paradigms concurrently. In this paper, we present our experience in using a multiparadigm and multiprogramming approach for an Introduction to Programming course. The multiparadigm element consisted of teaching the imperative and functional paradigms, while the multiprogramming element involved the Scheme and Python programming languages. For the multiparadigm part, the lectures were oriented to compare the similarities and differences between the functional and imperative approaches. For the multiprogramming part, we chose syntactically simple software tools that have a robust set of prebuilt functions and available libraries.
    [Show full text]
  • Wald (And Score) Tests
    Wald (and Score) Tests 1 / 18 Vector of MLEs is Asymptotically Normal That is, Multivariate Normal This yields I Confidence intervals I Z-tests of H0 : θj = θ0 I Wald tests I Score Tests I Indirectly, the Likelihood Ratio tests 2 / 18 Under Regularity Conditions (Thank you, Mr. Wald) a.s. I θbn → θ √ d −1 I n(θbn − θ) → T ∼ Nk 0, I(θ) 1 −1 I So we say that θbn is asymptotically Nk θ, n I(θ) . I I(θ) is the Fisher Information in one observation. I A k × k matrix ∂2 I(θ) = E[− log f(Y ; θ)] ∂θi∂θj I The Fisher Information in the whole sample is nI(θ) 3 / 18 H0 : Cθ = h Suppose θ = (θ1, . θ7), and the null hypothesis is I θ1 = θ2 I θ6 = θ7 1 1 I 3 (θ1 + θ2 + θ3) = 3 (θ4 + θ5 + θ6) We can write null hypothesis in matrix form as θ1 θ2 1 −1 0 0 0 0 0 θ3 0 0 0 0 0 0 1 −1 θ4 = 0 1 1 1 −1 −1 −1 0 θ5 0 θ6 θ7 4 / 18 p Suppose H0 : Cθ = h is True, and Id(θ)n → I(θ) By Slutsky 6a (Continuous mapping), √ √ d −1 0 n(Cθbn − Cθ) = n(Cθbn − h) → CT ∼ Nk 0, CI(θ) C and −1 p −1 Id(θ)n → I(θ) . Then by Slutsky’s (6c) Stack Theorem, √ ! n(Cθbn − h) d CT → . −1 I(θ)−1 Id(θ)n Finally, by Slutsky 6a again, −1 0 0 −1 Wn = n(Cθb − h) (CId(θ)n C ) (Cθb − h) →d W = (CT − 0)0(CI(θ)−1C0)−1(CT − 0) ∼ χ2(r) 5 / 18 The Wald Test Statistic −1 0 0 −1 Wn = n(Cθbn − h) (CId(θ)n C ) (Cθbn − h) I Again, null hypothesis is H0 : Cθ = h I Matrix C is r × k, r ≤ k, rank r I All we need is a consistent estimator of I(θ) I I(θb) would do I But it’s inconvenient I Need to compute partial derivatives and expected values in ∂2 I(θ) = E[− log f(Y ; θ)] ∂θi∂θj 6 / 18 Observed Fisher Information I To find θbn, minimize the minus log likelihood.
    [Show full text]