Advanced Regression Methods Symposium on Updates on Clinical Research Methodology March 18, 2013

Total Page:16

File Type:pdf, Size:1020Kb

Advanced Regression Methods Symposium on Updates on Clinical Research Methodology March 18, 2013 Advanced Regression Methods Symposium on Updates on Clinical Research Methodology March 18, 2013 Lloyd Mancl, PhD Charles Spiekerman, PhD Oral Health Sciences Oral Health Sciences University of Washington University of Washington [email protected] [email protected] Outline • Introduce common regression methods for different types of outcomes – Logistic regression – Multiple linear regression – Cox proportional hazards regression – Poisson or log-linear regression • Uses of regression: – Adjust for confounding – Assess for effect modification or interaction – Account for non-independent outcomes Uses for Multiple regression analysis • Used to adjust for confounding – In observational studies, groups of interest can differ on other variables that may be related to the outcome. – In an RCT, randomization may not result in balanced groups. • Used to assess simultaneously the associations for several explanatory variables, as well as, interactions between variables – In observational studies, you may be interested in how several explanatory variables are related to an outcome. – In an RCT, we are often interested in testing if treatment is modified by another variable (i.e., interaction or moderation). – In designed experiments, we typically test for interactions between the different study factors. • Used to develop a prediction equation – Not commonly used for this purpose; not covered in this workshop. Multiple regression analysis Models the association between one outcome variable, Y, and multiple variables of interest, X1, X2,…,Xk. Multiple Linear regression model Y = α + β1X1 + β2X2 + … + βkXk + random error Generalized Linear Model G(Y) ~ α + β1X1 + β2X2 + … + βkXk (basically, a more complicated version of the outcome, Y, is related to a linear combination of the variables of interest) Common Multiple Regression Methods Outcome variable Regression method Regression results Slopes & difference Quantitative/continuous Linear regression between means Binary (2 categories) Logistic regression Odds ratio Poisson or Count or count rate Relative risk or rate ratio log-linear regression Cox proportional hazards Time to an event Hazard ratio regression Ordinal (>2 categories) Ordinal logistic regression Odds ratio Multinomial logistic Nominal (>2 categories) Odds ratio regression Regression method depends on the outcome • Continuous or quantitative outcome – linear regression – Amount of attachment loss (mm) – Change in dmfs • Binary outcome – logistic regression – Any new decay – Incident TMD • Time to event outcome - Cox proportional hazards regression – Time to tooth loss – Time to pulp cap failure • Count outcome – Poisson or log-linear regression – Number of new caries – Rate of new caries Example: heart disease and periodontitis • NHANES II –observational study that examined a large number of participants at a baseline visit and followed them for over 10 years to ascertain morbid events. • We are interested in assessing the association between periodontal disease evaluated at baseline and the occurrence of heart disease (CHD) within 10 years of study entry. Logistic regression In this analysis the outcome variable, CHD incidence, is a binary variable, so a regression method we could employ is logistic regression CHD risk by exposure group Group Healthy Periodontal Odds 95% Conf. Int. gums disease Ratio CHD incidence 4.9% 13.5% 3.0 (2.5, 3.7) Logistic regression uses the Odds Ratio as an estimate of association between your independent variable and the outcome Confounding • Confounding occurs when there is a third variable that is strongly related to both the dependent and the independent variable. • This can bias an estimate of association. ? Y X1 X2 • With respect to Periodontitis and CHD, an obvious potential confounder is Age. • We can adjust for the potential confounding effects by entering Age into the logistic regression model as an additional independent variable CHD incidence by periodontal disease Logistic Regression with Periodontal disease as the only independent variable Independent Variable Odds Ratio 95% Conf. Int. Healthy Gums 1 - Periodontal Disease 3.0 (2.5, 3.7) Logistic Regression model with Age added Independent Variable Odds Ratio 95% Conf. Int. Healthy Gums 1 - Periodontal Disease 1.6 (1.3, 2.0) Age (10 year increment) 2.1 (1.9, 2.2) CHD incidence by periodontal disease Logistic Regression model with Age added Independent Variable Odds Ratio 95% Conf. Int. Healthy Gums 1 - Periodontal Disease 1.6 (1.3, 2.0) Age (10 year increment) 2.1 (1.9, 2.2) • By controlling for Age the estimated association of Periodontal disease with CHD is less strong. • The association is still statistically significant (confidence interval does not contain 1). • The Age output indicates 2.1 times higher odds of CHD associated with 10 years greater Age Effect Modification / Interaction • An interaction is when the association or relationship between an explanatory variable and the outcome variable depends on the value of another explanatory variable. • Also called effect modification or moderation. • In extreme cases, an interaction may completely reverse the relationship between the explanatory variable and outcome. • More commonly, the effect is stronger (or weaker) depending on the value of another explanatory variable. • Stratification can be used to identify an interaction. • Can use regression to test for an interaction by adding an interaction term/variable in the regression model, which is the product of two explanatory variables. Chewing Gum Study • Subjects randomly assigned to 3 different chewing gums • Outcome was continuous, change in DMFS • Linear regression used to compare the 3 groups, adjusting for baseline DMFS DMFS Change Baseline DMFS Group n Mean (SD) Mean (SD) A 25 -0.72 (5.37) 4.68 (1.02) B 35 -0.83 (3.57) 3.77 (0.55) C 40 2.63 (3.80) 3.67 (0.57) Linear regression results Coefficient Standard Estimate Error P-value Intercept 1.30 0.90 .15 Group B (vs A) -0.50 1.01 .62 Group C (vs A) 2.91 0.98 .004 Baseline DMFS -0.43 0.10 <.001 Group main effect, p-value <.001 • Model estimates a constant group difference • DFMS change 2.91 greater for Group C than Group A Linear regression results Coefficient Standard Estimate Error P-value Intercept 1.30 0.90 .15 Group B (vs A) -0.50 1.01 .62 Group C (vs A) 2.91 0.98 .004 Baseline DMFS -0.43 0.10 <.001 Group main effect, p-value <.001 • Model estimates a constant group difference • DFMS change 2.91 greater for Group C than Group A • DFMS change -0.50 less for Group B than Group A Scatterplot of change in DMFS versus baseline DMFS for the treatment groups (A, B, C) 15 Group A Group B Group C 10 5 0 -5 ChangeDMFS in -10 -15 -20 0 5 10 15 20 25 Baseline DMFS • Group x baseline DMFS interaction added to the linear regression model to test if group differences are affected by baseline DMFS Coefficient Standard Estimate Error P-value Intercept 2.96 0.97 .003 Group B (vs A) -1.53 1.33 .25 Group C (vs A) -0.79 1.26 .53 Baseline DMFS -0.79 0.14 <.001 Group B x Baseline DMFS 0.19 0.23 .42 Group C x Baseline DMFS 0.91 0.21 <.001 • Group x Baseline DMFS interaction, p-value <.001 • Difference between Group C and A increases with baseline DMFS Scatterplot of change in DMFS versus baseline DMFS for the treatment groups (A, B, C) 15 Group A Group B Group C 10 5 0 -5 ChangeDMFS in -10 -15 -20 0 5 10 15 20 25 Baseline DMFS • Difference between Group C and A increases with baseline DMFS Effect Modification / Interaction • Test for interactions after all main effects are included in the regression model. • Typically, only assess for two-way interactions. • Usually only test interactions, when at least one of the variables has a significant main effect. • (Exceptions for designed experiments involving a small number of factors, where all possible interactions may be assessed). • Lower significance level (e.g., p<0.01) may be used, if testing a large number of interactions, to control type I error due to multiple comparisons. Survival Analysis: Time to event data • In some studies the outcome of interest is the time until an event – Time to implant failure – Time to tooth loss – Time to death • Analyses of this type of data are commonly called “Survival Analysis” Censored events • In most time to event studies a non-trivial portion of the events will not be observed because they don’t occur during the period of observation. • These unobserved events are considered “censored” • For the censored events we don’t know the actual time until the event, but we do know that the time until event is at least as great as the time until the patient was last seen. • Survival analysis uses the information on the complete observations and the censored observations in a smart way. Censored events Enrollment date Event date Time Start of End of enrollment study • The 2nd, 4th and 5th patients have censored times to event. • The 1st, 3rd and 6th patients we know the exact times to event. Periodontal disease and tooth loss • One hundred periodontal patients under maintenance care*. • Interest in assessing factors associated with tooth loss. • Outcome is time to loss of tooth. • We will look at oral hygiene, patient age, and smoking. *M. McGuire & M. Nunn, J Periodontology, 1996; 67:666-674. Kaplan-Meier survival plots 1.00 1.00 1.00 Hygiene Age Smoking good or fair less than 50 not smoker poor 50 or older smoker 0.98 0.98 0.98 0.96 0.96 0.96 Survival Probability Survival 0.94 0.94 0.94 0.92 0.92 0.92 0 5 10 15 0 5 10 15 0 5 10 15 Time (years) Time (years) Time (years) Kaplan-Meier plots present estimates of the survival function, S(t). S(t) = Probability of surviving to time t Cox proportional hazards regression • If some simplifying assumptions hold, then one can compare survival probabilities using a regression framework.
Recommended publications
  • An Introduction to Poisson Regression Russ Lavery, K&L Consulting Services, King of Prussia, PA, U.S.A
    NESUG 2010 Statistics and Analysis An Animated Guide: An Introduction To Poisson Regression Russ Lavery, K&L Consulting Services, King of Prussia, PA, U.S.A. ABSTRACT: This paper will be a brief introduction to Poisson regression (theory, steps to be followed, complications and interpretation) via a worked example. It is hoped that this will increase motivation towards learning this useful statistical technique. INTRODUCTION: Poisson regression is available in SAS through the GENMOD procedure (generalized modeling). It is appropriate when: 1) the process that generates the conditional Y distributions would, theoretically, be expected to be a Poisson random process and 2) when there is no evidence of overdispersion and 3) when the mean of the marginal distribution is less than ten (preferably less than five and ideally close to one). THE POISSON DISTRIBUTION: The Poison distribution is a discrete Percent of observations where the random variable X is expected distribution and is appropriate for to have the value x, given that the Poisson distribution has a mean modeling counts of observations. of λ= P(X=x, λ ) = (e - λ * λ X) / X! Counts are observed cases, like the 0.4 count of measles cases in cities. You λ can simply model counts if all data 0.35 were collected in the same measuring 0.3 unit (e.g. the same number of days or 0.3 0.5 0.8 same number of square feet). 0.25 λ 1 0.2 3 You can use the Poisson Distribution = 5 for modeling rates (rates are counts 0.15 20 per unit) if the units of collection were 8 different.
    [Show full text]
  • Generalized Linear Models (Glms)
    San Jos´eState University Math 261A: Regression Theory & Methods Generalized Linear Models (GLMs) Dr. Guangliang Chen This lecture is based on the following textbook sections: • Chapter 13: 13.1 – 13.3 Outline of this presentation: • What is a GLM? • Logistic regression • Poisson regression Generalized Linear Models (GLMs) What is a GLM? In ordinary linear regression, we assume that the response is a linear function of the regressors plus Gaussian noise: 0 2 y = β0 + β1x1 + ··· + βkxk + ∼ N(x β, σ ) | {z } |{z} linear form x0β N(0,σ2) noise The model can be reformulate in terms of • distribution of the response: y | x ∼ N(µ, σ2), and • dependence of the mean on the predictors: µ = E(y | x) = x0β Dr. Guangliang Chen | Mathematics & Statistics, San Jos´e State University3/24 Generalized Linear Models (GLMs) beta=(1,2) 5 4 3 β0 + β1x b y 2 y 1 0 −1 0.0 0.2 0.4 0.6 0.8 1.0 x x Dr. Guangliang Chen | Mathematics & Statistics, San Jos´e State University4/24 Generalized Linear Models (GLMs) Generalized linear models (GLM) extend linear regression by allowing the response variable to have • a general distribution (with mean µ = E(y | x)) and • a mean that depends on the predictors through a link function g: That is, g(µ) = β0x or equivalently, µ = g−1(β0x) Dr. Guangliang Chen | Mathematics & Statistics, San Jos´e State University5/24 Generalized Linear Models (GLMs) In GLM, the response is typically assumed to have a distribution in the exponential family, which is a large class of probability distributions that have pdfs of the form f(x | θ) = a(x)b(θ) exp(c(θ) · T (x)), including • Normal - ordinary linear regression • Bernoulli - Logistic regression, modeling binary data • Binomial - Multinomial logistic regression, modeling general cate- gorical data • Poisson - Poisson regression, modeling count data • Exponential, Gamma - survival analysis Dr.
    [Show full text]
  • Moderated Mediation Analysis: a Review and Application to School Climate Research
    Practical Assessment, Research, and Evaluation Volume 25 Article 5 2020 Moderated Mediation Analysis: A Review and Application to School Climate Research Kelly D. Edwards University of Virginia Timothy R. Konold University of Virginia Follow this and additional works at: https://scholarworks.umass.edu/pare Part of the Educational Assessment, Evaluation, and Research Commons, Educational Methods Commons, and the Social Statistics Commons Recommended Citation Edwards, Kelly D. and Konold, Timothy R. (2020) "Moderated Mediation Analysis: A Review and Application to School Climate Research," Practical Assessment, Research, and Evaluation: Vol. 25 , Article 5. Available at: https://scholarworks.umass.edu/pare/vol25/iss1/5 This Article is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Practical Assessment, Research, and Evaluation by an authorized editor of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. Moderated Mediation Analysis: A Review and Application to School Climate Research Cover Page Footnote We thank members of our research team including Dewey Cornell, Anna Grace Burnette, Brittany Zellers Crowley, Katrina Debnam, Francis Huang, Yuane Jia, Jennifer Maeng, and Shelby Stohlman. This project was supported by Grant #NIJ 2017-CK-BX-007 awarded by the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice. Surveying was conducted in collaboration with the Center for School and Campus Safety at the Virginia Department of Criminal Justice Services. The opinions, findings, and conclusions or ecommendationsr expressed in this report are those of the authors and do not necessarily reflect those of the U.S.
    [Show full text]
  • Mediation and Moderation Analyses with R
    Mediation and Moderation Analyses with R Stephen D. Short Saturday, February 28, 2015 Stephen D. Short (College of Charleston) Mediation and Moderation Analyses with R Saturday, February 28, 2015 1 / 25 Overview Mediation analysis in R Simple mediation model example Multiple mediator model example Moderation analysis in R Continuous moderator model example Simple slope figures Tips For slides and code please visit http://stephendshort.wix.com/psyc Stephen D. Short (College of Charleston) Mediation and Moderation Analyses with R Saturday, February 28, 2015 2 / 25 Mediation Occurs when the effect of one variable (X) on another variable (Y) “passes through” a third variable (M) M = a0 + aX + eM 0 Y = b0 + bM + c X + ey The indirect effect is quantified as ab Stephen D. Short (College of Charleston) Mediation and Moderation Analyses with R Saturday, February 28, 2015 3 / 25 Notable Mediation Packages Available in R R packages for mediation analyses BayesMed (Nuijten, Wetzels, Matzke, Dolan, & Wagenmakers, 20015) bmem (Zhang & Wang, 2011) mediation (Tingley, Yamamoto, Hirose, Keele, & Imai, 2014) powerMediation (Qui, 2015) RMediation (Tofighi & MacKinnon, 2010) Functions within other packages mediate () in psych package (Revelle, 2012) mediation () in MBESS package (Kelley & Lai, 2012) Note. This is not a complete list, but merely suggestions for social science researchers Stephen D. Short (College of Charleston) Mediation and Moderation Analyses with R Saturday, February 28, 2015 4 / 25 Example 1: Data From Pollack, VanEpps, & Hayes (2012) Also example data in Hayes (2013) mediation text Does economic stress (X) lead to a desire to withdraw from small business (Y), as a result of negative affect (M)? N = 262 small business owners X = estress (1-7 Likert scale) M = affect (1-5 Likert scale) Y = withdraw (1-7 Likert scale) Example data available from www.afhayes.com Stephen D.
    [Show full text]
  • Generalized Linear Models with Poisson Family: Applications in Ecology
    UNIVERSITY OF ABOMEY- CALAVI *********** FACULTY OF AGRONOMIC SCIENCES *************** **************** Master Program in Statistics, Major Biostatistics 1st batch Generalized linear models with Poisson family: applications in ecology A thesis submitted to the Faculty of Agronomic Sciences in partial fulfillment of the requirements for the degree of the Master of Sciences in Biostatistics Presented by: LOKONON Enagnon Bruno Supervisor: Pr Romain L. GLELE KAKAÏ, Professor of Biostatistics and Forest estimation Academic year: 2014-2015 UNIVERSITE D’ABOMEY- CALAVI *********** FACULTE DES SCIENCES AGRONOMIQUES *************** ************** Programme de Master en Biostatistiques 1ère Promotion Modèles linéaires généralisés de la famille de Poisson : applications en écologie Mémoire soumis à la Faculté des Sciences Agronomiques pour obtenir le Diplôme de Master recherche en Biostatistiques Présenté par: LOKONON Enagnon Bruno Superviseur: Pr Romain L. GLELE KAKAÏ, Professeur titulaire de Biostatistiques et estimation forestière Année académique: 2014-2015 Certification I certify that this work has been achieved by LOKONON E. Bruno under my entire supervision at the University of Abomey-Calavi (Benin) in order to obtain his Master of Science degree in Biostatistics. Pr Romain L. GLELE KAKAÏ Professor of Biostatistics and Forest estimation i Acknowledgements This research was supported by WAAPP/PPAAO-BENIN (West African Agricultural Productivity Program/ Programme de Productivité Agricole en Afrique de l‟Ouest). This dissertation could only have been possible through the generous contributions of many people. First and foremost, I am grateful to my supervisor Pr Romain L. GLELE KAKAÏ, Professor of Biostatistics and Forest estimation who tirelessly played key role in orientation, scientific writing and mentoring during this research. In particular, I thank him for his prompt availability whenever needed.
    [Show full text]
  • Heteroscedastic Errors
    Heteroscedastic Errors ◮ Sometimes plots and/or tests show that the error variances 2 σi = Var(ǫi ) depend on i ◮ Several standard approaches to fixing the problem, depending on the nature of the dependence. ◮ Weighted Least Squares. ◮ Transformation of the response. ◮ Generalized Linear Models. Richard Lockhart STAT 350: Heteroscedastic Errors and GLIM Weighted Least Squares ◮ Suppose variances are known except for a constant factor. 2 2 ◮ That is, σi = σ /wi . ◮ Use weighted least squares. (See Chapter 10 in the text.) ◮ This usually arises realistically in the following situations: ◮ Yi is an average of ni measurements where you know ni . Then wi = ni . 2 ◮ Plots suggest that σi might be proportional to some power of 2 γ γ some covariate: σi = kxi . Then wi = xi− . Richard Lockhart STAT 350: Heteroscedastic Errors and GLIM Variances depending on (mean of) Y ◮ Two standard approaches are available: ◮ Older approach is transformation. ◮ Newer approach is use of generalized linear model; see STAT 402. Richard Lockhart STAT 350: Heteroscedastic Errors and GLIM Transformation ◮ Compute Yi∗ = g(Yi ) for some function g like logarithm or square root. ◮ Then regress Yi∗ on the covariates. ◮ This approach sometimes works for skewed response variables like income; ◮ after transformation we occasionally find the errors are more nearly normal, more homoscedastic and that the model is simpler. ◮ See page 130ff and check under transformations and Box-Cox in the index. Richard Lockhart STAT 350: Heteroscedastic Errors and GLIM Generalized Linear Models ◮ Transformation uses the model T E(g(Yi )) = xi β while generalized linear models use T g(E(Yi )) = xi β ◮ Generally latter approach offers more flexibility.
    [Show full text]
  • Generalized Linear Models
    CHAPTER 6 Generalized linear models 6.1 Introduction Generalized linear modeling is a framework for statistical analysis that includes linear and logistic regression as special cases. Linear regression directly predicts continuous data y from a linear predictor Xβ = β0 + X1β1 + + Xkβk.Logistic regression predicts Pr(y =1)forbinarydatafromalinearpredictorwithaninverse-··· logit transformation. A generalized linear model involves: 1. A data vector y =(y1,...,yn) 2. Predictors X and coefficients β,formingalinearpredictorXβ 1 3. A link function g,yieldingavectoroftransformeddataˆy = g− (Xβ)thatare used to model the data 4. A data distribution, p(y yˆ) | 5. Possibly other parameters, such as variances, overdispersions, and cutpoints, involved in the predictors, link function, and data distribution. The options in a generalized linear model are the transformation g and the data distribution p. In linear regression,thetransformationistheidentity(thatis,g(u) u)and • the data distribution is normal, with standard deviation σ estimated from≡ data. 1 1 In logistic regression,thetransformationistheinverse-logit,g− (u)=logit− (u) • (see Figure 5.2a on page 80) and the data distribution is defined by the proba- bility for binary data: Pr(y =1)=y ˆ. This chapter discusses several other classes of generalized linear model, which we list here for convenience: The Poisson model (Section 6.2) is used for count data; that is, where each • data point yi can equal 0, 1, 2, ....Theusualtransformationg used here is the logarithmic, so that g(u)=exp(u)transformsacontinuouslinearpredictorXiβ to a positivey ˆi.ThedatadistributionisPoisson. It is usually a good idea to add a parameter to this model to capture overdis- persion,thatis,variationinthedatabeyondwhatwouldbepredictedfromthe Poisson distribution alone.
    [Show full text]
  • Smartphone Use and Academic Performance of University Students: a Mediation and Moderation Analysis
    sustainability Article Smartphone Use and Academic Performance of University Students: A Mediation and Moderation Analysis Rizwan Raheem Ahmed 1,* , Faryal Salman 2, Shahab Alam Malik 1, Dalia Streimikiene 3,* , Riaz Hussain Soomro 2 and Munwar Hussain Pahi 4 1 Faculty of Management Sciences, Indus University, Block-17, Gulshan, Karachi 75300, Pakistan; [email protected] 2 Institute of Health Management, Dow University of Health Sciences, Mission Road, Karachi 74200, Pakistan; [email protected] (F.S.); [email protected] (R.H.S.) 3 Institute of Sport Science and Innovations, Lithuanian Sports University, Sporto str. 6, Kaunas 44221, Lithuania 4 College of Business Management, PAF-KIET University, Korangi Creek, Karachi 75190, Pakistan; [email protected] * Correspondence: [email protected] (R.R.A.); [email protected] (D.S.) Received: 3 December 2019; Accepted: 1 January 2020; Published: 6 January 2020 Abstract: The purpose of the undertaken study is to examine the influence of smartphones on the performance of university students in Pakistan. This paper also investigates the functions of a smartphone as exogenous predictors such as smartphone applications, multimedia messaging service (MMS), short message service (SMS), warp-speed processing, and entertainment on the academic performance of a student. This paper also addresses the impact of electronic word of mouth (eWOM) and attitude as mediating variables between exogenous and endogenous variables. Finally, we incorporated technology and addiction as moderating variables between independent variables and the outcome variable to measure the influence of moderating variables. We have taken 684 responses from seven universities in Pakistan and employed the SEM-based multivariate approach for the analysis of the data.
    [Show full text]
  • Moderation Fundamentals
    Moderation Fundamentals: - Moderation refers to a change in the relationship between an independent variable and a dependent variable, depending on the level of a third variable, termed the moderator variable. Moderating effects are also referred to as interaction and conditioning effects. * For two continuous variables, moderation means that the slope of the relationship between the independent and dependent variable varies (i.e., increases or decreases) according to the level of the moderator variable. * For a continuous independent variable and a categorical moderator variable, moderation means that the slope of the relationship between the independent and dependent variable differs across the groups represented by the categorical moderator variable. * For a categorical independent variable and a continuous moderator variable, moderation means that the differences between the group means represented by the levels of the categorical independent variable differ according to the level of the moderator variable. * For two categorical variables, moderation means that the difference between the group means for the categorical independent variable differ depending on group membership on the moderator variable. - When the predictor and moderator variables are continuous, a single product is needed to capture the moderating effect. When one variable is continuous and the other is categorical, the required number of product terms is g – 1, where g equals the number of groups represented by the categorical variable. When both variables are categorical, the required number of product terms is (g1 – 1)(g2 – 1), where g1 and g2 are the number of groups represented by the two categorical variables. - Interactions can range up to the kth order, where k represents the number of variables on the right side of the equation.
    [Show full text]
  • Bayesian Hierarchical Poisson Regression Model for Overdispersed Count Data
    Bayesian Hierarchical Poisson Regression Model for Overdispersed Count Data Overview This example uses the RANDOM statement in MCMC procedure to fit a Bayesian hierarchical Poisson regression model to overdispersed count data. The RANDOM statement, available in SAS/STAT 9.3 and later, provides a convenient way to specify random effects with substantionally improved performance. Overdispersion occurs when count data appear more dispersed than expected under a reference model. Overdispersion can be caused by positive correlation among the observations, an incorrect model, an in- correct distributional specification, or incorrect variance functions. The example displays how Bayesian hierarchical Poisson regression models are effective in capturing overdispersion and providing a better fit. The SAS source code for this example is available as a text file attachment. In Adobe Acrobat, right-click the icon in the margin and select Save Embedded File to Disk. You can also double-click the icon to open the file immediately. Analysis Count data frequently display overdispersion (more variation than expected from a standard parametric model). Breslow(1984) discusses these types of models and suggests several different ways to model them. Hierarchical Poisson models have been found effective in capturing the overdispersion in data sets with extra Poisson variation. Hierarchical Poisson regression models are expressed as Poisson models with a log link and a normal vari- ance on the mean parameter. More formally, a hierarchical Poisson regression model is written as Yij ij Poisson.ij / j log.ij / Xi ˇ ij D C 2 ij normal.0; / for i 1; :::; n, j 1; :::; J , and y 0; 1; 2; ::: .
    [Show full text]
  • The Great Moderation and the Relationship Between Output Growth and Its Volatility
    The Great Moderation and the Relationship between Output Growth and Its Volatility WenShwo Fang Department of Economics Feng Chia University 100 WenHwa Road Taichung, TAIWAN [email protected] and Stephen M. Miller* College of Business University of Nevada, Las Vegas 4505 Maryland Parkway Las Vegas, Nevada, USA 89154-6005 [email protected] Abstract: This study examines the effect of the Great Moderation on the relationship between U.S. output growth and its volatility over the period 1947 to 2006. First, we consider the possible effects of structural change in the volatility process. In so doing, we employ GARCH-M and ARCH-M specifications of the process describing output growth rate and its volatility with and without a one-time structural break in volatility. Second, our data analyses and empirical results suggest no significant relationship between the output growth rate and its volatility, favoring the traditional wisdom of dichotomy in macroeconomics. Moreover, the evidence shows that the time-varying variance falls sharply or even disappears once we incorporate a one-time structural break in the unconditional variance of output starting 1982 or 1984. That is, the integrated GARCH effect proves spurious. Finally, a joint test of a trend change and a one-time shift in the volatility process finds that the one-time shift dominates. Keywords: Great Moderation, economic growth and volatility, structural change in variance, IGARCH JEL classification: C32; E32; O40 * Corresponding author 1. Introduction Macroeconomic volatility declined substantially during the past 20 years. Kim and Nelson (1999), McConnell and Perez-Quiros (2000), Blanchard and Simon (2001), Stock and Watson (2003), and Ahmed, Levin, and Wilson (2004), among others, document this Great Moderation in the volatility of U.S.
    [Show full text]
  • Using Geographically Weighted Poisson Regression for County-Level Crash Modeling in California ⇑ Zhibin Li A,B, , Wei Wang A,1, Pan Liu A,2, John M
    Safety Science 58 (2013) 89–97 Contents lists available at SciVerse ScienceDirect Safety Science journal homepage: www.elsevier.com/locate/ssci Using Geographically Weighted Poisson Regression for county-level crash modeling in California ⇑ Zhibin Li a,b, , Wei Wang a,1, Pan Liu a,2, John M. Bigham b,3, David R. Ragland b,3 a School of Transportation, Southeast University, Si Pai Lou #2, Nanjing 210096, China b Safe Transportation Research and Education Center, Institute of Transportation Studies, University of California, Berkeley, 2614 Dwight Way #7374, Berkeley, CA 94720-7374, United States article info abstract Article history: Development of crash prediction models at the county-level has drawn the interests of state agencies for Received 25 December 2012 forecasting the normal level of traffic safety according to a series of countywide characteristics. A com- Received in revised form 11 March 2013 mon technique for the county-level crash modeling is the generalized linear modeling (GLM) procedure. Accepted 13 April 2013 However, the GLM fails to capture the spatial heterogeneity that exists in the relationship between crash counts and explanatory variables over counties. This study aims to evaluate the use of a Geographically Weighted Poisson Regression (GWPR) to capture these spatially varying relationships in the county-level Keywords: crash data. The performance of a GWPR was compared to a traditional GLM. Fatal crashes and countywide Safety factors including traffic patterns, road network attributes, and socio-demographic characteristics were Crash County-level collected from the 58 counties in California. Results showed that the GWPR was useful in capturing Geographically Weighted Regression the spatially non-stationary relationships between crashes and predicting factors at the county level.
    [Show full text]