Njit-Etd2007-041

Total Page:16

File Type:pdf, Size:1020Kb

Njit-Etd2007-041 Copyright Warning & Restrictions The copyright law of the United States (Title 17, United States Code) governs the making of photocopies or other reproductions of copyrighted material. Under certain conditions specified in the law, libraries and archives are authorized to furnish a photocopy or other reproduction. One of these specified conditions is that the photocopy or reproduction is not to be “used for any purpose other than private study, scholarship, or research.” If a, user makes a request for, or later uses, a photocopy or reproduction for purposes in excess of “fair use” that user may be liable for copyright infringement, This institution reserves the right to refuse to accept a copying order if, in its judgment, fulfillment of the order would involve violation of copyright law. Please Note: The author retains the copyright while the New Jersey Institute of Technology reserves the right to distribute this thesis or dissertation Printing note: If you do not wish to print this page, then select “Pages from: first page # to: last page #” on the print dialog screen The Van Houten library has removed some of the personal information and all signatures from the approval page and biographical sketches of theses and dissertations in order to protect the identity of NJIT graduates and faculty. ABSTRACT PROBLEMS RELATED TO EFFICACY MEASUREMENT AND ANALYSES by Sibabrata Banerjee In clinical research it is very common to compare two treatments on the basis of an efficacy vrbl Mr pfll f Χ nd Υ dnt th rpn f ptnt n th t trtnt A nd rptvl th ntt (Y (hh n b lld th prbblt ndx fr th Efft Sz f ntrt n lnl ttt h bjtv f th td t drv n ff r tht ld pr t trtnt r nfrtvl nd bjtvl prd t th rlr pprh Krnl dnt ttn fl nn-prtr thd tht h nt bn ll tlzd n ppld tttl tl nl d t t pttnl plxt h rrnt td h tht th thd rbt vn ndr rrltn trtr tht r drn th pttn f ll pbl dffrn h rnl thd n b ppld t th ttn f th OC (vr Oprtn Chrtrt rv ll t th plnttn f nn- prtrz rrn f OC h r ndr th OC rv (AC hh xtl l t th ntt (Y l xplrd n th drttn h thdl d fr th td t nrlz t thr r f ppltn OEMS EAE ΤΟ EICACY MEASUREMENT AND ANALYSES by Sibabrata Banerjee A Dissertation Submitted to the Faculty of New Jersey Institute of Technology in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Mathematical Sciences Department of Mathematical Sciences May 2007 Copyright © 2007 by Sibabrata Banerjee ALL RIGHTS RESERVED APPROVAL PAGE PROBLEMS RELATED TO EFFICACY MEASUREMENT AND ANALYSES Sibabrata Baneriee Dr. Sunil.K. War, Dissertation Advisor Date Associate Professor, Mathematical Sciences, NJIT Dr. Mannish Bhattacharjee, Committee Member Date Professor Mathematical Sciences, NJIT Dr. Farad Kianifard, Committee Member Date Senior Associate Director, Biometrics, Novartis Parma, NJ Dr. Sandie Sin}iaray, Committee Member Date Research Scientist R&D, Educational Testing Service, NJ Dr. Thomas Spencer I, Committee Member 'Date Professor, School of Management, Walden University, NJ Dr. Kaushik Ghosth Committee Member Dte Assistant Professor, Mathematical Sciences, NJIT BIOGRAPHICAL SKETCH Author: Sibabrata Banerjee Degree: Doctor of Philosophy Date: May 2007 Undergraduate and Graduate Education: • Doctor of Philosophy in Mathematical Sciences, New Jersey Institute of Technology, Newark, NJ, 2007 • Master of Science in Statistics, Rniversity of Maryland, Baltimore, MD, 2003 • Master of Statistics, Indian Statistical Institute, Kolkata, India, 1997 • Bachelor of Statistics, Indian Statistical Institute, Kolkata, India, 1995 Major: Mathematical Sciences Presentations and Publications: Banerjee S. and Char, Sunil "Problems Related to Efficacy Measurement and Analysis" Joint Statistical Meeting (GSM) Seattle, Washington 2006 Banerjee S. and Boo, Wonsuk "A Comparison Study of Models for the Human Sex Ratio," Joint Statistical Meeting (JSM) Minneapolis, Minnesota, 2005 Banerjee S. and Roy Anindya, "Finite Sample Efficiency of Local linear Estimation in EXPAR Models," Frontiers in Applied and Computational Mathematics (ACM) Conference, Newark, NJ 2004 iv Το blvd prnt t lvl f dr tr nd dr prnt n l ν ACKNOWLEDGMENT I ld l t xpr nr rttd t r Snl Chr h ntrdd t th prbl h l vn ntl d nd tn thn ld hv bn pbl tht h ntnt nrnt nd pprt It ld hv bn pbl t hv dn nthn tht r Mnnh htthrj n hlp l hd h dr pn I ld l nt h r th n prbl nd t th ltn r trn hnt tht ld ld t th ltn r rd Knfrd drv pl thn h n pprh trd lvn th prbl hn n n f h ppr h l vn nbr f pntr nd d fr rrh nd h flld th r vr ll Whtvr 1 hv nd t lrn bt th fl f rrh dnt tl fr h r h Spnr h dntd vrl fl b t n tht nd ttt fr h prnl lltn M nr thn t h I ld l l t thn r Snd Snhr fr h thnl dn fr t t t nr t hl nd I vr frtnt t hv h n tt r Kh Ght l nr n hl nd n prfr h hlpd nl th h t bth th thnl nd nn-thnl M f hrr bn tttn hrlf hlpd th hr npt fr t t t Alt vrthn tht I hv nd t lrn bt thnl prnttn d t hr I l thnfl t th prtnt f Mthtl Sn nd th nvrt fr pprtn r nd fr th ptn nd thr flt vlbl t vi AE O COES Chptr Page 1 IOCIO 1 11 Objtv 1 1 rnd Infrtn 2 O-AAMEIC ESS 7 1 fnn th rbl 7 nt Ettn Mthd 15 3 Srvvl Anl 1 vr Oprtn Chrtrt Crv 1 3 AAMEIC ESS 31 Expnntl UMUE fr ES 20 3 Mnt Crl Sltn f th Ettr .. 26 KEE ESI ESIMAIO 7 1 Apprh 7 Ettn fr pndnt t 3 3 Sltn 3 5 ECEIE OEAIG CAACEISIC CE 51 51 n OC 51 5 OC fr nt .. 51 53 Ettn th rvtv f OC nd th AC n th nt 57 5 OC Mr f Eff 59 55 rl lt .. 3 vii TABLE OF CONTENTS (Continued) Chapter Page COCSIOS A E SIES .. 66 AEI SOCE COES 68 EEECES 77 viii LIST OF TABLES Table Page 1 trbtn f ld rr Chn r ln fr Anthprtnv r 9 2.2 Sr Sttt 1 3 Cvn l fr ffrnt t f Mn f Spl 1 31 lt fr Mnt Crl Sltn 5 1 Ch f Krnl 31 4.2 Effn f Krnl Cprd t Epnhnv Krnl 3 3- Cprn f nt Ett fr ffrnt Krnl n th S nddth f 15 11 nd 115 35 4.4 Cprn f Ettd Mn Intrtd Srd Errr f nt Cptd fr Indpndnt nd pndnt t 44 5 Ettd rbblt Gvn Aln th th Mn Srd Errr 48 4.6 Ettd rbblt Gvn Aln th 95% ttrp Cnfdn Intrvl nd Eprl rbblt 9 51 Efft Sz Mr fr ffrnt Mthd f Ettn 64 5 Cnfdn Intrvl f OC Crv 5 53 Ettd Optl hrhld l 66 ix LIST OF FIGURES Figure Page 1 OC rv 1 1 Unvrt Krnl dnt tt f th fft f t blndd bld prr lrn dr vn ln th th rnl t th vltn pnt n bnddth 9 4.2 Unvrt Krnl dnt tt f th fft f t blndd bld prr lrn dr vn ln th th rnl t th vltn pnt n bnddth 9 3 nvrt Krnl dnt tt f th fft f t blndd bld prr lrn dr vn ln th th Epnhnv rnl t th vltn pnt n bnddth = nd 5 3 4.4 nvrt Krnl dnt tt f th fft f t blndd bld prr lrn dr vn ln th th Epnhnv rnl t th vltn pnt n bnddth = 9 nd 1 33 5 Unvrt Krnl dnt tt f th fft f t blndd bld prr lrn dr vn ln th th Epnhnv rnl t th vltn pnt n bnddth = 15 nd 3 nvrt Krnl dnt tt f th fft f t blndd bld prr lrn dr n vrl Krnl nd th bnddth 3 7 Mn ntrtd rd rrr (MISS prn f th dnt tt ptd fr n ndpndnt pl nd dpndnt pl 5 4.8 nt tt prd nt th tr dnt 46 9 nt tt fr vr lr nbr ( f ndpndnt brvtn 7 51 ndn th ptl pnt f drntn n th OC rv 5 5 Ar ndr th OC rv 57 53 h ptl thrhld vl .. 59 5 Eprl OC rv nd 95% nfdn bnd 3 χ CHAPTER 1 INTRODUCTION 1.1 Objective h bjtv f th drttn t td r f efficacy. Strtn th th pf prbl f prn t nt-hprtnv dr n dbl-blnd lnl trl bootstrap kernel density estimate f th dffrn f th dr prpd h bootstrap confidence intervals f th prpd dnt r l ptd h td dntrt tht th thd frl rbt vn ndr dependent trtr tht r drn th pttn f ll pbl dffrn n hh f th nfrn bd A rv f th xtn thd f ff rnt h bn nldd hhlhtn th lnt pnt nd lttn f h thd h thd r l tlzd t td rvvl nl r pfll th tt f th rvvl fntn nd th hzrd fntn A dnt bd lrth t tt th drvtv f th rvr prtn hrtrt rv ntrdd n th td An ppltn f th thd n drnnt nl xplrd hr h drttn l tlz lrth h kernel density ttn nearest neighbor ttn nd ROC regression nd h tht th n b fftvl dptd ppld tttl tl n th r f phrtl ttt In th ntxt f kernel density ttn bandwidth ltn thd pl rl rl trtr rv f h thd r nldd n rnbl dtl All th bnddth ltn thd fll ndr nrl l f prbl tht n b bd ndr tht ptztn Stht ptztn rlt nd prbl r th b f ll tttl thr 1 2 Expl f th ptztn nld maximum likelihood ttn likelihood ratio tt Neyman-Pearson Lemma nd ptztn f th bias nd th variance f n ttr .2 rnd Infrtn Cprn t trtnt th rpt t prr ff vrbl prbl hh nl nntrd n th lnl td Svrl prtr nd rprtrzd thd r d t fnd ltn t th prbl rtr pprh r ftn bd n nrlt ptn rprtrzd pprh r prrl rn bd tt l th Wilcoxon-Mann-Whitney(WMW) tt Efft z (ES prnt th ntd f th dffrn btn t trtnt ndr ndrtn In n f th rnt pprh ES prntd thtl xprn nt l ndrtd b lnn An dl ES r ht t pt nd ppl t bth lnn nd tttn An xpl f h r vn b th prbblt ndx P (Y X) , hr nd dnt th prfrn r f t ptn dr (n f hh b plb r t d tht th lrr th prfrn vl th r ff th dr On f th frt td tht rvd th ttn f th ntt P (Y X) is Wlf nd (1971 h hv ndrd th ttn ndr th nrlt ptn ll tht th ptn Cnfdn bnd fr P (Y X) nd rltd ntt l xplrd th vrl xpl Snff t l (19 hv xplrd th ntt P (Y X) nd rprtrzd vrn P (Y — P (X Y n trl dt n fr t ppltn In th ppr th hv ntrdd hbrd ttr th prtr ll rprtrzd 3 prprt h ttr ntll Wlxn-Mnn-Whtn (WMW ttt xpt th prprtn r lltd n nrlt ptn h hv l ntrdd n th ppr th ttr f P(Y>X) — (Y bd n nn-prtń dnt ttn prdr h hv l hn bth th prl td nd thrtl jtftn b vrn ttn nd ptt rlt tht th hbrd ttr pseudo-MLE th ll t v rt rlt ndr n rtn t tht th thd ttr bd n nn-prtń dnt ttn thd h bn d n trl dt hr td v vdn tht th ttr pttll vlnt t th MW ttt nd b rrtn thd nd bttrp trtnt t vn r ttrtv (b rdd th ttr Mthd bd n bttrp nfdn ntrvl f th ntt hv bn dd b Chn nd Knfrd ( It h bn rrtl pntd t b th tht th rjtn f th Wilcoxon-Mann-Whitney tt ld pl tht th t dtrbtn ndrln th brvtn r nt th hrfr n n ff rnt nl t n b rnbl nfrrd tht n trtnt bttr thn th thr vr th d nt prvd ntfbl r f ff Al t ntnd n th ppr nd lltrtd th n xpl tht t nt rnbl t
Recommended publications
  • Auditor: an R Package for Model-Agnostic Visual Validation and Diagnostics
    auditor: an R Package for Model-Agnostic Visual Validation and Diagnostics APREPRINT Alicja Gosiewska Przemysław Biecek Faculty of Mathematics and Information Science Faculty of Mathematics and Information Science Warsaw University of Technology Warsaw University of Technology Poland Faculty of Mathematics, Informatics and Mechanics [email protected] University of Warsaw Poland [email protected] May 27, 2020 ABSTRACT Machine learning models have spread to almost every area of life. They are successfully applied in biology, medicine, finance, physics, and other fields. With modern software it is easy to train even a complex model that fits the training data and results in high accuracy on test set. The problem arises when models fail confronted with the real-world data. This paper describes methodology and tools for model-agnostic audit. Introduced tech- niques facilitate assessing and comparing the goodness of fit and performance of models. In addition, they may be used for analysis of the similarity of residuals and for identification of outliers and influential observations. The examination is carried out by diagnostic scores and visual verification. Presented methods were implemented in the auditor package for R. Due to flexible and con- sistent grammar, it is simple to validate models of any classes. K eywords machine learning, R, diagnostics, visualization, modeling 1 Introduction Predictive modeling is a process that uses mathematical and computational methods to forecast outcomes. arXiv:1809.07763v4 [stat.CO] 26 May 2020 Lots of algorithms in this area have been developed and are still being develop. Therefore, there are countless possible models to choose from and a lot of ways to train a new new complex model.
    [Show full text]
  • ROC Curve Analysis and Medical Decision Making
    ROC curve analysis and medical decision making: what’s the evidence that matters for evidence-based diagnosis ? Piergiorgio Duca Biometria e Statistica Medica – Dipartimento di Scienze Cliniche Luigi Sacco – Università degli Studi – Via GB Grassi 74 – 20157 MILANO (ITALY) [email protected] 1) The ROC (Receiver Operating Characteristic) curve and the Area Under the Curve (AUC) The ROC curve is the statistical tool used to analyse the accuracy of a diagnostic test with multiple cut-off points. The test could be based on a continuous diagnostic indicant, such as a serum enzyme level, or just on an ordinal one, such as a classification based on radiological imaging. The ROC curve is based on the probability density distributions, in actually diseased and non diseased patients, of the diagnostician’s confidence in a positive diagnosis, and upon a set of cut-off points to separate “positive” and “negative” test results (Egan, 1968; Bamber, 1975; Swets, Pickett, 1982; Metz, 1986; Zhou et al, 2002). The Sensitivity (SE) – the proportion of diseased turned out to be test positive – and the Specificity (SP) – the proportion of non diseased turned out to be test negative – will depend on the particular confidence threshold the observer applies to partition the continuously distributed perceptions of evidence into positive and negative test results. The ROC curve is the plot of all the pairs of True Positive Rates (TPR = SE), as ordinate, and False Positive Rates (FPR = (1 – SP)), as abscissa, related to all the possible cut-off points. An ROC curve represents all of the compromises between SE and SP can be achieved, changing the confidence threshold.
    [Show full text]
  • Estimating the Variance of a Propensity Score Matching Estimator for the Average Treatment Effect
    Observational Studies 4 (2018) 71-96 Submitted 5/15; Published 3/18 Estimating the variance of a propensity score matching estimator for the average treatment effect Ronnie Pingel [email protected] Department of Statistics Uppsala University Uppsala, Sweden Abstract This study considers variance estimation when estimating the asymptotic variance of a propensity score matching estimator for the average treatment effect. We investigate the role of smoothing parameters in a variance estimator based on matching. We also study the properties of estimators using local linear estimation. Simulations demonstrate that large gains can be made in terms of mean squared error, bias and coverage rate by prop- erly selecting smoothing parameters. Alternatively, a residual-based local linear estimator could be used as an estimator of the asymptotic variance. The variance estimators are implemented in analysis to evaluate the effect of right heart catheterisation. Keywords: Average Causal Effect, Causal Inference, Kernel estimator 1. Introduction Matching estimators belong to a class of estimators of average treatment effects (ATEs) in observational studies that seek to balance the distributions of covariates in a treatment and control group (Stuart, 2010). In this study we consider one frequently used matching method, the simple nearest-neighbour matching with replacement (Rubin, 1973; Abadie and Imbens, 2006). Rosenbaum and Rubin (1983) show that instead of matching on covariates directly to remove confounding, it is sufficient to match on the propensity score. In observational studies the propensity score must almost always be estimated. An important contribution is therefore Abadie and Imbens (2016), who derive the large sample distribution of a nearest- neighbour propensity score matching estimator when using the estimated propensity score.
    [Show full text]
  • An Introduction to Logistic Regression: from Basic Concepts to Interpretation with Particular Attention to Nursing Domain
    J Korean Acad Nurs Vol.43 No.2, 154 -164 J Korean Acad Nurs Vol.43 No.2 April 2013 http://dx.doi.org/10.4040/jkan.2013.43.2.154 An Introduction to Logistic Regression: From Basic Concepts to Interpretation with Particular Attention to Nursing Domain Park, Hyeoun-Ae College of Nursing and System Biomedical Informatics National Core Research Center, Seoul National University, Seoul, Korea Purpose: The purpose of this article is twofold: 1) introducing logistic regression (LR), a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, and 2) examining use and reporting of LR in the nursing literature. Methods: Text books on LR and research articles employing LR as main statistical analysis were reviewed. Twenty-three articles published between 2010 and 2011 in the Journal of Korean Academy of Nursing were analyzed for proper use and reporting of LR models. Results: Logistic regression from basic concepts such as odds, odds ratio, logit transformation and logistic curve, assumption, fitting, reporting and interpreting to cautions were presented. Substantial short- comings were found in both use of LR and reporting of results. For many studies, sample size was not sufficiently large to call into question the accuracy of the regression model. Additionally, only one study reported validation analysis. Conclusion: Nurs- ing researchers need to pay greater attention to guidelines concerning the use and reporting of LR models. Key words: Logit function, Maximum likelihood estimation, Odds, Odds ratio, Wald test INTRODUCTION The model serves two purposes: (1) it can predict the value of the depen- dent variable for new values of the independent variables, and (2) it can Multivariable methods of statistical analysis commonly appear in help describe the relative contribution of each independent variable to general health science literature (Bagley, White, & Golomb, 2001).
    [Show full text]
  • Propensity Score Analysis with Hierarchical Data
    Section on Statistics in Epidemiology Propensity score analysis with hierarchical data Fan Li, Alan M. Zaslavsky, Mary Beth Landrum Department of Health Care Policy, Harvard Medical School 180 Longwood Avenue, Boston, MA 02115 October 29, 2007 Abstract nors et al., 1996; D’Agostino, 1998, and references therein). This approach, which involves comparing subjects weighted Propensity score (Rosenbaum and Rubin, 1983) methods are (or stratified, matched) according to their propensity to re- being increasingly used as a less parametric alternative to tra- ceive treatment (i.e., propensity score), attempts to balance ditional regression methods in medical care and health policy subjects in treatment groups in terms of observed character- research. Data collected in these disciplines are often clus- istics as would occur in a randomized experiment. Propensity tered or hierarchically structured, in the sense that subjects are score methods permit control of all observed confounding fac- grouped together in one or more ways that may be relevant tors that might influence both choice of treatment and outcome to the analysis. However, propensity score was developed using a single composite measure, without requiring specifi- and has been applied in settings with unstructured data. In cation of the relationships between the control variables and this report, we present and compare several propensity-score- outcome. weighted estimators of treatment effect in the context of hier- Propensity score methods were developed and have been archically structured
    [Show full text]
  • Logistic Regression, Part I: Problems with the Linear Probability Model
    Logistic Regression, Part I: Problems with the Linear Probability Model (LPM) Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised February 22, 2015 This handout steals heavily from Linear probability, logit, and probit models, by John Aldrich and Forrest Nelson, paper # 45 in the Sage series on Quantitative Applications in the Social Sciences. INTRODUCTION. We are often interested in qualitative dependent variables: • Voting (does or does not vote) • Marital status (married or not) • Fertility (have children or not) • Immigration attitudes (opposes immigration or supports it) In the next few handouts, we will examine different techniques for analyzing qualitative dependent variables; in particular, dichotomous dependent variables. We will first examine the problems with using OLS, and then present logistic regression as a more desirable alternative. OLS AND DICHOTOMOUS DEPENDENT VARIABLES. While estimates derived from regression analysis may be robust against violations of some assumptions, other assumptions are crucial, and violations of them can lead to unreasonable estimates. Such is often the case when the dependent variable is a qualitative measure rather than a continuous, interval measure. If OLS Regression is done with a qualitative dependent variable • it may seriously misestimate the magnitude of the effects of IVs • all of the standard statistical inferences (e.g. hypothesis tests, construction of confidence intervals) are unjustified • regression estimates will be highly sensitive to the range of particular values observed (thus making extrapolations or forecasts beyond the range of the data especially unjustified) OLS REGRESSION AND THE LINEAR PROBABILITY MODEL (LPM). The regression model places no restrictions on the values that the independent variables take on.
    [Show full text]
  • Homoskedasticity
    Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (ujX) = σ2: (23) This means that the variance of the error term u is the same, regardless of the predictor variable X. If assumption (23) is violated, e.g. if Var (ujX) = σ2h(X), then we say the error term is heteroskedastic. Sylvia Fruhwirth-Schnatter¨ Econometrics I WS 2012/13 1-55 Homoskedasticity • Assumption (23) certainly holds, if u and X are assumed to be independent. However, (23) is a weaker assumption. • Assumption (23) implies that σ2 is also the unconditional variance of u, referred to as error variance: Var (u) = E(u2) − (E(u))2 = σ2: Its square root σ is the standard deviation of the error. • It follows that Var (Y jX) = σ2. Sylvia Fruhwirth-Schnatter¨ Econometrics I WS 2012/13 1-56 Variance of the OLS estimator How large is the variation of the OLS estimator around the true parameter? ^ • Difference β1 − β1 is 0 on average • Measure the variation of the OLS estimator around the true parameter through the expected squared difference, i.e. the variance: ( ) ^ ^ 2 Var β1 = E((β1 − β1) ) (24) ( ) ^ ^ ^ 2 • Similarly for β0: Var β0 = E((β0 − β0) ). Sylvia Fruhwirth-Schnatter¨ Econometrics I WS 2012/13 1-57 Variance of the OLS estimator ^ Variance of the slope estimator β1 follows from (22): ( ) N 1 X Var β^ = (x − x)2Var (u ) 1 N 2(s2)2 i i x i=1 σ2 XN σ2 = (x − x)2 = : (25) N 2(s2)2 i Ns2 x i=1 x • The variance of the slope estimator is the larger, the smaller the number of observations N (or the smaller, the larger N).
    [Show full text]
  • HOMOSCEDASTICITY PLOT Graphics Commands
    HOMOSCEDASTICITY PLOT Graphics Commands HOMOSCEDASTICITY PLOT PURPOSE Generates a homoscedasticity plot. DESCRIPTION A homoscedasticity plot is a graphical data analysis technique for assessing the assumption of constant variance across subsets of the data. The first variable is a response variable and the second variable identifies subsets of the data. The mean and standard deviation are calculated for each of these subsets. The following plot is generated: Vertical axis = subset standard deviations; Horizontal axis = subset means. The interpertation of this plot is that the greater the spread on the vertical axis, the less valid is the assumption of constant variance. A common pattern is for the spread (i.e., the standard deviation) to increase as the location (i.e., the mean) increases. This indicates the need for some type of transformation such as a log or square root. SYNTAX HOMOSCEDASTICITY PLOT <y> <tag> <SUBSET/EXCEPT/FOR qualification> where <y> is a response variable; <tag> identifies the subsets; and where the <SUBSET/EXCEPT/FOR qualification> is optional. EXAMPLES HOMOSCEDASTICITY PLOT Y1 TAG HOMOSCEDASTICITY PLOT Y1 TAG SUBSET TAG > 2 NOTE 1 One limitation of the homoscedasticity plot is that it does not give a convenient way to label the groups on the plot. This can be done by using the SUBSET command as in this example (assume Y is the response variable, X the group-id variable): X1LABEL MEANS Y1LABEL STANDARD DEVIATIONS CHARACTER X; LINE BLANK XLIMITS 0 5; YLIMITS 0 4 CHARACTER NORM; TITLE HOMOSCEDASTICITY PLOT HOMOSCEDASTICITY PLOT Y X SUBSET X = 1 PRE-ERASE OFF CHARACTER T HOMOSCEDASTICITY PLOT Y X SUBSET X = 2 CHARACTER CHIS HOMOSCEDASTICITY PLOT Y X SUBSET X = 3 CHARACTER UNIF HOMOSCEDASTICITY PLOT Y X SUBSET X = 4 CHARACTER F HOMOSCEDASTICITY PLOT Y X SUBSET X = 5 NOTE 2 Bartlett’s test is an analytic test for the assumption of constant variance.
    [Show full text]
  • Chapter 10 Heteroskedasticity
    Chapter 10 Heteroskedasticity In the multiple regression model yX , it is assumed that VI() 2 , i.e., 22 Var()i , Cov(ij ) 0, i j 1, 2,..., n . In this case, the diagonal elements of the covariance matrix of are the same indicating that the variance of each i is same and off-diagonal elements of the covariance matrix of are zero indicating that all disturbances are pairwise uncorrelated. This property of constancy of variance is termed as homoskedasticity and disturbances are called as homoskedastic disturbances. In many situations, this assumption may not be plausible, and the variances may not remain the same. The disturbances whose variances are not constant across the observations are called heteroskedastic disturbance, and this property is termed as heteroskedasticity. In this case 2 Var()ii , i 1,2,..., n and disturbances are pairwise uncorrelated. The covariance matrix of disturbances is 2 00 1 00 2 Vdiag( ) (22 , ,..., 2 )2 . 12 n 2 00 n Regression Analysis | Chapter 10 | Heteroskedasticity | Shalabh, IIT Kanpur 1 Graphically, the following pictures depict homoskedasticity and heteroskedasticity. Homoskedasticity Heteroskedasticity (Var(y) increases with x) Heteroskedasticity (Var(y) decreases with x) Examples: Suppose in a simple linear regression model, x denote the income and y denotes the expenditure on food. It is observed that as the income increases, the expenditure on food increases because of the choice and varieties in food increase, in general, up to a certain extent. So the variance of observations on y will not remain constant as income changes. The assumption of homoscedasticity implies that the consumption pattern of food will remain the same irrespective of the income of the person.
    [Show full text]
  • Assumptions of Multiple Linear Regression
    Statistics Solutions Advancement Through Clarity http://www.statisticssolutions.com Assumptions of Multiple Linear Regression Multiple linear regression analysis makes several key assumptions: Linear relationship Multivariate normality No or little multicollinearity No auto-correlation Homoscedasticity Multiple linear regression needs at least 3 variables of metric (ratio or interval) scale. A rule of thumb for the sample size is that regression analysis requires at least 20 cases per independent variable in the analysis, in the simplest case of having just two independent variables that requires n > 40. G*Power can also be used to calculate a more exact, appropriate sample size. Firstly, multiple linear regression needs the relationship between the independent and dependent variables to be linear. It is also important to check for outliers since multiple linear regression is sensitive to outlier effects. The linearity assumption can best be tested with scatter plots, the following two examples depict two cases, where no and little linearity is present. 1 / 5 Statistics Solutions Advancement Through Clarity http://www.statisticssolutions.com Secondly, the multiple linear regression analysis requires all variables to be normal. This assumption can best be checked with a histogram and a fitted normal curve or a Q-Q-Plot. Normality can be checked with a goodness of fit test, e.g., the Kolmogorov-Smirnof test. When the data is not normally distributed a non-linear transformation, e.g., log-transformation might fix this issue. However it can introduce effects of multicollinearity. Thirdly, multiple linear regression assumes that there is little or no multicollinearity in the data. Multicollinearity occurs when the independent variables are not independent from each other.
    [Show full text]
  • Simple Linear Regression 80 60 Rating 40 20
    Simple Linear 12 Regression Material from Devore’s book (Ed 8), and Cengagebrain.com Simple Linear Regression 80 60 Rating 40 20 0 5 10 15 Sugar 2 Simple Linear Regression 80 60 Rating 40 20 0 5 10 15 Sugar 3 Simple Linear Regression 80 60 Rating 40 xx 20 0 5 10 15 Sugar 4 The Simple Linear Regression Model The simplest deterministic mathematical relationship between two variables x and y is a linear relationship: y = β0 + β1x. The objective of this section is to develop an equivalent linear probabilistic model. If the two (random) variables are probabilistically related, then for a fixed value of x, there is uncertainty in the value of the second variable. So we assume Y = β0 + β1x + ε, where ε is a random variable. 2 variables are related linearly “on average” if for fixed x the actual value of Y differs from its expected value by a random amount (i.e. there is random error). 5 A Linear Probabilistic Model Definition The Simple Linear Regression Model 2 There are parameters β0, β1, and σ , such that for any fixed value of the independent variable x, the dependent variable is a random variable related to x through the model equation Y = β0 + β1x + ε The quantity ε in the model equation is the “error” -- a random variable, assumed to be symmetrically distributed with 2 2 E(ε) = 0 and V(ε) = σ ε = σ (no assumption made about the distribution of ε, yet) 6 A Linear Probabilistic Model X: the independent, predictor, or explanatory variable (usually known).
    [Show full text]
  • Discriminatory Accuracy of Serological Tests for Detecting Trypanosoma
    107 Rev. Fac. Med. 2020 Vol. 68 No. 1: 107-16 ORIGINAL RESEARCH DOI: http://dx.doi.org/10.15446/revfacmed.v68n1.71092 Received: 18/03/2018 Accepted: 27/06/2018 Revista de la Facultad de Medicina Discriminatory accuracy of serological tests for detecting Trypanosoma cruzi using the ROC curve and the standard methodology Precisión discriminatoria de pruebas serológicas para Trypanosoma cruzi aplicando metodología estándar y de curva COR Marvin Querales1,2, Ana Rita De Lima2,3, Harú Martínez-de Cordero4, Domenica Carolina Cannova5, María Isabel Simons5, Luis Pérez-Ybarra6 1 Universidad de Carabobo - Faculty of Health Sciences - School of Biomedical and Technological Sciences - Department of Biochemistry - Valencia - Venezuela. 2 Universidad de Carabobo - Faculty of Health Sciences - Instituto de Biología Molecular de Parásitos (BioMolP) - Protozoology Laboratory - Valencia - Venezuela. 3 Universidad de Carabobo - Faculty of Health Sciences - Bioanalysis School - Department of Morphophysiology - Valencia - Venezuela. 4 Universidad Central de Venezuela - Faculty of Agronomy - Department of Agricultural Engineering - Maracay - Venezuela. 5 Universidad de Carabobo - Faculty of Health Sciences - School of Biomedical and Technological Sciences - Department of Parasitology - Leishmaniasis and Entomology Laboratory - Valencia - Venezuela. 6 Universidad de Carabobo - Faculty of Health Sciences - School of Bioanalysis - Department of Basic Sciences - Maracay - Venezuela. Corresponding author: Luis Pérez-Ybarra. Departamento de Ciencias Básicas, Escuela de Bioanálisis, Facultad de Ciencias de la Salud, Universidad de Carabobo. Campus La Morita, Núcleo Aragua, módulo A. Telephone number: +58 241 6004000, ext.: 404241. Maracay. Venezuela. Email: [email protected]. Abstract Introduction: Serological tests are used to confirmTrypanosoma cruzi infection and their discrim- Querales M, De Lima AR, Martínez-de inatory accuracy depends on the established decision threshold.
    [Show full text]