The Heteroskedasticity Tests Implementation for Linear Regression Model Using MATLAB

https://doi.org/10.31449/inf.v42i4.1862 Informatica 42 (2018) 545–553 545 The Heteroskedasticity Tests Implementation for Linear Regression Model Using MATLAB Lуudmyla Malyarets, Katerina Kovaleva, Irina Lebedeva, Ievgeniia Misiura and Oleksandr Dorokhov Simon Kuznets Kharkiv National University of Economics, Nauky Avenue, 9-A, Kharkiv, Ukraine, 61166 E-mail: [email protected] http://www.hneu.edu.ua/ Keywords: regression model, homoskedasticity, testing for heteroskedasticity, software environment MATLAB Received: September 23, 2017 The article discusses the problem of heteroskedasticity, which can arise in the process of calculating econometric models of large dimension and ways to overcome it. Heteroskedasticity distorts the value of the true standard deviation of the prediction errors. This can be accompanied by both an increase and a decrease in the confidence interval. We gave the principles of implementing the most common tests that are used to detect heteroskedasticity in constructing linear regression models, and compared their sensitivity. One of the achievements of this paper is that real empirical data are used to test for heteroskedasticity. The aim of the article is to propose a MATLAB implementation of many tests used for checking the heteroskedasticity in multifactor regression models. To this purpose we modified few open algorithms of the implementation of known tests on heteroskedasticity. Experimental studies for validation the proposed programs were carried out for various linear regression models. The models used for comparison are models of the Department of Higher Mathematics and Mathematical Methods in Economy of Simon Kuznets Kharkiv National University of Economics and econometric models which were published recently by leading journals. Pozvetek: Avrorji prispevka se ukvarjajo s problemi ekonometričnih modelov z veliko dimenzijami, kjer je izračun problematičen. Razvijejo metodo v MATLABu za multifaktorske regresijske modele. 1 Introduction coefficients, b0 , bj B ; i are the residuals (model In econometrics, a linear regression model is often used to describe different processes and phenomena. Using errors). matrix notation, the linear model regression can be given An error term is introduced in a regression model as: because the model does not fully represent the actual relationship between the variables of the model. As a result of this incomplete relationship, there are Y = XB + (1) differences between the observed responses (values of the variable being predicted) in the given dataset and where Y and are n1 matrices, X is n(m +1) , those predicted by a linear function of a set of and B is (m +1)1; n is the number of measurements explanatory variables. The error term is the amount at (sample size); m is the number of independent variables which the equation may differ from measurements. In in the regression model. other words that is the ‘white noise’. As a rule, the building a linear regression model is For the ith row of (the ith observation) the linear done by the method of ordinary least squares (OLS). This regression model can be written as follows: method for estimating the unknown parameters is based on the minimization of the sum of the squares of the (2) yi = b0 +b1xi1 +b2xi2 +...+bmxim +i model errors. The estimators of model parameters determined by OLS are known as best linear unbiased estimators (BLUE). The variances of the model where yi are the values of the dependent variable, parameters are determined by: y Y ; i is the experiment identification number, i n 2 i =1, n ; xij are the values of the independent variable 2 i=1 i 2 (3) S = z jj = e z jj th b j n − m −1 х j X ( j =1, m ) in the i experiment; b0 is the constant term of the equation; b j are the regression 546 Informatica 42 (2018) 545–553 L. Malyarets et al. −1 For these reasons, all the conclusions obtained on the where z jj is the diagonal element of matrix Z = (X'X) basis of the corresponding t − statistics and F − statistics, which corresponds to the parameter ; is the b j e as well as interval estimates, will be unreliable. standard error. A unified approach to the estimation of The OLS application requires the realization of a heteroscedasticity is lacking. To solve this problem, a number of conditions [1 – 3]. Only if these conditions are large number of different tests and criteria have been met, the estimates calculated by such a model will be developed: the Spearman rank correlation test, the Park unbiased, efficient and well-off. These conditions are test, the Glaser test, the Goldfeld ‒ Quandt test, formulated in the form of the Gauss ‒ Markov theorem. the Breusch – Pagan test, the Leven's test, the White test, According to this theorem there are four principal and so on. assumptions which admit the using of linear regression The application of all the above tests is very difficult models for research and prediction. One of them is the for the so-called ‘manual’ account, and for a large set of homoskedasticity (constant variance) of the errors in initial data it is completely impossible. relation to any independent variable. There are a lot of software with which you can Homoskedasticity makes the assumption that the identify heteroscedasticity. These are professional errors have a constant variance: var() = const and packages (SAS, BMDP), universal packages (STADIA, OLIMP, STATGRAPHICS, STATISTICA, SPSS) and independent of causal variables: cov(x j , ) = 0 for all j , specialized packages (DATASCOPE, BIOSTAT, j =1, m . The error is a random variable distributed MESOSAUR). When using economic data researcher can face two according to the normal law: ~ Ν(0, 2 ) where the main problems. Firstly, all the listed software are quite mathematical expectation of the error term is zero and expensive and price of the product may be an the variance is constant. Failure to comply with this insurmountable barrier for the young researcher. requirement leads to bias in the estimates obtained using Secondly, company-developer never provides the source such a regression model. Thus [4] indicate that code, considering that this is not necessary for an estimation uncertainty may increase dramatically in the ordinary user. Therefore, we can not modify the built-in presence of conditional heteroskedasticity. algorithms to detect and eliminate heterosquadity. The requirement of homoskedasticity also exists in Another drawback of the above program products is the construction of the econometric model using the the outdated conceptual approaches to econometric maximum likelihood method [5 – 7]. methods, which are constantly being improved. When the scatter of the errors is different, varying For example, the program products SPP and depending on the value of one or more of the MICROSTAT calculate the coefficient of multiple independent variables, the error terms are correlations as the square root of the coefficient of heteroskedastic. Namely the distribution law of errors determination. STATGRAPHICS calculates it as the remains normal with a mathematical expectation equal to square root of the adjusted coefficient of determination zero, but the errors of the model are a function of the [13]. While in theory the coefficients of multiple values of the independent variables: ~ Ν(0, f (X)) , correlations is estimated using elements of the correlation where f (X) is a function that describes the change in matrix [2]. the variance of errors as a function of the values of the Another important aspect that should be taken into independent variables. account is the existence of different algorithms to A similar problem arises during the building of identify heteroskedasticity and the specific problem of semiparametric [8 – 10] and nonparametric [11, 12] division by zero [14]. models. Ideal option would be to create your own software Heteroskedasticity makes difficult to gauge the true product that would take into account the research tasks. standard deviation of the forecast errors. The OLS However, to write such a program, the economist estimates are no longer BLUE. Thus, if the variance of should be an expert in algorithmic programming. But this the errors is increasing over time, confidence intervals happens rarely. for out-of-sample predictions will tend to be In this article, we carry out a comparative analysis of unrealistically narrow. In particular, heteroscedasticity the tests most often used to detect heteroskedasticity [1, does not allow us to use equation 3 for the computation 2, 14] and give their source code. The use of program of S , since it assumes a uniform dispersion of the code allows you to modify the program in accordance b j with the objectives of the study. errors. Under heteroskedasticity, the sample variance of OLS estimator is ˆ 2 −1 −1 2 Analysis of literary data and the Var(b j ) = e (X'X) X'X(X'X) (4) formulation of the problem where Ω is the covariance matrix, the elements of which Before starting the construction of the regression model, are defined as the variance of the model parameters. it is necessary to verify whether the conditions of the Under homoskedasticity,_Ω= I. Equation 4 is correct if Gauss-Markov theorem are fulfilled. there is no autocorrelation. The Heteroskedasticity Tests Implementation for Linear... Informatica 42 (2018) 545–553 547 One of the main methods of preliminary research on It should be noted that MATLAB does not contain heteroskedasticity is a visual analysis of the graph of ready-made software implementation to verify residues. On these graphs, the scattering of points can compliance of homoskedasticity. We chose it as a vary depending on the value of the independent variables programming environment. For this purpose, other [14, 15]. programming environments can also be used, for To estimate heteroskedasticity, are used such example, a the free software environment R. quantitative tests [15 ‒ 17] as the White test, the Goldfeld The authors have chosen MATLAB by the following ‒ Quandt test, the Breusch ‒ Pagan test, the Park test, the reasons.

The Heteroskedasticity Tests Implementation for Linear Regression Model Using MATLAB

05 36534Nys130620 31

Business Economics Paper No. : 8, Fundamentals of Econometrics Module No. : 15, Heteroscedasticity Detection

Graduate Econometrics Review

Regression: an Introduction to Econometrics

Regression Project

Inary Least Squares Regression with Raw Cost – OLS Log Transformed Cost – GLM Model (Gamma Regression)

Comparison of Different Tests for Detecting Heteroscedasticity in Datasets

Classical Linear Regression Model: Assumptions and Diagnostic Tests

A Study on the Violation of Homoskedasticity Assumption in Linear Regression Models االنحذار الخطي ت التباي

Syllabus SBNM 5220 Econometrics North Park University Course Credit: 2 Semester Hours

Models for Health Care

Are Amenities Important for the Migration of Highly Educated Workers? the Role of Built- Amenities in the Migration of Highly Educated Workers