Correlation Analysis to Identify the Effective Data in Machine Learning: Prediction of Depressive Disorder and Emotion States

Total Page:16

File Type:pdf, Size:1020Kb

Correlation Analysis to Identify the Effective Data in Machine Learning: Prediction of Depressive Disorder and Emotion States International Journal of Environmental Research and Public Health Article Correlation Analysis to Identify the Effective Data in Machine Learning: Prediction of Depressive Disorder and Emotion States Sunil Kumar and Ilyoung Chong * Department of Information and Communications Engineering, Hankuk University of Foreign Studies, Seoul 02450, Korea; [email protected] * Correspondence: [email protected]; Tel.: +82-10-3305-5904 Received: 31 October 2018; Accepted: 14 December 2018; Published: 19 December 2018 Abstract: Correlation analysis is an extensively used technique that identifies interesting relationships in data. These relationships help us realize the relevance of attributes with respect to the target class to be predicted. This study has exploited correlation analysis and machine learning-based approaches to identify relevant attributes in the dataset which have a significant impact on classifying a patient’s mental health status. For mental health situations, correlation analysis has been performed in Weka, which involves a dataset of depressive disorder symptoms and situations based on weather conditions, as well as emotion classification based on physiological sensor readings. Pearson’s product moment correlation and other different classification algorithms have been utilized for this analysis. The results show interesting correlations in weather attributes for bipolar patients, as well as in features extracted from physiological data for emotional states. Keywords: correlation analysis; health care; machine learning; data analytics 1. Introduction In order to analyze the causality in a dataset, various techniques can be used. One such technique in the domain of data analysis is frequent pattern mining. In this case, two types of analysis are widely used: identification of association rules and correlation analysis. In association rule mining, frequent pattern analysis is performed using the support count and confidence measures. An association rule will be considered strong if its support count and confidence are above some defined threshold. However, the identified rules are sometimes irrelevant and can often be misleading. This limitation can be handled by correlation analysis, which is used to determine the strength of a relationship between two item sets [1]. In this case, the strength can be identified based on three criteria: direction, form, and dispersion strength, as shown in Figure1. The limitation of this analysis is that in the case of two independent variables, it cannot determine the causality between the two. That is, we will not have any idea which variable is affecting and which one is getting affected. Another situation that can cause distort the correlation and association results is the confounding of another attribute [2]. If this attribute is not considered in the dataset, then its effect may not be highlighted in the experimentation and thus leading to spurious analyzed correlations. In this regard, determining predictor attributes plays a very critical role, as we try to include as many relevant attributes as possible, to avoid such circumstances. Correlation analysis is a widely used statistical measure through which different studies have efficiently identified interesting collinear relations among different attributes of datasets. This study has employed correlation analysis to identify such attributes which strongly affect depressive disorder severity and emotional states. Int. J. Environ. Res. Public Health 2018, 15, 2907; doi:10.3390/ijerph15122907 www.mdpi.com/journal/ijerph Int. J. Environ. Res. Public Health 2018, 15, 2907 2 of 24 Int. J. Environ. Res. Public Health 2018, 15, x 2 of 25 Figure 1. Correlation based on direction, form, and dispersion strength. There are are several several fac factorstors that that have have been been found found leading leading to a higher to a highernumber number of depressive of depressive disorder disordercases in casesSouth in Korea South Korea[3–5], [including3–5], including work work pressure pressure and and overtime, overtime, loneliness, loneliness, social social relations, relations, socioeconomicsocioeconomic status, etc. However, the effect of weather conditions, which may have a considerable impact on on depressive depressive disorder, disorder, has has often often been been ove overlookedrlooked in this in this region. region. With With respect respect to emotion to emotion data dataanalysis, analysis, the themost most commonly commonly-used-used methodologies methodologies in inpractic practicalal environments environments and and research approaches involveinvolve emotion emotion detection detection through through facial facial expressions, expressions, physiological physiological sensor sensor signals, signals, speech recognitionspeech recognition and analysis, and analysis, and text and analysis text analysis [6]. The [6] former. The former two techniques two techniques are more are commonlymore commonly used, becauseused, because facial expressionsfacial expressions and physiological and physiological responses responses are more are closely more related closely to related one’s emotional to one’s reactions,emotional whereas reactions limited, whereas information limited information can be extracted can be regarding extracted emotional regarding reactions emotional in the reactions latter two in techniquesthe latter two [7,8 techniques]. Besides, in[7, the8]. Besides, latter two in approaches, the latter two the approaches, responses of the emotional responses reactions of emotional can be manipulated,reactions can be hence manip resultingulated, inhe morence resulting noisy data in more and obstructingnoisy data and analysis obstructing [9,10]. Thisanalysis can [9,10] also be. This the casecan also with be facial the expressions, case with facial where expressions we have a lot, where of noisy we data, have as a compared lot of noisy to physiological data, as compared responses. to Inphysiological this work, we responses have considered. In this work, the first we twohave techniques considered for the identifying first two techniques a patient’s emotionalfor identifying state. a patient’sThis emotional work considers state. aspects of healthcare service provisioning using data analytics. Therefore, differentThis challenges work considers are addressed aspects andof healthcare overcome service in this study.provisioning One of theusing essential data analytics. problems Therefore, addressed bydifferent this work challenges is the processare addressed of identifying and overcome information in this that study. has a strongOne of impact the essential on depression problems for specifiedaddressed patients. by this Consideringwork is the differentprocess of studies identifying that have information identified that different has a types strong of depressiveimpact on disorderdepression situations, for specified each patients. patient Considering shows different different behavioral studies that trends have based identified on his different or her types specific of depressive disorder situations, type and itseach symptoms. patient sho Asws there different are differentbehavioral data trends sources based through on his whichor her wespecific can analyzedepressive behavioral disorder trends type (facialand its expressions, symptoms. physiologicalAs there are different sensor signals, data sources speech analysis,through textwhich analysis, we can self-reporting analyze behavioral questionnaires, trends (facial etc.), thereexpressions, is a strong physiological need for mechanisms sensor signals, that identifyspeech suchanalysis, parameters text analysis, with strongself-reporting relationships questionnaires, specific to etc.), certain there depressive is a strong disorder need for patients. mechanisms Another that challengeidentify such is identifying parameters prominent with strong and relationships persistent relationships specific to overcertain longer depressive periods disorder of time. patients. As there canAnother be different challenge levels is identifying of uncertainty prominent in the patternsand pers ofistent acquired relationships data (such over as longer high and periods low levels of time. of moodAs there in bipolarcan be different disorder), levels longer of periods uncertainty of data in arethe requiredpatterns toof analyzeacquired the data trends. (such A as good high prediction and low modellevels of requires mood in enough bipolar data disorder), readings longer to acquire periods sufficiently of data are steady required trends to analyze to have the reliable trends. prediction A good results.prediction Further, model a systemrequires is enough required data for on-demandreadings to acquire data acquisition sufficiently from steady different trends data to sources have reliable based onprediction the specific results. user Further, context. a Based system on is the required above discussion,for on-demand this data study acquisition aims to achieve from different the following data objectives:sources based on the specific user context. Based on the above discussion, this study aims to achieve the following objectives: • Analyzing the impact of weather on two types of depressive disorder; Bipolar and • MelancholiaAnalyzing the disorder. impact of weather on two types of depressive disorder; Bipolar and Melancholia • Analyzingdisorder. the impact of physiological data on emotions, as well as identifying patient’s health • status,Analyzing using the machine impact learningof physiological with strong data correlated on emotions,
Recommended publications
  • 2016 Gao, P. and Zhang, L., Determining Spurious Correlation
    The Professional Geographer ISSN: 0033-0124 (Print) 1467-9272 (Online) Journal homepage: http://www.tandfonline.com/loi/rtpg20 Determining Spurious Correlation between Two Variables with Common Elements: Event Area- Weighted Suspended Sediment Yield and Event Mean Runoff Depth Peng Gao & Lianjun Zhang To cite this article: Peng Gao & Lianjun Zhang (2016) Determining Spurious Correlation between Two Variables with Common Elements: Event Area-Weighted Suspended Sediment Yield and Event Mean Runoff Depth, The Professional Geographer, 68:2, 261-270, DOI: 10.1080/00330124.2015.1065548 To link to this article: http://dx.doi.org/10.1080/00330124.2015.1065548 Published online: 20 Aug 2015. Submit your article to this journal Article views: 25 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=rtpg20 Download by: [Syracuse University Library] Date: 17 March 2016, At: 05:36 Determining Spurious Correlation between Two Variables with Common Elements: Event Area-Weighted Suspended Sediment Yield and Event Mean Runoff Depth Peng Gao Syracuse University Lianjun Zhang State University of New York, Syracuse Spurious correlation is a classic statistical pitfall pervasive to many disciplines including geography. Although methods of calculating the spurious correlation between two variables possessing a common element in the form of sum, ratio, or product have been developed for a long time, controversial assertions on whether the spurious correlation should be treated or ignored are still prevalent. In this study, we examined this well-known but intriguing issue using the data representing two nonindependent variables, event area-weighted suspended sediment yield (SSYe) and event mean runoff depth (h).
    [Show full text]
  • A COMPARISON of CAUSALITY TESTS APPLIED to the BILATERAL RELATIONSHIP BETWEEN CONSUMPTION and GDP in the USA and MEXICO GUISAN, M.Carmen*
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Research Papers in Economics International Journal of Applied Econometrics and Quantitative Studies. Vol.1-1(2004) A COMPARISON OF CAUSALITY TESTS APPLIED TO THE BILATERAL RELATIONSHIP BETWEEN CONSUMPTION AND GDP IN THE USA AND MEXICO GUISAN, M.Carmen* Abstract Bilateral causality between Private Consumption and GDP in the USA and Mexico during the period 1960-2002 is analysed comparing Granger´s test, a modified version of Granger´s test, cointegration, TSLS and Hausman´s causality test. The main conclusion is that the modified version of Granger´s test performs better than the unmodified version, and also better than cointegration, to accept lagged causality in the Consumption equation. Besides that Hausman´s test is very often more useful to distinguish between bilateral and unilateral contemporaneous causality than TSLS. We conclude that there is a high degree of causal dependence of Private Consumption on GDP and a lower dependence in the case of the reverse relation. JEL classification: C5, C52, E2, O51, O57 Key words: Granger´s test, causality test, Mexico, the USA, Consumption and Gross Domestic Product, Consumption Models 1.- Introduction In this study we analyse bilateral causality between Private and Total Consumption and GDP in Mexico and the USA with the following methods: Granger´s causality test, modified version of Granger´s causality test proposed by Guisan(2003), cointegration, Two Stage Least Squares, and Hausman´s test. * Maria-Carmen Guisan is Professor of Econometrics and Director of Master in International Sectoral Economics at the University of Santiago de Compostela (Spain), http://www.usc.es/economet/guisan.htm, e-mail [email protected] 115 Guisan, M.C.
    [Show full text]
  • Revisiting the Effect of Food Aid on Conflict
    WPS8171 Policy Research Working Paper 8171 Public Disclosure Authorized Revisiting the Effect of Food Aid on Conflict Public Disclosure Authorized A Methodological Caution Paul Christian Christopher B. Barrett Public Disclosure Authorized Public Disclosure Authorized Development Research Group Impact Evaluation Team August 2017 Policy Research Working Paper 8171 Abstract A popular identification strategy in non-experimental this strategy is susceptible to bias arising from spurious panel data uses instrumental variables constructed by trends. Re-randomization and Monte Carlo simulations interacting exogenous but potentially spurious time series show that the strategy identifies a spurious relationship or spatial variables with endogenous exposure variables to even when the true effect could be non-causal or causal generate identifying variation through assumptions like in the opposite direction, invalidating the claim that aid those of differences-in-differences estimators. Revisiting a causes conflict and providing a caution for similar strategies. celebrated study linking food aid and conflict shows that This paper is a product of the Impact Evaluation Team, Development Research Group. It is part of a larger effort by the World Bank to provide open access to its research and make a contribution to development policy discussions around the world. Policy Research Working Papers are also posted on the Web at http://econ.worldbank.org. The authors may be contacted at [email protected]. The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about development issues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished.
    [Show full text]
  • 2010-06 Spurious Long-Horizon Regression in Econometrics* Antonio E
    Banco de M¶exico Documentos de Investigaci¶on Banco de M¶exico Working Papers N± 2010-06 Spurious Long-Horizon Regression in Econometrics Antonio E. Noriega Daniel Ventosa-Santaul`aria Banco de M¶exico Universidad de Guanajuato Universidad de Guanajuato June 2010 La serie de Documentos de Investigaci¶ondel Banco de M¶exicodivulga resultados preliminares de trabajos de investigaci¶onecon¶omicarealizados en el Banco de M¶exicocon la ¯nalidad de propiciar el intercambio y debate de ideas. El contenido de los Documentos de Investigaci¶on,as¶³como las conclusiones que de ellos se derivan, son responsabilidad exclusiva de los autores y no reflejan necesariamente las del Banco de M¶exico. The Working Papers series of Banco de M¶exicodisseminates preliminary results of economic research conducted at Banco de M¶exicoin order to promote the exchange and debate of ideas. The views and conclusions presented in the Working Papers are exclusively the responsibility of the authors and do not necessarily reflect those of Banco de M¶exico. Documento de Investigaci¶on Working Paper 2010-06 2010-06 Spurious Long-Horizon Regression in Econometrics* Antonio E. Noriegay Daniel Ventosa-Santaul`ariaz Banco de M¶exico Universidad de Guanajuato Universidad de Guanajuato Abstract This paper extends recent research on the behaviour of the t-statistic in a long-horizon re- gression (LHR). We assume that the explanatory and dependent variables are generated according to the following models: a linear trend stationary process, a broken trend station- ary process, a unit root process, and a process with a double unit root. We show that, both asymptotically and in ¯nite samples, the presence of spurious LHR depends on the assumed model for the variables.
    [Show full text]
  • Spurious Regression and Econometric Trends1
    Spurious regression and econometric trends1 Antonio E. Noriega Dirección General de Investigación Económica, Banco de México Daniel Ventosa-Santaulària2 Escuela de Economía, Universidad de Guanajuato JEL Classification: C12 ,C13, C22 Keywords: Spurious regression, trends, unit roots, trend stationarity, structural breaks Abril de 2006 Documento de Investigación No. 2006-05 Dirección de Investigación Económica Banco de México 1 The authors thank Carlos Capistrán and Daniel Chiquiar for their comments. The opin- ions in this paper correspond to the authors and do not necessarily reflect the point of view of Banco de México. Correspondence: [email protected] 2 Escuela de Economia, Universidad de Guanajuato. UCEA, Campus Marfil, Fracc. I, El Establo, Guanajuato, Gto. 36250, México. Phone and Fax +52-(473)7352925, e-mail: [email protected] Abstract This paper analyses the asymptotic and finite sample implications of different types of nonstationary behavior among the dependent and explanatory variables in a linear spurious regression model. We study cases when the nonstationarity in the dependent and explanatory variables is deterministic as well as stochas- tic. In particular, we derive the order in probability of the t statistic in a lin- ear regression equation under a variety of empirically relevant− data generation processes, and show that the spurious regression phenomenon is present in all cases considered, when at least one of the variables behaves in a nonstationary way. Simulation experiments confirm our asymptotic results. Keywords: Spurious regression, trends, unit roots, trend stationarity, struc- tural breaks JEL Classification: C12 ,C13, C22 Resumen Este documento analiza las implicaciones asintóticas y de muestras finitas de diferentes tipos de no-estacionariedad entre la variable dependiente y la ex- plicativa en un modelo lineal de regresión espuria.
    [Show full text]
  • Spurious Regressions
    spurious regressions Clive W. J. Granger From The New Palgrave Dictionary of Economics, Second Edition, 2008 Edited by Steven N. Durlauf and Lawrence E. Blume Abstract Macmillan. Simulations have shown that if two independent time series, each being highly autocorrelated, are put into a standard regression framework, then the usual measures Palgrave of goodness of fit, such as t and R-squared statistics, will be badly biased and the series will appear to be ‘related’. This possibility of a ‘spurious relationship’ between Licensee: variables in economics, particularly in macroeconomics and finance, restrains the form of model that can be used. An error-correction model will provide a solution in some cases. permission. without Keywords distribute autocorrelation; Durbin–Watson statistic; econometrics; ordinary least squares (OLS); or regression; serial correlation; spurious regression; Weiner process copy not JEL classifications may You C1 Article For the first three-quarters of the 20th century the main workhorse of applied econometrics was the basic regression www.dictionaryofeconomics.com. Yt ¼ a þ bXt þ et: ð1Þ Here the variables are indicated as being measured over time, but could be over a Economics. cross-section; and the equation was estimated by ordinary least squares (OLS). In of practice more than one explanatory variable x would be likely to be used, but the form (1) is sufficient for this discussion. Various statistics can be used to describe the Dictionary quality of the regression, including R2, t-statistics for β, and Durbin–Watson statistic d which relates to any autocorrelation in the residuals. A good fitting model should Palgrave have jtj near 2, R2 quite near one, and d near 2.
    [Show full text]
  • Cointegration Versus Spurious Regression in Heterogeneous Panels
    Cointegration versus Spurious Regression in Heterogeneous Panels Lorenzo Trapani Cass Business School and Bergamo University Giovanni Urga∗ Cass Business School This Version: 27 January 2004 ∗Corresponding author: Faculty of Finance, Cass Business School, 106 Bunhill Row, London EC1Y 8TZ Tel.+/44/(0)20/70408698; Fax.+/44/(0)20/70408885; e-mail: [email protected]. http://www.staff.city.ac.uk/~giourga. We wish to thank Stepana Lazarova for useful discussions and comments. The usual disclaimer applies. Lorenzo Trapani wishes to thank Marie Curie Training Site for financial support (grant N. HPMT- CT-2001-00330). 1 Cointegration versus Spurious Regression in Heterogeneous Panels Abstract We consider the issue of cross sectional aggregation in nonstationary, heterogeneous panels where each unit cointegrates. We first derive the as- ymptotic properties of the aggregate estimate, and a necessary and sufficient condition for cointegration to hold in the aggregate relationship. We also develop an estimation and testing framework to verify whether the condi- tion is met. Secondly, we analyze the case when cointegration doesn’t carry through the aggregation process, investigating whether a mild violation can still lead to an aggregate estimator that summarizes the micro relationships reasonably well. We derive the asymptotic measure of the degree of non coin- tegration of the aggregated estimate and we provide estimation and testing procedures. A Monte Carlo exercise evaluates the small sample properties of the estimator. J.E.L. Classification Numbers: C12, C13, C23 Keywords: Aggregation, Cointegration, Heterogeneous Panel, Monte Carlo Simulation. 2 1INTRODUCTION The effects of cross-sectional aggregation in panel data model have been ex- plored by several contributions in the econometric literature.
    [Show full text]
  • Is the Phillips Curve Dead? a Vector Error Correction Model
    ISSN: 2278-3369 International Journal of Advances in Management and Economics Available online at www.managementjournal.info RESEARCH ARTICLE The Dynamic Phillips Curve Revisited: An Error Correction Model Zhao Y, Raehsler R, Sohng S, Woodburne P* Clarion University of Pennsylvania, Clarion, Pennsylvania. *Corresponding Author: Email: [email protected] Abstract In this paper, we apply different unit root tests on five macroeconomic variables. Nearly all our data exhibit unit root phenomenon, confirming results well-known in the literature. Co-integration tests indicate that there exists one set of co-integration relation in the unemployment- inflation equation. The VAR error correction model explains the expected short-term behavior of changes in unemployment rate on the inflation rate on inflation two years later. Keywords: Correction model, Phillips Curve, Vector autoregression. Introduction In 1958, economist A.W. Phillips [1] used British Data and Unit Root Tests data from 1861 to 1957 to examine the Granger and Newbold [5] found a significant but relationship between nominal wage growth and spurious relationship in their regression model unemployment rate. His results were the now using levels. Such models reveal a high R famous finding of a stable inverse relationship squared, significant t statistics and a low Durbin- between the two variables. Later economists used Watson statistic. The spurious regression inflation in place of nominal wage growth to re- indicates the t statistic is unusually large. examine the relationship and again found a stable Granger and Newbold suggest using t=11.2 inverse relationship. The relationship appeared instead of t=1.96. Correct regression on levels to break down during the stagflation of the 1970’s, requires examination of the stationary property of when both inflation and unemployment rate variables involved.
    [Show full text]
  • An Error Correction∗
    Time Series Analysis and Spurious Regression: An Error Correction∗ Peter K. Enns Takaaki Masaki Nathan J. Kelly [email protected] [email protected] [email protected] Cornell University Cornell University University of Tennessee September 25, 2014 Abstract: For scholars interested in longitudinal analysis, the single-equation (i.e., one way) error correction model (ECM) has become an important analytic tool. The popularity of this method has stemmed, in part, from evidence that the ECM can be applied to both stationary and nonstationary data (e.g., De Boef and Keele, 2008). We highlight an under- appreciated aspect of the ECM|when data are integrated or near-integrated, the ECM only avoids the spurious regression problem when (near-)cointegration is present. In fact, spurious regression results emerge close to 20 percent of the time when integrated or near integrated data that are not cointegrated are analyzed with an ECM. Given the near absence of cointegration tests in recent political science publications, we believe this is a fundamental point to emphasize. We also show, however, that when evidence of cointegration is present, the ECM performs well across a variety of data scenarios that political scientists are likely to encounter. ∗All files necessary to replicate our simulations and produce the figures and tables in this manuscript are available at: http://thedata.harvard.edu/dvn/dv/ECM. As political scientists, we are often interested in change. Why has inequality increased? Why do some authoritarian regimes persist and others transition to democracy? Why do attitudes toward the death penalty, immigration, or welfare vary over time? For researchers with an empirical approach to these types of political dynamics, the single equation (i.e., one way) error correction model (ECM) has become a popular analytic tool.
    [Show full text]
  • The Moderating Effect of Human Resource Management on the Relationship Between Ethical Leader and Employee Trust
    MASTER THESIS HUMAN RESOURCE STUDIES The moderating effect of Human Resource Management on the relationship between ethical leader and employee trust Student name : A.K.H.O. (Karim) Ressang Student ANR : 48 65 19 Name of supervisor : dr. K. Kalshoven Name 2nd reader : dr. R.S.M. de Reuver Project period : January – August 2014 Project theme : Interplay between HRM and Leadership 1 Table of contents Page Abstract 3 1. Introduction 4 2. Theoretical framework 6 2.1 Ethical leadership and employee trust 6 2.2 Employee cognitive and affective trust 8 2.3 HRM as moderator in ethical leadership and trust relationships 9 2.4 The mediating role of cognitive based trust 10 2.5 Moderating HRM and mediating cognitive trust 11 3. Methods 11 3.1 Research set-up 11 3.2 Procedure 12 3.3 Sample characteristics 13 3.4 Measures 14 3.5 Statistical analysis 17 4. Results 19 4.1 Correlations 19 4.2 Regression analysis 22 4.2.1 Further analysis 28 5. Discussion 28 5.1 Theoretical implications 29 5.2 Limitations and strengths 31 5.3 Practical implications 32 5.4 Conclusion 33 6. Reference 34 Appendix A: Further analysis 41 A1. Examination of partial correlation 41 A2. Testing the significance of difference between correlation coefficients 42 A3. Reviewing disconfirmation H2a, H2b and H4 42 Appendix B: Factor analysis 44 2 Abstract This research examines the link between ethical leadership, human resource management (HRM) and employee cognitive and affective trust in the leader. Drawing on the definition and description of how ethical leadership effectuates (Brown, Treviño & Harrison, 2005), leader trustworthiness (Mayer, Davis and Schoorman, 1995) and the idea of situational conditions inhibiting the influence of ethical leaders (Kerr & Jermier, 1978), in combination with research findings on cognitive trust and affective trust, this research expected that employee trust in the leader would be affected by developments in the domain of HRM.
    [Show full text]
  • 1 Spurious Regressions with Time-Series Data
    SPURIOUS REGRESSIONS WITH TIME-SERIES DATA: FURTHER ASYMPTOTIC RESULTS David E. A. Giles Department of Economics University of Victoria, B.C. Canada ABSTRACT A “spurious regression” is one in which the time-series variables are non-stationary and independent. It is well-known that in this context the OLS parameter estimates and the R2 converge to functionals of Brownian motions; the “t-ratios” diverge in distribution; and the Durbin-Watson statistic converges in probability to zero. We derive corresponding results for some common tests for the Normality and homoskedasticity of the errors in a spurious regression. Key Words Spurious regression; normality; homoskedasticity; asymptotic theory; unit roots Proposed Running Head SPURIOUS REGRESSIONS Mathematics Subject Classification Primary 62F05, 62J05, 91B84; Secondary 62M10, 62P20 Author contact David Giles, Department of Economics, University of Victoria, P.O. Box 1700 STN CSC, Victoria, B.C., Canada, V8W 2Y2 e-mail: [email protected]; FAX: (250) 721 6214; Phone: (250) 721 8540 1 1. INTRODUCTION Testing and allowing for non-stationary time-series data has been one of the major themes in econometrics over the past quarter-century or so. In their influential and relatively early contribution, Granger and Newbold (1974) drew our attention to some of the likely consequences of estimating a “spurious regression” model. They argued that the “levels” of many economic time- series are integrated or nearly so, and that if such data are used in a regression model a high R2- value is likely to be found even when the series are independent of each other. They also illustrated that the regression residuals are likely to be autocorrelated, as evidenced by a very low value for the Durbin-Watson (DW) statistic.
    [Show full text]
  • In This Chapter, We Cover Some More Advanced Topics in Time Series Econometrics. In
    d 7/14/99 8:36 PM Page 571 Chapter Eighteen Advanced Time Series Topics n this chapter, we cover some more advanced topics in time series econometrics. In Chapters 10, 11, and 12, we emphasized in several places that using time series data Iin regression analysis requires some care due to the trending, persistent nature of many economic time series. In addition to studying topics such as infinite distributed lag models and forecasting, we also discuss some recent advances in analyzing time series processes with unit roots. In Section 18.1, we describe infinite distributed lag models, which allow a change in an explanatory variable to affect all future values of the dependent variable. Conceptually, these models are straightforward extensions of the finite distributed lag models in Chapter 10; but estimating these models poses some interesting challenges. In Section 18.2, we show how to formally test for unit roots in a time series process. Recall from Chapter 11 that we excluded unit root processes to apply the usual asymp- totic theory. Because the presence of a unit root implies that a shock today has a long- lasting impact, determining whether a process has a unit root is of interest in its own right. We cover the notion of spurious regression between two time series processes, each of which has a unit root, in Section 18.3. The main result is that even if two unit root series are independent, it is quite likely that the regression of one on the other will yield a statistically significant t statistic.
    [Show full text]