A Non-Linear Granger-Causality Framework to Investigate Climate–Vegetation Dynamics
Total Page:16
File Type:pdf, Size:1020Kb
Geosci. Model Dev., 10, 1945–1960, 2017 www.geosci-model-dev.net/10/1945/2017/ doi:10.5194/gmd-10-1945-2017 © Author(s) 2017. CC Attribution 3.0 License. A non-linear Granger-causality framework to investigate climate–vegetation dynamics Christina Papagiannopoulou1, Diego G. Miralles2,3, Stijn Decubber1, Matthias Demuzere2, Niko E. C. Verhoest2, Wouter A. Dorigo4, and Willem Waegeman1 1Depart. of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, Ghent, Belgium 2Laboratory of Hydrology and Water Management, Ghent University, Ghent, Belgium 3Depart. of Earth Sciences, VU University Amsterdam, Amsterdam, the Netherlands 4Depart. of Geodesy and Geo-Information, Vienna University of Technology, Vienna, Austria Correspondence to: Christina Papagiannopoulou ([email protected]) Received: 11 October 2016 – Discussion started: 16 November 2016 Revised: 20 March 2017 – Accepted: 29 March 2017 – Published: 17 May 2017 Abstract. Satellite Earth observation has led to the creation in the global climate system. Plant life alters the characteris- of global climate data records of many important environ- tics of the atmosphere through the transfer of water vapour, mental and climatic variables. These come in the form of exchange of carbon dioxide, partition of surface net radia- multivariate time series with different spatial and temporal tion (e.g. albedo), and impacts on wind speed and direction resolutions. Data of this kind provide new means to fur- (Nemani et al., 2003; McPherson et al., 2007; Bonan, 2008; ther unravel the influence of climate on vegetation dynamics. Seddon et al., 2016; Papagiannopoulou et al., 2017). Because However, as advocated in this article, commonly used statis- of the strong two-way relationship between terrestrial vegeta- tical methods are often too simplistic to represent complex tion and climate variability, predictions of future climate can climate–vegetation relationships due to linearity assump- be improved through a better understanding of the vegetation tions. Therefore, as an extension of linear Granger-causality response to past climate variability. analysis, we present a novel non-linear framework consist- The current wealth of Earth observation data can be used ing of several components, such as data collection from vari- for this purpose. Nowadays, independent sensors on differ- ous databases, time series decomposition techniques, feature ent platforms collect optical, thermal, microwave, altimetry, construction methods, and predictive modelling by means and gravimetry information, and are used to monitor vege- of random forests. Experimental results on global data sets tation, soils, oceans, and atmosphere (e.g. Su et al., 2011; indicate that, with this framework, it is possible to detect Lettenmaier et al., 2015; McCabe et al., 2017). The longest non-linear patterns that are much less visible with traditional composite records of environmental and climatic variables Granger-causality methods. In addition, we discuss extensive already span up to 35 years, enabling the study of multi- experimental results that highlight the importance of consid- decadal climate–biosphere interactions. Simple correlation ering non-linear aspects of climate–vegetation dynamics. statistics and multilinear regressions using some of these data sets have led to important steps forward in understand- ing the links between vegetation and climate (e.g. Nemani et al., 2003; Barichivich et al., 2014; Wu et al., 2015). How- 1 Introduction ever, these methods in general are insufficient when it comes to assessing causality, particularly in systems like the land– Vegetation dynamics and the distribution of ecosystems are atmosphere continuum in which complex feedback mecha- largely driven by the availability of light, temperature, and nisms are involved. A commonly used alternative consists water; thus, they are mostly sensitive to climate conditions of Granger-causality modelling (Granger, 1969). Analyses of (Nemani et al., 2003; Seddon et al., 2016; Papagiannopoulou this kind have been applied in climate attribution studies to et al., 2017). Meanwhile, vegetation also plays a crucial role Published by Copernicus Publications on behalf of the European Geosciences Union. 1946 C. Papagiannopoulou et al.: A non-linear Granger-causality framework investigate the influence of one climatic variable on another, predictive causality, for which one attempts to forecast yt (at e.g. the Granger-causal effect of CO2 on global temperature the specific timestamp t) given the values of x and y in pre- (Triacca, 2005; Kodra et al., 2011; Attanasio, 2012), of veg- vious timestamps. Granger(1969) postulated that x causes y etation and snow coverage on temperature (Kaufmann et al., if the autoregressive forecast of y improves when informa- 2003), of sea surface temperatures on the North Atlantic Os- tion of x is taken into account. In order to make this defini- cillation (Mosedale et al., 2006), or of the El Niño–Southern tion more precise, it is important to introduce a performance Oscillation on the Indian monsoon (Mokhov et al., 2011). measure to evaluate the forecast. Below, we will work with Nonetheless, Granger causality should not be interpreted as the coefficient of determination R2, which is here defined as “real causality”; one assumes that a time series A Granger follows: causes a time series B if the past of A is helpful in predict- PN 2 RSS D C .yi −y Oi/ ing the future of B (see Sect. 2 for a more formal definition). R2.y;yO / D 1 − D 1 − i P 1 ; (1) TSS PN 2 However, the underlying statistical model that is commonly iDP C1.yi −y/ N considered in such a context is a linear vector autoregressive where y represents the observed time series, yN is the mean of model, which is (again), by definition, linear; see, e.g. Shahin this time series, yO is the predicted time series obtained from et al.(2014); Chapman et al.(2015). a given forecasting model, and P is the length of the lag-time In this article, we show new experimental evidence that moving window. Therefore, the R2 can be interpreted as the advocates the need non-linear methods to study climate– fraction of explained variance by the forecasting model, and vegetation dynamics due to the non-linear nature of these it increases when the performance of the model increases, interactions (Foley et al., 1998; Zeng et al., 2002; Verbesselt reaching the theoretical optimum of 1 for an error-free fore- et al., 2016). To this end, we have assembled a large, compre- cast and being negative when the predictions are less repre- hensive database, comprising various global data sets of tem- sentative of the observations than the mean of the observa- perature, radiation, and precipitation, originating from mul- tions. Using R2, one can now define Granger causality in a tiple online resources. We use the Normalized Difference more formal way. Vegetation Index (NDVI) to characterize vegetation, which Definition 1. We say that time series x Granger causes is commonly used as a proxy of plant productivity (Myneni 2 y if R .y;yO / increases when xt−1;xt−2;:::;xt−P are in- et al., 1997; Nemani et al., 2003). We followed an inclu- cluded in the prediction of y , in contrast to considering sive data collection approach, aiming to consider all available t yt−1;yt−2;:::;yt−P only, where P is the lag-time moving data sets with a worldwide coverage, and at least a 30-year window. time span and monthly temporal resolution (Sect.3). Our In climate sciences, linear vector autoregressive (VAR) novel non-linear Granger-causality framework is used for models are often employed to make forecasts (Stock and finding climatic drivers of vegetation and consists of several Watson, 2001; Triacca, 2005; Kodra et al., 2011; Attanasio, steps (Sect.2). In a first step, we apply time series decompo- 2012). A linear VAR model of order P boils down to the sition techniques to the vegetation and the various climatic following representation: time series to isolate seasonal cycles, trends, and anomalies. Subsequently, we explore various techniques for construct- P yt β01 X β11p β12p yt−p 1 ing more complex features from the decomposed climatic D C C ; (2) xt β02 β21p β22p xt−p 2 time series. In a final step, we run a Granger-causality anal- pD1 ysis on the NDVI anomalies, while replacing traditional lin- with βij being parameters that need to be estimated and 1 ear vector autoregressive models with random forests. This and 2 referring to two white noise error terms. This model framework allows for modelling non-linear relationships and can be used to derive the predictions required to determine prevents overfitting. The results of the global application of Granger causality. In that sense, time series x Granger causes our framework are discussed in Sect.4. time series y if at least one of the parameters β12p for any p significantly differs from 0. Specifically, and since we are focusing on the vegetation time series as the only target, the 2 A Granger-causality framework for geosciences following two models are compared: P 2.1 Linear Granger causality revisited X yt DOyt C 1 D β01 C β11pyt−p C β12pxt−p C 1 (3) pD We start with a formal introduction to Granger causality for 1 P the case of two times series, denoted as x DTx1;x2;:::;xN U X yt DOyt C 1 D β01 C β11pyt−p C 1: (4) and y DTy1;y2;:::;yN U, with N being the length of the time pD1 series. In this work, y alludes to the NDVI anomaly time se- ries at a given pixel, whereas x can represent the time series We will refer to Eq. (3) as the “full model” and to Eq. (4) as of any climatic variable at that pixel (e.g. temperature, pre- the “baseline model”, since the former incorporates all avail- cipitation, radiation). Granger causality can be interpreted as able information and the latter only information of y.