Model-Free Forecasting Outperforms the Correct Mechanistic Model for Simulated and Experimental Data
Total Page:16
File Type:pdf, Size:1020Kb
Model-free forecasting outperforms the correct mechanistic model for simulated and experimental data Charles T. Perrettia,1, Stephan B. Munchb, and George Sugiharaa aScripps Institution of Oceanography, University of California at San Diego, La Jolla, CA 92093; and bFisheries Ecology Division, Southwest Fisheries Science Center, Santa Cruz, CA 95060 Edited by Peter J. Bickel, University of California, Berkeley, CA, and approved January 10, 2013 (received for review September 14, 2012) Accurate predictions of species abundance remain one of the most ARMA time series models have been used extensively in ecology; vexing challenges in ecology. This observation is perhaps unsurpris- recent examples include forecasting quasi-extinction risk (10), ing, because population dynamics are often strongly forced and investigating the effects of climate change on fisheries (11), and highly nonlinear. Recently, however, numerous statistical techni- detecting critical thresholds in ecosystem dynamics (12). SSR ques have been proposed for fitting highly parameterized mecha- methods have been less pervasive in ecology (13, 14), although nistic models to complex time series, potentially providing the they are ubiquitous in many other scientific disciplines for de- machinery necessary for generating useful predictions. Alterna- scribing stochastic nonlinear systems (15). tively, there is a wide variety of comparatively simple model-free Here, we pose a rather conservative challenge and ask whether forecasting methods that could be used to predict abundance. Here a correctly specified mechanistic model, fit with commonly used we pose a rather conservative challenge and ask whether a correctly statistical techniques, can provide better forecasts than simple specified mechanistic model, fit with commonly used statistical tech- niques, can provide better forecasts than simple model-free meth- model-free methods for ecological systems with noisy nonlinear ods for ecological systems with noisy nonlinear dynamics. Using dynamics. To answer this question, we used computer simulations fi four different control models and seven experimental time series and experimental data to t a suite of mechanistic models using of flour beetles, we found that Markov chain Monte Carlo proce- a Bayesian MCMC algorithm under ecologically realistic levels of dures for fitting mechanistic models often converged on best-fit noise. Surprisingly, we found that even using only a single time parameterizations far different from the known parameters. As a re- series variable for each system, an SSR forecast approach con- sult, the correctly specified models provided inaccurate forecasts sistently outperformed both the linear time series models and the and incorrect inferences. In contrast, a model-free method based correctly specified mechanistic models. on state-space reconstruction gave the most accurate short-term forecasts, even while using only a single time series from the mul- Simulation Methods tivariate system. Considering the recent push for ecosystem-based Using four different mechanistic control models, we generated management and the increasing call for ecological predictions, our simulated time series with both process noise and observation results suggest that a flexible model-free approach may be the most error (see Table 1 for models and parameters). Each model was fit promising way forward. to a 50-y time series (referred to as the training set) and forecast accuracy was evaluated using an additional 50-y time series that nderstanding fluctuations in species abundance is a long- contained no observation error (referred to as the test set). Be- Ustanding goal of ecology and is particularly important for the cause each time series contains a unique sequence of process conservation of severely depleted populations (1, 2). Dramatic noise, we simulated 100 replicates for each model to ensure robust changes in species abundance are common (3) and large declines results. To facilitate comparison of forecast accuracy between can have disastrous impacts on ecosystem users (4). Accurate models, forecast error is described using the standardized rms forecasts could allow for improved conservation efforts and in- fi fi error statistic (SRMSE), which is de ned as the rms error divided creased shery yields, however, ecological surprises are common by the SD of the test set. Forecast errors greater than unity in- (5), and accurate predictions remain a major challenge (6). dicate predictions less accurate than simply predicting the mean of As ecological time series continue to lengthen, and computer- intensive statistics become more convenient, there has been an the test set. All species (or age classes) in each model were forecasted. increasing trend toward fitting complex mechanistic models that incorporate both process and observation error (i.e., state-space Importantly, however, to make the results more conservative, models) (7). A Bayesian fitting procedure for such a model typi- the model-free methods used only the single time series of the cally involves computationally intensive Markov chain Monte focal species (or age class) to make forecasts, whereas the Carlo (MCMC) sampling of the joint posterior probability distri- mechanistic models were fit to the entire multivariate set of bution to find the best fitting parameters (8). Previous work con- time series. In real systems we often lack time series of im- cluded that misspecified mechanistic models can produce poor portant driving variables or species; thus, for this additional forecasts (9), but it has been suggested that state-space models reason, this comparison represents a very optimistic scenario BIOLOGY may be more robust to misspecification due to their inclusion of for the mechanistic modeling approach. POPULATION a process-error term (8); however, their forecast accuracy remains largely untested. Model-free time series analysis provides an alternative method Author contributions: C.T.P., S.B.M., and G.S. designed research; C.T.P. performed re- for generating forecasts that, unlike most state-space fitting pro- search; C.T.P. analyzed data; and C.T.P., S.B.M., and G.S. wrote the paper. cedures, does not require intensive computations. Although there The authors declare no conflict of interest. exists a vast literature on time series forecasting methods, we focus This article is a PNAS Direct Submission. on three commonly used models: the linear autoregressive (AR) 1To whom correspondence should be addressed. E-mail: [email protected]. and autoregressive moving average (ARMA) models, and the This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. nonlinear state-space reconstruction (SSR) method. AR and 1073/pnas.1216076110/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1216076110 PNAS | March 26, 2013 | vol. 110 | no. 13 | 5253–5257 Downloaded by guest on September 26, 2021 Table 1. Model structure and model parameter values used in N], is an embedding of the attracting manifold, where N is the the simulations length of the time series. Model name Process model structure* After transforming the time series into an embedding, we gen- h i erated predictions using the S-Map model (13, 28, 29), which is x Logistic x + = x r 1 − t expðe Þ t 1 t K t a locally weighted autoregressive model of the embedded time h i series with weights given by xt Two-species xt+1 = xt r1 1 − − cxt yt expðet Þ K1 h i yt wðdÞ = exp −θd=d ; yt+1 = yt r2 1 − + cxt yt expðet Þ K2 Age-structured x = (x + x )f 0,t 2,t 3,t where θ ≥ 0, d is the Euclidean distance between the predictee and x + = x exp(r − rx + « ) 1,t 1 0,t 0,t t the neighbor vector in embedded space, and d is the average dis- x + = s x 2,t 1 1 1,t tance between the predictee and all other vectors. The degree of x + = s (x + x ) 3,t 1 2 2,t 3,t local weighting (θ) and the embedding dimension (E) were fit ~ = ð − + e Þ Spatial xi;t xi;Pt exp r rxi;t t using cross-validation on the 50-y training set. We searched to = 4 ~ xi;t+1 j=1dij xj;t a maximum of E = 9 lags, because the highest dimensionality of the models was n = 4, and it has been shown that any nonlinear *Process noise « is normally distributed with μ = 0 and σ = 0.005. t stochastic model can be represented by a univariate model of order 2n + 1, where n is the dimension of the true model (30). Control Models. We chose four control models that encompass a Predictions were made on the 50-y test set. wide range of ecological scenarios—the logistic model, a two- Linear Time Series Models. species coupled logistic model (16), an age-structured model (17), Last, we evaluated the forecast accuracy and a four-patch spatial model (18). Deterministic model struc- of two linear time series models: AR and ARMA. Linear time p tures are given in Table 1 (parameter values given in Table S1). series models have be used extensively in ecology. The ARMA( , All control models included log-normal process and observation q) model is given as (31) error, and parameter values that resulted in chaotic or near-cha- fi Xp Xq otic deterministic dynamics. A coef cient of variation (CV) of 0.2 ðx − μÞ = β ðx − μÞ + α e ; was chosen for the observation error, which is consistent with that t i t−i j t−j i = j = 0 reported in fish abundance surveys (19). 1 fi To avoid nonidenti ability of error sources, the process noise p q fi where is the dimension of the autoregressive term, is the di- variance was xed to the correct value (giving the mechanistic ap- mension of the moving average term, μ is the mean of the time proach yet another advantage), and only the observation error series, α and β are the coefficients of the autoregressive and mov- variance was fit. Each control model was fit to its training set e ’ ing average terms, respectively, and is a normally distributed using Bayesian adaptive MCMC with Geyer s algorithm (20) in random variable with mean zero.