Comparison of Prediction Accuracy of Multiple Linear Regression, ARIMA and ARIMAX Model for Pest Incidence of Cotton with Weather Factors
Total Page:16
File Type:pdf, Size:1020Kb
Madras Agric. J., 105 (7-9): 313-316, September 2018 Comparison of Prediction Accuracy of Multiple Linear Regression, ARIMA and ARIMAX Model for Pest Incidence of Cotton with Weather Factors V.S. Aswathi* and M.R. Duraisamy Department of Physical Science and Information Technology, Tamil Nadu Agricultural University, Coimbatore - 641 003. Identifying suitable statistical model for predicting pest incidence have important role in pest management programmes. For this study weekly data of aphid, thrips, jassid and whitefly incidence of cotton at the TNAU region, Coimbatore and the weather factors influencing these pests incidence were used for model development. Rainfall, maximum temperature, minimum temperature, morning humidity, eveningMaterial humidity and were Methods used as the independent variables and MLR, ARIMA, ARIMAX models built for each pests. Comparison of these three models was done and checked the model accuracy using root Standardmean square weekly error data value.for pest It wasincidence found (pests that perfor allthree leaves) were collected for pests ARIMAX model posses lowest RMSEcotton valuevariety compared DCH-32 tofrom ARIMA Cotton and MLR.Department, So ARIMAX TNAU, Coimbatore. The data model was selected as best fit model. corresponding to crop periods from September to January for four major cotton pests such as Cotton pests, Multiple linear regression, ARIMA, ARIMAX, Weather factors Key words: aphid, thrips, jassid and whitefly were selected for the study. The corresponding pest incidence data for aphids and thrips were recorded for the period of 2008-2012 while for Cotton (Gossypium spp.) is one among theJassids most and WhiteWeeklyfly were weather recorded data for for the Coimbatore period of 2008 (TNAU-2017 region) and 2008 -2015 respectively. important cash crops playing a key role in IndianWeekly weatherwere data collected for Coimbatore from Agro (TNAU Climatic region) Research were collected Centre, from Agro Climatic economy and is designated as “king of fibre crops”. TNAU, Coimbatore. The weather factors selected for India is the largest producer of cotton in the Researchworld Centre,the studyTNAU, were Coimbatore maximum. Thetemperature(°C), weather factors minimum selected for the study were accounting for about twenty six per cent of globalmaximum temperature(°C),temperature (°C),minimum morning temperature humidity(%), (°C), morning evening humidity(%), evening cotton production. Main factors affecting cotton humidity (%) ,rainfall(mm). Here cotton pest data did humidity (%) ,rainfall(mm).Here cotton pest data did not follow normality, so square root yield are climatic conditions, pests and diseases. not follow normality, so square root transformation Cotton is ravaged by many insect pests startingtransformation from ( ( ) )was was performed. performed. sowing to harvesting stage. The pests which are Multiple√ + 0 linear.5 regression (MLR) more problematic are sucking pests and bollworms.Multiple linear regression (MLR) Understanding the factors affecting population The multiple regression procedure was utilized abundance of the pest during the crop as well as off The multipleover the regression training procedure data set towas estimate utilized theover significant the training data set to estimate seasons would guide in formulating strategies forthe their significant regressionregression coefficients coefficients of the of linearthe linear equation equation management. Pest incidence is highly influenced by Y =β0 + β1 X1 + β2 X2 + ............ + βi Xi + ɛ weather factors. Hence an attempt has been made Y to =β 0 + β1 X1 + β2 X2 + ............ + βi Xi + ɛ Where: β0 = Intercept, Where: β = Intercept, develop pest forewarning models using weather factors 0 th βi = regression coefficient of i independent parameters, ( i = 1,2,…, n), to reduce yield loss in advance. The development of th th βi = regression coefficient of i independent models can depict the interaction of the pests and Xi = i weather parameter (independent variable) and ɛ = error term. parameters, ( i = 1,2,…, n), environment over time. In the present study, identified Y= X β + ɛ. th the suitable model for prediction of pest incidence and Xi = i weather parameter (independent variable) these findings may give reliable methods to identify Using leastand ɛsquare = error criteria, term. regression coefficients were estimated. environmental condition that are conducive for a ′ −1 ′ Y= X β + ɛ. β̂ = (X X) X YŶ particular pest development in cotton. Autoregressive integrated moving average(ARIMA) model Using least square criteria, regression coefficients Material and Methods An autoregressivewere estimated. model β̂ of=(X’ order X)-1 pX’ isYŶ conventionally classified as AR (p) and a Standard weekly data for pest incidence (pestsmoving per averageAutoregressive model with q termsintegrated is known moving as MA average(ARIMA) (q). A combined model that contains p three leaves) were collected for cotton variety DCH- model 32 from Cotton Department, TNAU, Coimbatore.autoregressive The terms and q moving average terms is called ARMA (p,q). If the object series is An autoregressive model of order p is data corresponding to crop periods from Septemberdifferenced d times to achieve stationary, the model is classified as ARIMA (p, d, q). Thus, to January for four major cotton pests such as aphid, conventionally classified as AR (p) and a moving an ARIMA model is a combination of an autoregressive (AR) process and a moving average thrips, jassid and whitefly were selected for the study. average model with q terms is known as MA (q). A The corresponding pest incidence data for aphids(MA) process combinedapplied to a model non-stationary that contains data series.p autoregressive Since weekly terms data on pest has been used and q moving average terms is called ARMA (p,q). and thrips were recorded for the period of 2008-2012in this study, ARIMA models mentioned as Seasonal ARIMA models (SARIMA). while for Jassids and Whitefly were recorded for the If the object series is differenced d times to achieve period of 2008-2017 and 2008-2015 respectively. stationary, the model is classified as ARIMA (p, d, q). Thus, an ARIMA model is a combination of an *Corresponding author’s email: [email protected] autoregressive (AR) process and a moving average 314 (MA) process applied to a non-stationary data series. (1,1,1) model, we can write Since weekly data on pest has been used in this Y =β + β X + β X + ............... + β X + n study, ARIMA models mentioned as Seasonal ARIMA 0 1 1 2 2 i i t models (SARIMA). (1- ɸ1 B)(1-B) nt =(1+θ1 B)ɛt Seasonal ARIMA models are an adaptation of Where ɛt, is a white noise series. ARIMAX autoregressive integrated moving average (ARIMA) model have two error terms here the error from the models to specifically fit seasonal time series. Their regression model which we denote by nt and the construction includes the seasonal part in model error from the ARIMA model which we denote by fitting. A combination of non-seasonal and seasonal ɛt. Only the ARIMA model errors are assumed to be process yields the multiplicative Seasonal ARIMA, white noise. (p,d,q) × (P,D,Q) , where m is the periods per season. m Root mean square error The general forms as: The root mean square error (RMSE) is a Φ (Bs ) ɸ (B) (1-B)d (1-Bs )DY =Ɵ (B) ɛ P p t Q t frequently used measure of the difference between d values predicted by a model and the actually Where: ɸp (B), θq (B) and (1-B) are non-seasonal The root mean square error (RMSE) is a frequently used measure of the difference autoregressive operator of order p, non-seasonal betweenobserved values predicted values. by a model LessThe and root RMSEthe mean actually square needed observed error (RMSE) values.for the Lessis abest frequentlyRMSE fit needed used measure of the difference moving average operator of order q and non-seasonal for the model.best fit model. RMSE RMSEbetween can can be be calculatedvalues calcul predicted byated using by by the a model usingequation and the the equationactually observed values. Less RMSE needed differencing operator of order d, respectively. for the best fit model. RMSE can be calculated by using the equation 2 s s D s =1 Where Φ (B ), (1-B ) and Ɵ (B ) are seasonal ∑ ( ̂ − ) P Q 2 = √ autoregressive operator, moving average operator Where yi =actual value and =estimated value. ∑=1( ̂ − ) = √ Where yi =actual value and =estimated value. and differencing operator their orders are P,D and Q Data Whereanalysis was y idone =actual̂ by using value Excel,SPSS, and R software.=estimated value. respectively, and are represented by: Data analysis was donê by using Excel,SPSS, R software. Akaike’s InformationData analysis Criterion (AIC) was done by using Excel,SPSS, s s 2s Ps R software. Akaike’s Information Criterion (AIC) ΦP(B ) = 1- Φ1 B - Φ2 B - ............- ΦP B The AIC computation is based on the mathematical formula AIC = −2log L + 2m, where m = p + q + P + Q is the numberThe AIC of computationparameters in is the based model on andthe Lmathematical is the likelihood formula AIC = −2log L + 2m, Ɵ (Bs) = 1- Ɵ Bs- Ɵ B2s - ............- Ɵ BQs Akaike’s information criterion (AIC) Q 1 2 Q function. Since -2 log L + 2wherem is approximately m = p + q + equalP + Q to is {n the (1+log number 2π) of + parametersn log σ2} where in the σ 2model is and L is the likelihood 2 2 Where: s indicates the length of seasonality. the model MSE.The The AIC best computation modelfunction. is the Since one with- 2is log based the L lowest+ 2m on is AIC approximately the value. mathematical equal to {n (1+log 2π) + n log σ } where σ is formula AIC =the −2log model MSE. L + The 2m, best modelwhere is the m one = with p +the q lowest + P AIC value. Results and Discussion ARIMAX or regression with ARIMA errors + Q is the number of parameters in the model and Multiple linear regression ofResults pests withand Dweatheriscussion factors ARIMAX method is an extension method of L is the likelihood function.