<<

Madras Agric. J., 105 (7-9): 313-316, September 2018

Comparison of Prediction Accuracy of Multiple Linear Regression, ARIMA and ARIMAX Model for Incidence of Cotton with Weather Factors

V.S. Aswathi* and M.R. Duraisamy Department of Physical Science and Information Technology, Tamil Nadu Agricultural University, Coimbatore - 641 003.

Identifying suitable statistical model for predicting pest incidence have important role in pest management programmes. For this study weekly data of , thrips, jassid and incidence of cotton at the TNAU region, Coimbatore and the weather factors influencing these pests incidence were used for model development. Rainfall, maximum temperature, minimum temperature, morning humidity, eveningMaterial humidity and were Methods used as the independent variables and MLR, ARIMA, ARIMAX models built for each pests. Comparison of these three models was done and checked the model accuracy using root Standardmean square weekly error data value.for pest It wasincidence found (pests that perfor allthree leaves) were collected for pests ARIMAX model posses lowest RMSEcotton valuevariety compared DCH-32 tofrom ARIMA Cotton and MLR.Department, So ARIMAX TNAU, Coimbatore. The data model was selected as best fit model. corresponding to crop periods from September to January for four major cotton pests such as Cotton pests, Multiple linear regression, ARIMA, ARIMAX, Weather factors Key words: aphid, thrips, jassid and whitefly were selected for the study. The corresponding pest incidence data for and thrips were recorded for the period of 2008-2012 while for Cotton (Gossypium spp.) is one among theJassids most and WhiteWeeklyfly were weather recorded data for for the Coimbatore period of 2008 (TNAU-2017 region) and 2008 -2015 respectively. important cash crops playing a key role in IndianWeekly weatherwere data collected for Coimbatore from Agro (TNAU Climatic region) Research were collected Centre, from Agro Climatic economy and is designated as “king of fibre crops”. TNAU, Coimbatore. The weather factors selected for is the largest producer of cotton in the Researchworld Centre,the studyTNAU, were Coimbatore maximum. Thetemperature(°C), weather factors minimum selected for the study were accounting for about twenty six per cent of globalmaximum temperature(°C),temperature (°C),minimum morning temperature humidity(%), (°C), morning evening humidity(%), evening cotton production. Main factors affecting cotton humidity (%) ,rainfall(mm). Here cotton pest data did humidity (%) ,rainfall(mm).Here cotton pest data did not follow normality, so square root yield are climatic conditions, pests and diseases. not follow normality, so square root transformation Cotton is ravaged by many pests startingtransformation from ( ( ) )was was performed. performed. sowing to harvesting stage. The pests which are Multiple√𝑋𝑋 + 0 linear.5 regression (MLR) more problematic are sucking pests and bollworms.Multiple linear regression (MLR) Understanding the factors affecting population The multiple regression procedure was utilized abundance of the pest during the crop as well as off The multipleover the regression training procedure data set towas estimate utilized theover significant the training data set to estimate seasons would guide in formulating strategies forthe their significant regressionregression coefficients coefficients of the of linearthe linear equation equation management. Pest incidence is highly influenced by Y =β0 + β1 X1 + β2 X2 + ...... + βi Xi + ɛ weather factors. Hence an attempt has been made Y to =β 0 + β1 X1 + β2 X2 + ...... + βi Xi + ɛ Where: β0 = Intercept, Where: β = Intercept, develop pest forewarning models using weather factors 0 th βi = regression coefficient of i independent parameters, ( i = 1,2,…, n), to reduce yield loss in advance. The development of th th βi = regression coefficient of i independent models can depict the interaction of the pests and Xi = i weather parameter (independent variable) and ɛ = error term. parameters, ( i = 1,2,…, n), environment over time. In the present study, identified Y= X β + ɛ. th the suitable model for prediction of pest incidence and Xi = i weather parameter (independent variable) these findings may give reliable methods to identify Using leastand ɛsquare = error criteria, term. regression coefficients were estimated. environmental condition that are conducive for a ′ −1 ′ Y= X β + ɛ. β̂ = (X X) X YŶ particular pest development in cotton. Autoregressive integrated moving average(ARIMA) model Using least square criteria, regression coefficients Material and Methods An autoregressivewere estimated. model β̂ of=(X’ X)-1 pX’ isYŶ conventionally classified as AR (p) and a Standard weekly data for pest incidence (pestsmoving per averageAutoregressive model with q termsintegrated is known moving as MA average(ARIMA) (q). A combined model that contains p three leaves) were collected for cotton variety DCH- model 32 from Cotton Department, TNAU, Coimbatore.autoregressive The terms and q moving average terms is called ARMA (p,q). If the object series is An autoregressive model of order p is data corresponding to crop periods from Septemberdifferenced d times to achieve stationary, the model is classified as ARIMA (p, d, q). Thus, to January for four major cotton pests such as aphid, conventionally classified as AR (p) and a moving an ARIMA model is a combination of an autoregressive (AR) process and a moving average thrips, jassid and whitefly were selected for the study. average model with q terms is known as MA (q). A The corresponding pest incidence data for aphids(MA) process combinedapplied to a model non-stationary that contains data series.p autoregressive Since weekly terms data on pest has been used and q moving average terms is called ARMA (p,q). and thrips were recorded for the period of 2008-2012in this study, ARIMA models mentioned as Seasonal ARIMA models (SARIMA). while for Jassids and Whitefly were recorded for the If the object series is differenced d times to achieve period of 2008-2017 and 2008-2015 respectively. stationary, the model is classified as ARIMA (p, d, q). Thus, an ARIMA model is a combination of an *Corresponding author’s email: [email protected] autoregressive (AR) process and a moving average 314

(MA) process applied to a non-stationary data series. (1,1,1) model, we can write Since weekly data on pest has been used in this Y =β + β X + β X + ...... + β X + n study, ARIMA models mentioned as Seasonal ARIMA 0 1 1 2 2 i i t models (SARIMA). (1- ɸ1 B)(1-B) nt =(1+θ1 B)ɛt

Seasonal ARIMA models are an adaptation of Where ɛt, is a white noise series. ARIMAX autoregressive integrated moving average (ARIMA) model have two error terms here the error from the models to specifically fit seasonal time series. Their regression model which we denote by nt and the construction includes the seasonal part in model error from the ARIMA model which we denote by fitting. A combination of non-seasonal and seasonal ɛt. Only the ARIMA model errors are assumed to be process yields the multiplicative Seasonal ARIMA, white noise. (p,d,q) × (P,D,Q) , where m is the periods per season. m Root mean square error The general forms as: The root mean square error (RMSE) is a Φ (Bs ) ɸ (B) (1-B)d (1-Bs )DY =Ɵ (B) ɛ P p t Q t frequently used measure of the difference between d values predicted by a model and the actually Where: ɸp (B), θq (B) and (1-B) are non-seasonal The root mean square error (RMSE) is a frequently used measure of the difference autoregressive operator of order p, non-seasonal betweenobserved values predicted values. by a model LessThe and root RMSEthe mean actually square needed observed error (RMSE) values.for the Lessis abest frequentlyRMSE fit needed used measure of the difference moving average operator of order q and non-seasonal for the model.best fit model. RMSE RMSEbetween can can be be calculatedvalues calcul predicted byated using by by the a model usingequation and the the equationactually observed values. Less RMSE needed differencing operator of order d, respectively. for the best fit model. RMSE can be calculated by using the equation

s s D s 𝑛𝑛 2 Where Φ (B ), (1-B ) and Ɵ (B ) are seasonal ∑𝑡𝑡=1( 𝑦𝑦̂𝑖𝑖 − 𝑦𝑦𝑖𝑖) P Q 𝑛𝑛 2 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = √ 𝑖𝑖 𝑖𝑖 autoregressive operator, moving average operator Where yi =actual value and =estimated value. 𝑛𝑛 ∑𝑡𝑡=1( 𝑦𝑦̂ − 𝑦𝑦 ) 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = √ Where yi =actual value and =estimated value. 𝑛𝑛 and differencing operator their orders are P,D and Q Data Whereanalysis was y idone =actual𝑦𝑦̂𝑖𝑖 by using value Excel,SPSS, and R software.=estimated value. respectively, and are represented by: Data analysis was done𝑦𝑦̂𝑖𝑖 by using Excel,SPSS, R software. Akaike’s InformationData analysis Criterion (AIC) was done by using Excel,SPSS, s s 2s Ps R software. Akaike’s Information Criterion (AIC) ΦP(B ) = 1- Φ1 B - Φ2 B - ...... - ΦP B The AIC computation is based on the mathematical formula AIC = −2log L + 2m, where m = p + q + P + Q is the numberThe AIC of computationparameters in is the based model on andthe mathematicalL is the likelihood formula AIC = −2log L + 2m, Ɵ (Bs) = 1- Ɵ Bs- Ɵ B2s - ...... - Ɵ BQs Akaike’s information criterion (AIC) Q 1 2 Q function. Since -2 log L + 2wherem is approximately m = p + q + equalP + Q to is {n the (1+log number 2π) of + parametersn log σ2} where in the σ 2model is and L is the likelihood 2 2 Where: s indicates the length of seasonality. the model MSE.The The AIC best computation modelfunction. is the Since one with- 2is log based the L lowest+ 2m on is AIC approximately the value. mathematical equal to {n (1+log 2π) + n log σ } where σ is formula AIC =the −2log model MSE. L + The 2m, best modelwhere is the m one = with p +the q lowest + P AIC value. Results and Discussion ARIMAX or regression with ARIMA errors + Q is the number of parameters in the model and Multiple linear regression ofResults pests withand Dweatheriscussion factors ARIMAX method is an extension method of L is the likelihood function. Since -2 log L + 2m is Multiple linear regressionMultiple equation linear ofregression aphid, thrips, of pests jassid, with white weatherfly were factors given in the ARIMA model. In forecasting, this method involves approximately equal to {n (1+log 2π) + n log σ2} where Table 1. MLR for aphid was founMultipled to be linear significant regression with equation value of of aphid, 0.232 thrips, and RMSE jassid, whitefly were given in the σ2 is the model MSE. The best model is the one with independent variables also.The independent Table 1. MLR for aphid was found2 to be significant with value of 0.232 and RMSE 0.87699. MLR for thrips was found to be significant with R value of 0.098 and RMSE the lowest AIC value. 2 variables used in this study were weather factors that 0.87699. MLR for thrips was found2 to be significant with value of 0.098 and RMSE 0.32686. MLR for jassid was found to be significant with 𝑅𝑅 value of 0.141 and RMSER 2 affect the pest incidence. Multiple regression models 0.32686. MLR for jassid was found2 to be significant with value of 0.141 and RMSE 0.71047.Results MLR for white andfly Discussion was not significant with value𝑅𝑅 of 0.062 and RMSE 0.5696.𝑅𝑅 2 of the formY =β0 + β1 X1 + ....+ βi Xi+ɛ. Where Y is a 0.71047. MLR for whitefly was2 not significant with value of 0.062 and RMSE 0.5696. Akram et al(2013) also used same method to study impact 𝑅𝑅 of weather on whitefly and thrips𝑅𝑅 dependent variable of the X predictor variables and ɛ Multiple linear regression of pests with weather 2 i incidence of cotton. RaghavendraAkram etet al.al(2013) (2014) also also used used same MLR method for predicting to study pestimpact incidence of weather on whitefly and thrips usually assumed to be an uncorrelated error term factors 𝑅𝑅 of cotton. incidence of cotton. Raghavendra et al. (2014) also used MLR for predicting pest incidence (i.e., it is white noise). We considered tests such as Multiple linearof cotton. regression equation of aphid, the Durbin-Watson test for assessing whether ɛ was ARIMAthrips, and ARIMAX jassid, models whitefly for pests were given in the Table 1. MLR ARIMA and ARIMAX models for pests 2 significantly correlated. We will replace ɛ by nt in the for aphid was found to be significant with R value of Weekly data of pest incidence were used to fit ARIMA and Autoregressive Integrated equation. The error series nt is assumed to follow an 0.232 and RMSE 0.87699.Weekly data ofMLR pest incidence for thrips were wasused to found fit ARIMA and Autoregressive Integrated Moving Average with regression (ARIMAX)2 model. So ARIMA and ARIMAX model ARIMA model. For example, if nt follows an ARIMA to be significant with R value of 0.098 and RMSE mentioned as seasonal ARIMA(SARIMA)Moving Average and with SARIMAX regression respectively. (ARIMAX)Best model. fitted So mod ARIMAels and ARIMAX model mentioned as seasonal ARIMA(SARIMA) and SARIMAX respectively.Best fitted models Table 1. Multiple linear regression of pests with weatherare presented factors in the Table 2,and RMSE and AIC values were calculated. are presented in the Table 2,and RMSE and AIC values were calculated. Pests Equation For aphid SARIMA and SARIMAXR2 RMSE with 0.63 and For aphid SARIMA and SARIMAX with 0.63 and 0.61 RMSE respectively.AIC(0,0, 3)values(1,0,1 )20are 210.4 and 204.9(1,0, 0)respectively. (1,0,0)20 For Thrips Aphid Y= 7.027-0.006(rain)-0.286 (maximum temperature) 0.232** 0.87699 and SARIMAX0.61 RMSE respectively.AIC with 0.2872(0,0, 3and)values(1 ,00.2730,1 )20are RMSE210.4 respectively.and 204.9(1,0, 0 )respectively. (1,0,0)20 For Thrips

+0.178(minimum temperature)+ 0.046(morning20 humidity)- 20 and SARIMAX with 0.2872 and 0.2730 RMSE respectively. 0.066(evening humidity) SARIMA (1,0,0) (1,0,0) SARIMA (1,0,0)20 (1,0,0)20 Thrips Y= 2.564+0.002(rain)+0.043 (maximum temperature) 0.098* 0.32686 -0.034(minimum temperature)-0.029(morning humidity)+0.008(evening humidity)

Jassid Y= 5.101-0.002(rain)+0.047 (maximum temperature)- 0.141** 0.71047 0.049(minimum temperature)-0.033(morning humidity)-0.006 (evening humidity) Whitefly Y= 4.766-0.000032(rain)-0.084 (maximum temperature) + 0.062 0.56966 0.008(minimum temperature)- 0.001(morning humidity)- 0.015(evening humidity) Note: ** significant at 1%, *significant at 5%. 315

0.32686. MLR for jassid was found to be significant ARIMAX showed lowest RMSE and AIC values. with R2 value of 0.141 and RMSE 0.71047. MLR for From the best fitted model, the result concluded whitefly was not significant with2 R value of 0.062 and that ARIMAX showed better result compared to RMSE 0.5696. Akram et al (2013) also used same ARIMA. Arya et al. (2015) predicted pest population method to study impact of weather on whitefly and using ARIMAX model. Anggraeni et al.(2017) also thrips incidence of cotton. Raghavendra et al. (2014) used ARIMAX method to predict the number of also used MLR for predicting pest incidence of cotton. dengue fever patients in by using trend Table 2. Fitted ARIMA and ARIMAX models of data and the results showed that ARIMAX was pests suitable for irregular patterned data. Kongcharoen and Kruangpradit (2013) also used ARIMAX for Pests Models RMSE AIC forecasting export to major trade partners.

SARIMA (0,0,3) (1,0,1)20 0.635 210.4 Aphid Comparison between MLR, ARIMA and ARIMAX for SARIMAX (1,0,0) (1,0,0) 0.616 204.9 20 different pests

SARIMA (1,0,0)20 0.2872 40.63 Thrips Accuracy of models was checked by using SARIMAX (1,0,0) 0.2730 40.53 20 RMSE value and it was given in the Table 3. For

SARIMA (0,1,0) (2,0,1)20 0.4188 207 aphid lowest RMSE value obtained for ARIMAX Jassid model (0.616) followed by ARIMA model(0.635) SARIMAX (0,1,0) (2,0,1)20 0.3980 197.7 and then MLR model (0.876). For thrips lowest SARIMA (1,0,0) (2,0,0)20 0.3308 113.8 Whitefly RMSE value for ARIMAX model (0.273) followed by SARIMAX (1,0,0) (2,0,1) 0.2932 100.3 20 ARIMA (0.287) and then MLR model (0.326). For ARIMA and ARIMAX models for pests jassid lowest RMSE value for ARIMAX model(0.398) followed by ARIMA model (0.418) and then MLR Weekly data of pest incidence were used to model(0.710).For whitefly lowest RMSE value for fit ARIMA and Autoregressive Integrated Moving ARIMAX model(0.293) followed by ARIMA(0.330) and Average with regression (ARIMAX) model. So then MLR model(0.569).Based on the lowest RMSE ARIMA and ARIMAX model mentioned as seasonal value, ARIMAX model recorded highest accuracy ARIMA(SARIMA) and SARIMAX respectively. Best of prediction compared to ARIMA and MLR models. fitted models are presented in the Table 2,and RMSE and AIC values were calculated. Conclusion ARIMAX model recorded the lowest RMSE values For aphid SARIMA (0,0,3) (1,0,1)20 and for all the pests compared to MLR and ARIMA. So SARIMAX(1,0,0) (1,0,0)20 with 0.63 and 0.61 RMSE respectively. AIC values are 210.4 and ARIMAX selected as best model for predicting pest incidence of aphid, thrips, jassid and whitefly of cotton 204.9 respectively. For Thrips SARIMA (1,0,0)20 using weather factors in the study area. and SARIMAX (1,0,0)20 with 0.2872 and 0.2730 RMSE respectively. AIC values are 40.63 and 40.53 Acknowledgement respectively. In case of Jassid SARIMA (0,1,0) (2,

0,1)20 and SARIMAX (0,1,0) (2,0,1)20 with 0.4188 and Our sincere gratitude to Dr. Patil Santosh 0.3980 RMSE values respectively. AIC values are 207 Ganapati, who has helped us throughout the research and 197.7 respectively. For Whitefly SARIMA (1,0,0) programme.

(2,0,0)20 and SARIMAX (1,0,0) (2,0,1)20 with 0.3308 and 0.2932 RMSE respectively. AIC values are 113.8 References and 100.3 respectively. Akram, M, Hafeez, F, Farooq, M, Arshad, M, Hussain, M, Ahmed, S. and Khan, H.A. 2013. A case to study Table 3. RMSE value of MLR,ARIMA, ARIMAX for population dynamics of Bemisia tabaci and Thrips different pests tabaci on Bt and non-Bt cotton genotypes. Pakistan Journal of Agricultural Sciences, , 617-623. Pests Models RMSE 50(4) Anggraeni,W,Pusparinda,N, Riksakomara, E, Samopa, MLR 0.876 F.and Pujiadi. 2017. The Performance of ARIMAX Aphid ARIMA 0.635 Method in Forecasting Number of Tuberculosis ARIMAX 0.616 Patients in Malang Regency, Indonesia. International J. Applied Engineering Research, 12(17),189-196. MLR 0.326 Arya, P, Paul, R. K, Kumar, A, Singh, K. N.and Sivaramne, Thrips ARIMA 0.287 N. 2015. Predicting pest population using weather ARIMAX 0.273 variables : An ARIMAX time series framework. International Journal of Agricultural and Statistical MLR 0.710 Sciences, 11(2), 381-386. Jassid ARIMA 0.418 Boopathi,T, Singh,S.B, Manju,T, Ramakrishna,Y, Akoijam,R.S, ARIMAX 0.398 Samik Chowdhury and Ngachan,S.V.2015. Development of temporal modeling for forecasting MLR 0.569 and prediction of the incidence of , Tessaratoma Whitefly ARIMA 0.330 papillosa (: ), using time series (ARIMA) analysis. Journal of Insect ARIMAX 0.293 Science,15(1),55-59. 316

Campos, A. C, Campos, M. S, Bustillo, C. W. G, Villafranca, Kadam, D. B, Kadam, D. R. and Umate, S. M. 2015. Effects M. H.and Johanssonb, T. 2012. Non-parametric of weather parameters on incidence sucking pests on statistical methods and data transformations in Bt cotton. International Journal of Plant Protection, agricultural pest population studies. Chilean Journal 8(1), 211-213. of Agricultural Research, 72(3),440-443. Kongcharoen, C, and Kruangpradit, T. 2013. Autoregressive Dhar, T, Ghosh, A, Senapati, S. K.and Bhattacharya, S. Integrated Moving Average with Explanatory 2014. Identification of prediction model on population Variable (ARIMAX) Model for Export. Paper build up of Singhiella pallida on Piper betle L. for presented at the 33rd International Symposium on timely intervention. International Journal of Agriculture, Forecasting,South Korea, p.1-8, www.researchgate. Environment and Biotechnology, 7(4), 883. net/publication/255731345. Hameed, A, Shahzad, M.S, bidmehmood, Ahmad, S.and Panwar, T. S, Singh, S. B. and Garg, V. K. 2015. Influence Noor-UlIslam.2014. Forecasting and modeling of meteorological parameters on population dynamics of sucking insect complex of cotton under agro- of thrips and aphid in Bt and non Bt cotton at ecosystem of Multan- Punjab, Pakistan. Pakistan Malwa region of Madhya Pradesh. Journal of Journal of Agricultural Sciences, 51(4), 997-1003. Agrometeorology, 17(1), 136-138. Janu, A. and Dahiya, K. 2017. Influence of weather Raghavendra, K. V, Naik, D. S. B, Mieee, S.Venkatrama parameters on population of whitefly, Bemisia tabaci phanikumar.and Mieee. 2014. Weather Based in American cotton (Gossypium hirsutum). Journal Prediction of Pests in Cotton. Paper presented at of and Zoology Studies, 5(4), 649-654. the sixth International Conference on Computational Intelligence and Communication Networks, p.570- 574,www.researchgate.net/publication/283649380.

Received : February 20, 2018; Revised : March 10, 2018; Accepted : March 25, 2018