CORE Metadata, citation and similar papers at core.ac.uk

Provided by Ghent University Academic Bibliography

Ecological Engineering 118 (2018) 104–110

Contents lists available at ScienceDirect

Ecological Engineering

journal homepage: www.elsevier.com/locate/ecoleng

Input variable selection with greedy stepwise search algorithm for analysing T the probability of fish occurrence: A case study for Alburnoides mossulensis in the Gamasiab River, ⁎ Rahmat Zarkamia, , Maryam Moradia, Roghayeh Sadeghi Pasvishehb, Ali Banic, Keivan Abbasid

a Department of Environmental Science, Faculty of Natural Resources, University of Guilan, 1144 Sowmeh Sara, Guilan, Iran b Department of Plant Production, Faculty of BioScience Engineering, Ghent University, Coupure links, 653, 9000 Ghent, Belgium c Department of Marine Science & Technology-Aquatic Ecology Department, University of Guilan, Rasht, Iran d Ecology Department, National Inland Water Aquaculture Institute, Bandar Anzali, Iran

ARTICLE INFO ABSTRACT

Keywords: Input variable selection using data-driven combined with statistical methods have received more attention to Alburnus mossulensis analyse the probability of freshwater organisms’ occurrence. Eight different sampling sites (from the source to Gamasiab River the mouth of Gamasiab River basin, Iran) were considered to study the occurrence of Alburnoides mossulensis Greedy stepwise during one year study period (2008–2009). A set of river characteristics together with abundance of target fish Logistic regression (based on presence/absence data) were recorded at each sampling site. Logistic regression was optimized with Probability of occurrence an input variable selection, greedy stepwise search algorithm, to select the most important explanatory variables for analysing the occurrence of fish. According to the optimization method, almost one-third of total recorded – variables in the sampling sites including electric conductivity (EC), bicarbonate (HCO3 ), river width, river 2− 3− depth, water temperature, pH, sulphate (SO4 ) and orthophosphate (PO4 -P) might influence the probability of occurrence of fish in the river while based on the outcomes of binary logistic regression model, electrical conductivity and bicarbonate were the most important ones (p < 0.05 for both variables).

1. Introduction conservation (Ambelu et al., 2010; Sadeghi et al., 2012a,b; Zarkami et al., 2012a; Sadeghi et al., 2013; Haghi Vayghan et al., 2015). The conservation and restoration of aquatic ecosystems are a main Classification tree (Zarkami, 2011; Sadeghi et al., 2012a; Zarkami objective for sustainable river basin management. The effect of human et al., 2014), artificial neural networks (Lock and Goethals, 2014), activities on river catchments results in pressures on the biota and support vector machine (Sadeghiet al., 2012b; Zarkami et al., 2012b) biological processes (Karr and Chu, 1999). Therefore, understanding and logistic regression (Lock and Goethals, 2014) are such predictive the relationship between the ecological characteristics of rivers/streams modelling techniques which have shown their high capability in pre- and the occurrence of organisms is a critical issue in conservation, re- dicting the habitat suitability of organisms. Among the abovementioned storation, and management programmes. According to WFD; European models, logistic regression is a very useful model to show the occur- Council (2000), aquatic flora, microbenthic and fish fauna are im- rence of organisms based on presence/absence data (Imran et al., 2008; portant organisms to assess the ecological water quality. To achieve Lock and Goethals, 2014). this, river managers and engineers initially need to design an appro- Selection of the most important explanatory variables is regarded as priate monitoring campaign based on a set of abiotic (e.g. water quality very important issue for evaluating the presence/absence and abun- and physical-habitat variables) and biotic variables (e.g. presence/ab- dance of freshwater organisms (Faraway and Chatfield, 1998; Zarkami sence and abundance) for freshwater organisms. The second step for et al., 2012a). The reason is that when various variables are introduced analysing the habitat preferences of the target organisms is the choice to the machine learning (e.g. logistic regression in the present study), of appropriate modelling techniques which should be used based on the the redundant variables will unquestionably lower the predictive power scope of study and also the existing dataset. The third and final step is and reliability of models (Hall and Holmes, 2003). Moreover, mon- the use of proper modelling techniques which are considered as im- itoring various variables for assessing the habitat preferences of or- portant tools to support decision-making in aquatic management and ganisms is either costly or unmanageable (Dom et al., 1989; Ambelu

⁎ Corresponding author at: Department Environmental Science, Faculty of Natural Resources, University of Guilan, 1144 Sowmeh Sara, Guilan, Iran. E-mail address: [email protected] (R. Zarkami).

https://doi.org/10.1016/j.ecoleng.2018.04.011 Received 5 May 2017; Received in revised form 5 April 2018; Accepted 8 April 2018 0925-8574/ © 2018 Elsevier B.V. All rights reserved. R. Zarkami et al. Ecological Engineering 118 (2018) 104–110 et al., 2010; Zarkami et al., 2012a). To solve this dilemma, an appro- priate selection method is unavoidably needed. Based on this, greedy stepwise search algorithm (GS) was integrated with logistic regression (LR) to automatically select the most important environmental vari- ables to meet the habitat needs of Alburnoides mossulensis in the Gamasiab River (an important freshwater river in Iran). The freshwater fishes of Iran consist of 155 species belonging to 24 families (Coad, 1998). The most diverse family belongs to the Cypri- nidae with 74 species (Coad, 1998). A. mossulensis (Heckel, 1843) is a cyprinid freshwater fish found in the Euphrates and Tigris rivers in Turkey and their adjacent basins in Iran (Kuru, 1978; Coad, 2010; Uçkun and Gokçe, 2015). The distribution of A. mossulensis in Asia extends from the Tigris-Euphrates basin to the upper parts of the delta of the Kor, Mand, and Kul rivers in Iran (Bogutskaya, 1997) and mainly in Gamasiab River. A. mossulensis is one of the most important fish species in the west of Iran (Yousefian et al., 2013). This fish feeds mostly on insects and algae (Mohamed et al., 2012). A. mossulensis is considered as a species of conservation interest (Yousefian et al., 2013). Although, it is not commercially considered as an important fish spe- cies, it is a source of food for many people. Despite high tolerance of A. mossulensis in nature, their populations are threatened because of too much pollution resulting from various agricultural activities in the surrounding areas. These activities degraded the habitat of fish and their natural reproduction. Most of water quality monitoring programmes in Iran are based on the assessment of physical-chemical variables while the ecological as- sessment of aquatic ecosystems programmes particularly by fish is quite limited. Therefore, an extensive program needs to be implemented in near future to analyse the habitat preferences of different fish species in different aquatic ecosystems. Among freshwater organisms, analysing Fig. 1. The location of sampling sites in Gamasiab river basin, Iran (the sam- fi the fish habitat requirements is a very important issue for the river pling sites are depicted with lled triangle). management (Armour and Taylor, 1991; Bockelmann et al., 2004; Mouton et al., 2007; Zarkami et al., 2010; Zarkami et al., 2012a). The 2.2. Fish data aim of the present paper is to select the most important explanatory variables for the habitat preferences of A. mossulensis. This study also The fish monitoring for A. mossulensis in Gamasiab River was based aimed to assess the probability of occurrence of the A. mossulensis on a standardized fish sampling method (e.g. Coad, 1995). The fish (based on presence/absence data) using LR model in the Gamasiab sampling campaigns were carried out during the same period which River. Based on this, it is hypothesized that different sampling sites/ was considered for the water quality and physical-habitat variables. seasons might influence the abundance of the given fish in the Gama- Electrofishing, a generator with an adjustable output voltage of siab River. Another hypothesis is that water quality conditions might 180–350 V, was carried out for fish sampling. In addition to this, the have more impact than physical-habitat variables on the habitat re- complementary methods such as beach sine (with the mesh sizes of quirements of fish in this river ecosystem. 6–8 mm) and cast net (with the mesh sizes of 8–14 mm) were im- plemented where electrofishing was impossible. The number of elec- trofishing devices and the number of hand-held anodes used depended 2. Materials and methods on the river width. The fish was randomly caught depending on the size of fishing. Then, the caught fish were studied either freshly or fixed (in 2.1. Study area and sampling sites 10% formalin) to determine their abundance. At each site, fish were individually counted (in total 829 real observations in eight sampling Gamasiab River basin is jointly located in the provinces of sites). The presence/absence of fish was obtained from the abundance Kermanshah and Hamadan between the latitudes of 34° 14′ 3″ to 34° data and also presence/absence of fish was used as output variable for 51′ 54″ and longitudes of 47° 5′ 3″ to 48° 26′ 56″. Gamasiab basin is one the development of LR. The sites 1, 2 and 8 had less contribution to the of the main branches of Karkheh River. The length of Gamasiab River is fish sampling than other sites. The frequency of fish occurrence was 900 km so that it is the longest freshwater river in Iran and also it is almost 50% in terms of the sampling sites/the number of instances. considered as the largest river in the Hamadan province. There are various activities in the course of this river basin such as agricultural lands, hydropower plants and fish farms. These activities have changed 2.3. Environmental parameters data river morphology and deteriorated the habitat of different fish species. Eight different sampling sites were seasonally considered in the Various water quality and physical-habitat variables were measured Gamasiab River during one year study period (from autumn 2008 till at each sampling point (Table 1). Since collinearity between variables summer 2009). The data related to the abiotic and biotic variables were might make a possible noise in data-driven model and reduce the re- simultaneously carried out in the study area. The sites were chosen liability of model (Walczak and Cerpa, 1999), some variables such as −3 from the mouth to the source of the Gamasiab River (Fig. 1). The sites TDS, DOM (dissolved organic matter), BOD5 and TPO4 , despite their were considered regarding the anthropogenic impacts, distribution of possible significance for analysing the habitat preferences of fish A. mossulensis over the entire river basin and finally access to fish (Abbasi, 2007), were not considered for fish monitoring due to possible sampling. collinearity between pairs of variables. For instance, since electric conductivity is proportional to the sum of cations and anions, and

105 R. Zarkami et al. Ecological Engineering 118 (2018) 104–110

Table 1 predictive performance of LR before and after variable selection. The Summary of general statistics of the recorded variables in the Gamasiab river following popular predictive performances including the percentage of basin (SD: standard deviation). Correctly Classified Instances (CCI%) (Buckland and Elston, 1993); ’ Variables Min Max Mean SD Fielding and Bell, 1997), Cohen s Kappa (Cohen, 1960) were considered for analysing the habitat preferences of A. mossulensis in the present Turbidity (FTU) 0.00 24.00 8.87 6.34 study (CCI > 70% and k > 0.40 were considered as threshold values pH 7.50 8.70 8.04 0.26 and reliable predictive performances). Electric conductivity (mS/cm) 0.26 0.74 0.45 0.14 Bicarbonate (mg/l) 183.00 463.60 254.45 89.23 Correctly classified instances( CCI %)=+ ( a d )/( a +++× b c d ) 100 Total hardness (mg/l) 130.00 330.00 213.30 60.66 – NO3 -N (mg/l) 0.38 2.40 1.21 0.50 + [(ad+− ) ((( acab + )( + ) + ( bdcdN + )( + ))/ )] NH4 -N (mg/l) 0.09 0.58 0.27 0.12 Cohen’() s Kappa k = – [NacabbdcdN−+ ((( )( ++++ ) ( )( )/ ) NO2 -N (mg/l) 0.00 0.08 0.02 0.02 Total nitrogen (mg/l) 2.00 7.10 4.08 1.33 3− where a, b, c and d are identified as True Positive, False Positive, PO4 -P (mg/l) 0.004 0.14 0.05 0.02 2− SO4 (mg/l) 0.44 41.30 17.69 13.56 False Negative and True Negative, respectively. DO (mg/l) 2.60 12.84 9.33 2.01 For optimizing the LR, a forward selection method (Witten et al., COD (mg/l) 1.47 28.34 11.29 8.13 2011; Sadeghi et al., 2014) was used for the greedy stepwise optimizer TSS (g/l) 0.00 0.17 0.04 0.03 comprising of all recorded abiotic variables. Water temperature (°C) 4.00 21.00 12.58 4.36 Depth (cm) 8.00 140.00 29.16 22.94 LR for p independent variables was used by the following formula: Width (m) 3.00 30.00 8.92 6.36 exp(BBX0 + i ) Flow velocity (cm/s) 7.00 150.00 59.90 43.37 p = Distance from source (m) 50.00 63100.00 19377.42 20383.52 1exp(++BBX0 i ) Height from sea level (m) 1.00 1839.00 1644.87 120.74 where exp is the exponent function. B0 is a constant and Bi are coeffi- cients of the predictor variables (the step function was used in the roughly equivalent to total dissolved solids (TDS) in water, only electric statistical package SPSS 16). conductivity was measured instead of TDS. Besides, dissolved organic The probability of occurrence of A. mossulensis (based on presence/ matter was not also considered for fish monitoring due to correlation absence) was modelled as a linear function of the selected variables in fi with TDS as well as EC. COD and orthophosphate were considered in- the river. Models were tted using the maximum likelihood method from Hosmer and Lemeshow goodness-of-fit test (Hosmer and stead of BOD5 and total phosphate for fish monitoring, respectively. The environmental variables were collectively determined in the Lemeshow, 2000). field and laboratory based on standardized and quality controlled methods (APHA/AWWA/WEF, 1998). On-site monitoring consisted of 3. Results and discussion water and air temperature, dissolved oxygen, pH and electrical con- ductivity. Other abiotic variables were analysed according to standard 3.1. Environmental parameters methods (APHA/AWWA/WEF, 1998). Table 1 shows a summary of general statistics (minimum, max- imum, mean, and standard deviation) of the entire variables recorded in 2.4. Data analysis methods the Gamasiab River basin. According to the normality test, the data of some environmental variables were not normally distributed First, a normality test (using Kolmogorov-Smirnov test) was per- (p < 0.05). Mild and extreme outliers were recognized in some water formed for the environmental variables (SPSS 16). The Pearson corre- – quality variables measured in the Gamasiab River such as NO2 -N, lation coefficient (r) (SPSS 16) was used to find the relationship be- − NH +-N, TSS, and PO 3 -P (Fig. 2). As indicated here, the outliers are tween environmental variables. LR (as main model) was optimized with 4 4 skewed to the upper part of the box plots. This indicates that the given GS (WEKA software, version 3.6.6, Witten et al., 2011) to select the river might be contaminated with the high levels of pollution, resulting most important predictors for the habitat preferences of A. mossulensis. from different agricultural activities. Correlation analysis of the se- Paired Student’s t-tests (a two-tailed test with a 95% confidence in- lected variables for habitat preferences of fish in the study area is terval) were utilized to compare the predictive performance of LR be- presented in Table 2. As seen here, some variables were strongly and fore and after variable selection method. A multivariate technique so- positively correlated. Sulphate was positively correlated with electric called Principal Component Analysis (PCA) (with log transformed va- conductivity (r = 0.52, p < 0.01). This implicitly demonstrates that lues) based on the first and second components was employed to de- increasing evaporation in warm seasons in the study river leads to in- termine the most important variables influencing the habitat pre- crease electric conductivity and also sulphate concentrations. On the ferences of A. mossulensis in the river. This technique was applied from contrary, the width of river and water temperature was negatively the program package PAST (Paleontological Statistics, version 3) correlated (r = -0.49, p < 0.01). This indicates that when there are (Hammer, 2011). abundant waters in the wider part of river particularly in cold seasons, Among the selected variables by GS, the predictors that had a p- normally, there should be less water temperature (and also less eva- value lower than 0.05 were merely used to show the probability of poration) in the wider part than the narrower part of the river for this occurrence of A. mossulensis in the river. A Kruskal–Wallis ANOVA test warm water fish. was conducted to examine the relationship between environmental After optimization of LR with GS, eight variables were finally se- variables and samples with fish or without fish. The one-way ANOVA lected by GS optimizer. This makes obvious that almost one-third of (post hoc test) was used to compare the means of selected variables total recorded variables were recognized to be important for meeting (selected by GS optimizer) in different sampling site/seasons. A chi- the habitat preferences of fish in the study area. The results of variable square test was employed to compare the difference between the selection method showed that a combination of water quality and abundance of A. mossulensis and the sampling sites/seasons. physical-habitat variables might affect the habitat suitability of A. mossulensis in the Gamasiab River of which the most important vari- 2.5. Modelling ables meeting the habitat preferences of fish consisted of width, depth, water temperature, pH, electric conductivity (EC), bicarbonate – 2− 3− A stratified threefold cross-validation was used to compare the (HCO3 ), sulphate (SO4 ), orthophosphate (PO4 -P).

106 R. Zarkami et al. Ecological Engineering 118 (2018) 104–110

Fig. 2. The box-and-whisker plots of some water quality variables in the Gamasiab river (the mild and extreme outliers are indicated with blank and filled circles, – + 3− respectively). TSS (g/l): total suspended solids; NO2 (mg/l): nitrite; NH4 (mg/l): ammonium; PO4 (mg/l): phosphate.

The results of Paired Student’s t-tests showed that there was a sig- Surprisingly, winter season should normally have less contribution to nificant difference between the two predictive criteria of LR before and agricultural activities than other seasons (e.g. application of chemical after variable selection (p < 0.05). However, before variable selection, fertilizer for arable lands) in the watershed. In other words, most the model was reliable to successfully predict the habitat suitability of agricultural activities in the watershed coincides with spring and the target fish (CCI = 76.67 ± 1.15% and k = 0.45 ± 0.01), both summer periods. This can be justified by the fact that the elevated level indicator performances improved significantly after optimizing the LR of phosphate concentration in winter period can be attributed due to with GS (CCI = 88.3 ± 0.58% and k = 0.77 ± 0.02), resulting in a very low consumption of this chemical fertilizer by aquatic organisms better and ultimate decision for analysing the fish occurrence. Many (e.g. phytoplankton) (Abbasi, 2007), low evaporation and also low air studies (e.g. Sadeghi et al., 2013; Haghi Vayghan et al., 2015) have and water temperature in winter period. shown that the optimization of data-driven techniques with GS might result in an improvement of the predictive performances of models. 3.2. Relationship between fish abundance and sampling sites/seasons Based on Kruskal–Wallis ANOVA test, electric conductivity (chi- square = 7.12, p < 0.01) and bicarbonate (chi-square = 5.41, Chi-square test of independence showed a significant difference p < 0.05) differed significantly between samples with and without between the abundance of A. mossulensis and different sampling sites/ fish. However, no significant differences were observed between other seasons (x2 = 441.22, p = 0.001). This indicates that the distribution physical–chemical samples and presence/absence of fish (all and abundance of this species differed in various sampling sites and p > 0.05). seasons. Fish were not caught in the sites 1 and 2 in any seasons. Since The results of one-way ANOA (post hoc test), on the other hand, these sites were located in the mouth of river, they might not provide indicated a significant difference between the water quality variables – suitable habitats to the A. mossulensis population because the amount of (e.g. EC and HCO3 ) and sampling sites (p < 0.01 for both variables). pollutants significantly increases in this part of river. This indicates that Based on the post hoc test, the sites 1, 2 and 3 showed a significant A. mossulensis populations do not prefer to inhabit the mouth of river to difference (p < 0.01) with other sites concerning the given variables, escape too much pollution status of river. In contrast, they were caught while no significant difference was found between physical-habitat in the sites 3, 4, 5 and 6 in all seasons. In the remaining sites (7 and 8), variables and different sampling sites. Low and high concentrations of they were observed only in three seasons (except summer). Though, the EC were found in winter and summer seasons, respectively. Low con- site 8 (the source) is naturally rich in oxygen and also less polluted centration of EC in winter season might be attributed as a result of compared to other parts of the sampling sites, the aforementioned site abundant water and also low evaporation in the rainy seasons. also might not be a good place for the A. mossulensis population due to According to post hoc test, winter period significantly differed with 3− very high flow velocity. In total, autumn and winter (due to abundant other seasons concerning PO4 concentration (p < 0.05) so the 3− water in these seasons) had relatively more contribution to fish sam- maximum value of PO4 was found in winter period (0.14 mg/l). pling than spring and summer seasons.

Table 2 Correlation matrix of variables (selected by greedy stepwise) in the Gamasiab river (the p-value of each variable is included between brackets).

– −3 −2 W.T pH EC HCO3 PO4 -P SO4 Depth Width

W.T 1 pH 0.11 (0.57) 1 EC 0.42* (0.02) −0.05 (0.80) 1 – * ** HCO3 0.36 (0.05) −0.17 (0.36) 0.91 (0.00) 1 3− * PO4 -P −0.37 (0.04) −0.24 (0.20) −0.13 (0.49) −0.12 (0.52) 1 2− ** ** SO4 −0.05 (0.79) 0.34 (0.07) 0.52 (0.00) 0.48 (0.01) 0.14 (0.48) 1 Depth −0.20 (0.28) −0.30 (0.11) 0.20 (0.28) 0.13 (0.49) 0.15 (0.41) −0.01 (0.96) 1 Width −0.49** (0.01) 0.12 (0.52) −0.03 (0.86) −0.01 (0.97) 0.31 (0.1) 0.30 (0.10) 0.06 (0.74) 1

* Correlation is significant at the 0.05 level. ** Correlation is significant at the 0.01 level).

107 R. Zarkami et al. Ecological Engineering 118 (2018) 104–110

Fig. 3. PCA biplot related to fish sampling in the river basins using log transformed values of the selected variables by greedy stepwise (WID: width of river; DEP: 3− 2− – depth of river; PO4: PO4 (phosphate); SO4: SO4 (sulphate); HCO3 (bicarbonate); EC: electric conductivity; black circles: the number of instances).

Fig. 3 shows a principal component analysis (PCA biplot based on correlation matrix) using all eight selected variables by GS. Based on the plot, the first and second components explained 57.7% of the total variations. From the PCA biplot (according to loading plots of first component but not illustrated here), it can be observed that the most important environmental variables in the Gamasiab River were related – 2− 3− to EC, HCO3 ,SO4 , WT, width, PO4 , depth and pH, respectively.

3.3. Probability of fish occurrence

The outcomes of the LR model showed that the probability of oc- currence of A. mossulensis is associated with the concentration of several water quality and pollution variables in the river. Among the selected variables by GS, however, a significant difference was observed be- tween the probability of fish occurrence and bicarbonate (Wald = 4.10, p < 0.05) as well as electric conductivity (Wald = 4.71, p < 0.05) (Fig. 4) in the Gamasiab River. The curves of the logistic models in- dicate that the probability of absence of A. mossulensis populations is associated with the increased concentration of electric conductivity and – HCO3 (p < 0.05). As stated already, high concentration of electric conductivity (being a function of dissolved solids) was mostly found in the dry seasons. In contrast, concentration of electric conductivity de- creased in the wet seasons in the study area due to dilution occurring this period. Besides, many sampling sites in Gamasiab River are located in arable lands so that various agricultural activities might play a key role for elevating the concentration of eclectic conductivity (Kasangaki et al., 2008). Despite a wide range of tolerance of A. mossulensis re- garding acute environmental conditions (Yousefian, et al., 2013) and also its high reproductively rate, this fish is facing with a lot of pro- blems due to changes in water quality conditions and habitat dete- rioration in Gamasiab River and its branch (Abbasi, 2007; Yousefian, Fig. 4. The impact of the most environmental variables on the probability of et al., 2013). According to LR, the probability of fish occurrence might occurrence of A. mossulensis in the Gamasiab river (fish absence: 27 instances fi decrease with increasing the bicarbonate concentration in the study and sh absence: 25 instances). area. It seems that increasing this base anion in study area, would lower the tolerance of fish population so they cannot spawn in such an al- the physiology, growth and survival of A. mossulensis population in the kalinity environment conditions (Abbasi, 2007). The outcomes of LR Gamasiab River (Abbasi, 2007). The obtained results of LR also showed based on sulphate, orthophosphate, pH, and river width also showed that an increase in the concentration of nutrients such as orthopho- that increasing the values of the given variables might decrease the sphate might restrict the presence of the fish species in the river. The probability of fish presence in the study river (the probability curves of highest (0.14 mg/l) and lowest phosphate concentrations (0.004 mg/l) the aforementioned variables are not illustrated in Fig. 3 due to their were found in the mouth and source of the Gamasiab River, respectively non-significant values). In fact, due to sensitivity of the target fish which corresponds with the earlier reports (e.g. Abbasi, 2007). Ac- species to sulphate concentration, high value of this toxic anion would cording to EPA (1996), phosphorus concentrations (in the form of lower their tolerance in the sampling sites. This would, in turn, affect

108 R. Zarkami et al. Ecological Engineering 118 (2018) 104–110 soluble) should not exceed 0.10 mg/l in non-polluting rivers/streams. present research, the impact of water quality variables on the prob- The outcomes of present study demonstrated that the highest prob- ability of A. mossulensis’s occurrence was to some extent important than ability of A. mossulensis would occur in the river when phosphate the physical-habitat ones due to various agricultural activities in the concentration is less than 0.10 mg/l so that the outcomes of phosphate course of the river basin (e.g. agricultural lands and fish farms etc.) concentration in the present study are in line with the previous studies which discharge too much nutrients in the river. Since the outcomes of (e.g. Abbasi, 2007). The occurrence of the target fish could to some optimization model in the present research dealt with different physi- extent be predicted based on pH concentration. The minimum and cal–chemical and morphometric characteristics of the river basins, such maximum values of pH recorded in the study area were 7.5 and 8.7, integrated modelling approach allows joining ecologists, engineers and respectively (mean pH = 8.04). Like increment of bicarbonate anion, river managers in order to achieve a successful river management the values of higher pH indicate that the probability of presence of A. program in the future. mossulensis might be reduced when rivers gradually become basic. It is well-known that most fish species are able to tolerate the pH values Acknowledgements between 5 and 9. Earlier studies (e.g. Abbasi, 2007; Abbasi et al., 2014) have shown that the presence/absence of the target fish population is We would like to thank the national inland water aquaculture in- directly correlated with the amount of pH in the Gamasiab River so that stitute (Bandar Anzali, Iran) to provide the opportunity for field sam- pH ranging from 7 to 8 seems to be favourable for inhabiting A. mos- pling. sulensis population in the river ecosystem (Abbasi, 2007). The results of logistic model also clearly showed that the higher values of pH might References lower the tolerance of fish in the river. The width of river was another explanatory variable influencing the occurrence of fish in the river. If Abbasi, K., 2007. Identification and distribution of fishes in Gamasiab River, Guilan the width of river decreases, then the probability of fish presence will province. Iran. J. Fish. 2 86–73 (in Farsi, English abstract). Abbasi, F., Ghorbani, R., Yelghi, S., Hajimoradloo, A., Fazel, A., 2014. Study of some increase. Unexpectedly, this is in contradiction with the previous stu- growth parameters of bleak (Alburnoides eichwaldii) in Tilabad stream of the dies demonstrating that increasing the width of river would affect Golestan province. J. Nat. Res. Iran 1, 59–70 (in Farsi, English abstract). species richness and trophic composition (Oberdorff et al., 1993; Ambelu, A., Lock, K., Goethals, P., 2010. Comparison of modelling techniques to predict ff fi macroinvertebrate community composition in rivers of Ethiopia. Ecol. Eng. 5 (2), Belliard et al., 1997; Oberdor et al., 2002). This can be justi ed by the 147–152. fact that increasing river width (particularly in the mouth of river) Armour, C.L., Taylor, J.G., 1991. Evaluation of the instream flow incremental metho- might escalate the pollution loads in this part of river, resulting in an dology by U.S. fish and wildlife service field users. Fisheries 16 (5), 36–43. unsuitable habitat for fish (Abbasi, 2007). APHA/AWWA/WEF, 1998. Standard Methods for the Examination of Water and Wastewater, 19th ed. Washington, DC, USA. In reverse, water temperature and the depth of river showed the Belliard, J., Boet, P., Tales, E., 1997. Regional and longitudinal patterns of fish commu- opposite trends (in these cases, also no significant difference was ob- nity structure in the Seine River basin. France. Environ. Biol. Fishes. 50, 133–147. served between the probability of fish occurrence and these physical- Bockelmann, B.N., Fenrich, E.K., Lin, B., Falconer, R.A., 2004. Development of an eco- hydraulics model for stream and river restoration. Ecol. Eng. 22 (4–5), 227–235. habitat variables). Water temperature has been long recognized as one Bogutskaya, N.G., 1997. Contribution to the knowledge of leuciscine fishes of Asia Minor. of the main factors determining the spatial distribution of various fish Part 2. An annotated check-list of leuciscine fishes (, Cyprinidae) of species in river ecosystems (e.g. Brazner et al., 2005). The mean water Turkey with descriptions of a new species and two new subspecies. Mitt. Hamb. Zool. Mus. Inst. 94, 161–186. temperature recorded in the sampling sites (12.58 °C) indicates that the Brazner, J.C., Tanner, D.K., Detenbeck, N.E., Batterman, S.L., Stark, S.L., Jagger, L.A., A. mossulensis population cannot tolerate cold waters since they are Snarski, V.M., 2005. Regional, watershed, and site-specific environmental influences cyprinid fish preferring warm water. In other words, increasing water on fish assemblage structure and functioning western Lake Superior tributaries. Can. J. Fish. Aquat. Sci. 62, 1254–1270. temperature in the sampling sites coincides with increasing the prob- Buckland, S.T., Elston, D.A., 1993. Empirical-models for the spatial-distribution of wild- ability of presence of fish. On the other hand, the water depth was life. J. Appl. Ecol. 30, 478–495. another explanatory factor to account for the habitat preferences of A. Coad, B.W., 1995. Freshwater fishes of Iran. Acta Sci. Nat. Acad. Sci. Brno. 29, 1–64. Coad, B.W., 1998. Systematic biodiversity in the freshwater fishes of Iran. Ital. J. Zool. 65, mossulensis in the river and generally as one of the major factors in the 101–108. choice of habitat preferences of fish in lotic waters (Vadas and Orth, Coad, B.W., 2010. Freshwater Fishes of Iraq. Pensoft Publishers, Sofia, Bulgaria, pp. 274p. 2001). The optimal depth for A. mossulensis in the study area ranged Cohen, J., 1960. A coefficient of agreement for nominal scale. Educ. Psychol. Meas. 20, – from 20 to 50 cm (with an average depth of 29.16 cm) but based on LR 37 46. Dom, B., Niblack, W., Sheinvald, J., 1989. Feature selection with stochastic complexity. model fish still may be found when the river depth increases. In: Proceedings of IEEE on Computer Vision and Pattern Recognition, Rosemont. p 241–248. 4. Conclusions EPA, 1996. Quality Criteria for Waters, Washington D.C. 256p. European Council, 2000. Directive 2000/60/EC of the European Parliament and of the Council of 23 October 2000 establishing a framework for Community action in the The present paper integrated the data-driven method (logistic re- field of water policy. Official Journal of the European Communities L327, 22 gression) with a search algorithm (greedy stepwise) to examine the December 2000, Brussels. Faraway, J., Chatfield, C., 1998. Time series forecasting with neural network: a com- habitat needs of A. mossulensis in the Gamasiab River basins. The in- parative study using airline data. J. Appl. Stat. 47, 231–250. tegrated modelling approach allowed the ecological impact of various Fielding, A.H., Bell, J.F., 1997. A review of methods for the assessment of prediction water quality and physical-habitat variables in the river, as demon- errors in conservation presence/absence models. Environ. Conserv. 24 (01), 38–49. Haghi Vayghan, A., Zarkami, R., Sadeghi, R., Fazli, H., 2015. Modelling habitat pre- strated for A. mossulensis. This type of model-based optimization may be ferences of Caspian kutum, frisii kutum (Kamensky, 1901) (, a supportive tool for river assessment and management in the sur- ) in the . Hydrobiologia. http://dx.doi.org/10.1007/ rounding river basins and elsewhere in the globe. It is concluded that s10750-015-2446-3. Hall, M.A., Holmes, G., 2003. Benchmarking attribute selection techniques for discrete the optimization technique using greedy stepwise might be a potential class data- mining. IEEE Trans. Knowl. Data Eng. 15, 1437–1447. alternative method to logistic regression to choose the most important Hammer, Q., 2011. In: PAlaeontological STatistics (PAST) Natural History Museum. variables for assessing the fish habitat preferences in the river. The University of Oslo, pp. 221. reliability of the applied models and ecological relevancy of the vari- Hosmer, D.W., Lemeshow, S., 2000. Applied Logistic Regression, Second edition. Wiley, New York. ables for analysing the habitat requirements of fish are undoubtedly Imran, K., Mevlut, T.A., Turhan, K., 2008. Comparing performances of logistic regression, improved if more and better datasets (preferably no missing values) will classification and regression tree, and neural networks for predicting coronary artery – be gathered in the future monitoring in the Gamasiab River. The out- disease. Exp. Syst. Appl. 34, 366 374. Kasangaki, A., Chapman, L.J., Balirwa, J., 2008. Land use and the ecology of benthic comes of greedy stepwise showed that water quality and physical-ha- macroinvertebrate assemblages of high-altitude rainforest streams in Uganda. bitat variables might jointly influence the habitat preferences of A. Freshwater Biol. 53, 68–697. mossulensis in the study river. However, according to the outcomes of Karr, J.R., Chu, E.W., 1999. Restoring Life in Running Waters: Better Biological

109 R. Zarkami et al. Ecological Engineering 118 (2018) 104–110

Monitoring. Island Press, Washington, DC. wetland, Iran. Ecol. Model. 251, 44–53. Kuru, M., 1978. The freshwater fish of South-Eastern Turkey-2 (Euphrates-Tigris system). Sadeghi, R., Zarkami, R., Van Damme, P., 2014. Modelling habitat preference of an alien Hacettepe Bull. Nat. Sci. Eng. 7 (8), 105–114. aquatic fern, Azolla filiculoides (Lam.), in Anzali wetland (Iran) using data-driven Lock, K., Goethals, P., 2014. Predicting the occurrence of stoneflies (Plecoptera) on the methods. Ecol. Model. 284, 1–9. basis of water characteristics, river morphology and land use. J. Hydroinform. 16 (4), Uçkun, A.A., Gokçe, D., 2015. Assessing age, growth, and reproduction of Alburnus 812–821. mossulensis and Acanthobrama marmid (Cyprinidae) Populations in Karakaya Dam Mohamed, A.-R.M., Hussain, N.A., Al-Noor, S.S., Mutlak, F.M., Al-Sudani, I.M., Mojer, Lake (Turkey). Turk. J. Zool. 39, 1–14. A.M., 2012. Ecological and biological aspects of fish assemblage in the Chybayish Vadas Jr., R.L., Orth, D.J., 2001. Formulation of Habitat Suitability Models for Stream marsh, Southern Iraq. Ecohydrol. Hydrobiol. 12 (1), 65–74. Fish Guilds: Do the Standard Methods Work. Trans. Am. Fish. Soc. 130, 217–235. Mouton, A.M., Schneider, M., Depestele, J., Goethals, P.L.M., De Pauw, N., 2007. Fish Walczak, S., Cerpa, N., 1999. Heuristic principles for the design of artificial neural net- habitat modelling as a tool for river management. Ecol. Eng. 29 (3), 305–315. works. Inform. Softw. Technol. 41 (2), 107–117. Oberdorff, T., Guilbert, E., Luchetta, J.C., 1993. Patterns of fish species richness in the Witten, I.H., Frank, E., Mark, A., 2011. Data Mining: Practical machine learning tools and Seine River basin, France. Hydrobiol. 259, 157–167. techniques, 3rd edn. Morgan Kaufmann, San Francisco. Oberdorff, T., Pont, D., Hugueny, B., Porcher, J.P., 2002. Development and validation of a Yousefian, M., Keshavarz, K., Yagoubi Kafshkari, Y., 2013. Principal components analysis fish-based index for the assessment of ‘river health’ in France. Freshwater Biol. 47, of Alburnus mossulensis morphology. Int. J. Plan Environ. Sci. 3, 160–165. 1720–1734. Zarkami, R., Goethals, P., De Pauw, N., 2010. Use of classification tree methods to study Sadeghi, R., Zarkami, R., Sabetraftar, K., Van Damme, P., 2012a. Application of classi- the habitat requirements of tench (Tinca tinca) (L., 1758). CJES 8 (1), 55–63. fication trees to model the distribution pattern of a new exotic species Azolla fili- Zarkami, R., Sadeghi, R., Goethals, P., 2012a. Use of fish distribution modelling for river culoides (Lam.) at Selkeh Wildlife Refuge, Anzali wetland, Iran. Ecol. Model. 243, management. Ecol. Model. 230, 44–49. 8–17. Zarkami, R., Sadeghi, R., Goethals, P., 2012. Application of genetic algorithm (GA) to Sadeghi, R., Zarkami, R., Sabetraftar, K., Van Damme, P., 2012b. Use of support vector select input variables in support vector machine (SVM) for analyzing the occurrence machines (SVMs) to predict distribution of an invasive water fern Azolla filiculoides of roach, Rutilus rutilus, in streams. CJES 2, 237–246. (Lam.) in Anzali wetland, southern Caspian Sea, Iran. Ecol. Model. 244, 117–126. Zarkami, R., Sadeghi, R., Goethals, P., 2014. Modelling occurrence of roach “Rutilus Sadeghi, R., Zarkami, R., Sabetraftar, K., Van Damme, P., 2013. Application of genetic rutilus” in streams. Aquatic Ecol. 48 (2), 161–177. algorithm and greedy stepwise to select input variables in classification tree models Zarkami, R., 2011. Application of classification trees–J48 to model the presence of roach for the prediction of habitat requirements of Azolla filiculoides (Lam.) in Anzali (Rutilus rutilus) in rivers. CJES 9, 189–198.

110