Predicting Demersal Fish Species Distributions in the Mediterranean Sea Using Artificial Neural Networks

MARINE ECOLOGY PROGRESS SERIES Vol. 255: 249–258, 2003 Published June 24 Mar Ecol Prog Ser Predicting demersal fish species distributions in the Mediterranean Sea using artificial neural networks Christos D. Maravelias1,*, John Haralabous2, Costas Papaconstantinou3 1University of Thessaly, Dept. of Agriculture, Animal Production and Aquatic Environment, Fitoko Street, 38446 Magnesia, Greece 2Institute of Marine Biology of Crete, PO Box 2214, 71003 Crete, Greece 3National Centre for Marine Research, Agios Kosmas, 16604 Athens, Greece ABSTRACT: Predicting the occurrence of economically important demersal fish in a multispecies marine environment can be of considerable value to fisheries management and protection of biodiversity. Here, 2 predictive modelling principles were utilised, artificial neural network (ANN) and discriminant function analysis (DFA), to develop presence/absence models for 3 species (anglerfish Lophius budegassa; hake Merluccius merluccius; red mullet Mullus barbatus) in the Mediterranean Sea. ANN-based models of demersal fish distribution outperformed conventional models and attained better recognition and prediction performance. Results indicated the ability of ANN’s to predict presence more accurately than DFA when tested against independent field data. More precisely, sensitivity values obtained using DFA were 62.1% for anglerfish, 5.8% for hake and 59.8% for red mullet whereas using ANN were 75, 71 and 72.9% respectively. The accuracy of test data was 79.6% for anglerfish, 49.5% for hake and 83.3% for red mullet using DFA and 83.7, 83.3 and 85.6% respectively using a back-propagation ANN. After learning from a set of selected patterns, the neural network (NN) models displayed a relatively high demersal fish classification accuracy, which was con- sistent with present understanding of the aggregating effects of the examined variables on these species’ distribution. Predicting presence or absence was found to be easier for red mullet and anglerfish than for hake. The present results also suggested that the main processes modulating the occurrence of anglerfish, hake and red mullet in the NE Mediterranean Sea can be approximated by linear functions only to a limited extent. Due to their ability to mimic non-linear systems, ANNs proved far more effective in modelling the distribution of these species in the marine ecosystem. The main results and the ANN potential to predict suitable habitat profiles and structural characteristics of species assemblages are discussed. KEY WORDS: ANN · Anglerfish · Hake · Red mullet Resale or republication not permitted without written consent of the publisher INTRODUCTION reducing by-catch of non-target species; thus, protection of biodiversity may be an added benefit. Increasing focus on global and regional patterns In fisheries, a wide range of multivariate techniques of biodiversity necessitates reliable models of species have been used to this end, including several methods presence/absence. In a multispecies fishery, such as in of ordination and canonical analysis, univariate and the Mediterranean Sea, binary classification methods multivariate linear, curvilinear and logistic regressions of economically important demersal fish, based on (Mastrorillo et al. 1997). Most of the models used to commonly measured quantitative biotic and abiotic predict species distribution assume that relationships factors, are of major significance due to the role certain are smooth, continuous and either linear or simple fish species play in conduct of research and practice of polynomials (Shepherd et al. 1984, James & McCulloch commercial fisheries. Likewise, reliable habitat models 1990, Mann 1993). However, in real nature, any of target species may help conservation planning by changes in distributional boundaries or location cen- *Email: [email protected] © Inter-Research 2003 · www.int-res.com 250 Mar Ecol Prog Ser 255: 249–258, 2003 tres of a fish stock are unlikely to be either monotonic The most commonly used ANNs with supervised or linear. Traditional methods of statistical analysis learning are the multilayered perceptrons also known (namely linear regression models, multiple or not) may as ‘back-propagation’ ANNs after their training algo- therefore be inadequate for detecting and successfully rithm (Rumelhart et al. 1986). Such ANNs have the quantifying such changes (Maravelias & Reid 1997). ability to learn patterns of relationships in data from This work utilises the ability of artificial neural net- being shown a given set of inputs (including combina- works (ANNs) to recognise and learn the complex non- tions of descriptive and quantitative data), generalise monotonic and non-linear relationships between biotic or abstract results from imperfect data, and be insensi- and abiotic aspects of the marine environment that tive to minor variations in input (such as noise in the can be used to correctly predict the presence or ab- data, missing data or a few incorrect values). ANNs sence of demersal fish species. Recently, neural net- model the physical environment (habitat) system on works (ANNs) have been used in various disciplines of the basis of a set of hidden input/output examples, as is aquatic ecology, e.g. fish school species classification available in existing fisheries data, without any prior (Haralabous & Georgakarakos 1996), prediction of knowledge or assumptions about the underlying distri- phytoplankton production (Scardi 1996) and fresh- bution function. General references to ANNs can be water fish biomass (e.g. Baran et al. 1996, Lek et al. found in Rumelhart et al. (1986), Garson (1991), Ripley 1996, Mastrorillo et al. 1997, Brosse et al. 1999a,b). (1994), Goh (1995) and Stern (1996). ANN models were initially intended to mimic the The goal of the present study was to determine the neural activity in the human or animal brains (Garson predictive capacity of ANN models for estimating pres- 1991, Goh 1995, Stern 1996). ANNs are a form of arti- ence/absence of 3 commercially important fishes, the ficial intelligence that is composed of a network of European hake Merluccius merluccius, the red mullet connected nodes (Rumelhart et al. 1986). Density esti- Mullus barbatus and the anglerfish Lophius bude- mation (also referred to as ‘unsupervised learning’), gassa, from 5 predictor variables (biomass/abundance classification and regression (both often referred to as ratio, depth of the water column, geographical posi- ‘supervised learning’) are 3 broad types of statistical tion, i.e. latitude and longitude, and sampling month) problems that can be successfully modelled by ANNs. in the NE Mediterranean Sea. The objective was to Based on a source of training data, the aim of super- learn more about the factors that might modulate the vised learning is to produce a model of the process spatio-temporal aggregation patterns of these heavily from which the data were generated to allow the best exploited demersal species. Finally, the application of predictions to be made for new data. non-linear ANNs to empirical modelling was com- pared with a conventional linear approach, i.e. discriminant function analysis (DFA). MATERIALS AND METHODS The fish data for all 3 species examined were col- lected in the north Aegean Sea (NE Mediterranean) and more precisely in the Thermaikos Gulf and Thra- cian Sea regions (Fig. 1). Sampling was performed on a seasonal basis, i.e. every 3 mo, from April 1996 to Jan- uary 1998 using experimental trawl surveys. Trawls had a cod-end mesh size of 14 mm from knot to knot. Sampling took place only during daylight hours. The duration of each haul was 1 h with a vessel speed of 2.5 knots. Since in every survey the boat was at sea for the same length of time, from 06:00 to 18:00 h, the number of individuals caught by trawling hour and their corresponding weight were considered as units of relative abundance and biomass, respectively. A total of 675 stations were sampled, covering a depth range from 30 to 500 m and the sampling design adopted was simple random. The ratio of the natural logarithms of biomass against abundance (B/A ratio), both incre- Fig. 1. Studied area. Sampling stations (e) mented by one (i.e. data + 1) was used as a biological Maravelias et al.: ANN model prediction of demersal fish species 251 index for each of the 3 studied species. Low values of difference between observed and predicted values, the B/A ratio in a specific location (i.e. station) suggest and thus, obtaining the maximum number of correctly relatively higher abundance and lower biomass values classified cases. A form of gradient descent algorithm, of the examined species in that sampling station. On i.e. the error back-propagation (EBP) algorithm, was the contrary, high B/A ratio values indicate relatively used for that purpose (Rumelhart et al. 1986). The EBP higher biomass and lower abundance values. All spec- requires the specification of the search step size (the imens caught were measured recording total length to learning rate); here, a unit learning rate was used in or- the nearest mm and weighed to the nearest g. der to allow for a valid comparison between models. In the present study, an advanced (i.e. ANN) and a The ‘momentum’ term, an additional parameter op- conventional (i.e. DFA) discrimination technique were tional for accelerating model’s convergence, was set to implemented. Both methods were employed on a ran- 0.1. A random submission of input cases at each itera- dom subset of available data (training set) and then tion (learning epoch) reduced the risk of memorisation applied to the remaining data (validation or testing of the presentation order of the training cases. Here, the set). The training set consisted of random 4/5 of the early stopping strategy was followed to avoid overfit- available data set (540 cases) with the remaining ting and thus, to obtain a generalised model (Scardi random 1/5 data (135 cases) to consist the testing set. 1996, 2001). This procedure consisted of terminating This holdout partitioning technique (Kohavi 1995) was the training phase when the prediction error in the test- repeated 5 times to statistically compare the methods’ ing set (i.e.

Load more