IX Congreso Galego de Estatística e Investigación de Operacións Ourense, 12–13–14 de novembro de 2009

USING ENVIRONMENTAL DATA AND NON-PARAMETRIC ADDITIVE REGRESSION MODELS TO STUDY THE MERLUCCIUS HUBBSI COMERCIAL FISHERIES EFFORTS FOR THE GALICIAN FLEET IN THE SOUTH WEST ATLANTIC Rubén Fernández Casal 1, Jesús M. Torres Palenzuela 2, Tomás Cotos Yañez 3; Marta Darriba Estévez 2; Ana Pérez González 3. 1 Departamento de Matemáticas. Fac. de Informática. Universidad da Coruña [email protected] 2 Departamento de Física Aplicada. Fac. de Ciencias del Mar. Universidad de [email protected], 3 Departamento de Estadística e Inv. Op. Fac. de CC. Empresariales y Turismo. Campus de Ourense. Universidad de Vigo ABSTRACT

The aim of this work involves the use of environmental and fishing parameters in the framework of the project named “Aplicación de teledetección, inteligencia artificial y SIG al estudio de la variabilidad en la distribución de especies comerciales para la flota gallega en el Atlántico Sudoccidental”, financed by the Conselleria de Innovación, Industria e Comercio, from Xunta de , together with historical catches data and non- parametric additive statistical techniques. This allow us to identify the geographical regions with the most favourable conditions for different target species fishing, basing on the environmental state at each moment and on the training in a Geographic Information System (Mapinfo). In this work it is shown the obtained results by studying the catch effort for commercial specie Merluccius hubbsi in the South West Atlantic area between the years 1993 to 2006. Non-parametric additive regression models avoid the restrictions of parametric models and moreover allow us to study better the effects of the environmental variables. With this aim of applying these models it has been obtained environmental data from MERCATOR Ocean model and Southern Oscillation Index (SOI) within the aforementioned project as well as the compiled data in the fishing logbooks proceeding from the Fishing Vessel Owners´ Cooperative of the Port of Vigo (ARVI).

Keywords : Merluccius hubbsi , generalized additive models

1. INTRODUCTION

The Argentine hake Merluccius hubbsi is one of the most important commercial species in the South West Atlantic and target specie for Spanish demersal trawlers operating on the High Seas of the Patagonian Shelf, i.e. on the edge of shelf and slope of 45- 47° S and 41-42° S outside the Argentine EEZ. The fishing grounds around the have been divided in sub-areas (see Fig 1). The present paper aims is to identify the geographical regions with the most favourable conditions for the target specie fishing, basing on the environmental variables from MERCATOR Ocean Model and the Southern Oscillation Index (SOI).

Figure 1: Study area showing the subzones used in this work.

Merluccius hubbsi is subject to traditional trawling and longline fisheries in , , Uruguay and the Falkland Islands (Csirke, 1987). Argentinean hake inhabit Atlantic waters off South America: the Patagonian and Argentine Shelves between 28° and 54° S (Cohen et al., 1990).

Studies of habitat requirements of exploited marine fish have been driven both by the need to support management actions (e.g. to identify candidate Marine Protected Areas) and the increasing availability and accessibility of suitable tools. These include readily available remotely sensed data on a variety of surface oceanographic parameters, Geographic Information Systems (GIS) and powerful statistical modelling tools such as generalized additive models (GAM) (Hastie and Tibshirani, 1990).

The generalized additive mixed models (GAMM) allow an explicit consideration of spatial autocorrelation, particularly through the development of the ‘‘R’’ programming language (see R-project.org, Pierce et al., 2001, 2002; Valavanis et al., 2008, 2004; Zuur et al., 2007). Within the different non-parametric techniques, generalized additive models are of special importance due to the need for working with flexible multi-variant models that can be adapted to a wide variety of situations. Their main advantages are interpretability and flexibility. If the link function is the identity, the GAM is a nonparametric additive model. The generalized additive mixed models (GAMM) allow an explicit consideration of spatial autocorrelation.

The potential value of GIS in marine fisheries management has been widely recognized and applications have rapidly expanded. Applications of GIS in fisheries have included data management, environmental monitoring, ecosystem studies, stock assessment, forecasting, and fishery management. Spatial statistical analysis and Geographic Information Systems (GIS) technology provide the tools to model species-habitat relationships and their variability and identify essential habitat areas.

2. DATA SOURCES

Fisheries data were collected on board commercial vessels operating in the ATSW area between 1989 and 2006. These vessels are part of the Fishing Vessel Owners´ Cooperative of the Port of Vigo (ARVI). Data were collected by filling in a form on board, including for each haul the date, the fishing hours, the location (in latitude and longitude) and the total catch (in kilograms) for each species. In a later step all this information was integrated into a database and a GIS, and some extra variables were derived from the previous ones: - Temporal variables: Year, month, week of the year and Julian day (defined as the number of days elapsed since the 1st of January of the corresponding year). - Catch per unit effort (CPUE) of single hauls was used as an abundance index for hake from the Fishing Vessel Owners´ Cooperative of the Port of Vigo (ARVI) .We used the follow CPUE index: CPUE = catches (kg) × [fishing hours]

The following variables were used in order to predict the CPUE: - Latitude and longitude: They indicate the location of the fishing hauls, and are related to the spatial variability of the catches in a given instant. - Daily temperature and salinity data from 1993 to 2006 in the ATSW area were provided by MERCATOR. In summary, the following four variables were obtained from the MERCATOR data set: Depth; Sea Surface Temperature (SST); Sea Bottom Temperature (SBT); Sea Surface Salinity (SSS); Sea Bottom Salinity (SBS). - Southern Oscillation Index (SOI): It is appreciated an important interannual variability in the catches of different species in the ATSW area.

3. METHODOLOGY 3.1 GIS and statistical analysis Data from different sources were integrated into a database (MS Access) and GIS (MapInfo). Different software tools were used to perform the statistical analysis and generate graphics, including SPSS 16, MS Excel or R. Fishery and environmental data were processed and incorporated into a GIS (MapInfo) and MS Access database. Links were set-up between the different data sets in the GIS and database to allow data overlay for display and analysis. On the other hand, it was completed an exploratory analysis of the relationships between the CPUE and the environmental variables by using scatterplots and GIS maps.

3.2 Non-parametric additive regression models. Additive models study the relationship between a response variable and a sum of unknown and smooth functions of the covariables. Let us assume that (X,Y) follows the additive model =+α( ) +( ) ++( ) + ε Y fXfX1 1 2 2 ⋯ fXr r , (•) = where the additive functions fi are unknown and smooth and the error has mean 0. Given X x 0 , the theoretical predictor is EY{ X==+ x } α fxfx( ) +( ) ++⋯ fx( ) 0 101 202r 0 r . Although only the unknown one-dimensional additive functions are considered, they could have components of any dimension, including for example, interactions of covariables.

This model extends linear model by eliminating the nonparametric structure, only requiring the addition of the covariables’ effects. Furthermore, they preserve the interpretability of the linear models, because the conditional mean of the response variable is a sum of effects of the covariables. This allows to estimate clearly the contributions of the different covariables to the response. We applied the previous model to the CPUE to obtain the predictions. We define the response as Y=CPUE and covariables-vector X= (Latitude, Longitude, Julian day, Daily temperature, Salinity, Depth, SST, SBT, SSS and SBS). There are not standard criterias to select the variables included into the model. The selected additive model was constructed using a forward variable selection; at each step we select the mayor adjusted determination coefficient, to arrive at the optimum model. First, we considered one-dimensional components and secondly bi- dimensional components like interactions. Smoothing parameters were selected using generalized-cross- validation. In order to obtain a symmetric distribution of CPUE, a transformation of the variable response is applied.

4. RESULTS AND DISCUSSION

First of all, it was carried out a general descriptive study to the dataset, where meaningful differences on nominal and explanatory variables were observed among the grid zones (Fig 1). For instance, Table 1 shows the Argentine hake total caches per grid zone.

In the regression analysis we took the decision of study each zone separately, taking into account the percentage of catches (Table 1), and that the grid zone (46_S) has the highest number of records. We illustrate the performance of the model in Merluccius hubbsi CPUE.

Table 1: Argentine hake total catches per zone . Zone

42_S 46_S 49_S MALV MN MS MW Total Sum 69 11916287 72807 81 106166 4656 457454 12557520 Merluccius % in Hubbsi 0.0% 94.9% 0.6% 0.0% 0.8% 0.0% 3.6% Zone

The whole fisheries dataset from 1993 to 2006 includes a total of 103711 records. There are catches from 34 vessels of 93 different species. After removing the outliers and records with CPUE lower than 1 kg/h, the resulting dataset includes 13180 records for the Argentine hake.

The square root of CPUE was used in order to approximate a symmetric distribution. Figure 2 shows the histogram of square root of CPUE.

Histograma de RCPUE Frecuencia 0 200 400 600 800 1000 1200

0 10 20 30 40 50 60 RCPUE

Figure 2: Histogram of square root (CPUE).

To eliminate the temperature seasonal component (Figure 3), we considered the residual of nonparametric estimation (type kernel smoother) as new covariables.

The additive model considered was (in order of inclusion):

CPUE=+α f( Julian_ Day) ++ f( Year) f( Latitude , Longitude) ++ f( SOI ) ε 1 2 3 4 , The final fitted model explains a 52.5% of catches variability. In the Figure 4 the estimated additives effects are shown. The approximate significance of all smooth terms was <2e-16.

In summary, the GAM results with identity link function reveal strong and consistent additive effects of Julian day, Year, position of longitude and latitude and SOI. Surprisingly, none of the variables: temperature, salinity or depth, were selected by the proposed selection process.

SBT SST 6 8 10 12 14 16 2 3 4 5 6

0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 Dia Dia

Figure 3: SST (on the left) and SBT (on the right) by Julian day (circles with different colors depending on year) and its nonparametric estimation (solid line).

f(Dia) f(Año) -15 -10 -5 0 5 10 -15 -10 -5 0 5 10

0 50 100 150 200 250 300 350 1994 1996 1998 2000 2002 2004 2006

Dia Año

f(Latitud,Longitud) f(SOI) Latitud -4 -2 0 2 4 -47.0 -46.5 -46.0 -45.5 -45.0

-61.0 -60.8 -60.6 -60.4 -60.2 -60.0 -20 -15 -10 -5 0 5 10 15 Longitud SOI Figure 4: Estimations of explanatory variables effects for square root of Merluccius hubbsi CPUE. Top: left panel, Julian Day and right Year, bottom left panel (Latitude, Longitude) and right SOI.

REFERENCES

Cohen, D., Inada, T., Iwamoto, T. and Scialabba, N. (1990) Gadiform fishes of the world (Order Gadiformes). An annotated and illustrated catalogue of cods, hakes, grenadiers and other gadiform fishes known to date. FAO Fisheries Synopsis No. 125. Vol. 10. FAO, Rome, Italy, 442 pp.

Csirke J. (1987) Los recursos pesqueros patagónicos y las pesquerías de altura en el Atlántico Sud-occidental. FAO, Rome Doc. Tec. Pesca. 280.

Hastie, T.J. y Tibshirani, R.J. (1990). Generalized Additive Models. Chapman & Hall.

Pierce, G. J., Wang J., Zheng, X., Bellido, J. M., Boyle, P. R., Robin, J. P. and Denis, V. (2001). The cephalopod fishery GIS for the northeast Atlantic: development and application. International Journal of Geographical Information Science, 15, 763-784.

Pierce, G. J., Wang, J. and Valavanis, V. (2002) Application of GIS to cephalopod fisheries: workshop report. Bulletin of Marine Science, 71, 35-46.

R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.

Valavanis, V. D., Georgakarakos, S., Kapantagakis, A., Palialexis, A. and Katara, I. (2004) A GIS environmental modelling approach to essential fish habitat designation. Ecological Modelling, 178, 417-427.

Valavanis, V. D., Pierce, G. J., Zuur, A. F., Palialexis, A., Saveliev, A., Katara, I. and. Wang, J. (2008) Modelling of essential fish habitat based on remote sensing, spatial analysis and GIS. Hydrobiologia, 612, 5-20.

Zuur, A.F., Ieno, E.N. and Smith, G.M. (2007) Analysing ecological data. Springer.