ICES WGSAEM Report 2007

ICES Marine Habitat Committee CM 2007/MHC:02 REF. ACME

Report of the Working Group on Statistical Aspects of Environmental Monitoring (WGSAEM)

12–16 March 2007 Paris, France

International Council for the Exploration of the Sea Conseil International pour l’Exploration de la Mer H. C. Andersens Boulevard 44–46 DK-1553 Copenhagen V Denmark Telephone (+45) 33 38 67 00 Telefax (+45) 33 93 42 15 www.ices.dk [email protected]

Recommended format for purposes of citation: ICES. 2007. Report of the Working Group on Statistical Aspects of Environmental Monitoring (WGSAEM), 12–16 March 2007, Paris, France. CM 2007/MHC:02. 68 pp.

For permission to reproduce material from this publication, please apply to the General Secretary.

The document is a report of an Expert Group under the auspices of the International Council for the Exploration of the Sea and does not necessarily represent the views of the Council. © 2007 International Council for the Exploration of the Sea

ICES WGSAEM Report 2007 | i

Contents

Executive summary...... 1

1 Opening of the meeting ...... 2

2 Adoption of the agenda ...... 2

3 Terms of Reference...... 2

4 Develop and review tools for assessing and improving quality assurance of the data generating process...... 3 4.1 Data quality of Swedish water quality data and the impact on trend detection ...... 3 4.2 Tools for Data Quality Assurance ...... 4 4.3 Analysis on the decay of persistent organic pollutants in stored cod-liver data and consequences for an environmental sample bank...... 7 4.4 Comments to ICES STGQAB: reviewing proposed revison of Part B of the COMBINE manual ...... 9 4.4.1 Comments on the section “measurement uncertainty” ...... 9 4.4.2 Comments on the section “data validation”...... 9

5 Provide further advice on methods for temporal, spatial, and integrated assessments of contaminants in sediments and biological effects and inputs ... 10 5.1 Modelling imposex data in Nucella lapillus...... 11 5.2 Modelling less-thans in the OSPAR MON assessments...... 13 5.3 A simulation-based approach to facilitate the design of eutrophication surveys...... 14 5.4 Assessment of discharges and activity concentrations of radioactive substances...... 21 5.5 Use of Rtrend for INPUT data, experiences of Sweden ...... 27

6 Continue work on statistical aspects in the development of environmental indicators and classifications ...... 29 6.1 Using indicators to assess impacts...... 29 6.2 Statistical principles for ecological status classification of WFD monitoring ...... 30

7 Any other business...... 30

8 Recommendations...... 31

9 Close of meeting...... 32

Annex 1: List of participants ...... 33

Annex 2: Agenda...... 34

Annex 3: WGSAEM terms of reference for the next meeting ...... 35

Annex 4: Recommendations ...... 36

Annex 5: Modelling VDSI in Nucella lapillus...... 37

Annex 6: Dealing with less-thans in the OSPAR MON assessments of contaminants in biota ...... 46

ii | ICES WGSAEM Report 2007

Annex 7: OSPAR periodic evaluations of progress towards the objectives of the radioactive substances strategy: a summary centered on statistical methods, some samples analysis ...... 57

ICES WGSAEM Report 2007 | 1

Executive summary

1 ) WGSAEM agreed on the need for data flow quality assurance (QA). This was addressed in several presentations:

• An assessment of Swedish nutrient monitoring data, where apparent trends might have been caused by changes in laboratory practices (see Chapter 4.1 and Chapter 8 Recommendations);

• Presentation of several tools for testing the plausibility of measured metals and PAH’s in suspended matter of fresh water rivers, which takes into account the correlations between compounds (see Chapter 4.2);

• An investigation of systematic changes in observed levels of persistent organic pollutants (POP) in stored samples at -20°C (see Chapter 4.3).

• Review of the proposed HELCOM Combine Manual; request by the ICES Working Group STGAQB. 2 ) In several presentations it became clear that there is a need to have the original data instead of aggregated data to do the proper statistical analysis. Original data are necessary to reveal the structure of the data and to give the statistician the possibility to check the assumptions on which the analysis of aggregated data are based. The following examples clarified this:

• In an assessment of temporal trends in imposex levels in Nucella lapillus, using the Vas Deferens Sequence Index (VDSI), the variance-mean relationship used in the model of aggregated data could only be validated by using the original individual data (see Chapter 5.1, Chapter 8 Recommendations and Annex 5);

• A presentation used high frequency data from a smart-buoy to explore different monitoring designs for measuring chlorophyll-a and nitrates. In particular, the data were used to show the performance of ‘normal’ in-situ monitoring, which consists of taking a few measurements in a year (see Chapter 5.3). 3 ) In many assessments, methods have to be adjusted to deal with less-than measurements. Two presentations addressed this issue: • A method for dealing with less-than measurements in the OSPAR MON assessments of contaminants in biota (see Chapter 5.2, Chapter 8 Recommendations and Annex 6); • Assessment of discharges and activity concentrations of radioactive substances (see Chapter 5.4, Chapter 8 Recommendations and Annex 7). 4 ) WGSAEM discussed the group’s future. Due to the low number of attendees there is some fear that this group could be easily dissolved. It is therefore important that ICES makes a request to the Delegates to search for new WGSAEM members. WGSAEM is extremely useful as a platform to discuss and agree on statistical assessment techniques for the ICES and OSPAR community (see other business). 2 | ICES WGSAEM Report 2007

1 Opening of the meeting

Richard Duin opened the meeting of the Working Group on Statistical Aspects of Environmental Monitoring (WGSAEM) at 14:00 on Monday 12 March 2007. He welcomed the participants to the EDF (Electricité de France) R&D, Paris, France and thanked Philippe Nonclercq from EDF for all his support in organising the meeting, for having a guided tour in the famous Musée d’Orsay and for offering us a dinner. He welcomed two quests from IRSN, France, Benidicte Briand and Jerôme Guillevic, and the new member, Koen Parmentier, from Belgium.

2 Adoption of the agenda

Several background documents were submitted for discussion. These were allocated and discussed under the appropriate agenda item. The agenda was agreed and is attached as Annex 2. The list of participants is given in Annex 1.

3 Terms of Reference

The general terms of reference (ICES C. Res. 1986/2:25) originally formulated for WGSAEM were to: 1 ) develop statistical protocols for the determination of temporal and spatial trends in the concentration and distribution of contaminants in marine biota, sediments and sea water; 2 ) analyse data for the elucidation of temporal and spatial trends in contaminants in marine biota, sediments and sea water; 3 ) provide statistical advice with respect to other monitoring issues, as required; 4 ) liaise with the Statistics Committee as appropriate (now the Marine Habitat Committee).

Specific tasks for the 2007 WGSAEM meeting (2006/2/MHC02) are: a ) develop and review tools for assessing and improving quality assurance of the data generating process; b ) provide further advice on methods for temporal, spatial, and integrated assessments of contaminants in sediments and biological effects and inputs; c ) continue work on statistical aspects in the development of environmental indicators and classifications; d ) ICES/OSPAR/HELCOM Steering Group on Quality Assurance of Biological Measurements (STGQAB) invite WGSAEM by e-mail from C. Hagebro, 2 March 2007, to review the proposed changes for HELCOM COMBINE Manual, especially ANNEX 12, 13, 14, 15 and 16; e ) Request WKIMON, e-mail I. Davies 6 March 2007, to develop Background Assessment Concentrations (BAC) for various biological effects measurements, such as EROD, i.e. the activity, or response that would be expected in areas where anthropogenic influence is low. Access to data from P. Roose, MUMM, is essential.

ICES WGSAEM Report 2007 | 3

4 Develop and review tools for assessing and improving quality assurance of the data generating process

Justification

This topic will give the opportunity to assist the ICES Data Centre to develop appropriate QC tools for improving the information generating process

Summary

This section presents some work on the data quality assurance.

The first chapter addresses the need to have a high and even quality of data collected over a long period to ensure valid inferences in temporal trend assessments. Several principles for a statistical retrospective assessment of data quality are mentioned.

The second chapter discuss several tools which could be used for testing the plausibility of data. The tools, which take into account the correlations between compounds, are illustrated using measured metals and PAH-s in suspended matter from two monitoring stations in the rivers Meuse and Rhine.

The third chapter gives a statistical analysis of data for 12 cod liver samples stored for a given period at −20°C to investigate if there was evidence of systematic changes in observed levels of persistent organic pollutants (POP). The question has arisen in connection with plans to establish a Norwegian environmental sample bank. The pattern of variation between samples does not indicate a general reduction of concentrations over time, although further investigation into analytical quality and additional analysis of samples is suggested.

The fourth chapter deals with the request from ICES/OSPAR/HELCOM Steering Group on Quality Assurance of Biological Measurements (STGQAB) to review the proposed changes to the HELCOM COMBINE Manual. Some comments referring to the measurement uncertainty and data validation are given.

Conclusions

WGSAEM agreed on the general need for data quality assurance (QA) for the different kinds of measurements ICES deals with (Biological, Chemical, Physical, Biological effects, etc.) and to that ICES should recommend tools, which could be used to assure the QA.

WGSAEM agreed that data quality peer reviews by qualified external reviewers can help improve data quality (in addition to the available internal reviews).

4.1 Data quality of Swedish water quality data and the impact on trend detection

A report on ”Uncertainty in Swedish water quality data and its implications for trend detection” was presented. This report showed that data quality problems can be very severe even if conventional quality assurance is practiced. Maintaining a high and even quality of data collected over decades is a very demanding task, and many sources of errors can not be seen until the entire history of measured data is assessed for whole networks of sampling sites.

The report emphasised three principles for a statistical retrospective assessment of data quality: 1 ) Meterological/hydrological adjustment can reduce the noise in the collected time- series of data and thereby clarify other sources of variation, such as human interventions in the drainage area, or conscious or unconscious changes in sampling and laboratory practices.

4 | ICES WGSAEM Report 2007

2 ) Simultaneous analysis of several time-series of adjusted data can help in revealing patterns caused by changes of laboratories or laboratory practices. 3 ) Forming ratios or differences between water quality parameters that are related to each other can facilitate the detection of data quality problems.

Also, it was emphasised that continued sampling can shed more light on data that have already been put into a database, and that this calls for a system that allows and encourages retrospective data reviews. NASA (National Aeronautics and Space Agency) has implemented such a system for their databases. In particular, it can be noted that NASA, in its Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Information Disseminated by Federal Agencies, accepts and encourages technical and data quality peer review by qualified external reviewers or committees of external reviewers (www.sti.nasa.gov/STI-public-homepage.html).

WGSAEM agreed that data quality peer reviews by qualified external reviewers can help improving data quality (in addition to the available internal reviews). This fits into the outline of data quality assurance that was addressed by WGSAEM in 2006 (WGSAEM, 2006). Even though such reviews can not eliminate data quality problems, they can substantially shorten the time before errors are detected (if ever). In some cases, simple scatterplots and time-series can reveal data quality problems. In other cases, multivariate approaches might be needed.

Reference ICES. 2007. Report of the Working Group on the Statistical Aspects of Environmental Monitoring (WGSAEM), 12-16 March 2007, Paris, France. CM 2007/MHC:02. 48 pp.

4.2 Tools for Data Quality Assurance

Introduction

Steffen and Richard presented preliminary results of a project for the development of tools for testing the plausibility of data taking in account correlations between compounds and/or sampling sites. According to the approach the residuals of the trend assessment are explained by two components: the compound-comprehensive contribution and the remaining error (see for more details: WGSAEM, 2006, report in prep, 2007). Statistical assessments of a large number of sites and compounds indicate that there are high correlations of annual median concentrations, especially within PAHs and within heavy metals. The effect of using the compound-comprehensive procedure is illustrated with data from a monitoring station in the river Meuse (Keizersveer) and the river Rhine (Maassluis).

The effect of using the compound-comprehensive procedure.

Removal of the compound-comprehensive contribution provides corrected median concentrations, which are considerably smoother than the original median values.

This is apparent e.g. for the sampling site Keizersveer (river Meuse, the Netherlands). Original and corrected median concentrations of Benzo(b)fluorantheen with prediction intervals are presented in Figure 4.1.

ICES WGSAEM Report 2007 | 5

#205-99-2#Benzo(b)fluorantheen

1,5 1,4 1,3 1,2 1,1

mg/kg 1,0 0,9 0,8 0,7 1990 1993 1996 1999 2002 2005

Figure 4.1. Original and corrected concentrations of Benzo(b)fluorantheen in suspended matter(mg/kg dry_weight) with prediction intervals for Keizersveer (river Meuse, the Netherlands). Blue = original values and respective prediction interval, brown = corrected values and respective prediction interval.

The uncorrected median concentration (blue box) in 2003 is below the lower bound of the corresponding prediction interval. This may be an indication that in this year there was something wrong with the data. However, after removing the compound-comprehensive component the median is just in the center of the prediction interval. It can be concluded from this that the cause for the outlier, if any, is not related to that specific compound, but is compound-comprehensive. Further investigations show that there is indeed a compound- comprehensive effect in this year. This effect can also be detected at other sites of the river Meuse.

In Figure 4.2 Chrome median concentrations at Maassluis (river Rhine, the Netherlands) are presented. Here the uncorrected figures are all within the tolerance, but the corrected median for 2002 is slightly exceeding the limits. This may be interpreted as a hint that the concentration of this single compound is indeed very large compared with the other median concentrations since 1993. This may be due to errors not affecting simultaneously the other compounds as well.

6 | ICES WGSAEM Report 2007

#7440-47-3#Chroom

130

120

110

100

mg/kg 90

80

70

60 1987 1990 1993 1996 1999 2002 2005

Figure 4.2. Original and corrected concentrations of Chrome in suspended matter (mg/kg dry_weight) with prediction intervals for Maassluis (river Rhine, the Netherlands). Blue = original values and respective prediction interval, brown = corrected values and respective prediction interval.

In both figures the prediction bands were calculated empirically. The outer band is based on the uncorrected median values, which can be considered stochastically independent (apart from the underlying temporal trend). However the inner band is based on the corrected median values, and these depend on the underlying estimated trend. WGSAEM discussed the interpretation of this inner prediction band. It can be considered as a conditional prediction band, depending on the underlying trend. This means that with another underlying trend the prediction band may follow another trend line, but the differences between the bounds and the corrected median concentrations remain the same.

The inner conditional prediction band may therefore be used in order to check whether single value exhibit specific deviations, but it may not be used to derive a unconditional confidence band for the underlying trend and is therefore not directly suited for assessing trends.

In the discussion, WGSAEM noted further that the proposed technique is very promising and may also be used in order to test contrasts between one compound and an (weighted) average of the other compounds. This is illustrated in Figure 4.3, in which in addition to the blue and brown prediction bands the estimated trend (black) and the conditional confidence band for the compound-comprehensive trend (red) is presented. In 2004 the estimated trend exceeds the latter confidence band, and this is a specific early warning signal.

ICES WGSAEM Report 2007 | 7

#7440-02-0#Nikkel 90 85 80

75 70 mg/kg 65 60 55 50 45 40 1987 1990 1993 1996 1999 2002 2005

Figure 4.3. Original and corrected concentrations with prediction intervals for Nickel in suspended matter (mg/kg dry_weight) at Eijsden ponton (river Meuse, the Netherlands) Blue = original values and respective prediction interval, brown = corrected values and respective prediction interval, black = trend line, red = conditional confidence bounds for the compound comprehensive trend.

Reference Uhlig, S. 2006. In Report of the ICES Working Group on Statistical Aspects of Environmental Monitoring. Annex 6. ICES CM2006/MHC:02: 31:36.

4.3 Analysis on the decay of persistent organic pollutants in stored cod- liver data and consequences for an environmental sample bank

Birger Bjerkeng presented a statistical analysis of data for 12 cod liver samples stored at −20°C since being collected from 3 stations in 1993, 1994, 1997 and 2003. These were reanalysed in 2006 for comparison with the original analysis values. The purpose was to see if there was evidence of systematic changes in observed levels of persistent organic pollutants (POP) in samples that have been stored over a number of years. The question has arisen in connection with plans to establish a Norwegian environmental sample bank.

For most of the samples there are 3 sets of analysis values: 1 ) Original analyses of fat content and contaminant concentrations done in the year of sampling. 2 ) Fat extraction and analysis of the stored wet samples, with new measurement of fat content. 3 ) New analysis of the stored fat extracts.

Ratios were calculated between pairs of analyses for each sample and contaminant. All ratios were on lipid basis. The statistical analyses were done on natural logarithms of the ratios, expected to have linearly symmetric distribution functions with expectation value 0 under the null hypothesis of no systematic change between new and original analyses and between analysis on stored fat and stored wet samples.

The samples were analysed for PCBs (CB28, CB52, CB101, CB105, CB118, CB138, CB153, CB156, CB180, CB209), pesticides (DDTPP, DDEPP, TDEPP, HCHA, HCHG) and other chlorinated compounds (HCB, OCS, QCB). The statistical analysis was restricted to the

8 | ICES WGSAEM Report 2007

contaminants that had real values in all analyses; they were CB101, C105, CB118, CB138, CB153, CB180, DDEPP and HCB. The other contaminants had values marked as uncertain, below limit or missing due to masking or interference in chromatograms.

Hotelling’s multivariate T2-test and univariate main-factor variance analysis with sample as random factor and contaminant as fixed factor were used to look for differences between samples and contaminants and to test for deviations from the null hypothesis.

The analysis indicated clearly significant overall differences between samples in the ratios between reanalysed values of stored wet samples and original values, and also with significant differences between contaminants as averages over samples. The random variance components in the variance analyses of this ratio indicate a total coefficient of variation for individual analysis values of 13 %, based on the ratio between the two new analyses the coefficient of variation is estimated to be about 8 %. In both cases more than half of the variation is due to common variation for all contaminants in a sample. The estimated coefficient of variation is in line with analysis uncertainty estimated from results for certified reference material, which have been analysed in connection with each longer series of samples throughout the whole period.

The pattern of variation between samples does not indicate a general reduction of concentrations over time. The oldest samples from 1993 and the newest samples from 2003 have about equal repeatability, with no signs of average reduction from original analyses, while there is an estimated reduction of 20 to 30 % for samples from 1997, with the samples from 1994 intermediate between these. Such a pattern might be a result of analytical quality changing over time in combination with either a real reduction over time or stability in concentration, depending on which results were seen as biased by analytical changes. However, from results for the certified reference material it appears that the analytical quality of the lab has not changed over the period from 1993 until today, and this indicates that the apparent non-monotonic relation to number of years in storage may be a spurious result arising from the analysis of a small number of samples.

The results indicate an average reduction of 10 % on storage, but the estimate is very uncertain, with a 90 % confidence range going from 17 % to 2.5 % reduction. This should be more thoroughly investigated.

Analysing an additional set of samples might give a better basis for concluding on these points. Comparing whole time-series for one or more stations based on original and reanalysed values of the same set of samples might also help to determine if trends indicate changes in degree of detection by analysis or real change in stored samples over time.

The data show no significant overall difference in stability of concentrations for stored fat and stored wet samples, but there may be minor differences between contaminants in this respect. There is a weak indication that reanalysis of wet samples gives somewhat lower fat content determinations than the original analysis, but only by about 3 %.

Reference

Brevik, E., Green, N., Bjerkeng, B. 2007. Analyse av nedfrosne torskeleverprøver (Analysis of frozen cod liver samples). Norwegian Institute for Water Research (NIVA), Project report no. 5325-2006. In prep. (General text in Norwegian, statistical analysis in English)

ICES WGSAEM Report 2007 | 9

4.4 Comments to ICES STGQAB: reviewing proposed revison of Part B of the COMBINE manual

WGSAEM appreciates the efforts of the ICES/OSPAR/HELCOM Steering Group on Quality Assurance of Biological Measurements (STGQAB) to revise the HELCOM COMBINE Manual on quality assurance (QA). The ICES Advisory Secretary, Claus Hagebro, sent the manual on 2 March 2007. Only Appendix 15 of this manual has been reviewed by WGSAEM. In the following, we make a few general remarks and address some specific recommendations.

4.4.1 Comments on the section “measurement uncertainty”

Some sections of the manual are outdated. Most of the references are dated around 1998, while newer versions of the publications are available and would better describe how to estimate the uncertainty of analytical measurements. Furthermore, the manual has to deal with biological measurements. It should be carefully checked if the approved descriptions and procedures are applicable for these kinds of measurements. In measuring a chemical concentration it is obvious to identify a LOD or LOQ, but we are not aware that the same kind of measures could be used for biological measurements.

It should be emphasized that biological measurements have their own specific problems. A part of the uncertainty in the measurements is caused by the identification of species, which is difficult to standardize. The identification error should be examined by means of properly defined ring tests. WGSAEM noted that appropriate statistical methods are required in order to analyse properly the outcome of ring tests for taxonomic identification properly. One option to perform such an analysis is described in P. Schilling et al (2006).

It is unclear why for the calculation of measurement uncertainty the approach of the Nordic Committee on Food Analysis (1997) is suggested. Most institutes are already using a specified approach and it would not be wise to suggest something else without a well-defined reason (see e.g. the updated version of NEN 7779, from the Netherlands Standardization Institute for measurement uncertainty).

Some of the references in the manual are missing and others are not fully described or are unclear. Some examples are given:

• Nordtest is not explained and not referenced, ISO 5725 (1994); • In chapter B4.2.6 ISO 5725 is used as a reference, while actually 6 parts of this document are available, so ISO 5725-2 (1997) might be used in stead; • A reference is given for Thompson and Wood, while it might be more appropriate to refer to ISO/TS 21748; • The acronym BCR (Chapter B4.2.5) is not explained.

4.4.2 Comments on the section “data validation”

As stated in the WGSAEM 2006 report, data flow QA not only considers QA measures for single processes within the data flow process, but also has to integrate all the individual QA components. Tools for such a data flow process assessment might be e.g. intercomparison studies, audits of the data flow process and simulation studies.

Recommendations

WGSAEM recommends that ICES note the increased importance of data flow quality assurance, as already stated in the WGSAEM 2006 report, and that this topic is relevant for other ICES working groups dealing with monitoring data as well.

WGSAEM encourages the data quality assurance process for biological measurements method, but emphasizes that, due to the complexity of biological measurements methods, a

10 | ICES WGSAEM Report 2007

very general approach could be difficult to implement and could be less successful than focussing on specific biological measurement methods.

WGSAEM recommends the use of tools to assure data quality (see WGSAEM 2006)

References

NEN, Netherlands Standardization Institute. 2006. Updated version of NEN7779, ICS 03.120.30; 17.020. Schilling, P., Powilleit, M., and Uhlig, S. 2006. Macrozoobenthos interlaboratory comparison on taxonomical identification and counting of marine invertebrates in artificial sediment samples including testing various statistical methods of data evaluation. Accred. Qual. Assur. (11): 422–429 Uhlig, S. and Duin, R.N.M. 2006. In Report of the ICES Working Group on Statistical Aspects of Environmental Monitoring. ICES CM2006/MHC:02: 3:5.

5 Provide further advice on methods for temporal, spatial, and integrated assessments of contaminants in sediments and biological effects and inputs

Justification

This provides the opportunity to improve the existing assessment procedures recommended by ICES and develop new assessment procedures for marine environmental assessments. This will help the ICES Marine Data Centre to develop appropriate data products, and may feed back into the development of monitoring strategies.

Summary An improved method was developed for modelling time-series of VDSI in Nucella lapillus. The method was applied to all such time-series in the ICES database.

A method was developed for modelling time-series containing less-than measurements in the OSPAR MON assessments of contaminants in biota.

A simulation-based approach to facilitate the design of eutrophication surveys was presented by using high frequency buoy data. These data were used to investigate the properties of various sampling strategies for ToxN and chlorophyll-a. In particular, the proportions of occasions that a sampling programme will consider the chlorophyll-a and TOxN to be below a threshold using a Green Test were tested.

The methods used by OSPAR Radioactive Substances Strategy (RSS) to assess the pollutant levels (discharges and activity concentrations) over time were discussed. RSS is facing some difficulties in performing these assessments: some data sets are heterogenous, with sometimes a large number of data below LOD. A first analysis of how to deal with these LOD is presented.

Conclusions The method for modelling VDSI in Nucella lapillus should be adopted in the OSPAR MON assessments. Similar methods should be developed for other measures of imposex and other species (e.g. VDSI in Neptunea antiqua and PCI in Buccinum undatum). The changes in methodology provide more accurate significance levels and better model diagnostics. The number of females that contribute to each VDSI is required for the analysis and should be submitted to the ICES database. The submission of individual imposex data (e.g. the VDS class of each female) might have long-term benefits and should be encouraged.

ICES WGSAEM Report 2007 | 11

The method for dealing with less-than measurements in time-series of contaminants in biota should be adopted in the OSPAR MON assessments. However, the performance of candidate test statistics for assessing whether concentrations are increasing should first be investigated.

The simulation-based approach using high frequency smart-buoy ToxN and chlorophyll-a data suggest that more than one year from a single station is needed in order to get sufficient power to successfully identify whether regions are below a threshold. Further work is needed to identify how sampling performs when data from additional stations and years are included. There is quite a lot of variation between years. The practical implication of this is that, even if the site passes the test in most years, there may be some years in which it fails – with possible potential harmful effects on ecosystems and the environment. Adjusting for correlation when calculating confidence intervals is important. The fitted variograms suggest that Liverpool Bay data are independent only when at a separation of greater than 20 days (chlorophyll-a) and 100 days (TOxN).

An important aspect in the analysis of radioactive pollutant levels is the presence of LODs. The LOD depends on time of detection, and uncertainty on the measured values, which explains the peculiar behaviour of the LODs in certain data sets. Three data sets with Cs-137 concentration in seaweed were further examined. In all of them, all LODs could be either discarded without harm, either integrated in the measured values. RSS assessments (parametric or non-parametric tests) can then perfectly be performed on the cleaned values. The working group made a number of simple recommendations to perform the cleaning, stressing on the fact that the analyst must dispose of as much information as possible. One suggestion is to record the sample and white-noise counts as a pair for each measurement instead of only the estimated concentration.

5.1 Modelling imposex data in Nucella lapillus

Introduction An improved method was developed by Rob Fryer and Matt Gubbins for modelling time- series of Vas Deference Sequence Index (VDSI) in Nucella lapillus. A preliminary method for assessing VDSI was presented to WGSAEM and WGBEC in 2006 (Fryer and Gubbins, 2006a). However, it was recognised that the assumed mean-variance relationship was inappropriate. Individual VDS data were obtained intersessionally, modelled using a proportional odds model, and the results used to estimate a more appropriate mean-variance relationship. This was then used to assess the time-series of VDSI in Nucella lapillus in the ICES database.

The Individual Data Approach

Annex 5 describes improved methodology for assessing temporal trends in VDSI in Nucella lapillus. Fryer and Gubbins (2006a) describe how VDSI time-series can be assessed by scaling the indices to lie between 0 and 1 and then modelling them using generalised linear (or additive) models assuming quasi-binomial errors. They applied the method to all VDSI time- series in the ICES database with at least five years of data. The method appeared reasonable for any one time-series, but problems arose when several time-series were modelled simultaneously (Fryer and Gubbins, 2006b), as the mean-variance relationship implicit in the quasi-binomial distribution was shown to be inappropriate. There are good biological reasons for this: VDS classes are ordinal, with no sense of distance between them, and it is ‘harder’ to move between some classes than others. In particular, it is ‘hard’ to progress to VDSI >4.0 in Nucella lapillus.

Four sets of individual VDS data in Nucella lapillus were modelled using a proportional odds model (McCullagh and Nelder, 1999), a generalised linear model for ordinal data, and used to

12 | ICES WGSAEM Report 2007

estimate a mean-variance relationship appropriate for VDSI data. Figure 5.1 shows the mean- variance relationship under the assumption of quasi-binomial errors (black) and based on the individual VDS data (blue). The two curves clearly differ, the one based on individual data showing markedly lower variability when the mean VDSI is around 4. Thus, when the mean VDSI is around 4, most individual VDS classifications would be 4 and the VDSI from replicate samples would all be close to 4. Conversely, when the mean VDSI is around 2, individual VDS classifications would be more widely spread and the VDSI from replicate samples would have relatively large variability.

1.0

0.8

0.6

variance 0.4

0.2

0.0 0123456

mean

Figure 5.1. The mean-variance relationship under the assumption of quasi-binomial errors (black) and based on the individual VDS data (blue). The two relationships are scaled to have a common maximum of unity.

Generalised linear models, assuming the mean-variance relationship based on individual data, were used to assess all time-series of VDSI in Nucella lapillus in the ICES data base with at least four years of data (Annex 5). There were 114 time-series. Most of the estimated trends were downwards. There were 17 significant downward trends (at the 5% level) and only 1 significant upwards trend. There were 1, 14, 38, 58, and 3 time-series in assessment classes A through E respectively. There were some data that require quality checks.

The new mean-variance relationship provides more accurate significance levels and better model diagnostics.

The assessments should weight each VDSI by the number of females in the sample, but this is not available in the ICES database. Instead, half the total number of individuals in the sample was used.

There is evidence of over-dispersion in some time-series, so a between-year variance component will need to be introduced. Estimating these variance components will help inform the design of imposex monitoring programmes.

Mean-variance relationships now need to be estimated for other measures of imposex in other species (e.g. VDSI in Neptunea antiqua and PCI in Buccinum undatum).

In principle, modelling individual imposex data using e.g. the proportional odds model should be more informative than modelling summary measures such as VDSI. However, it is unclear

ICES WGSAEM Report 2007 | 13

whether, in practice, such a change would yield substantial benefits for large-scale assessments. Nevertheless, access to individual data would allow for this possibility in the future, and would help to estimate appropriate mean-variance relationship for use in VDSI assessments. Therefore, submission of individual data to the ICES database should be encouraged.

Recommendations

OSPAR MON should adopt the new methodology for assessing VDSI in Nucella lapillus.

The MON Intersessional Group (MIG) should estimate an appropriate mean-variance relationship, based on individual imposex measurements, for each measure of imposex in each species (e.g. VDSI in Neptunea antiqua and PCI in Buccinum undatum).

ICES should ensure that the number of females that contribute to each VDSI is submitted to the ICES database.

WGBEC should encourage the submission of individual imposex data

References Fryer, R.J., and Gubbins, M.J. 2006a. An assessment of temporal trends in VDSI. Annex 9 to ICES WGBEC Report, 2006. Fryer, R. J., and Gubbins, M. J. 2006b. A regional assessment of VDSI in dogwhelks from and . Annex 10 to ICES WGBEC Report, 2006. McCullagh, P., Nelder, J.A. 1989. Generalized Linear Models (second edition). Chapman and Hall, London.

5.2 Modelling less-thans in the OSPAR MON assessments

Annex 6 describes a simple method, developed by Rob Fryer, for dealing with less-than measurements in the OSPAR MON assessments of contaminants in biota. The method is motivated by the following:

• most less-than measurements in the ICES database are ‘low’, in the sense that they are below the Background Assessment Concentration (BAC) if available; • the MON assessments leading up to the 2010 Quality Status Report will focus on trends and levels in recent years (the ten monitoring years before the assessment) with historic trends being less important; • downward trends are not very interesting when concentrations are close to background.

Thus, when a time-series has many less-than values in recent years, the questions of interest are:

• are recent levels below the BAC? • are levels increasing in recent years?

These can be assessed by restricting the data to the ten years leading up to the QSR (the monitoring years 1998–2007 for the 2008/9 MON assessments) and replacing the reported concentrations by zeros or ones depending on whether the concentrations are below or above

the Background Assessment Concentration. Suppose that in year ti there are ni observations of

which yi are above the BAC. Then, we:

• test whether levels are below the BAC by assuming the yi arise from a common binomial distribution with probability p, computing an upper one-sided 95% confidence limit on p using the cumulative distribution function of the binomial distribution, and comparing this value to a reference value of 0.5;

14 | ICES WGSAEM Report 2007

• test for an upward trend by a one-tailed permutation test using the statistic Σi ti yi Annex 6 also describes rules for deciding whether to assess a time-series on the concentration scale (i.e. the usual assessment) or on the 0-1 scale. An important consideration is that assessments on the concentration scale tend to be more powerful. Thus:

• time-series with few less-thans are assessed on the concentration scale; • time-series with historic less-thans and recent real measurements are truncated and the recent portion of the time-series is assessed on the concentration scale; • time-series with many recent less-thans are assessed on the 0-1 scale.

Annex 6 illustrates the method with an assessment of α-HCH levels in shellfish. The WGSAEM share-point contains further assessments of γ-HCH and Benzo[a]pyrene in shellfish.

The method works well in practice. There is some loss of information about historic trends, but spurious trends due to changing limits of detection are also avoided. The analysis of truncated time-series on the concentration scale and of time-series on the 0-1 scale both give sensible results when compared to the plots of the original data. In discussion, it was noted that:

• The Mann-Kendal statistic might be more powerful than the statistic Σi ti yi when testing for trends. The Mann-Kendal statistic can be applied on the concentration scale and can accommodate less-than values by treating data pairs as ties when the larger of the pair is a less-than (e.g., Helsel, 2005). The significance of the Mann-Kendall statistic can again be evaluated using a permutation test. It was agreed that the performance of the two candidate statistics should be compared before implementation by MON; • When the analysis is on the 0-1 scale, it is still important to display the original data; • Testing for increasing trends in the presence of less-thans is likely to be important for ‘new’ contaminants, such as brominated flame retardants. Recommendation

OSPAR MON should adopt the method for dealing with less-thans in their assessments of temporal trends in contaminants in biota.

Reference Helsel, D. R. 2005. More than obvious: better methods for interpreting nondetect data. Env. Sci. Techn., 39: 419A–423A.

5.3 A simulation-based approach to facilitate the design of eutrophication surveys

Jon Barry and Rob Fryer presented joint work instigated by Cefas, UK on a simulation-based approach to facilitate the design of eutrophication surveys (Heffernan et al, in prep). This was an extension of work reported at the 2006 WGSAEM.

Overall summary • Our approach was to model Smart Buoy data from Liverpool Bay, UK. The results were then used to simulate realistic data which could be used to test alternative monitoring strategies. • Initial results suggest that more than one year from a single station is needed in order to get sufficient power to successfully identify whether regions are below a threshold. Further work is needed to identify how sampling performs when data from additional stations and years are included.

ICES WGSAEM Report 2007 | 15

• There is quite a lot of variation between years. The practical implication of this is that, even if the site passes the test in most years, there may be some years in which it fails – with possible potential harmful effects on ecosystems and the environment. • Adjusting for correlation when calculating confidence intervals is important. The fitted variograms suggest that Liverpool Bay data are independent only when at a separation of greater than 20 days (chlorophyll-a) and 100 days (TOxN).

Modelling the TOxN and Chlorophyll-a buoy data

TOxN (total oxidised nitrogen, NO3 + NO2) and chlorophyll-a data from a Smart Buoy (see www.cefas.co.uk/monitoring) off Liverpool Bay, UK were modelled (similar results were obtained for Cefas’ other Smart Buoy off the Thames estuary but were not reported here). Data is available for the years 2003 to 2006 (see Figures 5.2 and 5.3). Potentially, data is available every 20 minutes for chlorophyll-a and every 2 hours for TOxN. However, because of technical and quality-control problems with the Smart Buoy there were large amounts of missing data.

Note that we modelled both log chlorophyll and log TOxN throughout. One strong reason for doing this is that it means that realised values from our simulations are always positive.

The approach we adopted was to model the data in such a way that replicate data could be simulated with very similar characteristics to the original data. We have then used these data to investigate the properties of various sampling strategies. In particular, we have looked at the proportions of occasions that a sampling programme will consider the chlorophyll-a and TOxN to be below a threshold using a Green Test (Fryer, 2004). Using this method, a test is passed if the upper confidence interval is below the Background Assessment Concentration. The Background Concentration (BC) and Background Assessment Concentration (BAC) for Chlorophyll-a are 10 and 15 µg/l respectively and for TOxN 13 and 20 µmol/l .

In order to simulate the model we have identified various important statistical components of the data and then removed the effect of these from the data. We have continued until we have been left with random residuals. We have then simulated from these residuals and then added the statistical components back in until simulated data were obtained. The statistical components we modelled were:

• Yearly trend (mean and variance); • Adjustment to salinity of 32 psu; • Dependence on other observations as a function of time; • Residuals – partly non-parametric and using Generalised Pareto Distribution in the tails above an estimated threshold.

The trends for both the mean and variance were modelled using a kernel smoother. Dependence in time was modelled by fitting variograms to the data.

Figure 5.4 shows a simulated series together with potential winter sampling occasions from a bucket survey with 5, 10, 15 and 20 observations. In Figure 5.5 we show the proportion of (simulated) years in which the TOxN data passes the Green Test as a function of sample size and also as a function of the distance of the true mean from the BAC. The left hand side of the plot is the Background Concentration (i.e. for TOxN, the BC is 7 units less than the BAC). Thus, for TOxN, even when the mean level is at the BC, there is only just over an 80% compliance level (i.e. correctly saying that the station has passed the background level test).

For Chlorophyll, the power is higher in that between 20–30 samples are needed to get a compliance of 80%. However, overall conclusions from these plots are that more years or more stations would be needed in order to get acceptable compliance rates.

16 | ICES WGSAEM Report 2007

We have not shown compliance probabilities when the true mean is above the BAC. Because the Green Test fails a test if the upper confidence limit is below the BAC, the proportion of correctly assessed failures quickly approaches 100% even for levels only just greater than the BAC.

For future work, we want to use our model to assess other sampling strategies. These will include the use of more years and more stations in order to increase the power to detect compliance correctly as well as the use of other sampling statistics such as the sample 90th percentile.

References Fryer R. J. 2004. Effective Environmental Standards. OSPAR Convention for the Protection of the Marine Environment of the North East Atlantic: Workshop on BRC-EAC 9- 13/02/2004 Appendix 6.1. Heffernan, J., Barry, J., Devlin, M., and Fryer, R. J. (in prep) A simulation based approach to study design for nutrient management and eutrophication impact. To be submitted to Environmetrics.

ICES WGSAEM Report 2007 | 17 2006 2005 Liverpool Bay 2004

2003 0123

log Chlorophyll log

Figure 5.2. Natural log chlorophyll-a (µg/l) data against time from the smart-buoy at Liverpool Bay.

18 | ICES WGSAEM Report 2007 2006 2005 Liverpool Bay 2004

2003

0 05 04 03 02 1 0 TOxN

Figure 5.3. TOxN (µmol/l) data against time from the smart-buoy at Liverpool Bay.

ICES WGSAEM Report 2007 | 19

time 300 300 300 300 @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ @ 200 200 200 200 @ @ @ @ @ @ @ @ @ @ @ @ @ Days from 1st July Days from 1st July Days from 1st July Days from 1st July @ @ @ @ @ @ @ @ @ @ @ Liverpool Bay TOxN : 5 samples Liverpool Bay TOxN : 10 samples Liverpool Bay TOxN : 15 samples Liverpool Bay TOxN : 20 samples @ @ @ @ @ @ 100 100 100 100

0 0 0 0

02040 02040 02040 02040

Simulation 1 Simulation 1 Simulation 1 Simulation

Simulation 1 Simulation Figure 5.4. Realisation of simulated log(TOxN) data and potential sampling occasions in the winterperiod. The time axis start at 1 July.

20 | ICES WGSAEM Report 2007 0 5 samples 10 samples 15 samples 20 samples 30 samples 40 samples 50 samples 75 samples 100 samples −1 −2 −3 −4 Distance of Mean Level from Assessment Conc Liverpool Bay TOxN : correlation corrected −5 −6

−7

1.0 0.8 0.6 0.4 0.2 0.0

Prop’n years passing Green test Green passing years Prop’n

Figure 5.5. Power of Green test for ToxN for the different numbers of samples (see text for details).

ICES WGSAEM Report 2007 | 21 0 5 samples 10 samples 15 samples 20 samples 30 samples 40 samples 50 samples 75 samples 100 samples −1 −2 −3 Distance of Mean Level from Assessment Conc Liverpool Bay Chlorophyll : correlation corrected −4

−5

1.0 0.8 0.6 0.4 0.2 0.0 Prop’n years passing Green test Green passing years Prop’n

Figure 5.6. Power of Green test for Chlorophyll-a for the different numbers of samples (see text for details).

5.4 Assessment of discharges and activity concentrations of radioactive substances

Philippe Nonclerq exposed the methods used by OSPAR Radioactive Substances Strategy (RSS) to assess how the pollutant levels (discharges and activity concentrations) are behaving over time. RSS is facing some difficulties in performing these assessments: some data sets are heterogenous, with sometimes a large number of data below LOD.

22 | ICES WGSAEM Report 2007

Summary

A working group was set up to address the issue how to assess radioactive pollutant levels in data sets with a large number of data below LOD. Three data sets with Cs-137 concentration were examined. In all of them, all LODs could be either discarded without harm, either integrated in the measured values. RSS assessments (parametric or non-parametric tests) can then perfectly be performed on the cleaned values. The working group made a number of simple recommendations to perform the cleaning, stressing on the fact that the analyst must dispose of as much information as possible.

Introduction

Philippe exposed the methods used by OSPAR Radioactive Substances Strategy (RSS) to assess how the pollutant levels (discharges and activity concentrations) are behaving over time. RSS is facing some difficulties in performing these assessments: some data sets are heterogenous, with sometimes a large number of data below LOD. The RSS methodology and the questions raised by the quality of data are detailed in appendix 7.

Philippe, together with Bénédicte Briand and Jérôme Guillevic, requested WGSAEM’s opinion on these questions. This request does not officially come from OSPAR, but RSS recently set up an intersessional working group, one of the missions of which is to investigate on the statistical methods recommended by ICES.

A working group composed of Bénédicte, Birger, Koen, Rob, and Philippe examined some problematical data sets.

Measurement methods and LOD

The group inquired on the activity measurement methods, because some LODs showed quite a peculiar behaviour: some of them are below some other measured (detected) data made in the same site, by the same lab, and in the same period. See Figure 5.7 LODs are variable, some measured values are below all the LODs, some are above all the LODs, some are in the LOD range.

0,3

0,25 Catégorie

0,2 M 0,15 LOD 0,1 Concentration 0,05

0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05

Date

Figure 5.7. Cs-137 (mBq/l) in seaweed from the location Carteret, analysed by the laboratory LRC. The green squares are the measured (detected) values and the red squares are the values below LOD.

Activities are measured by sampling a volume (or a weight) of seawater, or dried-up biota (seaweed, mollusc, etc). A detection device is directed towards the sample and detects the emission of photons, electrons or alpha (He) particles and counts them over a detection time, which can be quite long (hours). This count is then corrected with a white-noise count (that counts solar emissions for example), and by possible calibration parameters. The LODs are therefore highly dependent on the observation time. Some labs are thus only interested in showing that the values are below a certain level, and do not bother spending the time

ICES WGSAEM Report 2007 | 23

necessary to detect enough counts if the actual levels are low. In this case, the information brought by some of these LOD data is so poor that they can even be discarded, considering that the measurement device is inadequate. Moreover, uncertainties on the measured values are pretty high: 50% if they are close to LOD, typically 20% for most of the values currently measured, 10% if the measured value is equal to several times the LOD.

Both these characteristics: LOD dependent on time of detection, and uncertainty on the measured values, explains thus the peculiar behaviour of the LODs.

Birger pointed out an interesting characteristic of this type of measurement: times to disintegration of radionucleides follow exponential laws, the count of emitted particles, equivalent to a count of disintegrations therefore follows a Poisson law. Knowing the counts during the observation time and the white noise counts, it should be possible to determine an analytical confidence interval, even for the data below LOD. Testing on these confidence intervals could be very efficient, because the information brought by the data below LOD could be taken into account this way. Plus, being able to establish analytical confidence intervals is rare in environmental monitoring; so all efforts should be made to take profit of this characteristic. It would require that sample and white-noise counts were recorded as a pair for each measurement instead of only the estimated concentration.

Dealing with LOD in tests

The group has examined three different data sets, and makes several recommendations that allowed to deal simply with data below LOD. A thorough examination of these three sets is given in Appendix 7.

Examine the influence of location and lab

In the first example, data are more homogenous since the measures were scattered over a small area (the presqu’île du Cotentin in France). Two labs were involved, with different behaviour:

• the first lab, OPRI, kept a minimum LOD of about 0.1 mBq/l while the real data sunk below this value with time. There are nearly only datas below LOD after 2000, see Figure 5.8.

1,4

1,2 Catégorie 1

0,8 LOD 0,6 M 0,4 Concentration 0,2 0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Date

Figure 5.8. Cs-137 (mBq/l) in seaweed from a small area (the presqu’île du Cotentin) analysed by the laboratory OPRI. The green squares are the measured (detected) values and the red squares are the values below LOD.

• the second lab, LRC, lowered its LOD (i.e. extended the detection time) in order to be able to detect real values, see Figure 5.9. The data below LOD are therefore rare.

24 | ICES WGSAEM Report 2007

0,8 0,7 Catégorie 0,6 0,5 M 0,4 LOD 0,3

Concentration 0,2 0,1 0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Date

Figure 5.9. Cs-137 (mBq/l) in seaweed from a small area (the presqu’île du Cotentin) analysed by the laboratory LRC. The green squares are the measured (detected) values and the red squares are the values below LOD.

Nine different sites were concerned by these measurements. It was verified that the site effect was not a significant factor in the number of LODs.

In a second example (see Figure 5.10), the data set is highly heterogenous: the French data are only made of measurements below LOD for example. All LOD are so high that they bring absolutely no information as far as the OSPAR tests are concerned: they should therefore be discarded.

1 France LOD Spain LOD Spain M 0,1 Ireland M Ireland LOD 0,01

Cs-137 inCs-137 seawater 0,001

0,0001 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Figure 5.10. Cs-137 (mBq/l) in seawater zone 1, see legenda for explanation of the symbols.

In a third example, Birger analysed data from two sites, with about the same levels and variation over time, where one site (Site 1) had only real values, while the other (Site 2) had many

ICES WGSAEM Report 2007 | 25

0.45 Measured conc.

0.30

0.25

0.20

concentration 0.15

0.10

0.05

0.00 1994 1996 1998 2000 2002 2004 2006 2008 Year (1 Jan)

Figure 5.11. CS-137 activity concentration (Bq/kg wet_weight) in seaweed at Site 1 and 2, with

A comparison of data from the same year and month shows that when real values were observed at both sites, Site 1 had the largest value in 70 % of the cases. In contrast to this, when Site 2 had observations

Table 5.1. Statistics on pairwise comparison between observations at site 1 and 2 in the same year and month.

NUMBER OF OBSERVATION PAIRS

Type of obs. at Site 2: with largest value at Site 2 with largest value at Site 1

The data were investigated with a number of different ANOVA models, with Site, Period (ref. 1995–2001 and recent 2002–2005) and month as factors (and with Year as nested factor within Period). Models with only year, site and/or month as factors, without dividing into predefined periods, were also used.

Generally, the heterogeneity of within-year or within-period variance is a problem, especially since differences in standard deviations are related to differences in means between years or periods. For log-transformed concentrations the heterogeneity is reduced, but still considerable.

Results from two supplemental ways of analysing data

First, a main-factor variance analysis of log(activity concentration) was done with year as random factor and month as fixed factor, using only data from Site 1. Lack of replicates does not permit separating between interaction and residual term. The variation between months is not at all significant in this analysis, while the variation between years is of course strongly significant. Post hoc test shows that years 1995 and 1996 are significantly higher than all succeeding years, while there are no overall differences between years within the two periods 1995–1996 and 1997–2004 (Table 5.2). This may indicate that prescribing comparison of a reference period 1995–2001 versus later data is not optimal; it appears that the change came between 1996 and 1997.

26 | ICES WGSAEM Report 2007

Table 5.2. Result of Variance analysis on log (activity concentration) for Site 1 in main-factor model with year as random factor and month as fixed factor.

P VALUES FOR SIGNIFICANCE UNDER NEUMAN -KEULS RANGE TEST GEOMETRIC Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 AVERAGE 1995 0.264 1996 0.495 0.233 1997 0.000 0.001 0.115 1998 0.000 0.000 0.436 0.079 1999 0.000 0.000 0.399 0.960 0.090 2000 0.000 0.000 0.366 0.998 0.906 0.081 2001 0.000 0.000 0.342 0.991 0.706 0.953 0.085 2002 0.000 0.000 0.035 0.364 0.396 0.522 0.558 0.062 2003 0.000 0.000 0.448 0.934 0.973 0.996 0.995 0.204 0.079 2004 0.000 0.000 0.228 0.972 0.932 0.940 0.888 0.406 0.979 0.093 2005 0.000 0.000 0.361 0.989 0.854 0.889 0.876 0.559 0.996 0.925 0.081

Secondly, an analysis was done with data from both sites, but only from times with real observations at both sites, with year as random factor nested within periods 1995-1996 and 2000-2004. The results show that there are significant differences between sites and between years within period, but the dominant effect is the difference between the two defined periods (Table 5.3). The difference between periods on log scale is 1.12, corresponding to a factor 3.0 for the concentrations. Site 1 is about 20 % higher that site 2 during both periods.

Table 5.3. ANOVA table from Variance analysis on log(activity concentration) for Site 1 and 2, using data only from times with real observations at both sites.

EFFECT DF SS MS F P Site Fixed 1 0.629 0.629 14.322 0.0003 Period Fixed 1 33.110 33.110 753.952 0.0000 Site*Period Fixed 1 0.032 0.032 0.736 0.3933 Year(Period) Random 6 0.924 0.154 3.508 0.0039 Error 82 3.601 0.044 Total 38.887

The results seem very clear, but the effect of possible autocorrelation in time and of heterogeneity in variance should be checked.

The group makes the following recommendations:

• When a data set is problematic (with numerous

ICES WGSAEM Report 2007 | 27

avoid replacing LOD values by LOD/2 or 2/3LOD. The tests results would be wrongly biased1. If LODs are significantly lower than real observations, and the occurrence of

Conclusion

In all the examined data sets, all LODs could be either discarded without harm, or integrated in the measured values. Parametric or non-parametric tests can then be performed on the cleaned values. The working group made a number of simple recommendations to perform the cleaning, stressing the fact that the analyst must use as much information as possible.

Philippe’s other questions could not be answered in the time imparted by the meeting, but some methods were cited. For example, weighted Student or Wilcoxon tests could be performed to take into account intra-year variation.

Recommendation OSPAR RSC should note the way in which LOD values in radioactivity concentration could be treated. Data below LOD in radioactivity concentration measurements can carry poor information (when LOD are largeley above the real value) or rich information (when measurements below LODs are mixed up with detected values at the same concentration levels). In the first case, all

References OSPAR RSC. 2006. First Periodic Evaluation of Progress towards the Objective of the OSPAR Radioactive Substances Strategy – RSC 06/2/1. OSPAR RSC. 2007. Second Periodic Evaluation of Progress towards the Objective of the OSPAR Radioactive Substances Strategy – RSC 072/1.

5.5 Use of Rtrend for INPUT data, experiences of Sweden

Anders Grimvall presented the experiences gained by using the Rtrend programme for analysing the annual and monthly input data from Sweden and comparing the results with Multitrend. The full report is available at OSPAR INPUT, INPUT 07/3/12.

1 At least with the type of data analysed here, although some LODs seem to be equal to 1, 5 or 2 times the measured value in the same area and period. Some labs indeed evaluate the LOD by adding 50% to the background noise, while some measured values are close to this background noise. The method should thus be adapted to the measurement type.

28 | ICES WGSAEM Report 2007

Furthermore, it was announced that Sweden shall organise a workshop for adjustment and trend analysis of waterborne and atmospheric inputs this year (OSPAR INPUT 07/3/11). Unfortunately no Terms of Reference were available at the meeting.

After some discussion WGSAEM concluded that the suggestions made in 2006 for the assessment of inputs are still valid. The suggestions were: 1 ) Due to the complexity of load adjustment procedures it is fundamental for the comparability of trend assessment results on inputs to perform the analysis with one and the same software so that different assessments are reproducible. 2 ) As RTrend is recommended for riverine and direct inputs specifically designed according to the needs of OSPAR INPUT, RTrend should be used for all RID assessments. This guarantees that the handling and preparation of data as well as the calculation of trends and adjusted loads is performed in a reproducible way. 3 ) WGSAEM recommends OSPAR INPUT to adopt RTrend for the assessment of input data. This is especially needed when the riverine input assessment results shall be compared with other assessments (CAMP, OSPAR-MON, OSPAR- ETG). 4 ) This does not mean that for other purposes out of the INPUT assessment other software programmes might not be useful. Rtrend is specifically designed for the RID trend assessment. For other e.g. national purposes other programs might be more appropriate. 5 ) WGSAEM feels not to be able to perform a comprehensive comparison with other software progams, because this would be very time-consuming. RID assessments should generally be performed with RTrend. 6 ) However, if there are new methods which allow to perform trend calculation of loads at least for some river systems more successful, these should also be implemented in Rtrend and in the respective JAMP guidance document. 7 ) WGSAEM notices that the current version of Rtrend is designed for institutes who have no central database. However, if a central database is available, the transfer of data into the Rtrend-database would be not necessary, if there would be some sort of connection/interface from Rtrend to the central database. Such a direct link would simplify the work with Rtrend considerably. WGSAEM therefore recommend to onsider the implementation of such an open interface. 8 ) Furthermore, WGSAEM notices that in 2009 an assessment of both RID and CAMP data shall take place. Considering that the currently used methods and results for RID and CAMP data are incompatible, WGSAEM recommends to OSPAR INPUT that if a workshop on RTrend and trend assessment shall be performed, the use of normalisation techniques in order to achieve unified handling of CAMP and RID data should be given highest priority. According to the decision of INPUT 2006, the focus of an OSPAR INPUT workshop concerning trend assessment would however be on the RTrend software package.

References Grimvall, A. and Uhlig, S. 2006. In Report of the ICES Working Group on Statistical Aspects of Environmental Monitoring. ICES CM2006/MHC:02: 5:10. OSPAR INPUT. 2007. Rtrend RID analysis report, Sweden. INPUT: 07/03/12. OSPAR INPUT. 2007. Proposal for an OSPAR workshop on approaches for adjustment and trend analysis of waterborne and atmospheric inputs. INPUT: 07/03/11.

ICES WGSAEM Report 2007 | 29

6 Continue work on statistical aspects in the development of environmental indicators and classifications

Justification

This is a continuation of previous work to develop a proper statistical basis for environmental indicators.

Summary

Sample estimates of many biodiversity indicators based on species richness can be badly biased, with the bias depending on the underlying density and spatial pattern of species, the number of samples and the size of the grab. This is important if these indicators are used for impact analysis. In the presentation two methods were discussed. The first deals with estimation of the population value of species richness from sample data and the second predict the influence on species richness and abundance of an impact, such as dredging.

Conclusions

The methods under development for estimating the population value of species richness from sample data and for predicting the influence on species richness and abundance of an impact are very promising. However to make more progress, input from ecologists and data from benthic surveys need to be combined to give information on the spatial pattern of benthic organisms and their sensitivity to disturbance.

6.1 Using indicators to assess impacts

Summary • Sample estimates of many biodiversity indicators based on species richness can be badly biased. • A method to estimate the population value of species richness from sample data is being developed. Further ecological information is needed to get good estimates from this estimator. • A method has been developed to predict the influence on species richness and abundance of an impact such as dredging. Again, further ecological information is needed to improve the results from this method.

Jon Barry discussed several issues to do with measuring biodiversity indicators for benthic surveys and, in particular, problems associated with the bias of estimation of indicators from sampling grabs that are a function of species richness (number of species). Some components of this bias are:

• The underlying density of the species; • The sampling effort (number of grabs); • The size of the grab; • The spatial pattern of the benthic organisms.

The point is that it is dangerous to compare the sample value of species richness (and associated indicators that are a function of species richness - such as Shannon-Wiener) between two places or times because the bias of sample values compared to population values may not be the same because of the above factors. Of course, if one is not interested in the actual value of the indicator and is interested only in a change in the underlying population (whatever that change may be) then it might be acceptable in some situations to use biodiversity indicators to map such a change.

30 | ICES WGSAEM Report 2007

Jon showed some probability theory assuming random pattern of individuals to show that a survey at a site or time with 100 species and 10 individuals per species would be expected to detect 55% of the species, whereas the same survey would detect only 34% of the species if the number of individuals per species was reduced to 5. So, the population value of richness has not changed but the expected number detected in the sample will be very different. Indeed, it would be easy to construct examples (for example, if the second situation had 110 species with 5 individuals per species) where the population value in one situation was higher than in another, but the expected sample value was lower.

It would be interesting to investigate such situations with more realistic populations of rare and abundant species. This might even reveal more discrepancies between the defined species richness indicator and the real values.

In order to get round this problem in future, Jon presented a method based on the expected number of species detected by a grab as a function of the underlying richness, densities and spatial pattern of the true population (Barry et al, in prep). By comparing these expected number of species with the number observed, we can obtain an estimate of the underlying richness. Whilst other methods exist to do this, this method has the potential to be applicable to benthic surveys in that it can incorporate spatial pattern of individuals. The method may also be applicable to other measures of biodiversity. However, in order to apply this method in practice, more input is needed in practice from ecologists about the spatial patterns and densities of benthic organisms.

Finally, Jon presented a model that could be used to assess the impact on species richness and abundance of an impact such as dredging. This model assumed that individual species would have some probability of being killed by the impact and that individuals within species would also have a probability of being killed. This method can also incorporate spatial pattern of the species. However, to make more progress, input from ecologists and data from benthic surveys need to be combined to give information on the spatial pattern of benthic organisms and their sensitivity to disturbance.

References Barry, J. and Maxwell, D. (in prep) Biases of diversity indicators: when to trust them and when not to. Barry, J. Maxwell, D., and Boyd, S. (in prep) A new estimator of species richness from grab samples. Barry, J. and Boyd, S. (in prep) A model for the effects of gravel extraction on the macrobenthos.

6.2 Statistical principles for ecological status classification of WFD monitoring

Birger Bjerkeng presented the progress of the work of Jacob Carstensen, decribing methods for classifying ecological status under the WFD. The paper is published, see Carstensen, 2007.

References Carstensen, J. 2007. Statistical principles for ecological status classification of Water Framework Directive monitoring data. Marine Pollution Bulletin, 55: 3–15.

7 Any other business

The request from WKIMON, Ian Davies, 11 March 2007, to construct BAC for biological effects data could not be handled because data were not available in time and this will be dealt with by MIG, before MON 2007.

ICES WGSAEM Report 2007 | 31

The brochure for the Annual Science Conference 2007 was circulated. No member of this group will probably attend this conference; the chair shall write an executive summary for the chair of the Marine Habitat Committee.

WGSAEM discussed her future. Due to the low number of attendees there was some fear that this group could be easily dissolved. Therefore, it is important that ICES makes a request to the Delegates to search for new WGSAEM members. Furthermore, we want to be this group more visible for the ICES and OSPAR community. How to do this was not answered.

Certain OSPAR groups (MON, INPUT) get their statistical advise directly from some members of our group, so there is no need anymore to do an official request to ICES. The big advantage of this alliance is the tailor-made statistical solutions for these groups and WGSAEM encourages this very much. The disadvantage is that no official requests to WGSAEM from these groups are made anymore. In the past, unfortunately, many requests could not be solved due to lack of information, or because they were too vague or having time constraints.

WGSAEM is seen as an extremely useful platform to discuss and to agree on the statistical assessment techniques for the ICES and OSPAR community.

WGSAEM would like to be in touch with developments of assessment methods, sampling design and data quality assurance in organisations as WFD/EU/EMMA. An defined action could be to sort out where the action is and to filter down the available information for our group. Support from ICES is welcomed.

Rob Fryer and Birger Bjerking will attend the EMMA (European Marine Monitoring Assessment) workshop, Developing a monitoring and reporting framework, in Copenhagen, on 17-18 April.

The meeting place for next year is at the moment not certain, due to office reorganisation of present chair. We made two options: if Richard Duin could chair next meeting we would meet in Copenhagen, otherwise we should meet in Burnham and his chairmanship will be taken over by Jon Barry.

8 Recommendations

WGSAEM recommend that: 1 ) ICES note the increased importance of data flow quality assurance, as already stated in the WGSAEM 2006 report, and that this topic is relevant to other ICES working groups dealing with monitoring data as well. 2 ) STGQAB focus their data quality assurance process for biological measurement methods on certain types of biological measurement methods and develop these one at a time. WGSAEM emphasizes that, due to the complexity of biological measurement methods a very general approach could be difficult to implement. 3 ) To assure the data quality of the data stored in the ICES database, tools have to be developed to identify the different sources of error. 4 ) OSPAR MON should adopt the new methodology for assessing VDSI in Nucella lapillus. 5 ) The MON Intersessional Group (MIG) should estimate an appropriate mean- variance relationship, based on individual imposex measurements, for each measure of imposex in each species (e.g. VDSI in Neptunea antiqua and PCI in Buccinum undatum). 6 ) ICES should ensure that the number of females that contribute to each VDSI is submitted to the ICES database.

32 | ICES WGSAEM Report 2007

7 ) WGBEC should encourage the submission of individual imposex data. OSPAR MON should adopt the method for dealing with less-thans in their assessments of temporal trends in contaminants in biota. 8 ) OSPAR RSC note the way in which LOD values in radioactivity concentration could be treated. Radioactivity concentratrion values below LOD can carry poor information (if LOD are largeley above the real value) or rich information (if measurements below LODs are mixed up with detected values at the same concentration levels). In the first case, all less than LOD values should be discarded, in the second case, they may considered as detected values. In addition, an analytical way to take into account less than LOD values in an exact way is suggested. Furthermore, it is important to have a close look at the data and to have information about the way the labs work. 9 ) Due to the complexity of load adjustment procedures it is fundamental for the comparability of trend assessment results on inputs to perform the analysis with one and the same software so that different assessments are reproducible. 10 ) As RTrend is recommended for riverine and direct inputs specifically designed according to the needs of OSPAR INPUT, RTrend should be used for all RID assessments. This guarantees that the handling and preparation of data as well as the calculation of trends and adjusted loads is performed in a reproducible way. 11 ) OSPAR INPUT should give high priority in to developing normalisation techniques to achieve unified handling of CAMP and RID data. A natural place to develop such techniques would be at the workshop on Rtrend and trend assessment proposed for 2007 in Sweden. 12 ) ICES should request the Delegates to search for new WGSAEM members.

Further, WGSAEM should meet at ICES HQ or at Burnham next year to: a ) develop and review tools for assessing and improving quality assurance of the data generating process; b ) provide further advice on methods for temporal, spatial, and integrated assessments of contaminants in sediments and biological effects and inputs;

c ) continue work on statistical aspects in the development of environmental indicators and classifications.

9 Close of meeting

Richard Duin thanked everyone for their enthusiasm and productivity which made this meeting a success. Philippe Nonclerq and EDF are thanked for the valuable guided tour at Musee d’Orsay, the splendid dinner and his knowledge of good feeding places. Richard closed the meeting on 11:58 in a café on the corner of the square in front of the museum George Pompidou that served very good coffee.

ICES WGSAEM Report 2007 | 33

Annex 1: List of participants

NAME ADDRESS PHONE/FAX EMAIL Barry, Jon CEFAS P:+44 1524 844113 [email protected] 145 Willow Lane Lancaster LA1 5PU United Kingdom Bjerkeng, Birger Norwegian Institute for P:+47 97170010 [email protected] Water Research (NIVA) P:+47 22185100 Brekkeveien 19 F:+47 2218 5200 P.O. Box 173, Kjelsaas N-0411 Oslo Norway Briand, Benedicte IRSN, P:+33 492199259 [email protected] Laboratories d’Études F:+33 492129142 Radioécologiques Continentals et Marines. Bâtiment 153, BP3 13115 Saint Paul lès Durance Cedex France Duin, Richard National Institute for Coastal P:+31 70 3114214 [email protected] (Chair) and Marine F:+31 70 3114600 Management/RIKZ P.O. Box 20907 2500 EX The Hague The Netherlands Fryer, Rob FRS Marine Laboratory P:+44 1224 295502 [email protected] PO Box 101 F: +44 1224 295511 Victoria Road Aberdeen AB11 9DB United Kingdom Grimvall, Anders Department of Mathematics P:+46 13 281482 [email protected] Division of Statistics P:+46 13 281000 Linköping University F:+46 13 100746 SE-58183 Linköping Sweden Guillevic, Jerôme Institut de Radioprotection et P :+33 1 30 15 52 26 [email protected] de Sûreté Nucléaire F :+33 1 30 15 37 50 31 rue de l’écluse BP 35 78116 Le Vésinet Cedex Nonclercq, Philippe EDF R&D P:+33 147 653389 philippe.nonclercq@ Dép. Management des F+33 147 655173 edf.fr Risques Industriels 6, quai Watier 78400 Chatou France Parmentier, Koen Institute for Agriculture and P:+32 59 569857 koen.parmentier@ Fisheries Research-Fisheries F:+32 59 330629 ilvo.vlaanderen.be Ankerstraat 1 B-8400 Oostende Belgium Uhlig, Steffen Quo Data P:+49 35140356631 [email protected] Kaitzer Str. 135 F:+49 35140356639 D-01187 Dresden Germany

34 | ICES WGSAEM Report 2007

Annex 2: Agenda

1 ) Open meeting (Monday 13:00); 2 ) Welcome, Apologise of absence, Member list; 3 ) Agree on the agenda and organise work; 4 ) ToR a: develop and review tools for assessing and improving quality assurance of the data generating process;

- ICES/OSPAR/HELCOM Steering Group on Quality Assurance of Biological Measurements (STGQAB) invite WGSAEM by e-mail from C. Hagebro, 2 March 2007, to review the proposed changes for HELCOM COMBINE Manual, especially ANNEX 12, 13, 14, 15 and 16. 5 ) ToR b: provide further advice on methods for temporal, spatial, and integrated assessments of contaminants in biota, contaminants in sediments and biological effects and inputs;

- Request WKIMON, e-mail I. Davies 6 March 2007, to develop Background Assessment Concentrations (BAC) for various biological effects measurements, such as EROD, i.e. the activity, or response that would be expected in areas where anthropogenic influence is low. Access to data from P. Roose, MUMM, is essential.

6 ) ToR c: continue work on statistical aspects in the development of environmental indicators and classifications; 7 ) Any other business; 8 ) Date and venue of the next meeting; 9 ) Close meeting (Friday 13:00).

ICES WGSAEM Report 2007 | 35

Annex 3: WGSAEM terms of reference for the next meeting

The Working Group on Statistical Aspects of Environmental Monitoring [WGSAEM] (Chair: Richard Duin, the Netherlands or Jon Barry, United Kingdom) will meet in [Copenhagen or Burnham] from 10–14 March 2008 to: a ) develop and review tools for assessing and improving quality assurance of the data generating process; b ) provide further advice on methods for temporal, spatial, and integrated assessments of contaminants in sediments and biological effects and inputs; c ) continue work on statistical aspects in the development of environmental indicators and classification.

WGSAEM will report by [DATE] to the attention of the Marine Habitat Committee.

Supporting Information

PRIORITY: This Group prepares statistical advice for monitoring and assessment activities that is of great value to the quality of such activities. SCIENTIFIC Action Plan No: JUSTIFICATION Term of Reference a) AND RELATION TO ACTION PLAN: This is a continuation of work started last year. It deals with the data quality assurance, which gives insight in the quality of the data processing flow (from sampling to reporting). Term of Reference b) This provides the opportunity to improve the existing assessment procedures recommended by ICES and develop new assessment procedures for marine environmental assessments. This will help the ICES Marine Data Centre to develop appropiate data products, and may feed back into the development of monitoring strategies. Term of Reference c) This is a continuation of previous work to develop a proper statistical basis for environmental indicators.

RESOURCE Access to relevant parts of the ICES Environment Database REQUIREMENTS: PARTICIPANTS: Representatives of all Member Countries.The Group is normally attended by 7–10 members. Increased participation of environmental statisticians is critical to the viability of this Working group. SECRETARIAT Meeting room for 10-15 persons and associated services. FACILITIES: FINANCIAL: No financial implications. LINKAGES TO ACME. ADVISORY COMMITTEES: LINKAGES TO WGBEC, WGMS, MCWG, REGNS, WKIMON, OTHER COMMITTEES OR GROUPS: LINKAGES TO OSPAR, HELCOM. OTHER ORGANIZATIONS:

36 | ICES WGSAEM Report 2007

Annex 4: Recommendations

RECOMMENDATION ACTION 1. ICES note the increased importance of data flow quality All contracting parties who assurance, as already stated in the WGSAEM 2006 report, and submit data to the ICES Data that this topic is of value/relevance to other ICES working Centre groups 2. STGQAB focus their data quality assurance process for ICES STGQAB biological measurements on specific biological measures and develop these one at a time. WGSAEM emphasizes that, due to the complexity of biological measurements, this would be more successful than trying to implement a general approach. 3. To assure the data quality of the data stored in the ICES ICES database, tools have to be developed to identify the different sources of error. 4. OSPAR MON should adopt the new methodology for OSPAR MON assessing VDSI in Nucella lapillus.

5. The MON Intersessional Group (MIG) should estimate an OSPAR MIG appropriate mean-variance relationship, based on individual imposex measurements, for each measure of imposex in each species (e.g. VDSI in Neptunea antiqua and PCI in Buccinum undatum). 6. ICES should ensure that the number of females that ICES Data Centre contribute to each VDSI is submitted to the ICES database. 7. WGBEC should encourage the submission of individual ICES WGBEC imposex data 8. OSPAR MON should adopt the method for dealing with less- OSPAR MON thans in their assessments of temporal trends in contaminants in biota.

9. OSPAR RSC note the way in which LOD values in OSPAR RSC :Intersessional radioactivity concentration could be treated. Radioactivity Correspondence Group (ICG- concentratrion values below LOD can carry poor information (if Stats) LOD are largeley above the real value) or rich information (if measurements below LODs are mixed up with detected values at the same concentration levels). In the first case, all less than LOD values should be discarded, in the second case, they may considered as detected values. In addition, an analytical way to take into account less than LOD values in an exact way is suggested. Furthermore, it is important to have a close look at the data and to have information about the way the labs work. 10. Due to the complexity of load adjustment procedures, it is OSPAR INPUT fundamental for the comparability of trend assessment results on inputs to perform the analysis with one and the same software so that different assessments are reproducible. As RTrend is recommended for riverine and direct inputs specifically designed according to the needs of OSPAR INPUT, RTrend should be used for all RID assessments. This guarantees that the handling and preparation of data as well as the calculation of trends and adjusted loads is performed in a reproducible way. This is especially needed when the riverine input assessment results shall be compared with other assessments, like CAMP, OSPAR-MON, OSPAR-ETG 11. ICES should request the Delegates to search for new ICES WGSAEM members.

ICES WGSAEM Report 2007 | 37

Annex 5: Modelling VDSI in Nucella lapillus

Rob Fryer and Matt Gubbins, Fisheries Research Services, UK

1. Introduction

An assessment of temporal trends in imposex levels, based on data held in the ICES database, was presented to WGSAEM and WGBEC in 2006 (Fryer and Gubbins, 2006a). The assessment mostly used the Vas Deferens Sequence Index (VDSI) as the measure of imposex1 VIVIAN:numbering foornote. The VDSI is based on the females in a sample of 40 individuals; each female is given a VDS classification between 0 and 6 and the VDSI is the mean of these values. In the assessment, the VDSI were scaled to lie between 0 and 1 (by dividing by 6) and then modelled as under-dispersed binomial observations using quasi- likelihood methods. The method appeared reasonable for any one time-series, but problems arose when several time-series were modelled simultaneously (Fryer and Gubbins, 2006b), as the mean-variance relationship implicit in the quasi-binomial distribution then appeared inappropriate. There are good biological reasons for this: VDS classes are ordinal, with no sense of distance between them, and it is ‘harder’ to move between some classes than others. In particular, it is ‘hard’ to progress to VDSI >4.0 in Nucella lapillus.

Here we use some individual VDS data in Nucella lapillus to develop an improved assessment methodology for VDSI (in Nucella lapillus). We model the individual data using a proportional odds model (McCullagh and Nelder, 1999), use the results to estimate a more appropriate mean-variance relationship and then reassess the data for VDSI in Nucella lapillus in the ICES database. This process is necessary because many historic data are only stored as VDSI and individual VDS data are unlikely to be recovered.

2. Individual data

We have used 5 sets of individual VDS data in Nucella lapillus:

• Quasimeme data (rounds 13, 14, 16 and 17). Each round provides data on about 300 individuals from a single population. Two rounds used populations with high VDSI; the other two used populations with low VDSI. • Data from 20 stations in Sullom Voe and Yell Sound, in 2001 • Data from 20 stations in Sullom Voe and Yell Sound, Shetland in 2004. • Data from 47 stations around England and Wales in 2004. • Data from 30 stations around the UK and France in 1992.

The last data set is important, as it is the only one with several individuals in VDS class 6. It also allows us to compare historic and recent classifications.

We modelled the individual data using the proportional odds model. Let

γ j x ≤= ,)|VDSPr()( jxj = 5...0 ,

be the cumulative probability that an individual VDS classification is in class j or below, where x is a vector of possible covariates. Then assume that

1 VDSI was used for Nucella lapillus and Neptunea antiqua whilst the Penis Classification Index (PCI) was used for Buccinum undatum. Here we only consider VDSI in Nucella lapillus. 38 | ICES WGSAEM Report 2007

logitγ j x))(( = θ j − β′x .

Here, the θj are cut-points that measure the transition from one VDS class to the next, and the β are parameters that measure the covariate effects. (The negative sign ensures that, when the β are positive, large values of the covariates lead to a greater probability of having a high VDS

classification.) The parameters θj, j = 0…5, and β are estimated by maximum likelihood assuming the VDS data have a multinomial distribution.

We first modelled each data set in turn. For comparability, we combined imposex stages 5

and 6 (since only the 1992 data set has real information about stage 6), so θ5 is no longer estimable. Quasimeme round or monitoring station were treated as categorical covariates.

The estimates of θj, j = 0…4 are given in Table Annex 5.1. Table Annex 5.1. Estimates of cut points from the proportional odds model.

1 QUASIMEME SHETLAND SHETLAND ENGLAND & UK & FRANCE COMBINED 2001 2004 WALES 2004 1992

θ0 −3.6 −1.6 −1.5 −1.3 −3.3 −6.7

θ1 −2.6 −0.6 −1.1 −0.8 −0.9 −6.0

θ2 0.2 1.2 1.5 0.7 0.4 −4.4

θ3 1.9 3.5 3.7 2.1 1.9 −2.9

θ4 7.1 11.4 11.8 8.2 10.1 4.8

θ5 7.7 1 All data sets except for the Quasimeme data. Interpreting the estimates is quite tricky, because they depend on the reference value of the categorical covariate used in the fitting process. However, the important thing is that, for each

data set, the estimates of θ0, θ1, θ2 and θ3 are quite close, indicating that it is ‘easy’ to move

between VDS classes 0, 1, 2 and 3, whilst the estimates of θ4 are much larger than the

estimates of θ3, indicating that it can be ‘hard’ to move out of VDS class 4.

The estimates are easier to compare if we use them to compute the mean-variance relationship of a VDSI measurement. We do this numerically by choosing a range of values for the covariate, computing the corresponding probabilities that a VDS measurement is in each class, and hence computing the mean and variance of a VDSI measurement (using the properties of the multinomial distribution). Figure Annex 5.1 below shows the resulting mean-variance relationships, as well as the mean-variance relationship assuming quasi-binomial errors.

ICES WGSAEM Report 2007 | 39

1.0

0.8

0.6

variance 0.4

0.2

0.0 012345

mean

Figure Annex 5.1. The mean-variance relationship under the assumption of quasi-binomial errors (black) and based on the individual VDS data: Quasimeme (green), Shetland 2001 (light blue), Shetland 2004 (also light blue), England & Wales 2004 (red) and UK & France 1992 (dark blue). The relationships are scaled to have a common maximum of unity.

The curves based on the individual data clearly differ from that for the quasi-binomial errors, showing markedly lower variability when the mean VDSI is around 4. The curves from Shetland, England & Wales and UK & France show pretty good agreement, although all differ statistically. The England & Wales curve shows slightly different behaviour when the mean VDSI is above about 4, perhaps because this data set only had one station with individuals in VDS class 5. There is some suggestion that the first mode in the mean-variance relationship has shifted to the left since 1992. The mean-variance relationship from the Quasimeme data is the least compatible of the relationships based on individual data. The Quasimeme data set was also the only one for which the proportional odds model showed significant lack of fit. This might be because less care was taken in choosing adults since so many individuals were required for the exercise.

We combined the individual data from Shetland, England & Wales, and UK & France and estimated a common set of cut-points for VDS classes 0 through 6 (Table Annex 6.1). The resulting mean-variance relationship is shown below. We omitted the Quasimeme data set because it gave a rather different mean-variance relationship to the other individual data sets and because the lack of fit was disturbing.

40 | ICES WGSAEM Report 2007

1.0

0.8

0.6

variance 0.4

0.2

0.0 0123456

mean

Figure Annex 5.2. The mean-variance relationship under the assumption of quasi-binomial errors (black) and based on the combined individual VDS data (blue). The two relationships are scaled to have a common maximum of unity.

3. An assessment of time-series of VDSI in Nucella lapillus

Time-series of VDSI in Nucella lapillus in the ICES database were assessed by fitting a generalised linear model assuming the mean-variance relationship based on individual data (Figure Annex 5.2). Only time-series with 4 or more years of data were assessed. Each VDSI should be weighted by the number of females in the sample, but this is not submitted to the ICES database. Each VDSI was therefore weighted by half the total number of individuals in the sample (which is submitted).

There were 114 time-series from France, Norway and the UK. The data (circles) and fitted models (solid lines) with pointwise (two-sided) 90% confidence bands (grey shaded areas) are shown in Figure Annex 5.3 a-d. The thin dashed horizontal lines are the boundaries between the assessment classes developed for OSPAR (OSPAR, 2004). A time-series lies in a particular assessment class (or better) with 95% confidence if the upper confidence limit in the final year lies below the corresponding assessment boundary. Some time-series exhibited over-dispersion, and the confidence bands and significance levels have been adjusted to account for this.

Most of the estimated trends are downwards. There are 17 significant downward trends (at the 5% level) and only 1 significant upwards trend, at Scarf Stane in Shetland, UK, which may be due to a large pelagic fishing vessel that often moors close to the monitoring station. There are 1, 14, 38, 58, and 3 time-series in assessment classes A through E respectively.

There are some data that require quality checks. For example, some of the Norwegian data in 2005 are suspiciously low – maybe the VDSI is based on the total number of individuals rather than the number of females? The UK data from Cultra and Carrickfergus are far more variable than any other time-series. And the low VDSI from , UK, is a known error that needs to be corrected in the ICES database.

ICES WGSAEM Report 2007 | 41

4. Discussion

The new mean-variance relationship provides more accurate significance levels and better model diagnostics. The number of females that contribute to each VDSI is needed to improve things further.

There is evidence of over-dispersion in some time-series, so a between-year variance component will need to be introduced. Estimating these variance components will help inform the design of imposex monitoring programmes.

Mean-variance relationships now need to be estimated for other measures of imposex in other species (e.g. VDSI in Neptunea antiqua and PCI in Buccinum undatum).

In principle, modelling individual imposex data using e.g. the proportional odds model should be more informative than modelling summary measures such as VDSI. However, it is unclear whether, in practice, such a change would yield substantial benefits for large-scale assessments. Nevertheless, access to individual data would allow for this possibility in the future, and would help to estimate appropriate mean-variance relationship for use in VDSI assessments. Therefore, submission of individual data to the ICES database should be encouraged.

Acknowledgements Many thanks to John Thain for providing the England & Wales data.

5. References Fryer, R. J., and Gubbins, M. J. 2006a. An assessment of temporal trends in VDSI. Annex 9 to ICES WGBEC Report, 2006. Fryer, R. J., and Gubbins, M. J. 2006b. A regional assessment of VDSI in dogwhelks from Sullom Voe and Yell Sound. Annex 10 to ICES WGBEC Report, 2006. McCullagh, P., and Nelder, J. A. 1989. Generalized Linear Models (second edition). Chapman & Hall, London. OSPAR. 2004. Proposal for Assessment Criteria for TBT – Specific Biological Effects. ASMO 04/3/3. OSPAR Environmental Assessment and Monitoring Committee, Stockholm, 29 March–2 April 2004.

42 | ICES WGSAEM Report 2007

VDS Perharidy Beg an Fri Kerfissien Pordic Baie d Ecalg France 4 2 0

Quiberon Plouézoc'h Pointe de la Cap Lévy Wimereux sud

4 2 0

N.D. de la M Cran aux Boe Cap Gris Nez Le Bois aux Granville

4 2 0

Saint Samson Le Portel 2 Grève du Man Luc sur Mer Wimereux nor

4 2 0

La Bernerie Boulogne Cap de la Ha Tévenn Pointe de Na

4 2 0

Ambleteuse - Station Océa Audresselles Anse Saint-M Pointe de Tr

4 2 0

Cap de la Ch Pte du Touli Mousterlin Pointe de Qu Concarneau

4 2 0

St Jouin Bru Kerpape Le Becquet 2 Lesconil Pointe de Co

4 2 0 1990 2000 1990 2000 1990 2000 1990 2000 1990 2000

Figure Annex 5.3a. Assessment plots of VDSI in Nucella lapillus for locations in different countries. The data (circles) and fitted models (solid lines) with pointwise (two-sided) 90% confidence bands (grey shaded areas) are shown. The thin dashed horizontal lines are the boundaries between the assessment classes developed for OSPAR (See text for further details).

ICES WGSAEM Report 2007 | 43

VDSI

France Le Tronquay Locqueltas Port-Louis Le Croquet Port des Fla 4

2

0

Pointe de Co Pointe du Gâ Pointe St Ma Etretat Cap de la Hè

4

2

0

Pointe de Ke Porz ar Basc Vaucottes - Port du Blos Lomergat 2

4

2

0

Bénodet Pointe du Ta Gâvres Le Fret Roscanvel

4

2

0

Pointe Jumen Camaret Saint-Andrie Grainval Villerville

4

2

0

Beg Meil Rostiviec Plage du Per Sainte Barbe Phare du Por

4

2

0

Larmor Plage Bruneval Plage de la Le Guilvinec Penmarc'h 2

4

2

0

Mengant Le Caro Veulettes/me larmor Pointe de La

4

2

0 1990 2000 1990 2000 1990 2000 1990 2000 1990 2000

Figure Annex 5.3b. Assessment plots of VDSI in Nucella lapillus for locations in different countries. The data (circles) and fitted models (solid lines) with pointwise (two-sided) 90% confidence bands (grey shaded areas) are shown. The thin dashed horizontal lines are the boundaries between the assessment classes developed for OSPAR (See text for further details).

44 | ICES WGSAEM Report 2007

VDSI

France Digue Vieux 4

2

0

Norway Brashavn Lastad Gåøy Færder Risøy 4

2

0

Espevær vest Smørstakk svolvær områ Fugløyskjær Melandholmen

4

2

0

Heggjelen

4

2

0

UK Norther Geo Easterwick The Brough Burgo Taing 4

2

0

Samphrey/The Billia Skerr East of Olla Moss Bank/Gr

4

2

0

Voxter Ness Little Roe Tivaka Taing Grunn Taing The Kames

4

2

0

Scarf Stane Skaw Taing Northward Mavis Grind Noust of Bur

4

2

0 1990 2000 1990 2000 1990 2000 1990 2000 1990 2000

Figure Annex 5.3c. Assessment plots of VDSI in Nucella lapillus for locations in different countries. The data (circles) and fitted models (solid lines) with pointwise (two-sided) 90% confidence bands (grey shaded areas) are shown. The thin dashed horizontal lines are the boundaries between the assessment classes developed for OSPAR (See text for further details).

ICES WGSAEM Report 2007 | 45

VDSI

UK Cultra Carrickfergu 4

2

0 1990 2000 1990 2000

Figure Annex 5.3d. Assessment plots of VDSI in Nucella lapillus for locations in different countries. The data (circles) and fitted models (solid lines) with pointwise (two-sided) 90% confidence bands (grey shaded areas) are shown. The thin dashed horizontal lines are the boundaries between the assessment classes developed for OSPAR (See text for further details).

ICES WGSAEM Report 2007 | 46

Annex 6: Dealing with less-thans in the OSPAR MON assessments of contaminants in biota

Rob Fryer, Fisheries Research Services, UK

1. Introduction

Less-than measurements have long been an issue in the OSPAR MON assessments of contaminants in biota. The current approach is to summarise the contaminant measurements each year by the median log-concentration and to then fit a smoother to these annual indices1 This deals effectively with less-thans provided that,

• less than half the observations each year are less-thans, • the less-thans are the lowest concentrations,

since the median log-concentrations are then based on real measurements. Unfortunately, there are many time-series where the approach doesn’t work, particularly shellfish time-series, which often have only one observation each year. Inferences based on the smoother analysis will be incorrect if many observations are less-thans because the distributional assumptions behind the analysis are not met. Spurious trends might also be reported if limits of detection change over time. Traditional remedies, such as replacing the less-thans by half or two-thirds of the reported values, fail to address the problem.

Table Annex 6.1 shows the extent of the problem. For some contaminants, over half the observations are less-thans.

Table Annex 6.1. The number of observations(n) used in the OSPAR MON 2006/7 assessments with the percentage of less than values (%lt) and the percentage of less-thans reported above the Background Assessment Concentration (>BAC).

SHELLFISH FISH LIVER FISH MUSCLE n %lt >BAC n %lt >BAC1 n %lt >BAC2 Cd 4197 0.4 0.0 9349 2.5 44 15.9 Cu 4121 0.2 0.0 6155 0.0 2300 3.1 Hg 4198 1.6 0.2 559 0.4 11325 0.9 0.6 Pb 4197 2.2 0.0 7549 31.3 53 37.7 Zn 4150 0.2 0.0 6094 0.0 2297 0.5 Tbtin 382 1.6 0.8 Flu 1186 1.2 0.3 BAP 1120 24.4 0.5 Bbghip 1143 12.1 0.2 CB118 2923 6.7 0.1 6507 1.0 2836 3.3 CB153 2763 3.1 0.0 6218 0.1 0.0 2891 0.7 α-HCH 1908 55.6 0.1 4370 35.2 2909 42.0 γ-Hchg 2392 25.0 0.8 4359 25.9 2922 25.1 Ddepp 1883 4.2 0.1 4335 0.0 2658 0.4 HCB 1632 57.1 1.0 5752 9.6 2966 6.0 1 There is only a BAC for cb153 in fish liver 2 There is only a Background Reference Concentration for mercury in fish muscle.

1 Provided there are 7+ years of data; shorter time-series are summarised by a linear trend or a mean. ICES WGSAEM Report 2007 | 47

Most of the less-thans are below the BAC. This is good news because they still provide information that can be used to test whether concentrations are ‘close to background’; i.e. below the BAC. A simple way of doing so is to regard each observation as a Bernoulli random variable that takes the value 0 if the concentration is below the BAC and 1 otherwise. In principle, the 0-1 observations can be modelled in a similar manner to the median log- concentrations, by fitting a smoother but with binomial rather than Normal errors. The smoother can be used to test for trends, by analysis of deviance, and whether concentrations are close to background, by comparing the fitted value in the final monitoring year to the 50% line. The 50% line has an analogous role to the BAC, since if the underlying median concentration is below the BAC, the probability that a concentration is below the BAC is less than 50%. In practice, the approach fails miserably because too often either all the (Bernoulli) observations are zeros (i.e. all the concentrations are below the BAC) or there is a step change from ones to zeros, both of which cause problems for standard generalised additive (or linear) modelling software. However, the 0-1 approach can still be used if we recognise that:

• the MON assessments leading up to the 2010 Quality Status Report (QSR) will focus on trends and levels in recent years (the ten monitoring years before the assessment) with historic trends being less important; • downward trends are not very interesting when concentrations are close to background.

Thus, when a time-series has many less-than values in recent years, the questions of interest are:

• are recent levels below the BAC? • are levels increasing in recent years?

We can test for these as follows. Restrict the data to the ten years leading up to the QSR (I

have used 1996–2005 in the examples that follow). Suppose that in year ti there are ni

observations of which yi are above the BAC. Then

• test whether levels are below the BAC by assuming the yi arise from a common binomial distribution with probability p, computing an upper one-sided 95% confidence limit on p using the cumulative distribution function of the binomial distribution, and comparing this value to 0.5;

• test for an upward trend by a one-tailed permutation test using the statistic Σi ti yi The time-series shown in Figure Annex 6.1 is typical. The left plot shows α-HCH concentrations in blue mussel from a Norwegian site. The plotting symbols are blue for real measurements and pink for less-thans. The plotting symbol is bigger if there is evidence of good Quality Assurance (QA). The fitted line is the smoother from the OSPAR MON 2006/7 assessment in which all concentrations were treated as real measurements. The right plot shows the data converted to the 0-1 scale. All the observations are below the BAC and the probability that an observation is below the BAC is significantly below 0.5.

blue mussel Nor 11a brashav 1.0 1.0 0.6 0.4 0.5

0.2 0.0 1998 2002 1998 2002

Figure Annex 6.1. α-HCH (μg/kg dry_weight) concentrations in blue mussel from a Norwegian site (see text for details).

48 | ICES WGSAEM Report 2007

To include this method in an assessment, we need some rules for deciding whether to assess a time-series on the concentration or the 0-1 scale. An important consideration is that assessments on the concentration scale will tend to be more powerful. Thus, if a time-series consists of historic less-thans and recent real measurements, we should assess the recent portion of the time-series on the concentration scale, rather than the whole time-series on the 0-1 scale. The Figures Annex 6.2 a-g shows the result of an assessment of all time-series of α- HCH concentrations in shellfish with 5+ years of data where I adopted the following:

• analyse on the concentration scale provided that there is no more than one year with less-thans and this year is not the first year of the time-series2; • else, truncate the time-series (keeping the most recent years) so that there is no more than one year with less-thans and this year is not the first year of the time series – analyse on the concentration scale if there are 5+ years of data; • else, truncate the time-series further so that there are no years with less-thans – analyse on the concentration scale if there are 3+ years of data; • else, analyse the data from 1996–2005 on the 0–1 scale.

The balance is about right. There is some loss of information about historic trends, but spurious trends due to changing limits of detection are also lost. The analysis of truncated time-series on the concentration scale and of time-series on the 0–1 scale both give sensible results when compared to the plots of the original data.

The method for dealing with less-thans has been kept simple deliberately. There are more powerful, but more complicated, methods for modelling less-thans (e.g. likelihood methods) that might be appropriate for some time-series. However, one of the aims of the MON assessment is to use methods that are applicable across a wide range of time-series. Besides, the proposed method does the job.

It would be possible to test the 0-1 data for both upwards and downwards trends, but this would reduce the power for detecting upwards trends because the test would become two- tailed. Actually, I suspect that testing for upwards trends in the 0-1 data is not worth the effort. I have yet to find a time-series that would be analysed on the 0-1 scale and would show a significant upward trend. This is largely because a significant upward trend would require concentrations above the BAC in the most recent years. But these would tend to be real measurements (because most less-thans are below the BAC), so the data would be assessed on the concentration rather than the 0–1 scale.

The analysis on the 0–1 scale implicitly assumes that there is no between-year variation in the probability that an observation is below the BAC. We have to accept this for shellfish because there is usually no information in the data to estimate the between-year variation. The situation might be different for fish, where there are more observations each year, and I should trial the method on mercury in fish muscle. More generally, we need BACs for the other contaminants in fish. These can be constructed for man-made substances using the MON monitoring data. However, for natural substances, Background Concentrations are needed before BACs can be constructed.

In the analysis on the concentration scale, I have treated the few less-thans that remain as if they were real measurements. They could be replaced by half or two-thirds of their reported values, but inspection of the time-series plots suggests that this would make little difference to the assessment. Treating less-thans as real measurements can be thought of as precautionary, because the median log-concentrations will then be positively biased and it will be harder to

2 I intend to replace this by insisting that there is no more than one year when half or more of the observations are less-thans or when any of the less-thans are at or above the median concentration.

ICES WGSAEM Report 2007 | 49

demonstrate that concentrations are close to background. I gave less-thans above the BAC the value 1 in the 0-1 analyses; again, this can be thought of as precautionary.

The analyses on the concentration scale potentially down-weight observations with poor or missing QA. I do not think such an approach should be considered for the analyses on the 0-1 scale. If we dislike working with less-than data with poor QA3 4, we should delete them and forget about rescuing poor data with statistical wizardry.

3 The 0-1 analyses only use recent data (the last ten years), so the QA information should not be missing. 4 I intend to replace this by insisting that there is no more than one year when half or more of the observations are less-thans or when any of the less-thans are at or above the median concentration.

50 | ICES WGSAEM Report 2007

region I blue mussel Ice dvergastein blue mussel Ice eyri hvalfj 0.6 0.6 0.8 0.8 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2

1996 2000 2004 1996 2000 2004 1994 1998 2002 1994 1998 2002

blue mussel Ice grimsey blue mussel Ice hvalstöd hv

1.0 1.0 0.8 0.8 0.5 0.5 0.4 0.4 0.2 0.2 0.1 0.1 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

blue mussel Ice hvassahraun blue mussel Ice hvitanes hv 3 3 0.6 0.6 1 1 0.4 0.4

0.2 0.2

1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

blue mussel Ice mjófjördur blue mussel Ice mjófjördur

1.0 1.0 0.8 0.8 0.5 0.5 0.4 0.4 0.2 0.2

1998 2002 1998 2002 1996 2000 2004 1996 2000 2004

blue mussel Ice mjófjördur blue mussel Ice straumur st

0.6 0.6 0.8 0.8 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 1996 2000 2004 1996 2000 2004 1994 1998 2002 1994 1998 2002

blue mussel Ice ulfsá skutu blue mussel Nor 10a skallne 0.8 0.8 1.0 0.4 0.4 0.8 0.2 0.2 0.4 0.5

0.2 0.0 1998 2002 1998 2002 1994 1998 2002 1994 1998 2002

blue mussel Nor 11a brashav blue mussel Nor 98a svovlvæ 1.0 1.0 1.0 2.0 0.6 0.8 0.4 0.5 0.5 0.4 0.2 0.0 0.0 1998 2002 1998 2002 1998 2002 1998 2002

region II blue mussel Den arhus bugt blue mussel Den arhus havn 1.0 1.0 1.0 0.8 0.8 0.5 0.5 0.6 0.6 0.4 0.0 0.0 1999 2001 2003 2005 1999 2001 2003 2005 1999 2001 2003 2005 1999 2001 2003 2005

Figure Annex 6.2a. Assessment of α-HCH (μg/kg dry_weight) in shellfish from locations in different countries. The left plot shows the concentrations. The plotting symbols are blue for real measurements and pink for less-thans. The plotting symbol is bigger if there is evidence of good Quality Assurance (QA). The fitted line is the smoother from the OSPAR MON 2006/7 assessment in which all concentrations were treated as real measurements. The right plot shows the data converted to the 0-1 scale (see text for further details).

ICES WGSAEM Report 2007 | 51

region II blue mussel Den great belt blue mussel Den little belt 1.0 1.0 1.6 0.8 1.2 0.5 0.6 0.5 0.8 0.5 0.6 0.0 0.0 1998 2002 1998 2002 1998 2002 1998 2002

blue mussel Den nivå bugt 3 blue mussel Den randers fjo 1.6 1.0 1.8 1.0 1.2 1.4 0.5 0.5 1.0 0.8 0.8 0.6 0.0 0.0 1998 2002 1998 2002 1999 2001 2003 2005 1999 2001 2003 2005

blue mussel Den roskilde fj blue mussel Den roskilde fj 1.0 1.0 1.6 1.5 1.2 1.0 0.5 0.5 0.8 0.5 0.6 0.0 0.0 1998 2002 1998 2002 1998 2002 1998 2002

blue mussel Den the sound - blue mussel Den the sound s 1.0 1.0 1.0 1.0 0.8 0.5 0.8 0.5 0.6 0.6 0.0 0.0 1998 2002 1998 2002 1998 2002 1998 2002

blue mussel Den wadden sea blue mussel Den wadden sea 1.0 1.0 0.8 0.8 0.6 0.5 0.6 0.5 0.5 0.0 0.4 0.0 1998 2002 1998 2002 1998 2002 1998 2002

pacific oyster Fra aber benoit blue mussel Fra antifer dig 5.0 5.0 5.00 1.0 0.50 0.5 0.5 0.5 0.05 0.1 0.1 0.0 1994 1998 2002 1994 1998 2002 1992 1998 2004 1992 1998 2004

blue mussel Fra authie - be blue mussel Fra baie de la 5.00 1.0 3 1.0 1 0.50 0.5 0.5 0.05 0.0 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

blue mussel Fra baie des ve blue mussel Fra boulogne et 1.0 3 1.0 2 1 0.5 0.5

0.0 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

Figure Annex 6.2b. Assessment of α-HCH (μg/kg dry_weight) in shellfish from locations in different countries. The left plot shows the concentrations. The plotting symbols are blue for real measurements and pink for less-thans. The plotting symbol is bigger if there is evidence of good Quality Assurance (QA). The fitted line is the smoother from the OSPAR MON 2006/7 assessment in which all concentrations were treated as real measurements. The right plot shows the data converted to the 0-1 scale (see text for further details).

52 | ICES WGSAEM Report 2007

region II pacific oyster Fra brest - aul pacific oyster Fra brest - bai 5.0 5.0 3 3 1 1 0.5 0.5

1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

pacific oyster Fra brest - bai pacific oyster Fra brest - elo 3 3 6 6 1 1 2 2

1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

blue mussel Fra calais - du blue mussel Fra calvados ou 5.00 1.0 5.00 1.0

0.50 0.5 0.50 0.5 0.05 0.05 0.0 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

blue mussel Fra calvados po blue mussel Fra cancale - l 4 1.0 5.00 1.0 1 0.5 0.50 0.5 0.05 0.0 0.0 1992 1998 2004 1992 1998 2004 1994 1998 2002 1994 1998 2002

blue mussel Fra dieppe - va blue mussel Fra douarnenez 10.0 1.0 2 2 1.0 0.5 1 1 0.1 0.0 1994 1998 2002 1994 1998 2002 1999 2001 2003 2005 1999 2001 2003 2005

blue mussel Fra fecamp - va blue mussel Fra grande rade 5.00 1.0 4 1.0 1 0.50 0.5 0.5 0.05 0.0 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

blue mussel Fra lannion - s blue mussel Fra ouest coten 3 1.0 5.00 1.0 1 0.5 0.50 0.5 0.05 0.0 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

blue mussel Fra ouest coten pacific oyster Fra paimpol - b 1.0 5 5 1.5 0.5 0.5

0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

Figure Annex 6.2c. Assessment of α-HCH (μg/kg dry_weight) in shellfish from locations in different countries. The left plot shows the concentrations. The plotting symbols are blue for real measurements and pink for less-thans. The plotting symbol is bigger if there is evidence of good Quality Assurance (QA). The fitted line is the smoother from the OSPAR MON 2006/7 assessment in which all concentrations were treated as real measurements. The right plot shows the data converted to the 0-1 scale (see text for further details).

ICES WGSAEM Report 2007 | 53

region II mediterranean mussel Fra rance - la mediterranean mussel Fra saint brieu 5.00 1.0 3 1.0 1 0.50 0.5 0.5 0.05 0.0 0.0 1999 2001 2003 2005 1999 2001 2003 2005 1994 1998 2002 1994 1998 2002

blue mussel Fra saint vaast blue mussel Fra seine cap d 1.5 1.0 8 1.0 0.5 2 0.5 0.5

0.0 0.0 1994 1998 2002 1994 1998 2002 1992 1998 2004 1992 1998 2004

blue mussel Fra seine ville blue mussel Fra somme - poi 5.00 1.0 5.00 1.0

0.50 0.5 0.50 0.5 0.05 0.05 0.0 0.0 1992 1998 2004 1992 1998 2004 1994 1998 2002 1994 1998 2002

blue mussel Ger elbe outer blue mussel Nor 15a ullerø 4.0 4.0 1.0 2.0 2.0 0.8 0.5 0.8 0.8 0.4 0.4 0.4 0.2 0.2 0.2 0.0 1990 2000 1990 2000 1994 1998 2002 1994 1998 2002

blue mussel Nor 22a espevær blue mussel Nor 30a gressho 1.0 1.0 1.0 1.0 0.6 0.5 0.5 0.4 0.5 0.0 0.0 1992 1998 2004 1992 1998 2004 1992 1998 2004 1992 1998 2004

blue mussel Nor 31a solberg blue mussel Nor 35a mølen 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.5

0.0 0.0 1992 1998 2004 1992 1998 2004 1992 1998 2004 1992 1998 2004

blue mussel Nor 36a færder blue mussel Nor 51a byrkjen 2.0 1.0 1.5 1.0 1.0 0.8 0.5 0.5 0.4 0.5 0.2 0.0 0.0 1992 1998 2004 1992 1998 2004 1996 2000 2004 1996 2000 2004

blue mussel Nor 52a eitrhei blue mussel Nor 56a kvalnes 1.0 1.0 2.0 2.0 0.8 0.5 0.8 0.5 0.4 0.4 0.0 0.2 0.0 1992 1998 2004 1992 1998 2004 1992 1998 2004 1992 1998 2004

Figure Annex 6.2d. Assessment of α-HCH (μg/kg dry_weight) in shellfish from locations in different countries. The left plot shows the concentrations. The plotting symbols are blue for real measurements and pink for less-thans. The plotting symbol is bigger if there is evidence of good Quality Assurance (QA). The fitted line is the smoother from the OSPAR MON 2006/7 assessment in which all concentrations were treated as real measurements. The right plot shows the data converted to the 0-1 scale (see text for further details).

54 | ICES WGSAEM Report 2007

region II blue mussel Nor 57a krossan blue mussel Nor 63a ranaskj 1.0 2.0 1.0 1.5 0.8 0.5 0.5 0.5 0.4 0.2 0.0 0.0 1992 1998 2004 1992 1998 2004 1992 1998 2004 1992 1998 2004

blue mussel Nor 65a vikingn blue mussel Nor 69a lille t 2.0 1.0 1.0 1.0 0.8 0.5 0.4 0.5 0.5 0.2 0.0 0.0 1992 1998 2004 1992 1998 2004 1992 1998 2004 1992 1998 2004

blue mussel Nor 71a bjørkøy blue mussel Nor 76a risøy 1.0 1.0 2.0 2.0 0.8 0.5 0.8 0.5 0.4 0.4 0.0 0.0 1992 1998 2004 1992 1998 2004 1992 1998 2004 1992 1998 2004

region IV pacific oyster Fra adour pacific oyster Fra arcachon - 3 1.0 5.00 1.0 1 0.5 0.50 0.5 0.05 0.0 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

pacific oyster Fra arcachon - pacific oyster Fra arcachon - 5.00 1.0 1.5 1.0 0.5 0.50 0.5 0.5 0.05 0.0 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

blue mussel Fra audierne - pacific oyster Fra baie de l`a 5.0 5.0 3 1.0 1 0.5 0.5 0.5 0.0 2000 2002 2004 2000 2002 2004 1994 1998 2002 1994 1998 2002

pacific oyster Fra bourgneuf - blue mussel Fra capbreton o 1.0 1.0 3 1.5 1 0.5 0.5 0.5

0.0 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

pacific oyster Fra chatelaillo pacific oyster Fra ciboure - l 3 1.0 3 1.0 1 1 0.5 0.5

0.0 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

Figure Annex 6.2e. Assessment of α-HCH (μg/kg dry_weight) in shellfish from locations in different countries. The left plot shows the concentrations. The plotting symbols are blue for real measurements and pink for less-thans. The plotting symbol is bigger if there is evidence of good Quality Assurance (QA). The fitted line is the smoother from the OSPAR MON 2006/7 assessment in which all concentrations were treated as real measurements. The right plot shows the data converted to the 0-1 scale (see text for further details).

ICES WGSAEM Report 2007 | 55

region IV blue mussel Fra concarneau pacific oyster Fra gironde - b

1.6 1.6 5.0 5.0 1.2 1.2

0.8 0.8 0.5 0.5 0.6 0.6 1999 2001 2003 2005 1999 2001 2003 2005 1994 1998 2002 1994 1998 2002

pacific oyster Fra gironde - l pacific oyster Fra gironde - p 8 1.0 2 5.0 5.0 0.5 0.5 0.5 0.0 1996 2000 2004 1996 2000 2004 1994 1998 2002 1994 1998 2002

pacific oyster Fra hendaye - c blue mussel Fra loire - poi 4 1.0 8 8 1 2 2 0.5

0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

blue mussel Fra loirient - pacific oyster Fra marennes - 1.0 5.00 6 6 0.50 0.5 2 2

0.05 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

pacific oyster Fra marennes - blue mussel Fra marennes - 5.00 5.00 10 10 0.50 0.50

0.05 0.05 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

pacific oyster Fra marennes - pacific oyster Fra marennes - 10.0 10.0 5.0 5.0

1.0 1.0 1.0 1.0 0.1 0.1 0.5 0.5 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

pacific oyster Fra morbihan - pacific oyster Fra morbihan - 3 1.0 8 1.0 1 2 0.5 0.5

0.0 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

pacific oyster Fra noirmoutier pacific oyster Fra pertuis bre 3 1.0 3 1.0 1 1 0.5 0.5

0.0 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

Figure Annex 6.2f. Assessment of α-HCH (μg/kg dry_weight) in shellfish from locations in different countries. The left plot shows the concentrations. The plotting symbols are blue for real measurements and pink for less-thans. The plotting symbol is bigger if there is evidence of good Quality Assurance (QA). The fitted line is the smoother from the OSPAR MON 2006/7 assessment in which all concentrations were treated as real measurements. The right plot shows the data converted to the 0-1 scale (see text for further details).

56 | ICES WGSAEM Report 2007

region IV pacific oyster Fra riec sur be pacific oyster Fra vendee- tal 5.0 5.0 3 1.0 1 0.5 0.5 0.5 0.1 0.1 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

blue mussel Fra vilaine - e blue mussel Fra vilaine - l 5.00 1.0 3 1.0 1 0.50 0.5 0.5

0.05 0.0 0.0 1994 1998 2002 1994 1998 2002 1994 1998 2002 1994 1998 2002

blue mussel Fra vilaine - p 3 1.0 1 0.5

0.0 1994 1998 2002 1994 1998 2002

Figure Annex 6.2g. Assessment of α-HCH (μg/kg dry_weight) in shellfish from locations in different countries. The left plot shows the concentrations. The plotting symbols are blue for real measurements and pink for less-thans. The plotting symbol is bigger if there is evidence of good Quality Assurance (QA). The fitted line is the smoother from the OSPAR MON 2006/7 assessment in which all concentrations were treated as real measurements. The right plot shows the data converted to the 0-1 scale (see text for further details).

ICES WGSAEM Report 2007 | 57

Annex 7: OSPAR periodic evaluations of progress towards the objectives of the radioactive substances strategy: a summary centered on statistical methods, some samples analysis

Philippe Nonclercq, Electricité de France, and

Bénédicte Briand, Institut de Radioprotection et de Sûreté Nucléaire, France

Summary

The OSPAR Radioactive Substances Strategy (RSS) is setting up a statistical method to establish whether progress is made towards its objective, which mainly consists in reducing anthropogenic inputs of radioactive substances in the North-East Atlantic. In the past years, WGSAEM helped to establish such a method for the Hazardous Substances Strategy of OSPAR, whose objective is prety similar to RSS’s. Therefore, although the two methods are sensibly different, difficulties are similar. I therefore ask WGSAEM’s opinion on the method proposed by RSS and on the way to deal with data below limits of detection (LOD) and heterogenous datas, mainly.

1. Introduction

The main objective of the RSS is to reduce anthropogenic inputs of radioactive substances to the North-East Atlantic. Its programme is described in OSPAR RSS, 2001, and OSPAR, 2003.

The first periodic evaluation of progress (OSPAR RSC, 2006) focused on radioactive substances discharges, the second (OSPAR RSC, 2007) is focused on radioactive substances concentrations in seawater and biota.

Both are to be issued in 2007, draft reports are available.

The Radioactive Substance Comittee (RSC) agreed to establish a baseline against which the progress in implementing the strategy can be evaluated in the period from 2003 to the year 2020. In 2003, the baseline period was set to the period 1995-2001. Periodically, later discharges and concentrations levels should be compared to the baseline levels. In complement, trend identification techniques should be applied to the datas from 1998 on.

2. Data

The data consist on:

• discharges: for each year, annual discharges are available for alpha and beta- emitting radioactive substances and Tritium, for each country; • concentrations: annual average concentrations for Tritium, Tc-99, Cs-137 and Pu- 239/240 in 15 OSPAR regions are considered. Problem: LOD are heterogeneous even within regions, and many measurements are below LOD.

3. Tests

Statistical tests are used to determine whether the radioactivity levels have decreased/increased after 2001, in comparison with the baseline period 1995-2001.

Test 1: Simple comparison with the baseline ‘bracket’

See OSPAR RSC, 2006, chapter 3, paragraph ‘simple comparison’.

The baseline period is characterised by its average and standard deviation (m, s).

58 | ICES WGSAEM Report 2007

Levels are assumed to be normally distributed, and each yearly level after 2001 is positioned against the bracket [m-1.96s;m+1.96s]. It is supposed to give a simple indication of a change, although not statistically relevant. The RSC is well aware that the bracket is not exactly the 95% confidence prediction interval for a yearly level (see OSPAR RSC, 2006, Appendix 6), but it is still used.

Test 2: Welch-Aspin test

See OSPAR RSC, 2006, chapter 3, paragraph ‘other comparison methods’.

Once more, levels are assumed to be normally distributed. The Welch-Aspin test is the heteroscedastic version of the Student t-test. The baseline sample is compared to a sample made of the yearly levels of years following 2001 (test sample).

Wilcoxon-Mann-Whitney test

See OSPAR RSC, 2006, chapter 3, paragraph ‘other comparison methods’.

Levels do not need to be normally distributed: Wilcoxon or Mann-Whitney rank tests (they are equivalent) are used to compare the baseline sample and the test sample.

4. Format of results

Results are presented as shown in the Table 7.1.

Table Annex 7.1. Format of results.

BASELINE BASELINE BASELINE RANGE OF MANN- AVERAGE UPPER LOWER DISCHARGES STUDENT’S T WHITNEY 1 2 (TBQ) BRACKET BRACKET IN 2002–2004 PROBABILITY PROBABILITY Overall 4.70E-03 - Total- α 3.81E-02 1.39E-01 03 0.117 0.267 5.00-E04 4.70E-01 – Total- β 2.55E+00 9.29E+00 03 0.157 0.667 4.40E-01

Although RSC knows the bracket used is not correct (+/−1.96 instead of +/−2.4), the “simple comparison method” is always exposed. Moreover, in the second evaluation, OSPAR RSC, 2007, which deals with concentrations, it is the only exploited method, see below for instance:

1 As explained in Chapter 3 (OSPAR RSC, 2006), the interpretation of the Student’s t test depends on the degrees of freedom”. The figure given is the probability that the two “populations” compared (observations for the baseline period 1995–2001 and subsequent observations) have the same mean. It is calculated by comparing the calculated “t” with the a priori possibilities for the “t” distribution for the same degree of freedom. If that probability is less than 0.05, it can be concluded that there is a 95% probability that they are significantly different. 2 As explained in Chapter 3 (OSPAR RSC, 2006), this is the probability that the two “populations” represented by the observations for 1995–2001 (the baseline period) and 2002–2004 are the same. If the probability is below 0.05, it can be concluded that there is a 95% probability that they are significantly different. 3 The baseline lower bracket, as calculated, would be negative. Since a negative discharge is impossible, the bracket is truncated at zero.

ICES WGSAEM Report 2007 | 59

Table Annex 7.2. Results for Tc-99.

BASELINE LOWER BASELINE UPPER STUDENT’S T-TEST MANN-WHITNEY BRACKET BRACKET PROBABILITY PROBABILITY 3.2 11.0 – –

Question : what does WGSAEM think of the decision exposed in TOR 33 and 34 of chapter 3, OSPAR RSC, 2006:

“33. In conclusion, the simple comparison method described above is not very sensitive, and it includes a major risk of type I error. This method should, therefore, only be used as a first simple indicator for the comparison of individual annual releases with the baseline. Because of its serious limitations, other, more precise, methods are more appropriately used, such as those described below.

34. Nevertheless, since the “bracket” was included in the baseline agreed by the 2003 Ministerial Meeting of the OSPAR Commission, it has been retained as a method of comparison. Because the baseline “bracket” was not calculated in the way described above for the PI, any comparison with it cannot be described as giving “statistically significant” results. In the sections where simple comparisons can be made, therefore, the results have been described as indicating (or not) a relevant change (reduction or increase). “

Question: can there be a conclusion if both parametric and non-parametric tests agree, if they disagree with some ‘simple comparisons’?

5. Trend detections

In the longer term, RSC recommends to use techniques similar to those used for the CAMP, CEMP and RID assessments as a secondary method to look at the existence and scale of trends. This, however, requires methods that should keep the global nature of the baseline element and probably a longer run of observations than is available for this first evaluation. Trend assessment has not therefore been attempted.

The future assessments should be applied to the yearly levels since 1998. Recommended methods are those developed over several meetings of the OSPAR Working Group on Monitoring (MON), well known of WGSAEM:

• For each time-series with 7 or more years, trends are summarised by a loess smoother4, a non-parametric curve fitted to the annual data. This summary is supported by a formal statistical test of the significance of the fitted smoother and by tests of the linear and non-linear components of the trend. Few statistical assumptions are required for the fitted smoother to be valid. Mainly, the annual contaminant indices should be independent with a constant level of variability. The validity of the statistical tests also requires the residuals from the fitted model to be normally distributed. The theory and methodology are described in detail in Fryer and Nicholson (1999). • A simpler analysis was adopted for time-series with less than 7 years. For time- series of 3 or 4 years, the average of the annual data was computed. For time- series of 5 or 6 years, a linear regression was fitted to the annual data and the significance of the linear trend assessed.

4 “loess” is a portmanteau word (like the name “OSPAR”), derived from “locally weighted regression estimate”. Hence it is sometimes written “lowess”.

60 | ICES WGSAEM Report 2007

6. Problems

LOD

Discharges data are quite well controlled: baseline averages and standard deviation could be calculated; tests were conducted satisfactorily.

Concentration data are very problematic: some concentrations are very low, often below LOD, and LOD can be highly heterogeneous within the same OSPAR region. For example, natural Tritium can be found in seawater, with a concentration of roughly 0.1 Bq/l. In France, LOD for Tritium is about 10 Bq/l, and it is below 0.1 Bq/l in Spain. Some French and Spanish measures can be found in the same OSPAR region.

Attempts to calculate yearly levels were nevertheless conducted in this case: the text below is quoted from the second assessment:

Some baseline values were calculated using all or some/most results below analytical detection limits. Where this occurs, such values are identified in the tables (see below for example) through use of italics (all results below detection limits) or bold italics (some/most results below detection limits). The real baseline values may well be less than the values given, but there is no way of knowing on the information available. Values calculated using all or some/most results below detection limits are reported without any component for variability. This convention for the reporting of baseline values has been maintained where appropriate for all data reported from 2002 to 2005.

Some examples of data presentation are given in the tables below:

CS-137 TC-99 PU-239,240

Year n Mean SD n Mean SD n Mean SD < 2002 50 2.4 0.6 36 < 0.1 - 15 - 0.0032 Value for Cs-137 derived from values above detection limits only; Value for Tc-99 derived from values below detection limits only; Value for Pu-239,240 derived from values above and below detection limits.

BIOTA

Cs-137 (Bq/kg w_w)1 Tc-99 (Bq/kg w_w)1 Pu-239,240 (Bq/kg w_w)1 Year Type n Mean SD Type n Mean SD Type n Mean SD 1995 F 3 < 0.000039 – 1996 F 3 0.000048 0.000014 1997 F 8 < 0.000044 – 1998 F 8 < 0.000073 – 1999 F 1 < 0.000026 – 2000 2001 Baseline S 133 <0.15 – S 93 7.1 2.0 F 23 < 0.000046 – 8 0.06 0.01 2002 S S 6 5.7 1.1 9 <0.11 – 8 0.07 0.02 2003 S S 3 5.5 0.6 9 <0.17 – 8 0.08 0.03 2004 S S 5 5.9 1.5 9 <0.24 – 8 0.08 0.02 2005 S S 3 4.1 1.6 9 <0.19 – 1 w_w: wet_weight

ICES WGSAEM Report 2007 | 61

Where large differences in magnitude occur between data reported above and below detection limits for a radionuclide in the same environmental compartment for the same region, two sets of annual averages have been reported for the radionuclides for the applicable years. One average based only on data above detection limits and one average based on all available data (i.e. data above and below detection limits). Differences in magnitude between values above and below detection limits may result from differences in sampling and analytical methodologies within and between contracting parties.

Comparison tests were only performed on the first type of values: all data above LOD.

Question: does WGASEM suggest another method, which could deal with heterogenous LODs? Can’t we conduct some tests, even with datas below LOD?

Question: what does WGSAEM think of the data presentation with two sets of annual averages: the first made out of the only data above LOD, the second on all available data?

Within-year variability

Within-year variability is not exploited in the comparison tests. Some possible seasonal effects are thus cancelled, but large natural variability may be ignored.

Question: can WGSAEM suggest efficient methods to take into account a within-year variability?

Heterogenous data

In a same OSPAR region, a single post may produce several measurements each year, while another one may produce only a few measurements over the whole examination period. This difference is often a difference between contracting parties measurement methods.

Question: Does WGSAEM know a way to deal with such a disparity between data?

Cs-137

Cs-137 concentration is influenced by historical discharges: nuclear bombs atmospheric deflagrations and the Chernobyl accident. These produced an overall quantity of Cs-137, which decreases with time (its half-life is 30 years). Even though the overall quantity of historical Cs-137 decreases with an exponential law, it is not the case for its concentration in a given OSPAR region: this concentration depends on marine fluxes, land transfers, rains, etc. Physical models are not able yet to assess a historical background concentration for Cs-137.

Question: does WGSAEM have any idea on how to take this moving background concentration into account in the comparison assessments, or to assess whether the Cs-137 concentrations are close to the background concentration?

7. Analysis of sample datasets

Two sample datasets were analysed: the first one is Cs-137 activity in seaweed, region 2; the second one is Cs-137 activity in seawater, region 1.

Sample 1: Cs-137 activity in seaweed, region 2

Complete time-series of the French measurements of Cs-137 concentration in seaweed from 1995 to 2005, OSPAR zone 2. All French data were examined by B. Briand, 2007, and plotted in the Figures Annex 7.1–7.13.

62 | ICES WGSAEM Report 2007

1,4

1,2 Catégorie

1

0,8 LOD M 0,6 Concentration 0,4

0,2

0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Date

Figure Annex 7.1. Cs-137 activity concentration (Bq/kg wet_weight) in seaweed, region 2. The plotting symbols are green (M) for measured values and red (LOD) for values below LOD.

Conclusions:

• At least one obvious outlier in LOD category; • LOD are significantly above measured values after 2000.

Two laboratories made measures in 9 different locations. The results are shown in the Tables Annex 7.3 and 7.4 and in the Figures Annex 7.2 and 7.3.

Table Annex 7.3. The frequency of measured values (M) and values below LOD (LOD) analysed by the laboratory OPRI at different locations in France.

LABORATORY OPRI Frequency Location Category Barneville Granville Moulinets Sciotot Siouville Total LOD 35 27 29 7 24 122 M 9 12 15 4 9 49 Total 44 39 44 11 33 171

1,4

1,2 Catégorie 1

0,8 LOD 0,6 M 0,4 Concentration 0,2 0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Date

Figure Annex 7.2. Cs-137 activity concentration (Bq/kg wet_weight) in seaweed, region 2, OPRI laboratory. The plotting symbols are green (M) for measured values and red (LOD) for values below LOD.

ICES WGSAEM Report 2007 | 63

Table Annex 7.4: The frequency of measured values (M) and values below LOD (LOD) analysed by the laboratory LRC at different locations in France.

LABORATORY LRC Frequency Location Category Carteret Diellette Fermanville Herquemoulin Total LOD 14 1 5 20 M 30 42 44 33 149 Total 44 43 44 38 169

0,8 0,7 Catégorie 0,6 0,5 M 0,4 LOD 0,3

Concentration 0,2 0,1 0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Date

Figure Annex 7.3. Cs-137 activity concentration (Bq/kg wet_weight) in seaweed, region 2, LRC laboratory. The plotting symbols are green (M) for measured values and red (LOD) for values below LOD.

Conclusions:

• OPRI laboratory is less efficient than LRC laboratory: 122 LOD for 49 M (measured value) against 20 LOD for 149 M. • OPRI laboratory: LODs are roughly constant. Real values decreased below LOD after 1999. • LRC laboratory: LODs before 2004 are equal to measured values: they can be considered as such, because of the high uncertainty on measured values when they are close to LOD (50%). After 2004, LODs are significantly higher than measured values (roughly twice as high), and they could as well be considered as absent data, due to a non-negligible quantity of measured values.

64 | ICES WGSAEM Report 2007

Plots per location

Data were plotted by location in order to highlight any location or geographical effect.

Barneville: This location hardly shows any measured values

0,5 0,45 0,4 Catégorie 0,35 0,3 M 0,25 0,2 LOD 0,15 Concentration 0,1 0,05 0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Date

Figure Annex 7.4. Cs-137 activity concentration (Bq/kg wet_weight) in seaweed at Barneville. The plotting symbols are green (M) for measured values and red (LOD) for values below LOD.

Carteret: LOD data seem to follow the behaviour of the measured data, with the exception of the last data of 2005.

0,3

0,25 Catégorie

0,2 M 0,15 LOD 0,1 Concentration 0,05

0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05

Date

Figure Annex 7.5. Cs-137 activity concentration (Bq/kg wet_weight) in seaweed at Carteret. The plotting symbols are green (M) for measured values and red (LOD) for values below LOD.

Dielette: Only one LOD at this location, and this LOD is consistent with the measured values.

0,6

0,5 Catégorie

0,4 M 0,3 LOD 0,2 Concentration 0,1

0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Date

Figure Annex 7.6. Cs-137 activity concentration (Bq/kg wet_weight) in seaweed at Dielette. The plotting symbols are green (M) for measured values and red (LOD) for values below LOD.

ICES WGSAEM Report 2007 | 65

Fermanville: No LOD at this location.

M

0,5

0,4 Catégorie

0,3 M 0,2

Concentration 0,1

0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Date

Figure Annex 7.7. Cs-137 activity concentration (Bq/kg wet_weight) in seaweed at Fermanville. The plotting symbols are green (M) for measured values and red (LOD) for values below LOD.

Granville: LOD are consistent with measured values during the baseline period. One can suspect a decrease in concentrations over time since LOD levels have nearly not changed over time and the ratio of measured concentrations have decreased dramatically over the years.

0,4 0,35 Catégorie 0,3 0,25 M 0,2 LOD 0,15

Concentration 0,1 0,05 0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Date

Figure Annex 7.8. Cs-137 activity concentration (Bq/kg wet_weight) in seaweed at Granville. The plotting symbols are green (M) for measured values and red (LOD) for values below LOD.

Herquemoulin: LOD only occur after the end of 2003. Their values are consistent with former measured values.

0,8 0,7 Catégorie 0,6 0,5 M 0,4 LOD 0,3

Concentration 0,2 0,1 0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Date

Figure Annex 7.9. Cs-137 activity concentration (Bq/kg wet_weight) in seaweed at Herquemoulin. The plotting symbols are green (M) for measured values and red (LOD) for values below LOD.

66 | ICES WGSAEM Report 2007

Moulinets: The high LOD value can obviously be treated as an absent value.

1,4

1,2 Catégorie 1

0,8 LOD 0,6 M

0,4 Concentration 0,2

0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Date

Figure Annex 7.10: Cs-137 activity concentration (Bq/kg wet_weight) in seaweed at Moulinets. The plotting symbols are green (M) for measured values and red (LOD) for values below LOD.

Sciotot: Measured values have disappeared over time, while as the real values must have sunk below the LOD levels (0.15/0.3 mBq/l) (same pattern as Granville).

0,4 0,35 Catégorie 0,3 0,25 M 0,2 LOD 0,15

Concentration 0,1 0,05 0 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Date

Figure Annex 7.11: Cs-137 activity concentration (Bq/kg wet_weight) in seaweed at Sciotot. The plotting symbols are green (M) for measured values and red (LOD) for values below LOD.

Siouville: Measured values have disappeared over time, while the real values must have sunk below the LOD levels (0.15/0.3 mBq/l) (same pattern as Granville and Sciotot).

0,4 0,35 Catégorie 0,3 0,25 M 0,2 LOD 0,15

Concentration 0,1 0,05 0 juin-94 oct-95 m ars-97 juil-98 déc-99 avr-01 sept-02 janv-04 m ai-05 oct-06

Date

Figure Annex 7.12. Cs-137 activity concentration (Bq/kg wet_weight) in seaweed at Siouville. The plotting symbols are green (M) for measured values and red (LOD) for values below LOD.

ICES WGSAEM Report 2007 | 67

Map of locations

Figure Annex 7.13. Map of measurement locations. The red box indicates nearly only LOD, the green nearly only measured values; the green stripes mixture of LODs and measured values (most of the time, LODS are predominant after 1998/1999). The laboratories are indicated with italic for LRC and normal for OPRI.

Note:

The laboratory factor seems to be predominant over the topographic factor for the LOD ratio.

The only reason why Fermanville and Dielette only had measured value, even in the recent years, is because LOD was the smallest at these locations (below 0.05 mBq/l).

Sample 2: Cs-137 in seawater, region 1

Complete time-series of French, Spanish and Irish measurements of Cs-137 concentration in seawater from 1995 to 2005, OSPAR zone 1.

1 France LOD Spain LOD Spain M 0,1 Ireland M Ireland LOD 0,01

Cs-137 inCs-137 seawater 0,001

0,0001 juin-94 oct-95 mars-97 juil-98 déc-99 avr-01 sept-02 janv-04 mai-05 oct-06

Figure Annex 7.14. Cs-137 activity concentration (mBq/l) in seawater, region 1. The plotting symbols refer to the different countries. The symbol M is for the measurement and LOD is for the value below LOD.

Conclusion:

French data are all below LOD. The LOD is obviously too high over the period. Those data cannot be used at all.

68 | ICES WGSAEM Report 2007

All measured data are at least 10 times smaller than the LODs (with one exception: Irish LOD). Only measured data should be taken into account. All data below LOD can be discarded due to the poor information they carry.

References Briand, B. 2007. Characterization of radioactive measurements of French marine environment in the framework of OSPAR. Note DEI/SESURE 2007-13. IRSN. OSPAR. 2003 Progress Report on the More Detailed Implementation of the OSPAR Strategy with regard to Radioactive Substances. Summary Record OSPAR 03/17/1-E, Annex 30. OSPAR RSS. 2001, Programme for the More Detailed Implementation of the OSPAR Strategy with regard to Radioactive Substances. Reference number: 2001-3 OSPAR RSC, 2006. First Periodic Evaluation of Progress towards the Objective of the OSPAR Radioactive Substances Strategy. RSC 06/2/1. OSPAR RSC, 2007. Second Periodic Evaluation of Progress towards the Objective of the OSPAR Radioactive Substances Strategy. RSC 07/2/1.