Using the PCI-LF - a draft user guide

Paul Tett

Napier University, Edinburgh

(v1.2) 2006

This document explains how to use the Matlab script PCI1ED2 and provides some explanatory material to help users of the PCI-LF tool. It is best read before using the script and tool, but you may go directly to the practical instructions (section 5) and run the script with the test data provided, reading the explanatory material afterwards. A licensed copy of Matlab version 7 or later is needed in order to run the script.

Table of contents 1. and sources of phytoplankton data...... 2 2. What is phytoplankton community structure and what is a PCI? ...... 3 3. The PCI-LF explained...... 4 4. Lifeform theory ...... 8 5. Using the Matlab script PCI1ED2 to calculate the PCI...... 10 Overview...... 10 Running the Matlab script...... 11 Data file preparation ...... 12 Preparing the run control file ...... 12 Problems...... 13 6. Interpreting the PCI-LF...... 14 7. Acknowledgements...... 16 8. References...... 17

Figures and Tables will be found on the page(s) following their legend(s) 1. Phytoplankton and sources of phytoplankton data

Phytoplankton is the community of pelagic photosynthetic micro-organisms. Typically, this community includes many species: these, and individual organisms, are referred to as phytoplankters. The species, as exemplified in Figure 11, belong to several high-level taxa, such as the (Baccillariophyta), (Dinophyta) and prymnesiophyte (Haptophyta). Views on the status and relationship of these high-level taxa are rapidly changing as a result of sequencing of the nucleic acids that dictate the genotypes of the member species. Table 1* gives a current summary. Species are sometimes also categorized according to lifeform, exemplified here by pelagic diatoms - those members of the Bacillariophyta living in the water column, in contrast to benthic diatoms, found in or on the seabed.

Phytoplankton abundance and species composition is highly variable in time and space. This variability can be monitored in two main ways. The major taxa differ in their photosynthetic pigments, and it has been argued that chemical measurements of the concentrations of these pigments in samples of seawater, or measurements of ocean colour (by remote, or in-water, sensing) or pigment-specific fluorescence emission (in flow cytometers), can be used for efficient monitoring. However, with the exception of flow cytometry for the smallest members of the phytoplankton, such methods have not yet replaced microscopic analysis as the main source of phytoplankton data.

In the Uttermohl microscopic method, a water sample is preserved with Lugol's iodine. Phytoplankters present in a few millilitres (or decilitres) of the sample are allowed to sink onto the transparent base of a sedimentation chamber and there identified and counted using an inverted microscope. Alternatively, a sample is collected by a fine-mesh net. Such a net is used, in the form of a moving band, called a silk, in the Continuous Recorder (CPR). As part of the CPR Survey, recorders are towed behind ships of opportunity, unit length of the silk corresponding to a certain distance towed. As each instrument is towed, its

1 Figure 1. Example phytoplankters, drawn living material in samples from Scottish west coast waters (modified from Tett, 1992). 1-3 are diatoms (Bacillariophyta); 4 - 6 are dinoflagellates (Dinophyta); 7-9 are ' flagellates', 7 being an euglenoid (Euglenophyceae), 8 a cryptomonad (Cryptophyta), and 9 a set of 'small flagellates' including at least one member of the Haptophyta. * Table 1. of photosynthetic organisms.

Using the PCI-LF page 2 2/7/06 Figure 1 Table 1: Taxonomy of photosynthetic organisms Originally based on Margulis (1993) who put most groups at the level of a phylum, with name ending in -[phyt]a) and the algal taxonomy of Tomas (ed.) (1997), who put most groups at the level of a class, with name ending in -[phyc]eae. However, the high-level taxonomy has been revised according to the 'Tree of Life' web project (Patterson & Sogin, 2001) and the NCBI taxonomy browser (Wheeler et al., 2000; NCBI, 2002). In the 'Group' column the names of the highest-order taxa, which may be considered Kingdoms, are given in CAPITALS; a colon (:) shows hierarchy, a slash (/) separates alternative names. Group typical marine phytoplanktonic forms example marine genera Prokaryote Domain (EU)BACTERIA cyanobacteria: 'blue-green algae', mostly contain Trichodesmium, / Cyanophyta/ phycobiliprotein; filiamentous or otherwise colonial Synechococcus, Cyanophyceae Oscillatoriales and heterocystous Nostocales include Prochlorococcus N-fixers; there picoplanktonic members in Chroococcales and Chloroxybacteria (without phycobiliprotein) Domain EUCARYA/EUKARYOTA, grade PROTOCTISTA EUGLENOZOA: elastic-bodied uni- or bi- euglenoid flagellates Eutreptiella Euglenida/ -phyceae

ALVEOLATA: ciliates: functionally photoautotrophic Mesodinium with Mesodinium = Ciliophora symbiotic cryptomonads Myrionecta Dinophyta/ dinoflagellates: 2 dissimilar flagella; many heterotrophic pr Dinophysis, -phyceae obably originally photoautotrophic by secondary symbiosis w Prorocentrum, ith alga. (a) Dinophysidae, (b) Prorocentrales, Gyrodinium, (c) Gymnodiniales, (d) Gonyaulacales, (h) Pyrocystales. Gonyaulax, Ceratium Cryptophyta/ crytomonads: flagellates with phycobiliprotein, arisen by Cryptomonas -phycea symbiosis between heterorophic and red alga

STRAMEN- diatoms: cells with box-like silicified wall, often forming , OPHILES loose chains, photoautotrophic by secondary symbiosis , (): with alga, (a) , centric diatoms - radiall Leptocylindrus, y symetrical Skeletonema, Bacillariophyta/ (b) Fragilariophyceae - araphid, pennate diatoms; Thalassiosira, -phyceae (c) Bacillariophyceae- raphid, pennate diatoms, bilaterally Pseudo-nitzschia symmetrical Chrysophyta/ small uni- or bi- flagellates including silicoflagellates with sil Ochromonas, -phyceae icified scales or cases. (a) Ochromonodales, (b) Synurales Dinobryon silicoflagellates with many chloroplasts and semi-internal Dictyocha silica skeleton Eustigmatophyta/ very small coccoid algae and small flagellates -phyceae small or picoplanktonic coccoid algae Aureococcus, Raphidophyta/ small bi-flagellates with many chloroplasts Chatonella, -phyceae Heterosigma Haptophyta/ small, bi-flagellates with haptonema: (a) Coccosphaerales, Pavlova, -phyceae or and (b) Isochrysidales, are flagellates, often with coccoliths Chrysochromulina, Prymesiophyta/ and/or spiny organic scales; (c) Pavlovales: flagellates; Emiliana, Phaeocystis -phyceae (d) Prymnesiales: flagellates, some colonial.

VIRIDIOPLANTAE green algae: (a) Volvocales: small flagellates with 1, 2,4,8 fl Brachiomonas, : Chlorophyta: agella, some forming flagellated colonies; Dunaliella, Chlorophyceae (b) Chlorococcales: small coccoid cells Nannochloris Chlorophyta: small flagellates with 1,2,4,8.. stiff flagella and organic Pyramimonas, Tetra- scales; some colonial. selmis, Halosphaera silk is wound into a tank of formaldehyde preservative. Subsequently, back in the SAHFOS laboratory, the silk is unwound beneath a microscope, allowing the retained phytoplankters to be identified and counted.

2. What is phytoplankton community structure and what is a PCI?

Some biological communities, such as woods and coral reefs, have recognizable physical structure as well as a diversity of primary producers, and both can be seen as contributing to community structure. Thus in the case of temperate deciduous woodland, the member species are seen as belonging to tree, shrub (or understorey) and herb (or ground) layers of vegetation. In the case of the phytoplankton, what we mean by 'community structure' includes the diversity of phytoplankters but not the physical structure element - which is absent, phytoplankters by definition being passively dispersed through the watery medium. Instead, the seasonal changes in dominant species can be considered as fulfilling the same ecological function as layers in woodland.

As in the case of the herb, shrub and tree lifeforms that characterize these layers, phytoplankters can be categorized into lifeform types. The exact basis for this is part of the challenge of devising PCIs, but it can be exemplified by the customary distinction between diatoms and other phytoplankters. This distinction, based on the former's unique need for silica, leads to a simple index, which is the ratio of cells to cells of other phytoplankters, averaged over the year or during the growth season. The 'lifeform-based' PCI-LF is a more sophisticated form of this index, taking account not only of seasonal variability but also of a 'deeper' understanding of the ecophysiology of different phytoplankters.

Alternative descriptions of community structure have been derived from the distribution of abundances amongst species and the way in which the distribution changes during the year . These descriptions include information-based diversity indexes such as that of Margalef (1958), and the fitted parameters of log-normal distributions of species- abundance (Tett, 1973). Such descriptions, however, throw away all information about species themselves except their relative abundance. Another approach starts from a data set consisting of regular records of species' abundance and, retaining species' identities, searches for patterns amongst these abundances. In this approach, such patterns are determined empirically, by the data themselves, rather than imposed by theory. The patterns might consist of groups of species that behave seasonally in the same way, or that show similar patterns of

Using the PCI-LF page 3 2/7/06 interannual change. The PCI-MDS, also developed during the project but not further described here, uses standard multivariate methods to extract patterns and changes in pattern.

Thus, what is meant by phytoplankton community structure might be either the very complex patterns of seasonal changing abundances apparent in regularly sampled phytoplankton, or the simpler essences of these patterns captured by empirical statistical reduction or theory-based aggregation of species into lifeforms.

The phytoplankton community index that is proposed here is not a measure of structure but of change in structure from a reference state. From the perspective of the Water Framework Directive, such a state in principle exists in water bodies undisturbed by human activities. More practically, 'undisturbed' is interpreted to mean 'before the industrial revolution'. In relation to the original aim of the phytoplankton community index project, 'undisturbed' meant 'not suffering from anthropogenic nutrient enrichment'. However, it became clear during this project that eutrophication as a driver of change could not be separated from other pressures on UK marine ecosystems, including those due to natural and human-driven climate change, fisheries impacts of ecosystems, and other forms of pollution. Thus the reference state has been deemed to be that of the oldest reliable data, and what we have sought during the calibration phase of the project are links between the change in phytoplankton and the change in ecological pressures from the reference state. A final point is that reference states are (according to the WFD) characteristic of particular water types: for example we would expect greater numbers of diatoms in well-stirred waters than in stratified waters during Summer.

3. The PCI-LF explained.

This 'handbook' is intended for users of the PCI-LF tool. In contrast to the PCI-MDS, which uses well-known if complex statistical methods, the calculation basis for the PCI-LF was developed during the project. It will be explained here using data from the CPR survey.

Observations of segments of the CPR 'silk' were used to make estimates of the abundance of phytoplankters per unit of tow. An average value was calculated from the estimates for all tows in each month in the 'Central North Sea' region for years between 1958

Using the PCI-LF page 4 2/7/06 and 2003. This was done automatically by data extraction software. Figure 22 shows time series for 1959-1964 of monthly means for pairs of example species: the centric diatoms Skeletonema costatum and Thalassiosira spp; the (raphid) pennate diatoms Nitzschia seriata and N. delicatissima (now groups of species of the Pseudo-nitzschia) and the autotrophic, ceratioid dinoflagellates Ceratium fusus and C. furca. The left-hand column plots the calculated mean abundances on linear axes; in the right hand column are plots of abundances transformed to logarithms to base 10 after adding 500 cells/tow unit (to deal with zero values).

The taxa were chosen as examples of lifeform groups. In the final row of Fig. 2 are shown the aggregate abundances of each pair: pelagic diatoms being the sum of S. costatum and Thalassiosira spp; weed diatoms the sum of the Nitzschia spp., and dinoflagellates the sum of the two Ceratium spp. shown above. The log transformation was performed after adding 500 cells/tow unit to the sums. The groupings used here are sparsely populated, in the interests of simplifying explanations. For example, the life-form grouping called pelagic diatoms is likely to include several dozen other taxa of lightly-silicified diatoms as well as S. costatum and Thalassiosira spp.; the label 'pelagic' is used in contrast to heavily-silicified 'tychopelagic' diatoms.

Lifeform theory is one part of the basis of the PCI-LF. A second part is provided by the an approach from system state theory. In such theory, a complex system is described in terms of a set of state variables, a particular state of the system being specified by a set of values of these variables. Each set of values defines a position in a multidimensional system state space. In the PCI-LF the set of state variables is the set of lifeform abundances, and the system state space has axes corresponding to these abundances. This is shown for the example data set in Figure 3(a)3, where the co-ordinates of each point are the abundances of pelagic diatoms, weed diatoms and dinoflagellates for a given month. This is a 3- dimensional space; in principle there will be as many dimensions as there are lifeforms for

2 Figure 2. Timeseries of data from CPR sampling of the Central North Sea, to illustrate lifeform variability. Left hand column: linear scale of abundance; right-hand column: logarithmically transformed abundances (after adding 500 cells/tow unit). 3 Figure 3 . The idea of ecosystem state space, as instanced in (a) for the 3 phytoplanktonic life-forms of Figure 2. The co-ordinates for each (red, open circle) point are the abundances of pelagic diatoms, weed diatoms, and autotrophic dinoflagellates in one of 60 months between January 1959 and December 1963. The dashed (black) lines link median values for each of the 12 months of the year. Part (b) is taken from Tett et al. (in press).

Using the PCI-LF page 5 2/7/06 5 x 10 example pelagic diatoms from CPR − Central North Sea 2.5 6 Skel cost 2 Thal spp 5 1.5 4 1 cells/tow unit 3 0.5 log cells/tow unit

0 2 1959 1960 1961 1962 1963 1964 1959 1960 1961 1962 1963 1964 year year 5 x 10 example weed diatoms from CPR − Central North Sea 2.5 6 Nit ser 2 Nit del 5 1.5 4 1 cells/tow unit 3 0.5 log cells/tow unit

0 2 1959 1960 1961 1962 1963 1964 1959 1960 1961 1962 1963 1964 year year 5 x 10 example dinoflagellates from CPR − Central North Sea 2.5 6 Cer fus 2 Cer fur 5 1.5 4 1 cells/tow unit 3 0.5 log cells/tow unit

0 2 1959 1960 1961 1962 1963 1964 1959 1960 1961 1962 1963 1964 year year 5 x 10 aggregated to lifeforms from CPR − Central North Sea 2.5 6 pel dia 2 weed dia 5 dino 1.5 4 1 cells/tow unit 3 0.5 log cells/tow unit

0 2 1959 1960 1961 1962 1963 1964 1959 1960 1961 1962 1963 1964 year year Figure 2 (a)

6

5

4

3 log dino cells/tow unit 2 6 5 6 4 4 3 2 2 log weed dia cells/tow unit log pelagic diatom cells/tow unit

stippled region shows 'normal' domain (b) of healthy ecosystem as it varies seasonally, etc

slight disturbance - state variable 2 2 y

undesirable disturbance

y1 - state variable 1 Figure 3 which data are available; but for simplicity all subsequent examples are plotted in 2- dimensions, by taking pairs of lifeforms.

Fig 3(b) reproduces a diagram from Tett et al. (in press) which suggests that a reference domain may be defined for a particular ecosystem component in an appropriate state space. Deviations from this domain may be small, ultimately returning to set of states that comprise the reference condition, or large and leading to a new domain. Such large change is often seen as a shift to an alternative stable state (in fact, to an alternative set of states), and is deemed to be undesirable. In order to monitor against such changes, what is needed is to: • plot points whose coordinates are given by the monthly values of abundances of each life form, • find a way to draw an envelope around these points to define a reference domain, • and then plot new data into the same state space for comparison with the reference domain.

The method used to do this, is illustrated in Figure 44. Data from the years 1959-1963 have been chosen to characterize a reference condition, and are plotted in part (a). The co- ordinates of each point are the abundances of the true lifeform groups pelagic diatoms and autotrophic dinoflagellates. Each abundance has been summed over all relevant taxa taken by the CPR, rather than just the species pairs used in Fig. 2. A state-space plot does not have a time axis because when the state variables of a system plot at a particular set of co- ordinates, the system is considered to be in a unique state, irrespective of when, or how many times, it has been in that state. Nevertheless, the points are coloured to show in which season of the year they were observed, and successive points are linked to each other, to demonstrate that phytoplankton state tends to describe a seasonal ellipse around an empty centre.

In part (b) the colours and links have been omitted, and envelopes have been drawn, using a convex hull procedure (Sunday, 2004; Weisstein, 2006), to include all these points on the assumption that these five years, 1959-1963, provide a reference condition. The convex hull procedure for 2 dimensions can be likened to stretching an elastic band around the cloud

4 Figure 4. Showing the plotting of reference envelopes and the calculation of the PCI. In parts (a) to (d) this is shown for pelagic diatoms and autotrophic dinoflagellates; in parts (e) and (f), the process is shown for tychopelagic diatoms and weed diatoms.

Using the PCI-LF page 6 2/7/06 (a) CPR Central North Sea (b) reference data and envelope 7 7 Reference years 1959−1963 min set at: 500 : all data used 6 months 1−3 6 months 4−6 months 7−9 5 months 10−12 5

4 4

3 3

2 2 (log10) auto dino/tow unit (log10) auto dino/tow unit 1 1 2 4 6 2 4 6 (log10) pel dia/tow unit (log10) pel dia/tow unit

(c) reference envelope (d) change in 1995−1999 7 7

6 6

5 5

4 4

3 3 chi−square 2 2 108.0 new: 60; out: 21 binom prob (log10) auto dino/tow unit (log10) auto dino/tow unit 1 1 PCI: 0.65 0.0000 2 4 6 2 4 6 (log10) pel dia/tow unit (log10) pel dia/tow unit

(e) CNS−reference (f) change in 1995−1999 7 7 Reference years 1959−1963 PCI1ED2demo on 01−Jul−2006 min set at: 500 : all data used 6 months 1−3 6 months 4−6 months 7−9 5 months 10−12 5

4 4

3 3 chi−square 2 2 8.3 new: 60; out: 8 binom prob (log10) weed dia/tow unit (log10) weed dia/tow unit 1 1 PCI: 0.87 0.0098 2 4 6 2 4 6 (log10) tycho dia/tow unit (log10) tycho dia/tow unit Figure 4 of points, thus identifying those points that define the outer envelope. Finding the inner envelope involved inverting the cloud of points about their centre (with co-ordinates defined by diatom and medians), fitting a convex hull, and re-inverting. The existence of a central hole is justified on theoretical grounds relating to competition for limiting nutrients, but not further discussed here. Including it makes the PCI more sensitive to change.

In part (c) the reference domain - that included between the inner and outer envelopes - is shown alone for clarity. In part (d) some new points from a later half-decade are plotted over the reference envelopes. The value of the PCI-LF is obtained in this case from counting the number of new points that do not lie within the reference domain. More precisely, the value is:

PCI = 1 - (new points outside the reference envelope)/(total new points)

A value of one indicates no change, and a value of zero indicates a complete change. 'Outside' includes inside the inner envelope.

Two methods are used to estimate the significance of this PCI value. The properties of the binomial series can be used to give the exact probability of finding the observed number of points, or more points, outside the reference envelope. This becomes difficult to calculate for more than about 200 points, and so a chi-square approximation has also been used. In both cases, an expectation is required. It is assumed that up to 5% of new points can be allowed to fall outside the reference envelope without concluding that ecosystem condition has changed. The binomial and chi-square tests therefore calculate the probability that an observed fraction of new points could have fallen outside the envelope by chance, given random sampling from a population of points of which 5% are outside the envelope.

Parts (e) and (f) complete this analysis with a second pair of 2-D plots, for weed diatoms and tychopelagic diatoms. The pair (d) and (f) are projections onto a flat plane from a 4-D state space, and in principle the PCI-LF should be calculated by counting points 'outside' a 4-D reference domain. This version of the PCI handbook, however, uses 2D plots and calculation schemes in the interests of clarity. The user may decide how to combine resulting sets of PCI values - perhaps as a simple average, or perhaps by giving more weight to the main components.

Using the PCI-LF page 7 2/7/06 4. Lifeform theory

The idea of marine phytoplanktonic lifeforms dates back at least to Margalef (1978), and may also influenced by older views from limnology concerning the correspondence between certain types of phytoplankter and the trophic status of lakes. It is related to ideas about biomes in terrestrial communities, a biome being characterised by a uniform life form of vegetation, such as grass or coniferous trees (Smith, 1992) and the origin of the lifeform concept in Raunkiaer's classification of plant life in 1903.

Margelef considered that phytoplanktonic life forms lay along a continuum from r- selected species, with high growth rates but needing high nutrient concentrations, to K- selected species, which could accumulate biomass slowly but persistently at low nutrient concentrations. Such growth was opposed by losses due to vertical mixing or patch spreading, and so diatoms dominated the (high nutrients, high losses) and dinoflagellates dominated summer conditions (low nutrients, low losses). Later developments in the theory are reviewed by Tett et al (2003), who distinguished the following groups of factors that might identify and distinguish lifeforms in relation to ecosystem sustainability: • functionality in relation to biogeochemical cycling of bio-limiting elements C, N, P, Si, S, O and perhaps Fe and Co; • functionality in relation to the marine food web; • relationship to the physical environment as considered by Margalef and others; • high level taxonomy.

The argument in respect of sustainability is that the theory of lifeforms used here goes beyond the Margalevian link to the physico-chemical environment by assuming that an adequate complement of primary producer lifeforms is required for a properly functioning marine ecosystem. This theory does not require any particular species to be present for such functioning, nor does it identify single species as characteristic of particular water types.

Within the taxonomic and morphological diversity of the more than 3000 known phytoplanktonic species (Sournia et al., 1991) it is possible to discern clearly the distinctive and important biogeochemical roles of N-fixing cyanobacteria, Si-cycling diatoms (as well as silicoflagellates), DIC-cycling coccolithopores and S-cycling haptophytes, as well as the more general carbon and energy-cycling role of different lifeforms under different physical conditions. As is becoming increasingly apparent, the various combinations of size, biochemistry, and seasonal patterns of growth and abundance, are important in generating a

Using the PCI-LF page 8 2/7/06 variety of links in the marine pelagic food web. Table 2* sets out a possible list of marine phytoplanktonic lifeforms for consideration, raising a theoretical question and two practical matters.

The theoretical question concerns the comparative importance of high-level taxonomy versus species-level adaptation to niche in determining lifeforms. For example, r- selected or K-selected might be an evolutionary choice that each new species makes independently of what related species do. Conversely, some of the evolutionary events that determine lifeform might have taken place a very long time ago, and thus would now be built into the characteristics of high-level taxa. Margulis (1993) saw some of these events as involving the acquisition of semes, gene sequences coding biochemical pathways for functions such as N-fixation or the production of particular photosynthetic accessory pigments. Because of repeated symbioses amongst the primitive (Delwiche et al., 2004), such semes might however be spread widely amongst the major taxa. Riegman (1998) offers a compromise view, arguing that we should expect the biochemistry of major taxa to determine their basic characteristics, whilst not forgetting that individual species can evolve into particular ecological niches (and hence, in my interpretation, into new lifeforms) by changing some of their nutrient uptake or growth parameters. Table 2 takes a view that is primarily taxonomic, with some distinctions within the major taxa on the basis of biogeochemistry (e.g. DIC-cycling coccolithophorids separated from the other haptomonads) or on relationship to the pelagic environment (e.g. the proposed distinction between 3 types of diatom). Quantitative distinctions, for example based on nutrient-limited growth half- saturation constants (Tilman et al., 1982) have not been used, nor has the suggestion (Smayda & Reynolds, 2001) that marine coastal dinoflagellates should be assigned 9 functional groups, each characteristic of particular physical conditions.

Nevertheless, the list in Table 2 is quite long. The first practical matter is that organisms belonging to many of the lifeforms listed here are not routinely recorded or distinguished during phytoplankton monitoring. The PCI-LF methodology described here was designed to be used with existing data, and requires information about a minimum of 2 lifeforms to be routinely available. Data on additional lifeforms can be added as it becomes available, and research continues into the utility of including this extra information. At

* Table 2. Lifeforms of pelagic photoautrophic micro-organisms - a possible list.

Using the PCI-LF page 9 2/7/06 Table 2: Lifeforms of pelagic photoautrophic micro-organisms

Includes some myxotrophs, but not heterotrophs

Lifeform High-level taxa: biogeo- sink floating or pig- other features low-level example chemistry -ing swimming ments* diatoms Bacillariophyta: Si-users pelagic diatoms Thalassiosira + + C many chain-forming weed diatoms Pseudo-Nitzschia + + C grow on surfaces? tychopelagic diatoms Gyrosigma ++ ++ C dinoflagellates Dinophyta: naked gymnodinioid Karenia ++ C small armoured Prorocentrum, ++ C cellulose armour Gonyaulax plates large armoured Ceratium ++ C " ciliates Ciliophora Myrionecta P functionally autotrophic flagellates euglenoids Euglenophyceae: + Eutreptiella cryptomonads Cryptophyta: +P Cryptomonas Raphidophyta: +C Heterosigma haptomonads Haptophyta: esp. S-cycling + C Prymnesiales: Chrysochromulina green monads Chlorophyta: + silicoflagellates silicoflagellates Chrysophyta: Dinobryon Si-users: + + C Dictyochophyceae: Dictyocha coccolithophores Haptophyta: CaCO3 +C Coccosphaerales, plates: ++ Isochrysidales: Emiliana S-cycling colonial Phaeocystis Haptophyta:Prymnesiale gelatinous colonies; s: Phaeocystis also monad stage cynanobacteria Cyanobacteria N-fixing CB N-fixing ++ P filiamentous non-N-fixing CB ++ P filiamentous prokaryotic pp Cynanobacteria: (P) Synechoccus eukaryotic pp Eustigmatophyta,, (C) Chlorophyta,Pelagophyta : Aureococcus

* pigments: chlorophyll a (green) always present, often some other chlorophylls; C = plus carotenoids (yellow-brown- orange); P = +phycobilin (red-blue); ( ) = in some some cases present, it is the choice of the analyst or data interpreter how species are assigned to lifeform categories, so long as a scientific case can be made for the assignments, and provided that the same categorization is used for reference and new conditions. It is hoped that more guidance will be provided in a later edition of this handbook.

The second practical matter concerns the 'diatom' and 'dinoflagellate' lifeforms. It may be thought that the use of these categories alone would be merely an extreme case of the simplifications suggested above, and indeed some use of such aggregated data was made during the development of the PCI. However, care must be taken to exclude heterotrophic dinoflagellates from the dinoflagellate category, and an attempt should be made to distinguish, at least, the 'pelagic' and 'tychopelagic' categories of diatom, since the second are typically benthic organisms that have been resuspended by strong tidal or wind-wave stirring.

5. Using the Matlab script PCI1ED2 to calculate the PCI

Overview

This handbook refers to five ASCII text files. These are: • PCI1ED2.m - the main Matlab script used to calculate the PCI; this calls two additional script files containing functions: • extract.m • findenv.m • cf1.m is a 'run control file' which is read by PCI1ED2 in order to obtain values of certain parameters used during the run; you will need to modify this file if you use other data sets other than: • DDR.txt an example data file with monthly-averaged CPR data for pelagic diatoms and autotrophic dinoflagellates from the Central North Sea.

The files can be got from: http://www.lifesciences.napier.ac.uk/research/Envbiofiles/PCI.htm

A licensed copy of a recent version of Matlab, and some knowledge of how to use Matlab, are needed to run the script. Script and data files are best installed in the same directory within the Matlab 'toolbox'.

To become familiar with the PCI calculation, start by running the script with the example files provided. Subsequently, in order to run data of your own , you will need to aggregate species into lifeforms and to make a file of the structure that PCI1ED2 expect to

Using the PCI-LF page 10 2/7/06 read. The run control file may need modification to work with your new data file. Instructions for doing this are provided in subsections below.

Running the Matlab script

Before you can run PCI1ED2 for the first time, you must edit line 24, which currently reads:

cd('/Applications/MATLAB71/toolbox/Paul/PCI'); and change the address to give the path to the folder or directory where you keep PCI1ED2 and related files. Thereafter, the working directory becomes this directory when PCI1ED2 is run for the first time in a Matlab session. You must also use the 'set path' command to add this folder or directory to Matlab's search path. Having done this, you can type the command:

run PCI1ED2 at the prompt in the Matlab Command Window. This will start the script. Unless you have changed the contents of the run control file cf1.m, the script will begin to process the CPR data contained in the file DDR.txt. Output should appear automatically in the Command window, and a Figure window will open to show reference data and envelope.

The script will then stop, awaiting input to the Control window from you. At the prompt:

Now to plot data for comparison period and calculate PCI : Enter start year for comparison, type, for example, 1995 and press the 'return' key. At the prompt:

Enter (inclusive) end year for comparison, enter, for example, 1999. The Figure window will now be completed - you may have to click on it to see. The script will go on to save the Figure as a postscript file5, printing its name to the Command window.

The script will continue to run, with new inputs sought from the Command Window. Suggested values for your first trial are included, in bold:

Optionally calculate a time-series of PCI -- Enter (either) 0 to end (or) integer year interval for time-series, 5 Enter (integer) commencement year for time-series, 1958

5 Figure 5. Example output from PCI1ED2, using CPR data from the Central North Sea.

Using the PCI-LF page 11 2/7/06 CPR Central North Sea change in 1995−1999 7 7 Reference years 1959−1963 drawn by PCI1ED2 on 01−Jul−2006 min set at: 500 : all data used 6 months 1−3 6 from DDR.txt months 4−6 months 7−9 5 months 10−12 5

4 4

3 3 chi−square 2 2 108.0 (df=1)

(log10) aut o dinoflag/tow unit (log10) aut o dinoflag/tow unit New: 60; Out: 21 binom prob PCI : 0.65 0.0000 1 1 1 2 3 4 5 6 7 1 2 3 4 5 6 7 (log10) pel diatoms/tow unit (log10) pel diatoms/tow unit

Figure 5 A table of PCIs value will appear in the Command Window. The final set of prompts, and suggested inputs, is:

Optionally graph time-series of abundances -- Enter (either) 0 to end (or) start year for time-series, 1960 Enter last full year for time-series, 1970

Another Figure Window will open, to display the time series. The script will then save this second Figure as a postscript file, printing its name to the Command Window. It will then end.

Data file preparation

Data must be prepared as a plain text file, with data in tab-separated columns that contain only numbers. The file DDR.txt provides an example. Its first row is:

1958.083333 1667 0

The first number is the date, as year and decimal fraction. In this case it represents year 1958, month 1 (January), and might have better been entered as 1958.043 for the middle of January.

The second number is the value for the first life-form in this month. In this case the lifeform is pelagic diatoms and the units are cells per unit tow. The units can be any consistent units, including cells per litre and micrograms of carbon biomass per litre. The third column is the value for the second life-form, in this case autotrophic dinoflagellates. There may be more columns of lifeform abundance; no limit has been set for the number of these columns. However, PCI1ED2 will deal only with one pair of columns at a time.

In most cases, the main data preparation task is to aggregate values for the abundance of individual species (or other taxa) into the chosen lifeforms. Table 3* is the listing of CPR phytoplankton (and heterotrophic dinoflagellate) taxa, categorized by lifeform, that was used to prepare the data in DDR.txt.

Preparing the run control file

The run control file mf1.m is a plain text file that is read into the Matlab script when the script is run. It contains comment lines, starting with %, which have no effect on calculations, and assignment lines in which a variable is given a name and a value. Values can be changed, but names must not be changed and semicolons should be left in place. It is

* Table 3: Lifeforms used with CPR taxa

Using the PCI-LF page 12 2/7/06 Table 3: Lifeforms used with CPR taxa

CPR survey member of size short name full name main group lifeform Order (1) (2) (3) ast glac Asterionella glacialis DIATOM pelagic diatom Fragilariales S L C bacillaria Bacillaria spp. DIATOM pelagic diatom Bacillariales S M C Bacteriastrum spp. DIATOM pelagic diatom Chaetocerotales M L C cer fur Ceratium furca DINOFLAG auto dino Peridiniales L L S cer fus Ceratium fusus DINOFLAG auto dino Peridiniales L L S cer hor Ceratium horridum DINOFLAG auto dino Peridiniales L L S cer lin Ceratium lineatum DINOFLAG auto dino Peridiniales L L S cer long Ceratium longipes DINOFLAG auto dino Peridiniales L L S cer mac Ceratium macroceros DINOFLAG auto dino Peridiniales L L S cer tri Ceratium tripos DINOFLAG auto dino Peridiniales L L S corethron Corethron spp. DIATOM pelagic diatom Corethrales M M S cosc con Coscinodiscus concinnus DIATOM tychopelagic diatom Coscinodiscales L M S cosc spp Coscinodiscus … DIATOM tychopelagic diatom Coscinodiscales L M S cylindro clo Cylindrotheca closterium DIATOM weed diatom Bacillariales S M S dact med Dactylosolen mediterr... DIATOM pelagic diatom Rhizosoleniales M M S dinophysis Dinophysis spp. DINOFLAG auto dino Dinophysiales M M S dity bri Ditylium brightwellii DIATOM pelagic diatom Lithodesmiales M M S Eucampia zodiacus DIATOM pelagic diatom Hemiaulales M L C frag spp spp. DIATOM pelagic diatom Fragilariales S M C gonyaulax Gonyaulax spp. DINOFLAG auto dino Gonyaulacales M M S gyrosigma Gyrosigma spp. DIATOM tychopelagic diatom Naviculales M M S hyal Chaetoceros sg DIATOM pelagic diatom Chaetocerotales S M C leptocyl Leptocylindrus spp. DIATOM pelagic diatom Leptocylindrales S M C navic sp spp. DIATOM tychopelagic diatom Naviculales M M S (Pseudo-)nitzschia nit deli delicatissima group DIATOM weed diatom Bacillariales S M C nit ser (Pseudo-)nitzschia seriata grp DIATOM weed diatom Bacillariales S M C odon aur Odontella … DIATOM pelagic diatom Triceratiales M M S odon regia Odontella … DIATOM pelagic diatom Triceratiales M M S odon sin Odontella sinenses DIATOM pelagic diatom Triceratiales M M S paralia Paralia spp. DIATOM tychopelagic diatom Paraliales M M S peridin (Proto)peridinium spp. DINOFLAG hetero dino Peridiniales M M S phaeceros Chaetoceros sg Phaeoceros DIATOM pelagic diatom Chaetocerotales M L C prorocent Prorocentrum spp. DINOFLAG auto dino Prorocentrales M M S rhiz al al Rhizosolenia alata alata DIATOM pelagic diatom Rhizosoleniales S L S Rhizosolenia hebetata rhiz heb semi semispina DIATOM pelagic diatom Rhizosoleniales S L S rhiz ind Rhizosolenia ind.... DIATOM pelagic diatom Rhizosoleniales S L S rhiz inerm Rhizosolenia inermis DIATOM pelagic diatom Rhizosoleniales S L S rhiz shrub Rhizosolenia shrubsolei DIATOM pelagic diatom Rhizosoleniales S L S rhiz stol Rhizosolenia stolterfolthii DIATOM pelagic diatom Rhizosoleniales S L S rhiz styl Rhizosolenia styliformis DIATOM pelagic diatom Rhizosoleniales S L S skel cost Skeletonema costatum DIATOM pelagic diatom S M C thal nitz Thalasionema nitzshoides DIATOM pelagic diatom Thalassionem (4) S L C thallas Thalassiosira spp. DIATOM pelagic diatom Thalassiosirales M M C thx long Thlassiothrix long... DIATOM pelagic diatom Thalassionem. (4) S L C (1) width of cells; (2) length - cells, or chains if chainforming, or colonies if colonial; (3) typically single or in chains or colonies. (4) Thalassionem. = Thalassionematales hoped that the comments in this file make it largely self-explanatory; a few control variables, however, need further explanation:

dfn='DDR.txt'; the variable called dfn is given the (string) value 'DDR.txt', which is the name of the data file to be read in when PCI1ED2 runs. The first part of the name, given here as DDR, may be changed, but the single quotes must be left in place, as must the suffix .txt.

% column number and description of first state variable csv1=2; dsv1='pel diatoms/tow unit'; this tells PCI1ED2 to look for the first state variable in column 2 of the data file - any column can be specified - and also that the legend pel diatoms/tow unit is to be plotted on the appropriate axis in Figures. This legend may be changed as needed, preserving the single quotes.

% data are transformed, log10(z+x) z=500.0; the value z should be appropriate for the data being processed. As a guide, its value should be roughly half the minimum observed abundance in the (aggregated) data. For example, if the units are micrograms C/L and the minimum biomass is 0.1 µg/L, write z=0.05;

% if mf is > 0.5, data are averaged over month % if mf is < 0, data are chosen at random with probability of 1/-mf mf=0; the value of the variable mf controls how the abundance data are used. If mf=0, all data are used.

Problems

The script has been run with several types of data, including sets from microscopically analysed coastal samples (expressed in cells/L) and results generated by a numerical model (expressed in µg C/L), and a number of numerical problems and software bugs have been identified and remedied during this testing. During its later stages it was developed using Matlab 7.1 on Macintosh computers (using OS X version 10.3.9 and the X11 (1.0) Window System), and it has also been run under Windows XP. The script should thus be moderately robust, but is not guaranteed to be error-free.

If you encounter problems, you should first seek help from a local Matlab expert, if you have one. Some Matlab errors will occur if the data files are in an incorrect format, but

Using the PCI-LF page 13 2/7/06 others may be deeper, due to the nature of the data themselves. Nevertheless, they may be fixable locally. In addition, check the PCI web site for updates to the script: http://www.lifesciences.napier.ac.uk/research/Envbiofiles/PCI.htm.

A long-term mechanism for dealing with PCI calculation problems is to be considered. In the meantime, you can also seek help by emailing the author ([email protected]) with a copy of the data file and the run control file that you used, some comments on what happened, and a copy of the script if it has been revised locally. This will be an ad hoc arrangement until other mechanisms are identified in discussion with Defra and the PCI team.

6. Interpreting the PCI-LF

The PCI-LF is a measure of impact in the DPSIR scheme. Intended originally to be used specifically in the context of eutrophication - i.e. to show the extent of change caused by a nutrient loading pressure - it has become clear that it is a more general indicator of change from a reference condition. Thus, as one of the set of phytoplankton 'tools' under development by the UK Marine Team, it can aid in the assessment of ecological quality according to Annex V of the Water Framework Directive (WFD).

By WFD definition, any value of the PCI that is significantly below 1.0 must indicate worse than 'high' status. Tett et al. (in press) suggested that an undesirably disturbed ecosystem corresponded to WFD 'poor' and 'bad' categories. The WFD Annex V diagnosis of 'poor' is to be made where biological communities deviate substantially from those normally associated with the surface water body type under undisturbed conditions. In Undesirable Disturbance theory, as illustrated in Figure 3(b), such disturbance is one that leads to a substantial shift in the nature of the community, a shift that may not be immediately reversed if the pressure is relaxed. Ecology has yet to develop a theory of ecosystem state that can predict the tipping point quantitatively, and hence critical values of the PCI needed to be obtained empirically. Application of the PCI-LF to CPR data from the North Sea tentatively suggests that values above 0.40 do not indicate an undesirable disturbance. It is hoped in, later revisions of this handbook, to provide firmer guidance on the critical values of the PCI- LF that separate WFD quality categories or indicate undesirable disturbance. Meanwhile, the PCI-LF can provide a useful monitoring tool if time series of its values are matched with time series of nutrient loadings. Correlation of trends may be interpreted to show a move in the direction of an undesirable disturbance, even if the value of the PCI remains above the

Using the PCI-LF page 14 2/7/06 tentative threshold. Conversely, absence of correlation, given adequate time-series, can be taken to show that the ecosystem is resistant to nutrient-induced change.

The PCI-LF has been designed as a flexible tool for application to existing data sets. It may be used with data resulting from incomplete analysis of phytoplankton samples, so long as the basis of the analysis is consistent between reference and new conditions. The minimum requirement is information about two lifeforms; although these should ideally be characterized on the theoretical basis set out in section 4, they could simply be 'diatoms' and 'dinoflagellates' if only that distinction has been made. However, even thus calculated, the PCI-LF is more than a diatom:dinoflagellate ratio. It takes account of seasonal variations in the contributions of each group to the phytoplankton, diatoms being expected to be more abundant during Spring and dinoflagellates in Summer. Rather than mapping a scalar ratio to a scalar quality threshold, it quantifies vector deviation from the 'normal' seasonal pattern.

As illustrated in this handbook, the PCI-LF is able to take account of additional components of phytoplankton community structure. Where such additional data are available, it can in principle be used to define ecosystem state and domain in 3 or more dimensions, and the envelopes, shown in this handbook as pairs of polygons in 2 dimensions, become (hyper)surfaces enclosing all reference states. The PCI-LF would continue to be estimated by counting the number of new observations that fall 'outside' the reference hypersurfaces.

However, the hypergeometric calculation schemes for such estimation have not been implemented in the existing software. It is thus suggested that the script PCI1ED2 should be run for each pair of lifeforms for which data are available. Resulting values of the PCI-LF may be combined using weighting factors. If there is an odd number of lifeforms is used, one pair should include an already used lifeform, and the weighting factor should be a half of the value that would be used if the pair was independent of other pairs.

It is hoped to develop guidance on these weighting factors at a later stage. It may be that particular combinations of life-forms will be found that are more sensitive to the nutrient enrichment pressure than to other pressures such as climate change, and weightings can then reflect the purpose for which the PCI-LF is to be employed..

Using the PCI-LF page 15 2/7/06 The final matter to be considered in interpreting PCI-LF values is that of the frequency of sampling. Figure 66 shows the results of several treatments of phytoplankton data from the FRS station at Stonehaven, which has been sampled at weekly intervals since the start of 1997. The data were provided as weekly values in cells/L, pre-aggregated into 'diatoms' and 'flagellates'. Data from 1997-1999 was used to define reference conditions. Three separate analyses were carried out by varying the value of the switch mf in mf1.m. In one analysis, all the data were used; in the second case, data were averaged over (calendar) months; and in the third case, one in 4 values were randomly selected, approximating a monthly sampling programme.

The greater variability evident in the weekly data resulted in more inclusive reference envelopes, and hence a higher value of the PCI for the new data (Fig. 6(b)). The 'flat bottom' (and vertical left side) apparent in parts (a) and (b) suggest insufficient dynamic range in the data - i.e. that it would be desirable to examine a larger volume of water during the Winter, to get a better estimate of abundance during periods of low phytoplankton biomass. Month- averaging of the data improved this, as shown in parts (c) and (d). Because the resulting reference domain was smaller, the analysis was more sensitive to deviations in the new data, giving a lower and more significant value of the PCI.

The proposed UK sampling scheme for WFD phytoplankton monitoring (at monthly intervals) corresponds to case 3, and should prove sufficiently intense to detect change, on the basis of the result shown in Fig. 6 (e, f). However, this is one of many possible realizations, differing in both the reference condition envelopes and the new data. Repeated runs with new random selections gave a PCI range from 0.95 to 0.68, with just over 2 in 3 proving significant. This needs further study.

7. Acknowledgements

The work described here was funded by a Defra contract to CEFAS for 'Development of a UK Phytoplankton Trophic Index', CSA 6754/ME2204, and a subcontract from CEFAS to

6 Figure 6. State space plots for FRS Stonehaven data, supplied pre-aggregated into 'diatom' and 'dinoflagellate' cells/L. The left hand column shows data for 1997-99, assumed to be reference conditions, and the right-hand column plots data for 2000-05 onto the reference envelopes, allowing PCI values to be calculated. (a) and (b) plot all (weekly) values; (c) and (d) plot month averages; (e) and (f) plot randomly selected data rows simulating (roughly) one sample per month. (These plots were made by an earlier version of the PCI1ED2 script and differ in some details from those of fig. 5. )

Using the PCI-LF page 16 2/7/06 (a) FRS Stonehaven (b) Change in 2000-2005 7 7 Reference years 1997- 1999 6.5 min set at: 500 , all data used 6.5 PCI1ED2 drawn : 25- Mar- 2006 6 6 months 1- 3 months 4- 6 5.5 months 7- 9 5.5 months 10- 12 5 5

4.5 4.5

4 4

(log10) dinoflag/L 3.5 (log10) dinoflag/L 3.5

3 3

2.5 2.5 Nnew: 283; Nnewout: 21 binom prob PCI : 0.93 0.0478 2 2 2 3 4 5 6 7 2 3 4 5 6 7 (log10) diatoms/L (log10) diatoms/L

(c) FRS Stonehaven (d) Change in 2000-2005 7 7 Reference years 1997- 1999 6.5 min set at: 500 : data month - averaged 6.5 PCI1ED2 drawn : 25- Mar- 2006 6 6 months 1- 3 months 4- 6 5.5 months 7- 9 5.5 months 10- 12 5 5

4.5 4.5

4 4

(log10) dinoflag/L 3.5 (log10) dinoflag/L 3.5

3 3

2.5 2.5 Nnew: 72; Nnewout: 10 binom prob PCI : 0.86 0.00305 2 2 2 3 4 5 6 7 2 3 4 5 6 7 (log10) diatoms/L (log10) diatoms/L

(e) FRS Stonehaven (f) Change in 2000-2005 7 7 Reference years 1997- 1999 6.5 min set at: 500 , 1/4 data at random 6.5 PCI1ED2 drawn : 25- Mar- 2006 6 6 months 1- 3 months 4- 6 5.5 months 7- 9 5.5 months 10- 12 5 5

4.5 4.5

4 4

(log10) dinoflag/L 3.5 (log10) dinoflag/L 3.5

3 3

2.5 2.5 Nnew: 59; Nnewout: 19 binom prob PCI : 0.68 3.82e- 11 2 2 2 3 4 5 6 7 2 3 4 5 6 7 (log10) diatoms/L (log10) diatoms/L Figure 6 Napier University. PT is grateful to other members of the project team who contributed advice and data, especially to Martin Edwards from SAHFOS for the CPR data and Eileen Bresnan for the FRS Stonehaven data used in the examples; also to Alex Ford and Sabine Schäfer in Napier University for work during the development of the index; and to Dave Mills of CEFAS for leading the project to a successful outcome.

8. References

Delwiche, C.F., Andersen, R.A., Bhattacharya, D., Mischler, B.D., & McCourt, R.N. (2004). Algal evolution and the early radiation of green plants. In Assembling the Tree of Life (eds J. Cracraft & M.J. Donoghue), pp. 121-137. Oxford University Press, New York. Margalef, R. (1958) Information theory in ecology. General Systems, 3. Margalef, R. (1978) Life forms of phytoplankton as survival alternatives in an unstable environment. Oceanologica Acta, 1, 493-509. Margulis, L. (1993) Symbiosis in cell evolution: microbial communities in the Archean and Proterozoic eons, 2nd edn. W H Freeman & Co, New York. Patterson, D.J. & Sogin, M.L. (2001) Tree of Life: Eukaryotes (Eukaryota), http://tolweb.org/tree/eukaryotes/accessory/treeoverview.html (seen August 2002). Riegman, R. (1998). Species composition of harmful algal blooms in relation to macronutrient dynamics. In Physiological Ecology of Harmful Algal Blooms (eds D.M. Anderson, A.D. Cembella & G.M. Hallegraef), pp. 475-488. Springer-Verlag, Berlin, Heidelberg. Smayda, T.J. & Reynolds, C.S. (2001) Community assembly in marine phytoplankton: application of recent models to harmful dinoflagellate blooms. Journal of Plankton Research, 23, 447-461. Smith, R.L. (1992) Elements of Ecology, 3rd ed. Harper Collins, New York. Sournia, A., Chrétiennot-Dinet, M.-J., & Ricard, M. (1991) Marine phytoplankton: how many species in the world ocean? Journal of Plankton Research, 13, 1093-1099. Sunday, D. (2004). The Convex Hull of a 2D Point Set or Polygon, http://softsurfer.com/Archive/algorithm_0109/algorithm_0109.htm. Tett, P. (1973) The use of log-normal statistics to describe phytoplankton populations from the Firth of Lorne area. Journal of Experimental Marine Biology and Ecology, 11, 121-136. Tett, P., Hydes, D., & Sanders, R. (2003). Influence of nutrient biogeochemistry on the ecology of North-West European shelf seas. In Biogeochemistry of Marine Systems (eds G. Schimmield & K. Black), pp. 293-363. Sheffield Academic Press Ltd, Sheffield. Tett, P., Gowen, R., Mills, D., Fernandes, T., Gilpin, L., Huxham, M., Kennington, K., Read, P., Service, M., Wilkinson, M., & Malcolm, S. (in press) Defining and detecting Undesirable Disturbance in the context of Eutrophication. Marine Pollution Bulletin. Tilman, D., Kilham, S.S., & Kilham, H. (1982) Phytoplankton community ecology - the role of limiting nutrients. Annual Review of Ecology and Systematics, 13, 349-372.

Using the PCI-LF page 17 2/7/06 Tomas, C.R., ed. (1997) Identifying marine phytoplankton, pp xv+858. Academic Press, San Diego & London. Weisstein, E.W. (2006). "Convex Hull." From MathWorld--A Wolfram Web Resource., http://mathworld.wolfram.com/ConvexHull.html. Wheeler, D.L., Chappey, C., Lash, A.E., Leipe, D.D., Madden, T.L., Schuler, G.D., Tatusova, T.A., & Rapp, B.A. (2000). Database resources of the National Center for Biotechnology Information, Rep. No. Nucleic Acids Res 2000 Jan 1;28(1):10-4.

Using the PCI-LF page 18 2/7/06