<<

Journalof MarineResearch, 59, 859– 894,2001

Adataassimilative marineecosystem modelof the central equatorialPaciŽ c: Numerical twinexperiments

byMarjorieA. M.Friedrichs 1

ABSTRACT AŽve-component,data assimilative marine ecosystem model is developed for the high- low-chlorophyllregion of thecentral equatorial PaciŽ c (0N,140W). Identical twin experiments, in whichmodel-generated synthetic ‘ data’are assimilated into the model, are employed to determine thefeasibility of improving simulation skill by assimilating in situ cruisedata (plankton, andprimary production) and remotely-sensed color data. Simple data assimilative schemes suchas data insertion or nudging may be insufŽ cient for lower trophic level marine ecosystem models,since they require long time-series of daily to weekly plankton and nutrient data as wellas adequateknowledge of the governing ecosystem parameters. In contrast, the variational adjoint technique,which minimizes model-data misŽ ts byoptimizingtunable ecosystem parameters, holds muchpromise for assimilating biological data into marine ecosystem models. Using sampling strategiestypical of those employed during the U.S. JointGlobal Flux Study (JGOFS) equatorial PaciŽc processstudy and the remotely-sensed ocean color data available from the Sea-viewing Wide Field-of-viewSensor (SeaWiFS), parameters that characterize processes such as growth, grazing, mortality,and recycling can be estimated. Simulation skill is improved even if synthetic data associatedwith 40% random noise are assimilated; however, the presence of biases of 10 – 20% provesto be more detrimental to the assimilation results. Although increasing the length of the assimilatedtime series improves simulation skill if randomerrors are present in the data, simulation skillmay deterior ateas more biased data are assimilat ed.As biologic aldatasets, includin g in situ, satelliteand acoustic sources, continue to grow, data assimilat ivebiologic al-physical modelswill play an increasin glycrucial role in large interdis ciplinaryoceanogr aphicobserva- tionalprograms .

1.Introduction Methodsfor systematicallyconstraining dynamical models with available data extend backto engineering control theory and geophysics; however, the terminology “ data assimilation”itself was developedin meteorology in the1960s and was originallyused to describethe technique(s) of using observations to improve the forecasting skill of operationalmodels (Malanotte-Rizzoli and Tziperman, 1996). In the following decade, physicaloceanographic data assimilative models began to be developed;however the lack

1.Center forCoastal PhysicalOceanography, Crittenton Hall, Old Dominion University, Norfolk, Virginia, 23529,U.S.A. email:[email protected] 859 860 Journalof MarineResearch [59, 6 ofaunifyingobjective for oceanographicdata assimilation, such as the driving force of numericalweather prediction for meteorology,caused oceanographers to de Ž ne data assimilationmuch more broadly (Malanotte-Rizzoli and Tziperman, 1996). In physical ,data assimilation refers tomanydifferent techniques of combiningdynami- calmodels with observations, and is not only used for oceanforecast (Peloquin, 1992; De Mariaand Jones, 1993; Carnes et al., 1996;Aikman et al., 1996),but is alsooften used to quantitativelyand systematically test and improve poorly known subgrid-scale parameter- izationsand boundary conditions that are abundant in many numerical models (Seiler, 1993;Lardner and Das, 1994;Gunson and Malanotte-Rizzoli, 1996a; Ullman and Wilson, 1998). Justas the scienti Ž ccommunitywitnessed the evolution of a newgeneration of data assimilativephysical oceanographic models in the 1970s and 1980s, in the past several years the Ž eldof dataassimilative biogeochemical ocean modeling has begun to emerge; however,just as the many innate differences between meteorology and physical oceanog- raphyhave caused data assimilation goals and techniques to evolve very differently in these two Ž elds,differences between physical and biological oceanography are also resultingin substantial differences between these types of dataassimilation (Hofmann and Friedrichs,2001a). Because biological systems have no analog to the  uiddynamicists ’ Navier-Stokesequations, ecosystem models are by necessity empirical, nonlinear, and aboundwith poorly known formulations. Such models typically include large numbers of parametersthat are dif Ž cult,or even impossible to measure with current oceanographic instrumentation.As aresult,data assimilation methods are often applied to marine ecosystemmodels in orderto testdifferent parameterizations of biogeochemicalprocesses (Matear,1995) and to estimate optimal parameter values (Evans, 1999; Vallino, 2000). Spaceand time scales in physical and biological oceanographic systems can also be different.The rapid doubling rates of themost abundant species O(1 d21) maycause the synoptic time scales of these systems to resemble their meteorological counterparts( O(days))as opposedto the physicaloceanographic scales of several months. Furthermore,just as thereare much less physical oceanographic data than meteorological data,there may be equallyless biological data than physical data. For thesereasons, the application of data assimilation techniques to biogeochemical oceanmodels, and speci Ž callyto marine ecosystem models, presents many exciting new challenges(Hofmann and Friedrichs, 2001a). In the past decade large interdisciplinary programs,e.g. the Joint Global Ocean Flux Study (JGOFS), haveincluded model predictionand forecast as speci Ž cresearchobjectives (Sarmiento et al., 1987;Abbott, 1992).However, new studies are revealing that much more work needs to be performed beforethis becomes a realisticand achievable goal (Sarmiento and Armstrong, 1997). It is rapidlybecoming evident that until high resolution biological and chemical data are availableover large regions of the ocean, and until a muchclearer understanding of the intricaciesof marineecosystems is attained,data assimilation in ecosystemmodels will be moreuseful for modelimprovement and parameter estimation (Matear, 1995; Fasham and 2001] Friedrichs:Data assimilative marine ecosystem model 861

Evans,1995; Prunet et al., 1996a,b;McGillicuddy et al., 1998; Spitz et al., 1998;Evans, 1999;Hurtt and Armstrong, 1999; Vallino, 2000) ratherthan model prediction and forecast (Hofmannand Friedrichs, 2001b). Manymethods for combiningmodel dynamics with data exist. One particularly straightforwardmethod entails replacing the model solution with data whenever such informationis available.This technique, referred toas data insertion, integrates the model forward intime until additional observations become available, at whichpoint the model is reinitialized,and the process repeated. In an initialapplication of datainsertion in physical oceanography,Holland and Hirschman (1972) attempted to estimate velocity Ž elds by insertingtemperature and salinity data into relatively complex ocean models. Model-data inconsistenciescaused the resulting simulations to comparepoorly with observations. This ledto the development of techniques in which the model solution is nudged toward observations,instead of beingreplaced by observations,whenever data become available. Althoughthis method represents a signi Ž cantimprovement over the earlier insertion technique,it still lacks a meansby which information on data uncertainty can be incorporated,and does not providean estimate of theuncertaintiesof the resulting solution (Sarmientoand Bryan, 1982; Holland and Malanotte-Rizzoli, 1989; Malanotte-Rizzoli and Young,1995). Data insertion and nudging have also been applied to simple marine ecosystemmodels (Ishizaka, 1990; Armstrong et al., 1995).In these studies, chlorophyll estimatesmade from satelliteocean color measurements were theonly biological data availablefor assimilation.Although estimates of phytoplanktonbiomass were improvedas aresultof theassimilation, the accuracyof other model components was reduced.Nudging hasalso been used to keepan ecosystem model from drifting(Moisan and Hofmann, 1996) ortorepresentunresolved biological processes (Najjar et al., 1992). Moresophisticated assimilation schemes such as optimalinterpolation and the Kalman Ž lterhave been successfully applied to weather forecast models by meteorologists, yet holdlittle hope for marineecosystem models because of the inherent nonlinearities of biologicalsystems. Instead, schemes which have recently been applied to nonlinear physicaloceanographic systems, such as the variational adjoint method (Tziperman and Thacker,1989), the Extended Kalman Filter (Evensen, 1992), and simulated annealing (Barthand Wunsch, 1990; Kruger, 1993), may be moreapplicable to biological problems. Thesemethods can be used to determine an optimal solution that maximizes agreement betweenthe model solution and the observations. Both the variational adjoint method and simulatedannealing have been successfully used to assimilatebiological data into marine ecosystemmodels (Matear, 1995; Hurtt and Armstrong, 1996, 1999; McGillicuddy et al., 1998;Friedrichs, 2001), and have their respective advantages and drawbacks. The variationaladjoint method can be usedto ef Ž cientlyestimate best- Ž tvaluesfor unknown modelparameters and rate coef Ž cients;however, because of thenonlinear nature of marine ecosystemmodels, it ispossiblethat an adjoint method may lead to multiple suboptimal parametersets, thereby necessitating techniques for identifyinga singleoptimal set of parameters.The stochastic, ‘random-walk ’ natureof simulated annealing may allow the 862 Journalof MarineResearch [59, 6 investigationof more parameter values resultin gina greaterability to identify the globaloptimum parameter set, but this techniqu eisless ef Ž cientthan the adjoint method.Both simulate dannealingand the Extende dKalmanFilter may be computa- tionallytoo intensiv etobe ofwide-scaleusein large-sca lebiologi caloceanog raphic assimilationanalyses . Beforethe application of thesetypes of dataassimilation techniques to marine ecosys- temmodels becomes routine, a numberof methodologicalissues need to be addressed.For example,it is notyet clear under what conditions different assimilation schemes will be mostappropriate, what sampling strategies will be optimal,or whatlevels of imprecision orinaccuraciesin themeasurements will be acceptable.One method for addressingsuch methodologicalissues is through the use of numerical twin experiments, in which model-generateddata are assimilated into a model.Although the utility of numerical twin experimentsis well accepted within the Ž eldsof meteorologyand physical oceanography (Ghiland Malanotte-Rizzoli, 1991; Sheinbaum, 1995; Gunson and Malanotte-Rizzoli , 1996a,b),this approach has only recently been applied to ecosystem modeling analyses. Lawson et al. (1995;1996) Ž rst usedthese types of experiments to demonstrate the feasibilityof applyingthe adjoint method to recoveroptimal values for variousecosystem parameters.They determined that the assimilation of data at monthly intervals was adequatefor therecovery of most rate parameters. Although bi-weekly data collection signiŽ cantlyenhanced the results, weekly assimilation provided no furtherimprovement. Harmonand Challenor (1997) used numerical experiments to assess the success of anew MonteCarlo Markov Chain data assimilation algorithm speci Ž callydeveloped for usewith highlynonlinear ecosystem models. Twin experiments have also been used to determineif modelparameters can be estimatedindependently, and thus whether or nota givenmodel willneed to be simpli Ž ed (Spitz et al., 1998).Even more recently, Gunson et al. (1999) seedednumerical  oatsin an eddy-permitting basin-scale model of the North Atlantic Ocean,and followed their trajectories with a one-dimensionalbiological model. Using twinexperiments to assimilate simulated ocean color data, the authors were ableto determinewhich parameters could be recovered in different regions of themodel domain. Inthis paper, the ecosystem model described by Friedrichs and Hofmann (2001; hereafterreferred toasFH01) is run in a dataassimilative mode. In the following section, methodologicaldetails of theecosystem model, the adjoint method, and the numerical twin experimentsare presented. Twin experiments are then conducted in order to (i) compare theresults of simpleand sophisticated assimilation schemes, (ii) determine the effects of assimilatingbiological data that contain random errors andbiases, and (iii) assess both the suitabilityof assimilating in situ cruisedata versus remotely-sensed ocean color data as wellas theeffects of varying sampling strategy and frequency of datacollection (Section 3).The paper concludes with a discussionand summary section (Section 4) as well as a sectiondescribing some future challenges for dataassimilative marine ecosystem model- ing(Section 5). 2001] Friedrichs:Data assimilative marine ecosystem model 863

Figure1. Schematic of the Ž ve-componentecosystem model: phytoplankton ( P),( Z), ammonium ( A), nitrate (N)anddetritus ( D).Adaptedfrom Friedrichs and Hofmann (2001).

2. Methods a.Ecosystem model Theecosystem model of FH01 contains Ž vemodel components (phytoplankton ( P), zooplankton( Z),ammonium( A), nitrate (N),anddetritus ( D);Fig.1) and has been developedfor usein the central equatorial Paci Ž c(i.e.0N, 140W). Because of therapid growthrates of the picoplankton and microzooplankton in this region ( O(1 d21)), the biologydirectly responds to physicalforcing on relatively short (daily) time scales. As a result,the model is not forced by oceangeneral circulation model output as istypicalfor one-dimensionalmarine ecosystem models (Fasham et al., 1993;Leonard et al., 1999; McClain et al., 1999;Gunson et al., 1999),rather it is forcedusing daily observations of solarradiation, temperature ( T),andvelocity ( u, v)obtainedfrom theextensive Tropical- AtmosphereOcean (TAO) mooringarray (McPhaden et al., 1998).Daily vertical pro Ž les ofverticalvelocity ( w( z))arealso computed from TAOdata( u, v, and T)accordingto the methodof FH01.Based on recentobservations from theequatorial Paci Ž c,ironlimitation ofthe phytoplankton is assumed a priori (Frost,1996; Behrenfeld et al., 1996).Daily verticalpro Ž lesof iron concentration are derived from theempirical formulation of Gordon et al. (1997)combined with a deep Fe:T relationship(Friedrichs, 1999; FH01). A briefoutline of the model equations is included in the Appendix and a morethorough discussionof themodel formulations is given in FH01. Based on areviewof theliterature describingthe biology of theequatorial Paci Ž c,bestestimates of themodel parameters are made(Table 1). Model runs begin on September1, 1991 (Fig. 2). b.Variational adjoint implementation Inthis study, both the relativelystraightforward technique of datainsertion as wellas the moresophisticated adjoint method are applied to theFH01ecosystem model. We providea briefoverview of howthe variational adjoint method is speci Ž callyapplied to the FH01 ecosystemmodel. Detailed descriptions of the variational adjoint method appear in Wunsch(1996) and Bennett (1992). 864 Journalof MarineResearch [59, 6

Table1. Values, units, and de Ž nitionsof theparameters used in thebiological model. Parameter Value Units DeŽ nition

21 wZ 0.5 m d Z sinkingrate 21 wD 12 m d D sinkingrate 2 1 wP 0.45 d P loss rate 2 1 wZ 0.75 d Z loss rate g 0.75 (none) Z assimilationef Ž ciency b 0.3 (none) A regenerationfraction 2 1 rD 1 d D regenerationrate g 29 d2 1 Z maximumgrazing rate L 1 (mmol N m23 )21 Ivlevgrazing parameter 2 3 kF e 0.034 mmol Fe m Ironhalf-saturation value a 29 d2 1 (E m22 h21 )21 Initialslope of P-I curve 2 1 PM 14 d Max.rate of 23 kA 0.1 mmol N m Ahalf-saturationvalue dc 45 m Depthof max. P, A conc. 2 3 Fem i n 0.026 mmol Fe m Min.concentration of iron 2 4 mF e 0.001 mmol Fe m Slopeof ironpro Ž le i.Cost function. TheFH01 ecosystem model is run forward intime using the parameter valueslisted in Table 1, in order to obtaina valueof thecost function, which is de Ž ned to beameasureof themis Ž tbetweenthe predicted variables ( a)andobserved variables ( aˆ ). Assumingthat the errors inthe data set to beassimilatedare uncorrelated, the general cost function, J,canbe mathematicallyexpressed as a weightedsum of squares:

1 M N J 5 w ~a 2 aˆ !2. (1) 2 O O i ij ij i51 j51

Thesums are carried out over the number of time-steps( N days)as wellas thenumber of variables (M 5 5)for whichobservations are available: P, Z, A, N,andthe rate of primaryproduction ( PP).Theweights, wi,accountfor differencesin the relative magnitudesof theecosystem components. For example,if dataare not available then wi 5 0.If dataare available, then wi 5 maxi (^ai&)/^ai&, where ^ai& isthetime average of ai overthe entire two-year model simulation, and max i (^ai&)isthemaximum value of these Ž vetime-averages ( ^PP&, ^P&, ^Z&, ^ A&, ^N&) (Lawson et al., 1996).Numerical twin experimentsare used aposteriori totest the sensitivity of theassimilation results to this particularchoice of weights. On average, doubling or halving any single value of wi changedthe errors inthe resulting parameter estimates by less than 3%. Although it is possibleto include separate terms in thede Ž nitionof thecost function in orderto allow a priori constraintsor penaltieson theparameter values (McGillicuddy et al., 1998;Gunson et al., 1999),because of thehigh degree of uncertainty associated with the values of many ofthe ecosystem parameters, and in order to maintain the highest possible degree of freedomin themodel control variables, no upperor lower limits are imposed on anyof the parametervalues. Anadjoint, or backward model, is used to compute the gradients of the cost function 2001] Friedrichs:Data assimilative marine ecosystem model 865

Figure2. Fifty simulated time series (thin lines) of (a) rP P (t), (b) rP (t), (c) rZ (t), (d) rA (t), and (e)

rN (t)averagedover fourteen days and generated using imperfect random values for the model controlvariables which fall within the a priori uncertaintyranges of Table3. Thetrue simulation (heavylines; Cˆ (t))isgenerated using values of 1.0 for all six scaled control variables. All simulationsrepresent vertical averages over the euphotic zone.

withrespect to the input parameter set, frequently referred toas theset of ‘modelcontrol variables.’ Thetechnique developed by Lawson et al. (1995)is used to construct the adjointcode directly from themodel code by means of Lagrange multipliers. Values of thesegradients are passed to a variable-storagequasi-Newton optimization procedure (Gilbertand Lemarechal, 1989) which computes the optimal direction toward the mini- mumof thecost function, and the optimal step size in that direction. New valuesof the controlvariables that yield a smallermis Ž tbetweenthe model solution and observations arethen used to rerunthe model, and recompute the cost function. The process of running theadjoint model and the optimization routine continues iteratively until the control variablesconverge to values which maximize agreement between the model-derived 866 Journalof MarineResearch [59, 6 quantitiesand the observations; i.e., until the minimum of thecost function is identi Ž ed. Uncertaintiesin therecovered values of thecontrol variables are computed from a Ž nite differenceapproximation of the Hessian matrix of the cost function at its minimum (Tzipermanand Thacker, 1989; Matear, 1995; Gunson and Malanotte-Rizzoli, 1996b). ii.Model control variables. Modelcontrol variables can consist of avarietyof unknowns, suchas componentinitial conditions, empirical model coef Ž cients,and parameters related tomodel forcing. Because ecosystem model components (e.g., plankton concentrations) typicallybecome independent of theirinitial values quite quickly, it isoftennot necessary norfeasible to usethe adjoint method to estimate component initial conditions (Friedrichs, 1999).Conversely, the numerous poorly known model parameters which characterize manyecosystem models may constitute appropriate choices for thecontrol variables; however,the inherent nonlinearities associated with most ecosystem models frequently causemany model parameters to be highly correlated (Matear, 1995). If correlated parametersare chosen as controlvariables in anadjointanalysis, the system may become underdeterminedresulting in large uncertainties in the estimated parameter values. An alternativeapproach consists of choosingthe control variables to bethe subset of model parameterswhich are uncorrelated, or atmost weakly correlated (Friedrichs, 1999, 2001). Althoughformal methods exist for determiningthe degree of correlationbetween pairs ofmodelparameters (Matear, 1995), in thisstudy the results of asensitivityanalysis are usedto estimate the degree of correlation between various model parameters. The sensitivityof a certainmodel component or model diagnostic ( C)toa givenmodel parameter (k)canbe quanti Ž edby calculatingthe normalized sensitivity ( SC,k) deŽ ned as thefractional change in C dueto afractional(25%) increase in the value of k (FH01):

CR 2 CS

CS SC,k 5 . (2) kR 2 kS

kS

Inthis notation, CR isthe steady-state value of P, Z, A, N, or PP obtainedusing the referenceparameter value ( kR;Table1), whereas CS istheanalogous value obtained using kS 5 1.25kR. (Note that SC,k isagoodmeasure of sensitivityonly as longas C does not havemaxima or minima over the interval kR to kS.Experimentsusing other fractional increases(10%, 20%) indicate that this is thecase for nearlyall the parameter values listed in Table 2.)

Similaritiesin theabsolutemagnitudes of SP, SZ, SA, SN, and SPP (Table2) for wZ and g suggestthat these parameters are highly correlated. Similarly, the SC for L areabout twice that of g,indicatingthat these two parameters are also correlated. These two pairs of zooplankton-relatedparameters are distinct from theiron-related parameters which also appearto be highly correlated: kFe, Femin, and mFe.Twoother correlated parameter 2001] Friedrichs:Data assimilative marine ecosystem model 867

Table2. Results of model sensitivity analysis. Sensitivities ( SC )toparameters are computed

accordingto Eq. (2). Dashes indicate model insensitivity ( u Sc u , 0.15)toparameterchanges.

Parameter SP SZ SA SN SP P g 20.4 20.3 — 20.3 20.4 L 20.8 20.5 0.2 20.5 20.7 wZ 0.5 20.6 — 0.3 0.4 g 20.4 0.7 — 20.3 20.4 kF e — 20.9 20.7 — 20.5 Fe m i n — 0.7 0.6 — 0.3 mF e — 20.3 20.2 — — PM — 1.0 — — 0.5 a — 0.8 — — 0.4 dc — 20.6 — — 20.3 wP — 21.0 — — — kA — — 1.0 — — rD — — 0.2 — — wD — — 20.2 — — wZ — — — — — b — — — — —

tripletsinclude PM, a, and dc, as well as kA, rD, and wD. Parameter wP isrelatively independentof theother model parameters. Basedon thissensitivity analysis, the set of modelcontrol variables is chosento include wP,aswell as one parameter from eachof the Ž vecorrelated parameter pairs/ triplets discussedabove: wP, PM, kFe, wZ, g and rD. The speciŽ cparameterchoice (from thetwo orthreeincluded in the pair/ triplet)is generallybased on thesensitivity of thecostfunction toeach parameter. Since the value of kFe typicallyaffects the value of thecost function muchmore than either Femin or mFe,theset of controlvariables includes kFe. An aposteriori error analysisindicates that the assimilation results are not sensitive to the speciŽ cchoiceof thesesix parameters. For example,similar results are obtained if the modelcontrol variables PM and rD arereplaced with a and kA,respectively.Similarly, the meanerror associatedwith the set of six recovered control variables is independent of whetherthe Ivlev grazing parameter ( L)orthemaximum grazing rate ( g)ischosen as a modelcontrol variable; however, if both L and g areincluded as control variables, the meanerror inthe (seven) model control variables increases by more than an order of magnitude.As discussedabove, it is the correlation of L and g thatcauses this increased error. Onlyindependent model parameters can be successfullyrecovered in a least-squares typeof analysis. Valuesof thecontrol variables vary over many orders of magnitude(Table 3). To avoid precisionproblems with the data assimilative model, these values are scaled to give the controlvariables more uniform magnitudes. The control variables are non-dimensionaliz ed as: PM 5 c1P9M, wZ 5 c2w9Z, kFe 5 c3k9Fe, wP 5 c4w9P, g 5 c5g9, rD 5 c6r9D, where ci areconstant scaling factors (Table 3). Scaled control variables with values of 1.0represent thebest estimates available from experimentalstudies; however, these best estimates are associatedwith relatively high levels of uncertainty.To obtain growth and grazing rates in 868 Journalof MarineResearch [59, 6

Table3. Scaled control variables and scaling factors. Scaledcontrol A priori rangeof values variable forscaled control variables Scaling factor Value of scalingfactor

21 P9M 1.0 6 0.3 c1 14 d 21 w9Z 1.0 6 0.3 c2 0.75 d 23 k9F e 1.0 6 0.3 c3 0.034 mmol Fe m 21 w9P 1.0 6 0.3 c4 0.45 d 21 g9 1.0 6 0.5 c5 29 d 2 1 r9D 1.0 6 0.5 c6 1.0 d

agreementwith those of Landry et al. (1995)and Verity et al. (1996)(Table 6 ofFH01), values of P9M and k9Fe couldbe anywhere within the range 1.0 6 0.3,and that of g9 could be withinthe range 1.0 6 0.5(Table 3). Similarly, non-negative nitrate concentrations and f-ratiosin agreementwith those of McCarthy et al. (1996)are obtained if r9D isin the range

1.0 6 0.5.Estimates of w9P and w9Z arenearly nonexistent, yet themodel is quitesensitive to choicesfor theseparameters; e.g., the model fails if the value of w9Z exceeds1.3. As discussedin more detail below, these uncertainty ranges (Table 3) are used to randomly select a priori valuesfor themodel control variables. Thereis noguaranteethat the adjoint method will always identify the global minimum ofthe cost function; it is possible that different relative minima may be identi Ž ed, dependingon theinitial a priori valuesassigned to the model control variables. Therefore, inorder to test the sensitivity of the assimilation results to the speci Ž c a priori values initiallyassigned to the model control variables, a uniformrandom number generator is usedto select Ž ftydistinct a priori estimatesfor eachof the six model control variables suchthat they are constrained to fall within the uncertainty ranges of Table3. If thesame setof optimalparameter values is returnedin eachof the Ž ftyassimilation runs, it ishighly likelythat the global minimum of thecost function has been located. c.Numerical twin experiments Ina numericaltwin experiment, a modelis initially run in order to provide a “true” simulatedtime series ( Cˆ (t)),whichis subsampled in order to create a model-generated syntheticdata set. The model is thenrun a secondtime using an imperfect set of model controlvariables, to generate a “reference” (noassimilation) time series ( rc(t)).Thissame imperfectparameter set is used in thethird and Ž nalmodel run, but this time the synthetic dataare assimilated into the model in order to generate a newtime series ( C(t)). The successof thetwin experiment is judged by thedegree of agreementbetween these results andthe true simulation. Inthis analysis, numerical twin experiments are performed (i) tocompare results obtainedusing data insertion with those obtained via the adjoint method, (ii) to determine theimpact of assimilatingdata containing known levels of randomnoise and biases, and (iii)to examine whether the sampling strategies of the U.S. JGOFS EqPacexperiment 2001] Friedrichs:Data assimilative marine ecosystem model 869

Table4. Descriptions of datasets used in numerical twin experiments. Data set Cruise data Satellitedata DSI PP, P, Z, A, N everyother day for 20 days none startingon YD276* DSII none P everyother day for six months startingon YD1 DSIII PP, P, Z, A, N onYD87 and YD89 P everyother day for six months startingon YD1 *Thenotation YD276 refers to the 276th day of 1992.

(Murray et al., 1994,1995) and the remotely-sensed ocean color data available from the Sea-viewingWide Field-of-view Sensor (SeaWiFS) (McClain et al., 1998)are adequate for parameterestimation studies. Toaccomplish these goals, three synthetic data sets, subsampled using different samplingstrategies, are assimilated. Data Set I (DSI; Table4) isobtainedby subsampling thetrue simulation using a samplingstrategy characteristic of the JGOFS EqPacTime SeriesII cruise: PP, P, Z, A, and N aresampled every other day for 20daysbeginning on YD276.Data Set II (DSII) isobtained by subsampling the true P simulationevery other dayfor 6monthsbeginning on YD1, andis typical of thedata available from SeaWiFS.As aresultof cloudcover and satellite path, ocean color data within a 10 ° longitudeand 2 ° latituderegion centered on 0N,140W, are typically available 50% of thetime. Data Set III combinesData Set II with PP, P, Z, A, and N cruisedata from twodays in 1992. Becausebiological and chemical measurements may contain large uncertainties, it is crucialto quantify the effect of assimilatingmodel-generated data with known levels of noise. As a Ž rst approximation,instrument imprecision is accounted for byaddingvarious levelsof uniformrandom noise to thesynthetic data sets. For example,a timeseries with 20%random noise is created by randomly generating a timeseries of numberswithin the range [20.2,0.2], multiplying this by the original (true) time series, and adding the resultingtime series to the original time series. A dataset associated with 20% random noisethus represents data to which a maximum of20% noise has been added. The deleteriouseffects of assimilating biased data are also investigated. A dataset with 20% biasis created by multiplyingthe true simulated time series by aconstantfactor of 1.2. Toquantifythe success of aparticularnumerical experiment, two measures of mis Ž t are calculated.First, the variable Rc isintroduced to quantify the original disagreement that existsprior to the assimilation process; i.e., the mis Ž tbetweenthe true simulation of component C computedusing values of 1.0for eachof thescaled model control variables (Cˆ (t)),andthe reference simulation generated using a randomlyselected (from the a priori rangesof valueslisted in Table 3) imperfect set of controlvariables ( rc(t)): ˆ u C~t! 2 rc~t!u R ~t! 5 . (3) c Cˆ ~t! 870 Journalof MarineResearch [59, 6

Time-averaged(over the 730 day model run) values of this a priori misŽ t are deŽ ned: ˆ ˆ ˆ ^Rc& 5 ^u C(t) 2 rc(t)u &/^C&. Because ^C& isalways nonzero, this particular de Ž nition ensuresthat ^Rc& remains Ž nite.The mean a priori misŽ ts for the Ž ftyreference simulations are: ^RPP& 5 0.15, ^RP& 5 0.15, ^RZ& 5 0.27, ^RA& 5 0.25, and ^RN& 5 0.20. Asecondmeasure of mis Ž t, Cc(t), is deŽ nedas the mis Ž tthatremains after the assimilationexperiment is complete, i.e. the mis Ž tbetweenthe true simulation ( Cˆ (t)) and thesimulated time series obtained from anassimilationexperiment ( C(t)):

u Cˆ ~t! 2 C~t!u C ~t! 5 . (4) c Cˆ ~t! ˆ ˆ Thetime-averaged value of thepost-assimilation mis Ž t is deŽ ned: ^Cc& 5 ^u C 2 Cu &/^C&. ˆ Notethat although Cc(t)mayapproach in Ž nity if C(t)approacheszero, ^Cc& remains Ž nite since ^Cˆ & isalwaysnonzero.

d.Reference simulations ( rc(t))

Fiftyreference simulations for phytoplankton( rP(t)),zooplankton( rZ(t)),ammonium (rA(t)),nitrate( rN(t))andprimary production ( rPP(t))(Fig.2) aregenerated using Ž fty distinctsets of a priori parameterestimates (randomly chosen such that they are constrainedto fall within the uncertainty ranges of Table3). These simulations illustrate howlarge differences in plankton and nutrient concentrations can result from relatively smallchanges in parameter values. Even if relatively conservative estimates of parameter uncertaintiesare made (Table 3), model components can often not be constrainedto within afactorof 5ormore.For example,nitrate concentration at the end of 1993could range anywherefrom 5mmolN m 23 to 40 mmol N m23 (Fig.2), depending on the various reasonablechoices made for themodel parameters. Although generally the Ž ftyreference simulationsare within 630–50%of the true simulation, during the time periods when zooplanktonand ammonium reach particularly low concentrations, zooplankton and ammoniumcan differ from thoseof thetrue simulation by more than 100%. Thecharacteristics of the nitrate simulations are very different from thoseof PP, P, Z, and A:whereasthe envelopes for theplankton and ammonium simulations appear to remaintight and nearly constant in time, the envelope of the 50 nitrate simulations broadensover time. This is becauseplankton and ammonium concentrations are dominated bybiologicalprocesses which have very short O(1 d21)timescales, and are characterized by a quick (2–5day)loss of memoryof theinitial conditions. Because the Figure 2 time seriesare averaged over a timeperiod (two weeks) which exceeds the time period of this memoryloss (several days), it appears that these model components instantaneously becomeindependent of theirinitial conditions, even though in actuality this occurs over a timespan of several days. Thus, the width of the envelope of the 50 PP, P, Z, and A simulationsis notgoverned by initialconditions, but rather is afunctionof theranges of parametervalues used for the Ž ftysimulations. On the contrary, nitrate concentration is primarilydetermined by physicaladvective processes which have much longer time scales. 2001] Friedrichs:Data assimilative marine ecosystem model 871

As aresult,nitrate remains dependent on initialconditions throughout the entire two-year simulation,causing the envelope of 50 simulationsto graduallyexpand through time.

3. Results a.Comparison of assimilationschemes i.Data insertion. Datainsertion experiments are performed in which Data Sets I andII (Table4) are inserted into the ecosystem model. The insertion of Data Set I (synthetic cruisedata) results in timeseries of Cc(t)whichare identical to those of Rc(t)untilthe day on which the Ž rst dataare inserted on YD276(Fig. 3). On this date, and on theother nine dayswhen data are inserted, Cc(t)dropsto zero as expected; however, on alternate days whendata are unavailable (YD277, YD279. . .)valuesof Cc arelarge. Because the syntheticdata are derived using parameter values that may be quitedifferent from those usedto run the model,the insertion of datamay cause instabilities in the model. As aresult, itis possible for valuesof Cc computedafter the insertion of DSI (e.g. CP and CA on YD277,YD279) to exceed the corresponding values of Rc computedwithout data assimilation.

After allavailable data have been inserted, values of CPP, CP, CZ and CA overshoot thecorresponding a priori timeseries ( Rc),indicatingthat as soonas data are no longer available,a deteriorationin simulation skill quickly results. Only 5 –10daysafter the insertionis complete, the assimilation time series converge to the corresponding reference (noassimilation) time series. This rapid convergence is due to the insensitivity of the modelto thecomponent initial conditions, which in turnresults from theshort ( O(1 day)) timescales of thedominant biological processes (e.g. growth, grazing). Thus, the success ofassimilation schemes such as data insertion and nudging will depend on time series observationsof nutrientconcentrations and plankton biomass consistently being available atthese time scales ( O(days)). Althoughthe DSI insertionresults in only a temporaryimprovement in simulationskill for PP, P, Z, and A,thereis arelativelylong-term improvement in thesimulation skill of nitrate(Fig. 3e). Whereas plankton biomass and ammonium concentration are governed by biologicalprocesses with time scales of O(1day),nitrate concentration is largely deter- minedby physical (advective) processes with much longer time scales. The nitrate simulation,therefore, remains dependent on theinitial condition for amuchlonger period oftime. The continual reinitialization of the model via data insertion thus provides a signiŽ cantimprovement to the overall simulation skill of nitrate, whereas any improve- mentfor phytoplankton,zooplankton or ammoniumis short-lived. Ina secondexperiment, Data Set II (syntheticocean color data) is inserted into the FH01 model.As expected, CP 5 0atthe speci Ž ctimeswhen observations are inserted into the model(Fig. 3b); however, on alternatedays when data are not available, values of CP are justas high as the a priori misŽ ts (RP)computedfrom thenon-data assimilative simulation.For theyear of 1992, the DSII insertionresults in an average reduction in 872 Journalof MarineResearch [59, 6

Figure3. Comparison of model-datamis Ž tsgenerated without data assimilation ( RC (t);thickline), withthe insertion of DSI (solidline), and with the insertion of DSII (dot-dashline) for (a) primary production,(b) phytoplankton, (c) zooplankton, (d) ammonium, and (e) nitrate. Speci Ž c times of DSI insertionare denoted by x;DSII insertionoccurs every other day throughout 1992. Note the change in y-axisscale in (b). modeldata mis Ž tfor phytoplanktonfrom 0.06to 0.02,and little or noimprovementin the simulationskill of the remaining model components. In fact, over this time period the simulationskill of nitratedeteriorates from an a priori value of RN 5 0.25 to CN 5 0.29. Furthermore,as soon as data are no longer inserted, there is a rapidconvergence of the assimilationand no-assimilation time series: for example,only seven days after the last oceancolor data point is inserted, the post-assimilation mis Ž t (CC) and a priori misŽ t (RC)areindistinguishable . ii.Variational adjoint method. WhenDSI (syntheticcruise data) is assimilatedusing the adjointmethod, the exact values of the model control variables are perfectly recovered. 2001] Friedrichs:Data assimilative marine ecosystem model 873

Figure4. (a) Value of the cost function as afunctionof iteration number for theadjoint assimilation ofDSI, DSII andDSIII (a), and values of thescaled model control variables as afunctionof the numberof iterations for the assimilation of DSI (b),DSII (c),and DSIII (d).

Thecost function is reducedfrom O(103) to O(102 8)inroughly25 iterations(Fig. 4a), andthe six model control variables are recovered nearly simultaneously (Fig. 4b). Forty-nineadditional twin experiments are conducted in which the scaled model control variablesare randomly assigned different initial values within their respective a priori ranges(Table 3). In each case the same values of the control variables are recovered, indicatingthat the global minimum of the cost function has been identi Ž ed. If the a priori rangesof the model control variables are tripled, perfect parameter recoveries are still possible,however the number of iterationsrequired is roughly 50% larger. Similar results arealso obtained when fewer data,e.g. 4 daysof PP, P, Z, A, and N cruisedata, are assimilated.When even fewer dataare assimilated, multiple values for themodel control variablesare recovered, and it becomes increasingly dif Ž cultto identify the absolute minimumof the cost function. Theassimilat ionof DSII(syntheticoceancolor data) results in theperfect recovery of Ž veof the six control variable sinall Ž ftynumerica lexperiments;however, the valueof the recyclin gparameter( rD)alwaysremains unchang ed(Fig. 4c). Because primaryproducti onandplankto nconcentrationsare indepen dentof thevalue assigned to rD,theassimilat ionof adataset that does not include nutrient concent rationscannot 874 Journalof MarineResearch [59, 6

Figure5. Time series of (a) CP P (t), (b) CP (t), (c) CZ (t), (d) CA (t), and (e) CN (t)averagedover 14days,generated using parameter sets recovered via the adjointassimilation of DSII.Results are shownfor 50 different a priori estimatesof the control variables. In (a), (b) and (c), mis Ž ts are alwaysequal to zero.Shaded regions indicate plus and minus one standard deviation of themean

of the Ž ftyreference (no assimilation) time series, RC (t).

recoverany informat ionon rD.As aresult,these assimilat ionexperime ntsresult in simulatedtimeseries of PP, P, and Z thatexactly reproduc ethedata, whereas the simulatedtimeseries of ammoniu mandnitrate, which depend heavily on the value assigned to rD,producemis Ž tsthatshow little improvem entover their pre-assim ilation values(Fig. 5). Theassimilationof DSIII, whichsupplements DSII withtwo days of PP, P, Z, A, and N cruisedata, results in theperfect recovery of allsix control variables (Fig. 4d), and hence

CPP(t) 5 CP(t) 5 CZ(t) 5 CA(t) 5 CN(t) 5 0.Furtherexperimentation reveals that perfectparameter recoveries are also possible even if only 3 months,rather than six months,of phytoplankton data are supplemented with two days of cruisedata. 2001] Friedrichs:Data assimilative marine ecosystem model 875

Table5. Summary of variationin scaledcontrol variables recovered from assimilation of DSI with 20%randomnoise. Values of 1.00indicate perfect parameter recoveries.

Realization P9M w9Z k9Fe w9P g9 r9D

1 1.01 6 0.04 0.98 6 0.01 0.98 6 0.03 1.11 6 0.04 0.95 6 0.02 0.99 6 0.03 2 0.99 6 0.04 1.05 6 0.02 1.00 6 0.03 0.98 6 0.05 1.06 6 0.02 1.04 6 0.03 3 0.91 6 0.04 0.99 6 0.01 1.06 6 0.03 0.81 6 0.04 0.96 6 0.02 0.97 6 0.02 4 0.99 6 0.04 0.98 6 0.01 0.96 6 0.03 1.00 6 0.04 0.89 6 0.02 0.95 6 0.02 5 0.95 6 0.04 1.01 6 0.02 0.99 6 0.03 1.05 6 0.04 0.98 6 0.02 0.93 6 0.02 6 0.93 6 0.04 1.01 6 0.01 1.03 6 0.03 0.83 6 0.04 1.00 6 0.02 1.00 6 0.03 7 0.90 6 0.03 1.06 6 0.02 1.02 6 0.03 0.85 6 0.04 1.06 6 0.03 1.00 6 0.03 8 1.14 6 0.06 0.97 6 0.02 1.01 6 0.03 1.21 6 0.05 1.00 6 0.02 0.96 6 0.03 9 0.96 6 0.04 0.95 6 0.01 1.05 6 0.03 0.89 6 0.04 0.98 6 0.02 0.98 6 0.03 10 0.81 6 0.03 1.01 6 0.02 0.94 6 0.03 0.86 6 0.04 1.02 6 0.03 0.88 6 0.02 Meanpercent error over tenrealizations* 7% 3% 3% 12% 4% 4%

10 *Meanpercent error overten realizations is computed as: 1 00 ¥j5 1 u yj 2 1.0u /10where the yj are theten realizations of each scaledcontrol variable. b.Variational adjoint assimilation of imperfectdata Inthe numerical twin experiments described thus far, thesynthetic data have been perfectlyconsistent with the model dynamics; however, in reality inconsistencies between dataand model arise from observationalerrors, includingmis Ž tsdue to subsampling spatiallyvariable Ž elds,as well as missing model dynamics. To examinethe rami Ž cations ofsuchinconsistencies, numerical experiments are conducted with data containing various levelsof randomnoise and bias. i.Data with random noise. Twentypercent random noise is added to Data Set I andData SetIII, andthe resulting noisy data are assimilated via the adjoint method in separate assimilationexperiments. Because this procedure involves random errors, multipleexperi- mentswill not produce identical results. Therefore, ten realizations (each consisting of Ž fty experiments)are performed for bothDSI andfor DSIII, inwhich different random errors, alwayswith a maximummagnitude of 20%, are added to the synthetic data. For each realization,the same Ž ftysets of initialvalues are used for thecontrol variables; however, all Ž ftyexperiments always result in therecovery of thesame parameter set, indicating that theglobal minimum of thecost function is successfully identi Ž edineachrealization. Recoveredvalues of the scaled control variables differ from thoseused to generate the truemodel simulation. The mean differences between the recovered and true values is about5% for theassimilation of in situ cruisedata (Table 5), and 9% for theassimilationof oceancolor data (Table 6), suggesting that the long (six months) time series of phytoplank- tonchlorophyll does not make up for havingonly two days (as opposedto ten days) of cruisedata available for theremaining model components. The greatest departures betweenthe true and recovered values occur for P9M, w9P, and k9Fe.Althoughthe assimilationof DSI yieldserrors inthe recovery of k9Fe ofonly 3% (Table 5), the assimilationof DSIII resultsin average errors in k9Fe of19%(Table 6). The error inthe recovery of P9M isalso twice as great for theDSIII assimilation(14% versus 7%), whereas 876 Journalof MarineResearch [59, 6

Table6. Summary of variationin scaled control variables recovered from assimilation of DSIIIwith 20%randomnoise. Values of 1.00indicate perfect parameter recoveries.

Realization P9M w9Z k9Fe w9P g9 r9D

1 1.25 6 0.09 1.03 6 0.03 1.18 6 0.02 1.06 6 0.11 1.03 6 0.06 1.00 6 0.06 2 1.11 6 0.11 0.91 6 0.03 1.19 6 0.19 0.91 6 0.02 0.95 6 0.01 0.90 6 0.004 3 0.88 6 0.06 0.99 6 0.04 0.81 6 0.08 1.05 6 0.09 1.01 6 0.05 1.03 6 0.05 4 1.09 6 0.15 0.94 6 0.05 1.09 6 0.19 0.95 6 0.12 0.94 6 0.05 0.97 6 0.04 5 0.92 6 0.12 0.96 6 0.03 0.94 6 0.13 0.98 6 0.05 0.98 6 0.03 1.02 6 0.06 6 1.09 6 0.10 1.04 6 0.02 1.07 6 0.10 1.03 6 0.06 1.03 6 0.01 0.97 6 0.03 7 1.05 6 0.13 1.00 6 0.07 1.18 6 0.23 0.90 6 0.13 0.99 6 0.08 0.97 6 0.03 8 0.75 6 0.04 0.99 6 0.03 0.54 6 0.02 1.20 6 0.06 1.02 6 0.04 1.19 6 0.04 9 0.78 6 0.10 1.06 6 0.08 0.70 6 0.08 1.09 6 0.13 1.13 6 0.09 1.12 6 0.08 10 1.16 6 0.01 0.96 6 0.01 1.20 6 0.10 0.99 6 0.02 0.96 6 0.02 0.98 6 0.01 Meanpercent error over tenrealizations* 14% 4% 19% 7% 4% 6%

10 *Meanpercent error overten realizations is computed as: 100 ¥j5 1 u yj 2 1.0u /10where the yj are theten realizations of each scaledcontrol variable.

theerror intherecovery of w9P issmaller for theDSIII assimilation(7% versus 12%). An examinationof themodel sensitivity results (Table 2) revealsthat variations in thesethree parametershave similar effects on the model components; e.g., all signi Ž cantlyaffect zooplanktonbiomass, but have very little impact on phytoplanktonabundance. This is an indicationthat these parameters are probably correlated, resulting in relatively poor parameterrecoveries with correspondingly high levels of uncertainty.

Theinexact recovery of parametervalues (Tables 5 and6) leads to timeseries of CC(t) thatare no longer identically equal to zero. When the noisy version of Data Set I is assimilated(Fig. 6), the average mis Ž tsfor theten realizations are computed to be:

^CPP& 5 0.04, ^CP& 5 0.02, ^CZ& 5 0.07, ^CA& 5 0.13, and ^CN& 5 0.03. These values are 2– 8timessmaller than the corresponding ^Rc& valuesobtained for thereference (noassimilation) simulations. Thus, even when noisy observations are assimilated using theadjoint method, simulation skill is stillsubstantially improved.

Althoughvalues of CPP(t), CP(t), and CN(t)arelow (typically less than 0.05; Fig. 6), values of CA(t)aregreater, at times exceeding 1.0. These high values occur because ammoniumcan undergo large (order of magnitude)changes in concentration on veryshort timescales ( O(days);FH01, their Fig. 8). As aresult,small changes in eventtiming can causelarge changes in the values of CA(t).Incontrast, the characteristic time scales of chlorophylland nitrate are longer (FH01), andthus small changes in event timing do not causelarge values of CP(t) or CN(t). Experimentswith other levels of random noise (0 – 40%)are also performed for both

DSI andDSIII, aswellas acombinationof DataSets I andIII (Fig.7a – e).Values of ^Cc& increasenearly linearly for allmodel components as themagnitude of randomnoise added tothe synthetic data is increased.Because the ocean color data set (DSIII) containsmore planktondata and fewer nutrientdata than the cruise data set (DSI), theassimilation of noisyocean color data results in values of ^CPP& and ^CP& whichare 40% lower than thoseobtained via the assimilation of noisy in situ cruisedata, whereas ^CA& and ^CN& are 2001] Friedrichs:Data assimilative marine ecosystem model 877

Figure6. Time series of (a) CP P (t), (b) CP (t), (c) CZ (t), (d) CA (t), and (e) CN (t)averagedover 14days,for simulations generated using parameter sets recovered via the adjoint assimilation of DataSet I towhich 20% random noise has been added. Ten random realizations are shown, as describedin text. Shading represents plus and minusone standarddeviation of themean of the Ž fty

reference(no assimilation)time series, RC (t). nearlytwice as high when the ocean color data are assimilated in placeof thecruise data. Notsurprisingly, the highest simulation skill and lowest mis Ž tsareobtained if DataSets I andIII areboth assimilated simultaneously. In this case ^CPP& and ^CP& are nearly identicalto those obtained for theDSIII assimilationcase, whereas ^CZ&, ^CA& and ^CN& areeven lower than those of theDSI assimilationcase (Fig. 7a – e). Whenhigh levels of random noise ( .30%)are added to the synthetic data prior to assimilation,multiple minima of thecost function are found; i.e., the same parameter set is not recoveredfor all50 initialestimates of the control variables. (Results shown in Figure7 representthose associated with the most commonly occurring parameter set.) Even though multipleminima are occurring when 40% random noise is addedto theassimilated data, a doublingof randomnoise from 20%to 40% is generallystill associated with a doublingin 878 Journalof MarineResearch [59, 6

Figure7. Model-data mis Ž ts ^CC & obtainedvia the adjoint assimilation of variousimperfect data

sets. (a) ^C P P &, (b) ^CP &, (c) ^CZ &, (d) ^CA &, and (e) ^CN & areshown as a functionof thepercent randomnoise added to the synthetic data: DSI (dottedline), DSIII (dash-dot line), and both DSI

andDSIII (solid line). Model-data mis Ž ts (f) ^CP P &, (g) ^CP &, (h) ^C Z &, (i) ^CA &, and (j) ^CN & areshown as a functionof the percent bias added individually to the Ž vecomponents of DSI.

Model-datamis Ž tsobtained without data assimilation, ^RC & (dottedlines), are included for reference.

themagnitudes of each ^Cc& component.The one exception is thenonlinearityin ^CA& for theData Set III assimilationcase, which is most likely due to thesemultiple minima. ii.Biased data. Inaddition to random errors, observationsare typically also associated witha certaindegree of bias.Depending on the type of observation,(e.g., rate, plankton biomass,or nutrientconcentration) and the observational platform from whichthe data are collected,(e.g., satellite or ship) varying levels of bias may be present. In order to 2001] Friedrichs:Data assimilative marine ecosystem model 879

Table7. Summary of variation in scaled control variables recovered from assimilation of datasets withvarying levels of bias.Values of 1.00indicate perfect parameter recoveries.

Mean Data param.

Bias setComponent P9M w9Z k9Fe w9P g9 r9D error*

220% I PP 0.70 6 0.02 1.01 6 0.01 1.05 6 0.04 0.55 6 0.03 1.00 6 0.02 0.93 6 0.03 15% 20% I PP 1.27 6 0.04 0.99 6 0.01 0.87 6 0.02 1.45 6 0.06 1.00 6 0.02 0.97 6 0.02 15% 220% I P 1.43 6 0.07 0.90 6 0.02 1.00 6 0.03 1.40 6 0.09 1.33 6 0.06 1.08 6 0.04 22% 20% I P 0.76 6 0.03 1.07 6 0.01 0.97 6 0.03 0.76 6 0.03 0.79 6 0.02 0.88 6 0.02 15% 220% I Z 0.95 6 0.04 1.08 6 0.02 0.93 6 0.03 1.14 6 0.05 1.12 6 0.03 0.93 6 0.03 9% 20% I Z 1.03 6 0.04 0.94 6 0.01 1.05 6 0.03 0.85 6 0.04 0.91 6 0.02 1.06 6 0.03 7% 220% I A 1.18 6 0.05 0.98 6 0.02 1.21 6 0.03 0.97 6 0.05 1.00 6 0.03 0.86 6 0.03 10% 20% I A 0.85 6 0.03 1.01 6 0.02 0.83 6 0.02 0.99 6 0.04 0.98 6 0.02 1.15 6 0.03 9% 220% I N 0.87 6 0.02 0.98 6 0.02 0.86 6 0.02 1.01 6 0.03 0.97 6 0.02 0.80 6 0.02 9% 20% I N 1.10 6 0.12 1.06 6 0.03 1.10 6 0.07 0.98 6 0.08 1.05 6 0.03 1.22 6 0.16 9% 220% I PP, P, Z, A, N 1.02 6 0.05 0.98 6 0.02 1.10 6 0.03 1.00 6 0.06 1.49 6 0.07 0.68 6 0.02 16% 20% I PP, P, Z, A, N 0.96 6 0.03 1.04 6 0.01 0.89 6 0.02 0.99 6 0.03 0.73 6 0.01 1.34 6 0.04 14% 220% III P 1.03 6 0.12 0.92 6 0.05 0.82 6 0.09 1.24 6 0.12 1.40 6 0.09 0.97 6 0.07 16% 20% III P 0.90 6 0.09 1.06 6 0.04 1.01 6 0.11 0.88 6 0.02 0.77 6 0.02 1.18 6 0.05 12%

6 *Meanparameter error iscomputed as: 100/ 6 ¥i5 1 u xi 2 1.0u where the xi are thevalues of thesix scaled control variables. investigatethe potential problems associated with assimilating data containing biased errors, numericalexperiments are conducted in which 120% and 220%biases are added to the PP componentof DSI, whilethe P, Z, A, and N dataare assumed to beerror-free. Experimentsare also carried out in whichthese biases are added individually to theother componentsof DSI ( P, Z, A, N),aswell as to all Ž vedatacomponents simultaneously. In a Ž nalexperiment, biases are added to thesynthetic ocean color observations of DSIII. In eachof these cases, Ž ftynumerical experiments are carried out with different initial randomestimates for themodel control variables. When highly biased data (20%) are assimilated,multiple minima of thecost function can exist; results associated with the most commonlyoccurring minimum are presented. Whenbiases are added to theassimilated data sets, often the model control variables are nolonger perfectly recovered. Because of nonlinearities associated with the model equations,a biasof acertainpercentage typically does not result in errors intherecovered parametervalues of thesame magnitude. However, in general the assimilation of positively andnegatively biased data tends to alterthe values of therecovered parameters by roughly anequal magnitude, but in the opposite direction (Table 7). For example,if the PP componentof DSIisbiasedby 220%,the recovered value of P9M is0.70,i.e. a decreaseof 30%from thetrue value of 1.00.An analogous PP bias of 120%results in an increase in

P9M ofnearlythe same magnitude, such that the Ž nal value of P9M is1.27 (Table 7). Althoughthe assimilation of biased PP dataresults in the imperfect recovery of P9M and w9P by30%and 45% respectively, other parameters, such as those governing zooplankton biomass ( g9 and w9Z),aresuccessfully recovered. If thebiases are added to the P componentof DSIinsteadof tothe PP data,not only are P9M and w9P poorlyrecovered, but the value of g9 isassociated with a 33%error aswell.When biases are added to ammonium ornitratedata, w9Z, w9P, and g9 aresuccessfully recovered; however, values of P9M, k9Fe, and r9D arein error by10 –20%.If biasesare added simultaneously to all Ž vecomponentsof 880 Journalof MarineResearch [59, 6

DSI, g9 and r9D areassociated with errors aslarge as 30 –50%.Overall, the poorest parameterrecoveries occur if biasesare added to phytoplankton data, suggesting that the simulationresults are most highly dependent on thismodel component. Thus it is crucial for observationalprograms to attainhigh quality chlorophyll measurements, since phyto- planktontime series are of centralimportance to attainingaccurate model simulations. Theimperfect parameter recoveries resulting from theassimilation of biaseddata can leadto large model-data mis Ž ts.In general, the magnitudes of these mis Ž tsare linearly dependenton themagnitude of biasadded to the synthetic data (Fig. 7f –j).For example,if a10%bias is addedto the primary production component of DSI andthe resulting time seriesis assimilatedalong with perfect time series of P, Z, A, and N,theresulting value of

^CPP& is0.10. If a20%bias is assumed and the same assimilation experiment is performed, ^CPP& doublesin magnitudeto 0.20 (Fig. 7f). Generally the greatest values of ^CC& occurwhen a biasexists in model component C;however,this is not true for ammonium.The assimilation of biased N dataresults in a poorrecovery of rD,towhich the concentrationof ammonium is particularlysensitive (Table 2). The assimilation of biased PP dataalso strongly affects ammonium concentration, since regenerated production (a primaryterm in the ammonium conservation equations; see the Appendix) is a strong function of PP.Thusbiases in PP and N causejust as large a deteriorationin the simulationskill of A,asdobiases in theammonium data itself. c.Sampling strategies Successfulparameter recoveries are dependent not only on observationalerrors, butalso ontheamount of availabledata. The degree of spatialand temporal averaging of thedata mayalso affect the success of assimilative model runs. Numerical twin experiments are performedin orderto examine the potential magnitude of these effects. i.Data availability. As discussedearlier, if 20%noise is added to DSI (a 20daytime series) priorto assimilation,imperfect model control variables are recovered, and values of

^CC& rangebetween 0.02 and 0.07 (Fig. 7b,d). If, however,only a 6-day(10-day) time seriesof noisycruise data is availablefor assimilation,model-data mis Ž ts are signiŽ cantly higher(Fig. 8a), ranging from ^CP& 5 0.06(0.03)to ^CA& 5 0.13(0.09).If longertime seriesare available for assimilation(e.g. 40 days),the deterioration in simulation skill due tothe addition of randomnoise is substantiallyreduced, with ^CC& , 0.05for allmodel components.Thus, when random noise is present in the data, the increase in time series lengthfrom 6daysto 40dayscan provide a large(70%) improvement in simulationskill. Alargepercentage of this improvement occurs if 10daysrather than 6 daysof data are assimilated.This is because of thestrong 6 – 8dayvariability of the equatorially trapped internalgravity waves (FH01). Furtherimprovement results if thestrong tropical instabil- itywave signal (20 –30day)is resolved by assimilatingat least40 daysof data. If biasederrors areindividually added to each component of DSI, anincrease in the numberof days of data that are assimilated produces little change in simulation skill 2001] Friedrichs:Data assimilative marine ecosystem model 881

Figure8. Model-data mis Ž ts, ^CC &,asafunctionof thelength of theassimilated synthetic cruise

datatime series. In (a) ^CC & iscomputed after 20% random noise is added simultaneously to PP,

P, Z, A, and N,andin(b)each ^CC & iscomputed after 20% biasis addedonly to component C of thesynthetic cruise data time series.

(Fig.8b). However, simulation skill can depend on theparticular days on whichdata are assimilated.For example,the relatively large value of ^CA& associatedwith the 10-day timeseries as comparedto the 6-day time series (Fig. 8b), is reduced if datafrom YD286to YD296are assimilated in placeof datafrom YD276to YD286. This is primarily because ofthelargepeaks in P, Z, and A concentrationsoccurring between YD282 –285(FH01);in contrast,concentrations during YD286 –296are more stable, resulting in lower assimila- tionerrors. Asimilarset of twin experiments is alsoconducted for oceancolor data (Fig. 9). If no randomnoise is addedto the synthetic data, a 90-daytime series supplemented by twodays of in situ dataprovides suf Ž cientinformation such that the control variables are recovered exactly;however, if 20% random noise is present in the 90-day assimilated time series, imperfectvalues of the control variables are recovered, resulting in model-data mis Ž ts

Figure9. Model-data mis Ž ts, ^CC &,asafunctionof the length of theassimilated synthetic satellite oceancolor time series. In (a)20% random noise is added, and in (b) 20% bias is added to the syntheticdata. 882 Journalof MarineResearch [59, 6

Table8. Average mis Ž ts (^Cc & p 100)asa functionof cruiselength and sampling strategy of DSI. Resultsrepresent means and standard errors of tenrealizations assuming a randomnoise level of 20%. Length of in situ

cruisedata set Sampling strategy ^CP P & ^CP & ^CZ & ^CA & ^CN & 20 days PP, P, Z, A, N every 4.1 6 0.4 3.2 6 0.2 5.7 6 0.4 6.7 6 0.2 3.5 6 0.3 other day 20 days PP, P, Z, A, N every 2.8 6 0.2 2.4 6 0.2 3.4 6 0.3 5.1 6 0.3 2.2 6 0.2 day 10 days PP, P, Z, A, N every 3.5 6 0.3 3.2 6 0.3 4.6 6 0.4 7.0 6 0.3 3.3 6 0.3 day 20 days PP, P, Z everyother day 3.7 6 0.3 3.0 6 0.2 5.1 6 0.4 5.7 6 0.2 2.4 6 0.2 A, N every day

rangingbetween ^CA& 5 0.17 to ^CP& 5 0.02(Fig.9a). As theassimilated time series is lengthened,values of ^CC& decrease.Because there is signi Ž cantenergy in the temperature forcing Ž eldsat aperiodof 10 –12months,a dramaticdecrease in model-data mis Ž t results ifa yearof oceancolor data are assimilated rather than six months. There is no additional improvementin simulation skill if morethan two years of dataare assimilated. Incontrast, if biased ocean color data are assimilated, there is no improvement in simulationskill as longer and longer time series become available (Fig. 8b). If the assimilatedtime series is lengthened from threemonths to three years, model-data mis Ž ts for primaryproduction and zooplankton more than double and mis Ž tsfor ammonium increaseby 30%.The simulation skill of phytoplanktonis nearlyindependent of how many daysof dataare assimilated, and is alwayspoorer than the a priori (nonassimilative)case

(^RP& 5 0.15).Althoughmodel-data mis Ž tsfor nitrateare slightly reduced as more biased oceancolor data are assimilated, in general assimilating more biaseddata does not necessarilyimprove simulation skill, and may even increase model-datamis Ž ts. Numericaltwin experiments are also used to examine the effects of altering the frequencyat whichdata are assimilated. For instance,if cruise data are assimilated every dayas opposed to everyother day, simulation skill is improvedby 25% – 40%(Table 8). If thetotal number of observations is held constant, but data are available every day for 10daysas opposed to every other day for 20days,there is no signi Ž cantchange in simulationskill, with the possible exception of zooplankton(4.6 6 0.4versus 5.7 6 0.4). Duringthe JGOFS EqPacexperiment, nutrient data were generallycollected at least once a day,whereas primary production and plankton data were obtainedonce every other day. Thishigher frequency of nutrientdata collection improves the simulation skill of A (15%) and N (31%)as would be expected;however, this increased nutrient sampling frequency doesnot signi Ž cantlyaffect the simulation skill of PP, P, or Z (Table 8). Acombinationof satelliteposition and cloud cover causes SeaWiFS oceancolor data in thecentral equatorial Paci Ž ctobe available about 50% of the time; i.e., approximately everyother day. Twin experiments demonstrate that assimilating ocean color data every 2001] Friedrichs:Data assimilative marine ecosystem model 883

Figure10. Model-data mis Ž ts, ^CC &,asa functionof the frequency for which the assimilated satelliteocean color data are available. In each case 20% random noise has been added to a two-yearlong synthetic data time series. day,as opposed to everyother day, does not reduce model-data mis Ž t;however, if dataare assimilatedevery 4 daysinstead of every2 days,a largedeterioration in simulationskill occursfor mostmodel components (Fig. 10). ii.Assimilating SeaWiFS composites. Oceancolor satellite data are typically available as a timeseries of composites,where each composite is computed as the average of all data fallingwithin a speci Ž edspatial and temporal range. SeaWiFS data,for example,are routinelyavailable as 8-day and monthly composites. In order to investigatethe effect of assimilatingcomposited data, as opposed to instantaneous (daily) data, four different syntheticocean color time series are assimilated (each in addition to two days of cruise data):(i) instantaneousdata once every 2 days(i.e. DSIII), (ii)instantaneous data once every8 days,(iii) 8-day composites, and (iv) 4-day composites once every 8 days.Note thattime series (ii) –(iv)all contain the same number of observations,whereas four times as manyobservations are available in DSIII (i). Assumingno random noise nor biasesare present, the assimilation of instantaneous data everytwo days or everyeight days results in perfect parameter recoveries and ^CC& 5 0

(Fig.11); however, if compositesare assimilated in place of theinstantaneous data, ^CC& mayreach values as high as 0.15 even in the absence of any random noise or bias (Fig.11d). Because random errors arepartially averaged out in the formation of the compositedtime series, the addition of random noise to the synthetic ocean color data causesa largerdeterioration in simulation skill for theinstantaneous data time series, than for thecomposited data time series. For example,when random noise is added to the instantaneousdata time series and the resulting data set is assimilated,model-data mis Ž ts increasesubstantially: values of ^CA& increasesharply from 0.00to 0.16,and areno longer signiŽ cantlydifferent from theanalogous values of ^CA& obtainedusing the composited data(Fig. 11d). In the presence of even higher noise levels (40%), the composited and 884 Journalof MarineResearch [59, 6

Figure11. Model-data mis Ž ts (a) ^CP P &, (b) ^CP &, (c) ^CZ &, (d) ^CA &, and (e) ^CN & obtainedvia theassimilation of 180-daytime series of compositedand instantaneous (daily) ocean color data, shownas afunctionof thepercent random noise added to the synthetic data. Model-data mis Ž ts

obtainedwithout data assimilation, ^RC & (dottedlines), are shown for reference.

instantaneousassimilation results also become indistinguishable for ^CPP&, ^CZ& and

^CN&.Thus,even though the assimilation of error-free instantaneousdata is more successfulthan the assimilation of error-free compositeddata, the assimilation of noisy compositeddata results in nearly as high simulation skill as the assimilation of noisy instantaneousdata.

4.Discussion and summary Althoughuseful for manymeteorological applications, simple data assimilation schemes suchas data insertion may be insuf Ž cientfor marineecosystem models. Meteorological 2001] Friedrichs:Data assimilative marine ecosystem model 885 modelsare typically highly dependent on initial conditions, and thus simple assimilation schemeswhich involve reinitializing the model whenever data become available can be quitesuccessful in weather forecasting. Partially because of the short time scales of the dominantbiological processes, lower trophic level marine ecosystem models quickly (withindays) become independent of component initial conditions. As aresult,in the numericaltwin experiments conducted in this analysis, the assimilatio nandno- assimilationtime series of planktonconcentrations are indistinguishabl eonlyseven days afterthe last ocean color data point is inserted. Theseresults are consistent with those of Ishizaka(1990), who found that the insertion ofsatellite ocean color data into a physical-biologicalmodel of the southeastern U.S. continentalshelf caused the simulatedphytoplankton Ž eldto rapidly(within four days after thelast assimilated observation) converge to the analogous Ž eldsobtained without assimilation.When biological processes were removedfrom themodel, the assimilation andnonassimilation cases still converged within several days, suggesting that this rapid convergencewas atleast partially due to the short time scales of the physical processes associatedwith the Gulf Stream system. Similarly, in the present study equatorially- trappedinternal gravity waves with periods of 6 – 8days(FH01; Wunsch and Gill, 1976) maybe effectingthe rapid convergence of theassimilation/ nonassimilationtime series. Thesuccess of data insertion thus depends on biological (plankton) and chemical (nutrient)observations being available at veryfrequent intervals ( O(days)).The inception ofremote-sensingocean color sensors, such as SeaWiFS, now makes it possibleto obtain surfacechlorophyll data on these time scales. However, numerical experiment results (Fig.3) indicate that successful simulations will require high frequency data on other modelcomponents as well.Currently there are no analogoushigh frequency measurements ofzooplankton available on a globalscale, and synoptic time series observations of nutrientconcentrations such as ammonium and nitrate are rare. Themost serious disadvantage of applying schemes such as data insertion to marine ecosystemmodels, is thatthese types of methodsassume an adequate knowledge of not onlythe model parameterizations, but also the model parameters. Ecosystem models are typicallycharacterized by multipleempirical formulations for whichparameter values are poorlyknown, and often unmeasurable. Simulations are often highly dependent on mortalityrates, but these rates are dif Ž cultor impossibleto measure and therefore poorly constrained.If inappropriatevalues are assigned to theseparameters, simulation skill may bepoor,even if dataare inserted every other day (Fig. 3). Furthermore, it may be dif Ž cult todetermine whether a failedsimulation results from aninappropriate choice of parameter values,or whetherthere is anintrinsic error inmodel formulation. For example,nudging the Fasham et al. (1990)model toward ocean color observations causes ammonium concentrationsto reach unrealistically high values (Armstrong et al., 1995).Although alteringthe structure of themodel to includemultiple plankton size classes is oneway to eliminatethis pathological behavior (Armstrong et al., 1995),it ispossiblethat alternate 886 Journalof MarineResearch [59, 6 choicesof parameters in the original Fasham et al. (1990)model also may reduce ammoniumconcentrations to realisticlevels. Althoughthe variational adjoint method assumes that the model parameterizations are correct,speci Ž cvaluesfor eachof themodel parameters do notneed to bepreciselyknown asthey do for datainsertion. Results of identicaltwin experiments indicate that the adjoint method,which minimizes model-data mis Ž tsby adjustingmodel parameters, holds much promisefor theassimilationof biologicaldata into marine ecosystem models. For example, exactparameter recoveries are possible when assimilating synthetic data subsampled using theresolution of the JGOFS EqPacTime Series cruises (Fig. 4b). Exact parameter recoveriesare also possible when synthetic ocean color data are assimilated along with a smallamount (2 days) of nutrient, plankton and production data (Fig. 4d). Similarly promisingresults are also obtained when actual cruise and ocean color data at this resolutionare assimilated into the FH01 ecosystem model using the variational adjoint method(Friedrichs, 2001). However, because it ispossiblethat recovered parameter values maybe offsetting either other parameter values that have remained Ž xedor inaccurate parameterizations,one may think of thisparameter optimization procedure as a ‘calibra- tion’ ofthemodel, rather than an accuratedetermination of actualbiological rate constants. Thereal utility of these simulations is not necessarily in identifying precise values of speciŽ cecosystemparameters, but rather in knowing that the model has been calibrated to bereasonablyconsistent with the available data. As aresult,we havegreater faith in the otherresults and processes described by themodel. Inthe identical twin experiments described in thispaper, data are generated by themodel andthus the model and data are, by de Ž nition,entirely consistent. In reality, however, both observationalerrors inprecision and accuracy as well as de Ž cienciesin the model may causesimulated time series distributions to be inconsistentwith observations. When this inconsistencyis arti Ž ciallyreproduced by addingrandom noise to the synthetic data prior toassimilation, the adjoint method still improves simulation skill (Figs. 6, 7a – e). The presenceof biases in the data, however, is more detrimental to the assimilation results (Fig. 7f–j).The addition of 20%bias to thephytoplankton component of the in situ cruise datacauses errors ofupto43%in thevalues of therecovered control variables (Table 7), andhigh model-data mis Ž tswhich exceed the a priori valuescomputed without data assimilation(Fig. 7g). In contrast, the addition of 20%random noise to thesame data set produceserrors inthe recovered control variables of only3 –12%(Table 5), and further- moreeven when phytoplankton data associated with 40% random noise are assimilated, model-datamis Ž tsarestill signi Ž cantlyreduced (Fig. 7b). Biasederrors arealso particularly troublesome since increasing the duration over which biaseddata are available generally does not improve simulation skill (Figs. 8b, 9b). In fact, whenassimilating biased ocean color data, increasing time series length from 6monthsto 3yearscauses a signi Ž cantincrease in model-data mis Ž t(Fig.9b). In contrast, when noisy dataare assimilated, increasing the duration of the data time series improves simulation skill.Speci Ž cally,the largestimprovement occurs if the cruisedata time series is increased 2001] Friedrichs:Data assimilative marine ecosystem model 887 from 6daysto 10days(Fig. 8a), or theavailable satellite ocean color data time series is increasedfrom 6monthsto one year (Fig. 9a). This is because the time series of chlorophyllconcentration has signi Ž cantenergy at periodsranging between 6 – 8 days and 6 –12months.In order to resolve high frequency processes such as equatorially-trapped internalgravity waves (FH01) andseasonal to annualvariability, at least10 daysof cruise dataand at least one year of oceancolor data must be assimilated,respectively. Duringthe JGOFS EqPaccruises, plankton data were collectedevery other day for roughly20 days;however, numerical twin experiments demonstrate that assimilating 10daysof datacollected every day instead of 20 daysof datacollected every other day doesnot affect simulation skill (Table 8). Therefore, if itispossible to collect data on a dailybasis as opposed to every other day, the length (and expense) of acruisecould be reducedor even halved. On the contrary, there appears to be no additional reduction in model-datamis Ž tifocean color data are assimilated every day as opposed to every other day.There is, however, a signi Ž cantdeterioration in simulationskill if dataare available onlyevery 4 days(Fig. 10). Thus, although typical cloud cover in the equatorial Paci Ž c maynot affect assimilation results, unusually cloudy weather associated with El Nin ˜o conditionsmay cause remotely-sensed ocean color data to be unavailable for daysat a time,and hence simulation skill may be substantiallyreduced. SeaWiFS dataare typically available as 8-day and monthly composites; however, becausethe dominant time scales of marineecosystems are typically on theorder of days, theformation of theseocean color composites may produce time series that are inconsis- tentwith biological observations. For example,a spectralanalysis of thesynthetic ocean colordata reveals a spikein energy corresponding to a periodof approximately 6 days, associatedwith equatorially trapped internal gravity waves (FH01). Inthe formation of an 8-daydata composite, this energy spike is smearedout, causing an inconsistencybetween themodel and composited data. As aresult,the assimilation of thistemporally-averaged dataleads to relatively poor parameter recoveries. In contrast, 4-day composites at least partiallyresolve the internalgravity waves, and model-data mis Ž tsobtainedafter assimilat- ing4-day composite time series are typically less than half of those obtained for the8-day compositeassimilation (Fig. 11). If randomnoise is present in the data, however, the formationof thedata composite will average out much of this noise, resulting in similar resultsfor theassimilation of both the composited and instantaneous data (Fig. 11). Becausereal data are typically associated with signi Ž cantrandom noise, it is anticipated thatwhen actual observations are used (Friedrichs, 2001), the assimilation of composited datawill be equallyas successful as theassimilation of instantaneousdata.

5.Future challenges Thisanalysis demonstrates that the variational adjoint method provides a successful meansfor combiningbiological and chemical oceanographic observations with marine ecosystemmodels, in order to improve simulation skill. Through the assimilation of syntheticcruise data based on theJGOFS EqPacexperiment as well as synthetic SeaWiFS 888 Journalof MarineResearch [59, 6 oceancolor data, it is possibleto recoverparameters governing processes such as grazing, growth,mortality and recycling. However, because high correlations exist between many modelparameters, it is typically not possible to recover the entire parameter set with any degreeof certainty(Matear, 1995; Prunet et al., 1996a,b;Harmon and Challenor, 1997). Therefore,in thisanalysis the controlvariables consist of asubsetof the model parameters. Thisresults in the successful recovery of the control variables with low levels of uncertainty.A morerobust and objective method would involve nondimensionalizi ngthe modelto obtain a parameterset containing a smallernumber of uncorrelatedparameters whichcould then be successfullyrecovered in itsentirety. The feasibility of thisapproach iscurrently under investigation. Inthe simple formulation of the cost function used in this analysis, there are no limitationson the values that the control variables may attain; i.e., there are no a priori penaltiesplaced on theseparameters. Although the initial parameter estimates are always withinthe uncertainty ranges of Table3, thereis noguaranteethat the recovered optimal valueswill fall within these ranges. In certain experiments when very noisy observations areassimilated, seriously  awedparameter sets with errors exceeding40% may be recovered.Despite these large errors, incertaincases the simulated time series generated from these  awedparameter sets may still agree reasonably well with the true time series. Inthese cases the model is trackingthe data by misrepresentingkey biological processes; becausethe amount of dataassimilated in theseexperiments is small relative to thenumber ofdegreesof freedom,the model is poorly constrained and it maybe possiblefor themodel toreproduce the data without representing the correct dynamical processes. Thus when actualobservations are assimilated (Friedrichs, 2001), it isimportant to be ableto de Ž ne reasonableerror boundson each of the model parameters, and either discard parameter estimatesthat fall outside these ranges or add penalties to the cost function to eliminate unrealisticparameter estimates. As biologicaldata sets, including in situ, satellite,and acoustic sources, continue to grow,data assimilative biological-physica lmodelswill play an increasingly crucial role in largemultidisciplinary oceanographic observational programs. The research described hereprovides a framework for futureglobal and basin-scale studies of biological data assimilationand predictive biogeochemical modeling; however there are still many questionswhich need to beaddressedbefore biological-physical models are routinely data assimilative.For instance,an optimal method for assimilatingboth physical and biological datasimultaneously is stillunder investigation. In this paper, the advantages and disadvan- tagesof datainsertion and the variational adjoint method were discussedindependently; however,it ispossiblethat these methods could be usedas complimentary tools, where the adjointcould Ž rst beused to optimize the parameter values, and then nudging could be usedto bringthe simulation into even closer agreement with the data, if desired.Although theadjoint method is shownto be averypowerful technique for improvingsimulation skill throughthe combined use of dataand models, it is not yet clear whether it is feasibleto 2001] Friedrichs:Data assimilative marine ecosystem model 889 applythis method to a fullythree-dimensional coupled physical-biologica lmodel.Work towardthis goal is currently underway.

Acknowledgments. Iwouldlike to thank Eileen Hofmann for the many helpful discussions and suggestionsshe providedthroughout this work. The insightful comments of Paola Malanotte-Rizzoli andan anonymous reviewer are also greatly appreciated. The TAO ProjectOf Ž ce,Michael J. Mc- Phaden,Director, is thanked for the use of the extensive data available from their mooring array. Supportfor this work was provided by the National Aeronautics and Space Administration under grantsNAGW-3550 and NCC-5-258.Computer resources and facilities were provided by theCenter forCoastal Physical Oceanography at Old Dominion University. This is U.S. JGOFS Contribution #699.

APPENDIX Ecosystemmodel equations Each of the Ž vemodel components (phytoplankton ( P),zooplankton( Z), ammonium ( A), nitrate (N),anddetritus ( D)) satisŽ esageneralconservation equation of theform:

]C~t! C~t!u z52E d~2E~t!! 5 F ~t! 1 B ~t! 2 (A1) ]t C C E~t! dt wherethe overbarnotation indicates the vertical average of aquantityfrom thedepth of the bottomof the euphotic zone (computed daily as the0.1% light level; 2E(t))tothesurface.

Thequantities FC and BC denotethe physical and biologicalprocesses, respectively, which affectthe concentration of component C, and the Ž nalterm of Eq. (A1) arisesvia the Liebnizrule, since the lower integral limit ( E(t))isafunctionof time(see FH01,Eq. (2)). Thevertical pro Ž les of P, Z, A, N and D are speciŽ ed,based on observations from the JGOFS EqPacexperiment (see FH01,Fig. 6). Thephysical processes (advection and sinking) affecting the concentrations of each of themodel components, can be writtenmathematically as:

1 0 ]P F 5 2 w dz (A2) P E~t! E ]z 2E~t!

1 0 ]Z F 5 2 ~w 1 w ! dz (A3) Z E~t! E Z ]z 2E~t!

1 0 ]N ]N ]N F 5 2 u 1 v 1 w dz (A4) N E~t! E X ]x ]y ]z D 2E~t!

1 0 ]A F 5 2 w dz (A5) A E~t! E ]z 2E~t! 890 Journalof MarineResearch [59, 6

1 0 ]D F 5 2 ~w 1 w ! dz (A6) D E~t! E D ]z 2E~t! where wZ and wD representthe sinkingrates of zooplanktonand , respectively. The variousbiological pathways for nitrogencycling in the model ecosystem (Fig. 1) are describedby thefollowing equations:

BP 5 PP~Fe, I, P# ! 2 G# ~P# , Z# ! 2 wPP# (A7)

BZ 5 gG# ~P# , Z# ! 2 wZZ# (A8)

BN 5 2NP~Fe, I, P# , A# ! (A9)

BA 5 2RP~Fe, I, P# , A# ! 1 bwZZ# 1 rDD# (A10)

BD 5 wPP# 1 ~1 2 b!wZZ# 1 ~1 2 g!G# ~P# , Z# ! 2 rDD# (A11) wherethe values and de Ž nitionsof speci Ž cparametersare given in Table 1. A brief overviewof thesevarious processes will now be given, but the reader is referred toFH01 for amorethorough discussion of themodel formulations. Thebiological processes affecting phytoplankton biomass include natural mortality

(wPP# ),grazingby zooplankton( G# ):

G# ~P# , Z# ! 5 gP# Z# L~1 2 e2P# L! (A12) andprimary production ( PP).Becauserecent results of in situ ironadditions to macronu- trient-richequatorial duringthe IronEx cruises have unequivocally established that theavailability of ironlimits the division rates and abundance of phytoplankton(Frost, 1996;Behrenfeld et al., 1996),iron limitation is assumed a priori. Thus,instead of re-testing theiron hypothesis, the novel approach of assuming ironlimitation and examiningthe implications for marineecosystem structure at 0N, 140W is taken. Until moreinformation becomes available on ironchemistry and thespeci Ž crolesvarious forms ofironplay in equatorialecosystems, an explicit iron model is not justi Ž ed.Therefore, in thisstudy iron concentration is not included as aseparatemodel component, but rather it is includedin the growth term where iron limitation is speci Ž ed.Primary production is modeled as:

1 0 Fe~z! 1 PP 5 P~z! P ~1 2 e2Ia/Pmx!dtdz (A13) E Fe z 1 k mx E ~ ! Fe E 2E 0 where Pmx(t) 5 PM sin (2p(t 2 0.25))for 0.25 , t , 0.75, and Pmx(t) 5 0otherwise, andthe photosynthetical lyactive portion of the visible light spectrum (PAR) iscomputed as: 2001] Friedrichs:Data assimilative marine ecosystem model 891

0 for t , 0.25 or t . 0.75 I~t, z! 5 I0~1 2 Cs! sin ~2p~t 2 0.25!!R~z! for 0.25 , t , 0.75.

Inthese formulations, t representsnondimensionalized time ( t 5 t/ 24 h, where t 5 [0,

24 h]), I0 istheclear sky value of PAR, Cs isfractioncloud cover, and R( z)isthe depth dependentattenuation as describedin Appendix B ofFH01.Iron concentration ( Fe( z)) is determinedfrom abilinearapproximation to the empirical formulation of Gordon et al.

(1997): Fe( z) 5 MAX [Femin, Feu z5 2 100 2 mFe( z 1 100)],whereiron concentrations at100 m depth( Feu z5 2 100)arecomputed from a Fe:T regression. A fraction (g)ofthe grazed phytoplankton is assumedto representzooplankton growth from assimilatedingestion, and is partially offset by a generalizedloss term ( wZZ# ). A fraction (b)ofthis loss term is attributed to zooplankton , and the remainder representslosses such as natural mortality and predation. Primary production is supported bybothnitrate uptake (new production; NP)andammonium uptake (regenerated produc- tion; RP).Recentlya strongsuppressing effect of ammoniumon nitrateuptake rates has beenobserved in the central equatorial Paci Ž c (McCarthy et al., 1996).Therefore, if ammoniumconcentrations are high enough to provideenough to fuelall primary production,then RP 5 PP.Otherwise,

A RP 5 mmx P# . (A14) kA 1 A where mmx isthe iron-saturated growth rate. New production( NP) is deŽ ned as that portionof total primary production that is supported by nitrate uptake, instead of by ammoniumutilization, i.e. NP 5 PP 2 RP. Otherbiological processes affecting the ammonium balance include zooplankton excretionand recycling via the detrital pool ( rDD# ), where rD representsa generalizedrate atwhich detritus is recycled into ammonium. Biological processes affecting the detrital poolinclude contributions from phytoplanktonand zooplankton mortality, unassimilated grazing,and the loss of detritalnitrogen to ammonium nitrogen via recycling.

REFERENCES Abbott,M. R.1992.Workshop on modelingand satellite data assimilation. U.S. JGOFS Planning ReportNumber 14, WHOI, WoodsHole, MA. Aikman,F., G. L.Mellor,T. Ezer,D. Sheinin,P. Chen,L. Breakerand D.B.Rao.1996. Towards an operationalnowcast/ forecastsystem for the U.S. EastCoast, in ModernApproaches to Data Assimilationin Ocean Modeling, P. Malanotte-Rizzoli,ed., Elsevier Oceanography Series, 61, Amsterdam,The Netherlands, 347 –376. Armstrong,R. A.,J.L.Sarmientoand R. D.Slater.1995. Monitoring ocean by assimilatingsatellite chlorophyll into ecosystem models, in EcologicalTime Series, T. Powelland J.Steele,eds., Chapman and Hall, NY, 371 –390. Barth,N. andC. Wunsch.1990. Oceanographic experiment design by simulatedannealing. J. Phys. Oceanogr., 20, 1249–1263. Behrenfeld,M. J.,A.J.Bale,Z. S.Kolber,J. Aikenand P. G.Falkowski.1996. Con Ž rmationof iron limitationof phytoplanktonphotosynthesis in theequatorial Paci Ž cOcean.Nature, 383, 508–511. 892 Journalof MarineResearch [59, 6

Bennett,A. F. 1992.Inverse Methods in Physical Oceanography, Cambridge Univ. Press, NY, 346 pp. Carnes,M. R.,D.N.Fox,R. C.Rhodesand O. M.Smedstad.1996. Data assimilation in a North PaciŽ cOceanmonitoring and prediction system, in ModernApproaches to Data Assimilation in OceanModeling, P. Malanotte-Rizzoli,ed., Elsevier Oceanography Series, 61, Amsterdam,The Netherlands,319 –346. DeMaria,M. andR. W.Jones.1993. Optimization of a hurricanetrack forecast model with the adjointmodel equations. Mon. Weather Rev., 121, 1730–1745. Evans,G. T.1999.The role of local models and data sets in the Joint Global Ocean Flux Study. Deep-SeaRes. I, 46, 1369–1389. Evensen,G. 1992.Using the extended Kalman Ž lterwith a multilayerquasi-geostrophic ocean model.J. Geophys.Res., 97, 17,905–17,924. Fasham,M., H.W.Ducklowand S. M.McKelvie.1990. A nitrogen-basedmodel of plankton dynamicsin the oceanic mixed layer. J. Mar.Res., 48, 591–639. Fasham,M. andG. T.Evans.1995. The use of optimizationtechniques to model marine ecosystem dynamicsat the JGOFS stationat 47 °N 20°W.Phil.Trans. Royal Soc. London B, 348, 203–209. Fasham,M. J.R.,J.L.Sarmiento,R. D.Slater,H. W.Ducklowand R. Williams.1993. Ecosystem behaviorat Bermuda Station “S” andOcean Weather Station “India”:ageneralcirculation model andobservationalanalysis. Global Biogeochem. Cycles, 7, 379–415. Friedrichs,M. A.M.1999.Physical control of biologicalprocesses in the central equatorial Paci Ž c: Adataassimilative modeling study. Ph.D. Thesis, Old Dominion University, Norfolk, VA, 195pp. 2001.Assimilation of JGOFS EqPacand SeaWiFS datainto a marineecosystem model of the centralequatorial Paci Ž cOcean.Deep-Sea Res. II, 49, 289–319. Friedrichs,M. A.M.andE. E.Hofmann.2001. Physical control of biologicalprocesses in thecentral equatorialPaci Ž c.Deep-SeaRes. I, 48, 1023–1069. Frost,B. W.1996.Phytoplankton bloom on iron rations. Nature, 383, 475–476. GhilM. andP. Malanotte-Rizzoli.1991. Data assimilation in meteorologyand oceanography. Adv. Geophys., 33, 151–266. Gilbert,J. C.andC. Lemarechal.1989. Some numerical experiments with variable-storage quasi- newtonalgorithms. Math. Program., 45, 405–435. Gordon,R. M.,K. H.Coaleand K. S.Johnson.1997. Iron distributions in the equatorial Paci Ž c: Implicationsfor new production. Limnol. Oceanogr., 42, 419–431. Gunson,J. R.andP. Malanotte-Rizzoli.1996a. Assimilation studies of open-ocean  ows, 1, Estimationof initial and boundary conditions. J. Geophys.Res., 101, 28457–28472. 1996b.Assimilation studies of open-ocean  ows,2, Error measures with strongly nonlinear dynamics.J. Geophys.Res., 101, 28473–28488. Gunson,J. R.,A.Oschliesand V. Garcon.1999. Sensitivity of ecosystem parameters to simulated satelliteocean color data using a coupledphysical-biologica lmodelof theNorth Atlantic. J. Mar. Res., 57, 613–639. Harmon,R. andP. Challenor.1997. A MarkovChain Monte Carlo method for estimation and assimilationinto models. Ecol. Model., 101, 41–59. Hofmann,E. E.andM. A.M.Friedrichs.2001a. Predictive modeling for marineecosystem models, in TheSea: Volume 12: Biological-Physical Interactions in the Ocean, A. R.Robinson,J. R. McCarthyand B. J.Rothschild,eds., John Wiley & Sons,640 pp. 2001b.Models: biogeochemical data assimilation, in Encyclopediaof Ocean Sciences, Vol. 1, M.J.R.Fasham,J. H.Steele,S. Thorpeand K.Turekian,eds., Academic Press, London, 302 –308. Holland,W. R.andA. D.Hirschman.1972. A numericalcalculation of thecirculation in the North AtlanticOcean. J. Phys.Oceanogr., 2, 336–354. 2001] Friedrichs:Data assimilative marine ecosystem model 893

Holland,W. R.andP. Malanotte-Rizzoli.1989. Assimilation of altimeter data into an ocean circulationmodel: space versus time resolution studies. J. Phys.Oceanogr., 19, 1507–1534. Hurtt,G. C.andR. A.Armstrong.1996. A pelagicecosystem model calibrated with BATS data. Deep-SeaRes. II, 43, 625–651. 1999.A pelagicecosystem model calibrated with BATS andOWS Idata.Deep-Sea Res. I, 46, 27–61. Ishizaka,J. 1990.Coupling of CoastalZone Color Scanner data to a physical-biologicalmodelof the southeasternU.S. continentalshelf ecosystem 3. Nutrient and phytoplankton  uxesand CZCS dataassimilation. J. Geophys.Res., 95, 20201–20212. Kruger,J. 1993.Simulated annealing: a toolfor data assimilation into an almoststeady model state. J. Phys.Oceanogr., 23, 679–688. Landry,M. R.,J. Constantinouand J. Kirshtein.1995. Microzooplankton grazing in the central equatorialPaci Ž cduringFebruary and August,1992. Deep-Sea Res. II, 42, 657–671. Lardner,R. W.andS. K.Das.1994. Optimal estimation of eddy viscosity for a quasi-three- dimensionalnumerical tidal and storm surge model. Int. J. Numer.Methods Fluids, 18, 295–312. Lawson,L. M.,Y.H.Spitz,E. E.Hofmannand R. B.Long.1995. A dataassimilation technique appliedto apredator-preymodel. Bull. Math. Biol., 57, 593–617. Lawson,L. M.,E.E.Hofmannand Y.H.Spitz.1996. Time series sampling and data assimilation in a simplemarine ecosystem model. Deep-Sea Res. II, 43, 625– 651. Leonard,C. L.,C. R.McClain,R. Murtugudde,E. E.Hofmannand L. W.Harding.1999. An iron-basedecosystem model of the central equatorial Paci Ž c.J.Geophys.Res., 104, 1325–1341. Malanotte-Rizzoli,P. andE. Tziperman.1996. The oceanographic data assimilation problem: overview,motivation and purposes, in ModernApproaches to Data Assimilation in Ocean Modeling,P. Malanotte-Rizzoli,ed., Elsevier Oceanography Series, 61, Amsterdam,The Nether- lands, 3–17. Malanotte-Rizzoli,P. andR. E.Young.1995. Assimilation of global versus local data sets into a regionalmodel of the Gulf Stream system 1. Data effectiveness. J. Geophys.Res., 100, 24773–24796. Matear,R. J.1995.Parameter optimization and analysis of ecosystem models using simulated annealing:A casestudy at StationP. J.Mar.Res., 53, 571–607. McCarthy,J. J.,C.Garside,J. L.Nevinsand R.T.Barber.1996. New productionalong 140 °W in the equatorialPaci Ž cduringand followingthe 1992 El Nin˜oevent.Deep-Sea Res. II, 43, 1065–1093. McClain,C. R.,J.L.Cleave,G. C.Feldman,W. W.Gregg,S. B.Hookerand N. Kuring.1998. Sciencequality SeaWiFS datafor global biosphere research. Sea Tech., 39, 10 –15. McClain,C. R.,R. Murtuguddeand S. Signorini.1999. A simulationof biologicalprocesses in the equatorialPaci Ž cwarmpool at 165 °E.J.Geophys.Res., 104, 18305–18322. McGillicuddy,D. J.,D.R.Lynch,A. M.Moore,W. C.Gentleman,C. S.Davisand C. J.Meise.1998. Anadjoint data assimilation approach to diagnosis of physical and biological controls on Pseudocalanus spp.in the Gulf of Maine-GeorgesBank Region. Fisheries Oceanogr., 7, 205–218. McPhaden,M. J.,A.J.Busalacchi,R. Cheney,J.-R. Donguy, K. S.Gage,D. Halpern,M. Ji, P.Julian,G. Meyers,G. T.Mitchum,P. P.Niiler,J. Picaut,R. W.Reynolds,N. Smithand K.Takeuchi.1998. The Tropical Ocean Global Atmosphere (TOGA) observingsystem: A decade ofprogress.J. Geophys.Res., 103, 14169–14240. Moisan,J. R.andE. E. Hofmann.1996. Modeling nutrient and plankton processes in theCalifornia coastaltransition zone. 2. A three-dimensionalphysical-bio-optic almodel.J. Geophys.Res., 101, 22,677–22,691. Murray,J. W.,R.T.Barber,M. R.Roman,M. P.Baconand R. A.Feely.1994. Physical and biologicalcontrols on carbon cycling in the equatorial Paci Ž c. Science, 266, 58–65. 894 Journalof MarineResearch [59, 6

Murray,J. W.,E.Johnsonand C. Garside.1995. A U.S. JGOFS processstudy in the equatorial PaciŽ c(EqPac):Introduction. Deep-Sea Res. II, 42, 275–293. Najjar,R. G.,J. L.Sarmientoand J. R.Toggweiler.1992. Downward transport and fate of organic matterin theocean: Simulations with a generalcirculation model. Global Biogeochem. Cycles, 6, 45–76. Peloquin,R. A.1992.The navy ocean modeling and prediction program. Oceanography, 1, 4 –8. Prunet,P., J. F. Minster,V. Echevinand I. Dadou.1996a. Assimilation of surface data in a one-dimensionalphysical-biogeoch emicalmodel of the surface ocean 2. adjustinga simpletrophic

modelto chlorophyll, temperature, nitrate, and pCO 2 data.Global Biogeochem. Cycles, 10, 139–158. Prunet,P., J. F.Minster,D. Ruiz-Pinoand I. Dadou.1996b. Assimilation of surface data in a one-dimensionalphysical-biogeoc hemicalmodel of thesurface ocean 1. methodand preliminary results.Global Biogeochem. Cycles, 10, 111–138. Sarmiento,J. L.andR. A.Armstrong.1997. U.S. JGOFS Synthesisand Modeling Project ImplementationPlan: The role of Oceanic Processes in the Global Carbon Cycle, U.S. JGOFS Planningand CoordinationOf Ž ce,WHOI, WoodsHole, MA. Sarmiento,J. L. andK. Bryan.1982. An ocean transport model for the North Atlantic. J. Geophys. Res., 87, 394– 408. Sarmiento,J., B.Frostand J. Wroblewski.1987. Modeling in GOFS, U.S. GOFS PlanningReport Number4, WHOI, WoodsHole, MA. Seiler,U. 1993.Estimation of openboundary conditions with the adjoint method. J. Geophys.Res., 98, 22855–22870. Semovski,S. V.andB. Wozniak.1995. Model of the annual phytoplankton cycle in the marine ecosystem:assimilation of monthly satellite chlorophyll data for the N. Atlanticand Baltic. Oceanologia, 37, 3–31. Sheinbaum,J. 1995.Variational assimilation of simulated acoustic tomography data and point observations:A comparativestudy. J. Geophys.Res., 100, 20745–20761. Spitz,Y. H.,J.R.Moisan,M. R.Abbottand J. G.Richman.1998. Data assimilation and a pelagic ecosystemmodel: parameterization using time series observations. J. Mar.Syst., 16, 51– 68. Tziperman,E. andW.C.Thacker.1989. An optimal-control/ adjoint-equationsapproach to studying theoceanic general circulation. J. Phys.Oceanogr., 19, 1471–1485. Ullman,D. S.andR.E.Wilson.1998. Model parameter estimation from data assimilation modeling: temporaland spatial variability of thebottom drag coef Ž cient.J. Geophys.Res., 103, 5531–5549. Vallino,J. J.2000.Improving marine ecosystem models: Use of data assimilation and mesocosm experiments.J. Mar.Res., 58, 117–164. Verity,P. G.,D. K.Stoecker,M. E. Sierackiand J. R.Nelson.1996. Microzooplankton grazing of primaryproduction in the equatorial Paci Ž c.Deep-Sea Res. II, 43, 1227–1256. Wunsch,C. 1996.The Ocean Circulation Inverse Problem. Cambridge Univ. Press, NY, 442pp. Wunsch,C. andA. E.Gill.1976. Observations of equatorially trapped waves in Paci Ž c sea level variations.Deep-Sea Res., 23, 371–390.

Received: 17June,2001; revised: 25September,2001.