MAY 2016 K A R S P E C K 1713

An Ensemble Approach for the Estimation of Observational Illustrated for a Nominal 1° Global Ocean Model

ALICIA R. KARSPECK National Center for Atmospheric Research,* Boulder, Colorado

(Manuscript received 15 October 2014, in final form 17 December 2015)

ABSTRACT

Least squares algorithms for data assimilation require estimates of both background error covariances and covariances. The specification of these is an essential part of designing an assim- ilation system; the relative sizes of these determine the extent to which the state variables are drawn toward the observational . Observational error covariances are typically computed as the sum of /instrumental errors and ‘‘representativeness error.’’ In a coarse-resolution ocean gen- eral circulation model the errors of representation are the dominant contribution to observational error covariance over large portions of the globe, and the size of these errors will vary by the type of and the geographic region. They may also vary from model to model. A straightforward approach for estimating model-dependent, spatially varying observational error that are suitable for least squares ocean data assimilating systems is presented here. The author proposes an ensemble-based estimator of the true observational error and outlines the assumptions necessary for the estimator to be unbiased. The author also presents the variance (or ) associated with the estimator under certain conditions. The analytic expressions for the expected value and variance of the estimator are validated with a simple au- toregressive model and illustrated for the nominal 18 resolution POP2 global ocean general circulation model.

1. Introduction real-world modeling applications, wherein error covari- Given of a geophysical system and a ances can never be precisely represented and optimality will numerical model for simulating that system, data- never be achieved, reasonable specification of the error assimilation methods are designed to produce (in most covariances still plays a critical role in designing a useful cases approximate) estimates of the distribution of the assimilation system (Dee 1995). model state variables conditional upon observations of In addition to errors associated with the measurement the system. This distribution is called the ‘‘analysis’’ in process (including instrumental errors), observational least squares parlance or the ‘‘posterior’’ in Bayesian errors also include ‘‘representativeness errors.’’ Repre- frameworks. For least squares data-assimilation methods sentativeness error is commonly understood to account (e.g., variational methods, optimal interpolation, Kal- for physical processes and scales that are detectable man filtering) the error covariance associated with the through observation but are not resolvable by the nu- ‘‘background’’ or ‘‘prior’’ distribution of the model state merical model (e.g., Cohn 1997; Fukumori et al. 1999; variables and the error variance associated with the ob- Oke and Sakov 2008). This category of errors can also servations are both required by the algorithms. Indeed, be extended to include errors arising from the pro- their accurate specification is a necessary condition cess of mapping from the discrete state space of the for the optimality of least squares methods. Even in modeltotheobservedfield(sometimescalledfor- ward interpolation errors; e.g., Daley 1993). The statistical characteristics of representativeness errors * The National Center for Atmospheric Research is sponsored will be model dependent, variable dependent, and by the National Foundation. geographically inhomogeneous. The intent of this work is to present a simple frame- Corresponding author address: Alicia R. Karspeck, NCAR, P.O. work for estimating observational error variances that Box 3000, Boulder, CO 80302. include both measurement error and representativeness E-mail: [email protected] error. The approach introduced here was motived by

DOI: 10.1175/MWR-D-14-00336.1

Ó 2016 American Meteorological Society Unauthenticated | Downloaded 09/29/21 12:09 PM UTC 1714 MONTHLY WEATHER REVIEW VOLUME 144 design requirements for the assimilation of in situ ocean (2008) and Forget and Wunsch (2007) make the as- hydrographic data into a coarse-resolution global ocean sumption that representativeness error primarily stems model (i.e., horizontal resolution of order 100 km). from the limited resolution of model grids. From this Coarse-resolution global ocean models do not resolve perspective, the observational error variance can be di- sharp density fronts or oceanic mesoscale eddies. Thus, agnosed by analyzing the differences between observa- in frontal zones or regions where mesoscale variability is tions and their area (Oke and Sakov 2008)orareaand active, representativeness error can be the dominant time (Forget and Wunsch 2007) averages. The downside contributor to the total observational error.1 Our a pri- of these methods is that they are tailored to the assimi- ori understanding of this motivated the need for a lation models for which they will be used only to the ex- method that admits spatial inhomogeneity. tent that averaging domains can be chosen for consistency In this work we are only concerned with the estima- with a model grid; no consideration of whether the model tion of observational error variance, not the full co- actually resolves the true large-scale dynamics is made. variance. While it is possible to extend this framework to Another approach to estimating observational error the estimation of the time and space covariances we do characteristics is to use observation-minus-forecast re- not cover this here for two reasons: (i) over most of the siduals from the successive cycling of a data-assimilation ocean in situ data are not sufficient to estimate spatial or update and a short-term forecast. These residuals temporal covariances without assuming parameterized contain a combination of model-resolvable forecast er- forms (Menemenlis and Chechelnitsky 2000) and (ii) it ror and unresolvable observation error. Disentangling is common practice in global ocean assimilation to treat the two requires that the time and/or geographic length observational errors as if they are spatially and temor- scales of the error sources (and, hence, their covariance ally uncorrelated (e.g., Wunsch and Heimbach 2007; characteristics) can be assumed to be sufficiently dif- Karspeck et al. 2013; Derber and Rosati 1989; Behringer ferent as to be robustly and mutually identifiable and et al. 1998; Giese and Ray 2011; Balmaseda et al. 2013; that there is sufficient data to make a useful joint esti- Zhang et al. 2007). This practice, while difficult to justify mation. These challenges are well appreciated and have when observational information is dense, is not un- been highlighted in both the oceanographic and atmo- reasonable for hydrographic observations, wherein ob- spheric data-assimilation literature (e.g., Rutherford servations are only occasionally in close physical 1972; Hollingsworth and Lönnberg 1986; Daley 1993; proximity. At any rate, whether or not it is reasonable, it Dee 1995; Dee and DaSilva 1999). For example, in the is also a choice that is routinely made for algorithmic context of ocean data assimilation, Richman et al. (2005) convenience. take the approach of defining observational errors as the The literature on representativeness error and the component of the observation-minus-forecast residuals variety of ways that it can be estimated is vast, and an in a forced global ocean assimilation that is globally exhaustive review is not presented here. It is worth orthogonal to the model’s leading spatial eigenvectors— mentioning, however, that a great deal of the seminal essentially imposing identifiability. work on estimating observational error emerged in Instead of using observation-minus-forecast residuals, the context of atmospheric data assimilation (e.g., in this work we will adopt the approach of using Rutherford 1976; Lönnberg and Hollingsworth 1986; observation-minus-simulation residuals from an ocean Daley 1993; Dee and DaSilva 1999; Desroziers and model forced with an observationally based estimate of Ivanov 2001; Desroziers et al. 2005). Newer techniques the atmosphere. Menemenlis and Chechelnitsky (2000) involve the hierarchical Bayesian estimation of obser- and Fu et al. (1993) both take this approach when esti- vational error (e.g., Li et al. 2009) and the estimation and mating observation and forecast error covariances for treatment of correlated and time-varying observational altimeter data. Model simulations are simply forecasts errors (e.g., Waller et al. 2014a,b; Miyoshi et al. 2013; that have been extended long enough that boundary Janjic´ and Cohn 2006). forcing and the internal variability dominate over initial In the context of global-scale ocean data assimilation, condition information. This strategy allows for the esti- the observation-based methods of Oke and Sakov mation of observational error variance before any assimilation has been performed. (A discussion of the relative advantages of using observation-minus- 1 For example, the results of Oke and Sakov (2008), Richman simulation and observation-minus-forecast residuals in et al. (2005), and Forget and Wunsch (2007) support the ranges of the context of the estimator that will be described in the 0.58–3.08C for the of observed temperature er- ror in the Gulf Stream for nominal 18 grids, while measurement body of this paper is included in the final section.) Note error is estimated at 0.0058C (for Argo profiling floats) to 0.18C [for that the fundamental problem of needing to assume expendable bathythermographs (XBTs); Johnson et al. (2009)]. identifiability of the simulation and observational error

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC MAY 2016 K A R S P E C K 1715 covariance structures is still present when using observational error variance. In particular, it is shown observation-minus-simulation residuals. Both Menemenlis that the variance of the estimator depends on the vari- and Chechelnitsky (2000) and Fu et al. (1993) make some ance and autocorrelation of the model simulations, the by choosing highly parameterized forms for number of ensembles used for estimation, the number of their simulation error covariances, but Menemenlis and observations available for computing the residuals, and Chechelnitsky (2000) point out that for the ocean there is the size of the true, underlying observational error var- rarely sufficient data to infer the forecast or simulation iance. In section 4, we use Monte Carlo to error covariance. validate the analytical equations for the properties of A notable aspect of the forced-ocean residual ap- the estimator. In section 5, we apply the technique to a proaches cited above is that they are based on a single state-of-the-art, non-eddy-resolving global ocean gen- realization of the model state that is meant to represent eral circulation model, showing maps of the estimator. A the expected value of the forecast or simulation distri- discussion of the technique in the context of other ap- bution. However, the atmospheric state used to force an proaches in the literature is presented in section 6. ocean model is, in actuality, uncertain. This uncertainty will naturally lead to uncertainty in the ocean simulation 2. The modeling framework (e.g., Leeuwenburgh 2007). Typically, the net impact of forcing uncertainty on the ocean state is treated as a. Ensemble of forced-ocean simulations ‘‘process’’ error (also referred to as ‘‘system’’ or Consider a deterministic numerical modeling frame- ‘‘model’’ error) and must be inferred as part of the joint work for the physics and dynamics of the ocean where forecast and observational error estimation problem the discrete multivariate model state vector zt is con- (Menemenlis and Chechelnitsky 2000; Fu et al. 1993). nected through time t using However, with the rise in popularity of ensemble data- assimilation systems, ensembles of possible atmospheric zt1Dt 5 Mzt 1 ft , (1) states that can be used for forcing ocean models are now available (e.g., Compo et al. 2011; Raeder et al. 2012; where f can be interpreted as the net forcing resulting Karspeck et al. 2013; Yang and Giese 2013). from time-varying surface fluxes of heat, fresh water, The problem of needing to jointly infer both the and momentum. Let us assume that we have access to a observational error characteristics and the error char- of possible realizations of atmospheric surface vari- acteristics of the simulation from observation-minus- ables from which k realizations of surface forcing can be simulation residuals can be mitigated by treating an derived and used to force a set of k ocean simulations. If ensemble of ocean simulations as samples from the M is linear and the net ocean forcing is an independent simulation distribution. With this additional information realization of a normally distributed process,2 then each about the simulation distribution, it is not necessary to of the k ocean state ensembles can be viewed as an assume or infer a preferred structure of the simulation identically distributed Z 5 z with error in order to separate the observational error and m and covariance L; that is, simulation errors; the contribution of simulation error to ... ; m L the observation-minus-simulation residuals can simply [Z1, Z2, Z3, , Zk] N( , ), (2) be computed directly from the ensemble. That being 5 ; ; ... said, care must be taken to construct a meaningful sim- where each Zj, j 1 2 3, , k exists in the discrete ulation ensemble and in section 2 we discuss process grid space and time intervals of the numerical model. error in this context and outline some general guidelines Note that in the absence of a time superscript Z is meant for how to construct a suitable ensemble. to signify a vector defined over the discrete space and Section 3 presents the proposed estimator for obser- time of the numerical model. The covariance L will re- vational error variance that is based on observation- flect the time and space autocorrelation and non- minus-simulation residuals from an ensemble of stationarity that emerge naturally through the model forced-ocean simulations. In the appendix the theoreti- dynamics and the atmospheric forcing. In section 3a we cal mean and variance of the estimator are derived un- will also add the assumption that M is an ‘‘unbiased’’ der an explicit set of assumptions and these are model, but in practice the requirement is just that if referenced in this section. We present the variance of the estimator (not to be confused with the estimator of the observational error variance) to develop an un- 2 Neither of these things will be strictly true for most state-of-the- derstanding of the factors that limit the usefulness of art global ocean models, but it is a useful approximation at coarse directly equating the estimation value with the true resolutions.

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC 1716 MONTHLY WEATHER REVIEW VOLUME 144 there is a stationary relative to observations it can the model) defines the boundaries of what is resolvable be estimated and removed. by the model. In this context, variations in f are the sole source of Naturally, observations are noisy records of the real process error in the system. In its broadest definition, ocean and here we make the standard assumption that process error may also include the effects of model pa- they are normally distributed conditional on knowledge rameter uncertainty and numerical imprecision (e.g., Fu of the real ocean state; that is, et al. 1993; Blanchet et al. 1997). However, it is impor- j ~ 5 H~ 1 ~ L tant to note that there is no unique definition of what p(y z, z) N( [z z], m), (3) constitutes process error; it must be determined sub- jectively through consideration of how the model is in- where the variance/covariance matrix Lm represents ~ tended to be used. For example, if the intention of the uncertainty due to the measurement process and H is a practitioner is to use a version of the model with variable (hypothetical) mapping from the continuous multivari- parameters, then the additional process error that ate space of to the discrete time–space location of emerges as a result of parameter uncertainty should be the measurement. Like z, y is generally a multivariate included in the generation of the ensemble. On the other vector of observations defined at any time and space ~ hand, if static parameters are to be used (as is most location. In practice, H must be approximated by the common in practice) then errors emerging as a result of discrete operator H that maps from the vector model parameter misspecification will fall into the category of space to the time–space location of the observations and ~ either bias or unresolvable processes. Because we are dH is the error due to this approximation (i.e., ~ ~ focused here on the use of a global ocean model with dH [ H 2 H) and (3) becomes static parameters, static topography, compiled code with j ~ 5 H 1 dH~ 1 ~ L fixed precision, etc., it is appropriate to assume that p(y z, z) N([ ][z z], m) ~ ~ forcing variability defines the error subspace of the 5 N(Hz 1 dHz 1 H~z, L ). (4) model. To the extent that it is desirable in other appli- m ~ ~ cations to consider additional forms of process error The terms dHz and H~z are the components of the rep- [parameter uncertainty likely being the next most resentativeness error due to the use of approximate important for non-eddy-resolving ocean general circu- forward operators and unresolved processes. While lation models; P. Gent (2015, personal communication)] forward operator error might be appreciable in some the ensemble of ocean states can be constructed to applications (e.g., assimilation of satellite radiances in contain these uncertainties through manipulation of the the atmosphere) for the problem of assimilation of numerical model M. As a general rule, if the practitioner in situ hydrography, forward operator errors are rela- ~ ~ generates an ensemble that underestimates (over- tively small (i.e., dHz H~z). Moving forward, we as- estimates) the process error, then the estimate of the sume that the forward operator error is zero. observational error variance will tend to be too high The goal here is to use observations to constrain the (too low). model-resolvable vector random variable Z, not Z 1 Z~. b. The treatment of unresolved processes as This is an important distinction; in sequential methods observational representativeness error that involve successive cycles of assimilation and fore- casting (such as Kalman filtering) requiring the numer- Before progressing, we consider why errors of repre- ical model to evolve states that include unresolved sentation can be lumped together with measurement processes can degrade the performance of the system errors under the general term observation error. The (e.g., Lorenc 1986; Daley 1993; Fukumori et al. 1999). real ocean, in contrast to the modeled ocean state, exists We must also, necessarily, admit that only the portion of in the continuous time and space that we associate with the observations that we can map using H can be used reality. It includes scales and physical processes that are as a constraint. unresolvable by the model and/or impossible to accu- Thus, the target posterior probability distribution is rately map from model state variables. Imagine that the p(z j y) and Bayes’s rule can be used to express this target real ocean state can be expressed as the sum of a random posterior distribution as variable Z, which is the model-resolvable ocean state ~ and a random variable Z that represents all remaining p(z j y) } p(z)p(y j z). (5) unresolvable processes. Note that Z~ is not synonymous with the process error discussed in the previous sub- The first factor on the right-hand side is the prior section; on the contrary, the process error (and its probability distribution for Z and the second factor is the manifestation as simulation error through integration of data likelihood. Using (3), we can express the likelihood

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC MAY 2016 K A R S P E C K 1717 function p(y j z) as a marginalization over the unresolved data assimilation are referred to Hodyss and Nichols (2015). processes: The development of an estimator for Lo under certain ð conditions is the central goal of this paper. For clarity and p(y j z) 5 p(y j z, ~z)p(~z) d~z. (6) consistency with the literature [see Cohn (1997)], it should ~z also be noted that in the special case of (5)–(7) applied to a sequential filtering update algorithm (like the Kalman fil- Importantly, the modeling framework described by (1) ter), there is an implied restriction of y and z to time t and an implicitly assumes that the evolution of the model state implied conditioning on observations at previous times. In Z 5 z is independent of the realization of Z~ 5 ~z. In other this special case, it is necessary to assume that the mea- words, there are assumed to be no rectified effects of the surement and representativeness error are uncorrelated in realization the unresolved processes on the realization time in order to support the time-serial processing the of the model state—and conversely, the realizations of Kalman filter employs. the unresolved process are not model state dependent. We also make the standard assumption that the un- resolved processes are normally distributed with mean 3. An estimator for the observational error ~ zero (unbiased) (i.e., Z~ ; N[0, L]). To be clear, the as- variance sumption that there is no covariance between Z and Z~ Moving forward, we focus on the special (but com- does not necessarily mean that these two fields do not mon) case where the observational error variance for a impact one another. For example, the effects of the given data type is assumed to be time stationary over a statistical properties of the unresolved processes on the fixed, but limited, geographic region. The choice of these resolved state may be accounted for via model param- regional boundaries is, naturally, application specific eterization, in which case they are implicitly included in and in many cases may be a heuristic choice based on M. This is consistent with the above assumptions unless expert insight into the system behavior. Henceforth, we the parameterizations are state dependent. If the model will refer to this more specific scalar parameter as s2, parameterizations are constructed with state de- o and the estimator or this parameter will defined for each pendency in the mean of Z~ then clearly the assumption geographic region independently.4 of zero covariance will be violated. Or, if the covariance Within each region, a set of n observations available of the unresolved processes is a function of the model through time and indexed by i, will be used for estima- state, then the marginal likelihood is no longer normally tion. Each scalar observation is a realization of the distributed in z—violating the standard least squares random variable Y , distributed as in (7). Each ensemble assumption of a Gaussian likelihood function.3 We do i member j of the model simulations can be mapped to the not explore these possibilities in this work, but we time and geographic location of any observation using mention it here for completeness and because it X 5 H Z for j 5 1; 2; 3, ..., k, such that Y and X are represents a substantive limitation in conceptually sep- i,j i j i i,j now collocated. For each region, we can then form F, arating resolved and unresolved processes. which will serve as an estimator for s2: Noting the above-mentioned assumptions and their o caveats, performing the integral in (6), the observational 1 n 1 n k 1 1 F5 å 2 h i 2 2 å 2 likelihood remains normally distributed, (Yi Xi ) Si . (8) n i51 n i51 k j 5 H L L [ H~L~ H~ T 1 L p(y z) N( z, o) o m , (7) The average of the k model ensembles at time and h i 2 2 space location i is denoted Xi and Si is the (k 1 and the variance associated with the unresolved process 2 5 2 ~ ~ ~ T normalized) ensemble variance; that is, Si 1/(k 1) HLH is the representativeness error. Thus, for the k 2 h i 2 åj51(Xi,j Xi ) . The estimator presented in (8) is the practical purpose of doing Bayesian assimilation using central result of this paper. The first term contains the (5), assuming Gaussian distributions, and measurement root-mean-square of deviations of the observational and measurement errors that are uncorrelated with each value from the ensemble mean value mapped to the other and with the resolvable model process, the mea- surement and representativeness errors can be summed. Readers interested in a more nuanced discussion of repre- 4 Note that it is certainly possible to construct a more complex sentation error and how it can be treated in the context of hierarchical spatial model of the observational error variance. And there may be some benefit to doing so for very data-sparse appli- cations. But since one of the goals of this method is ease of im- 3 Note that the estimator presented in the next section will still be plementation, this simpler framework of regionally independent unbiased in the case of a non-Gaussian likelihood. estimation is presented here.

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC 1718 MONTHLY WEATHER REVIEW VOLUME 144 location of the observation. In essence, this term (iv) observation errors are uncorrelated in space and contains a contribution from the observational error time, variance and the model simulation error variance [see then (from the derivation in the appendix) we can write (A4) in the appendix]. This can be nonintuitive for the variance of F as readers who are accustomed to interpreting an en- " # semble mean as the true state. In an ensemble frame- 2 2(k 1 1) (k 1 1)2 n work, the correct interpretation is that the true state is Var(F) 5 s4 1 s2s2 1 s4 , n o k o k(k 2 1) n statistically indistinguishable from the model simula- eff tion ensemble (see item i below). As such, it contains (10) one realization of the model simulation error. The s2 [ n s2 s4 [ n s4 s2 second term then contains the variance of the where 1/nåi51 i , 1/nåi51 i , and i is the ensemble, which is basically a sample estimate of simulation error variance at each time–space location i; s2 5 H LHT the model simulation variance. The difference of these that is, i i i . An important note here is that there terms is an estimate of the observational error vari- is no need to assume that the simulation variance is ance. There is a natural balance in these terms; as the stationary or that deviations from the mean are time spread of the ensemble (term two) becomes smaller, uncorrelated. The term neff serves as an effective de- the ensemble mean will be more reflective of the true grees of freedom that is associated with the autocor- state, and the simulation error component of term one relation and possibly nonstationary variance of the will also decrease. The remainder of this section is simulation: concerned with the conditions under which F can be n n s2 n 2 2 2 2 considered an unbiased estimate of o and the theo- n 5 b 5 å å s s 0 r 0 , i i i i . (11) eff 1 b 0 , retical variance of the estimator. The properties of 1 ns4 i51 i 5i11 the estimator presented below are all derived in the Here r 0 is the autocorrelation in the model simula- appendix. i,i tion between locations i and i0. Although the assumption Properties of the estimator of uncorrelated observational errors is standard in data assimilation, there is evidence that both measurement The estimator F is unbiased only if its expected value errors and representativeness errors may be correlated is equal to s2. Here we list the series of conditions for o in geoscience applications (e.g., Stewart et al. 2008; which this holds: Bormann et al. 2010; Waller et al. 2014a). Readers in- (i) The model-resolvable true state (denoted Z*) along terested in deriving the properties of the estimator as- with the simulation ensembles are independently and suming correlated observational errors can do so from identically distributed. This is known as a ‘‘reliable’’ (A7) in the appendix. We do not explicitly consider that ensemble and allows (2) to be extended to include case here, but suffice to say that the estimator will still be ... ; m L F Z*;thatis,[Z*, Z1, Z2, Z3, , Zk] N( , ). unbiased, but the variance of will increase. We now (ii) There is no correlation between observation errors consider three special cases that lead to significant sim- and simulation errors (i.e., deviations from the mean). plification of (11). (iii) The model solution is unbiased relative to the obser- In the special case of no autocorrelation in the model vations (or the bias can be estimated and removed). simulations, neff is trivially equal to n. In the case that the model ensemble variance is sta- Regarding condition i, keep in mind that Z is 2 2 * tionary (i.e., s 5 s 0 ), observations are available at defined by the error subspace of the simulation i i regular time intervals Dt, and the model simulation has a ensemble. Thus, the ensemble is reliable by con- regional autocorrelation function rt defined at lags tDt, struction. Condition i will be violated by the use of n21 then b 5 2/nå (n 2 t)r2. Further assuming that the an ensemble that does not represent the range of t51 t range of significant autocorrelation is much smaller than model-resolvable variability that could emerge in nDt, the factor (n 2 t)/n will be nearly unity for nonzero the actual anticipated use of the model. r , and we can approximate n as If the above conditions are met, then, as shown in t eff the appendix, n n ’ . (12) eff n21 1 r2 E(F) 5 s2 , (9) 1 2 å t o t51 where E is the expectation operator. If we make the In practice, the assumption that data are available at additional assumptions that regular intervals may be too restrictive. Imagine instead

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC MAY 2016 K A R S P E C K 1719

2 that something is known about the average simulation properties of the autoregressive model (i.e., s and rt)and s2 autocorrelation between all unique locations of the n o. The Monte Carlo and theoretical moments are observations, which we will just call r2. As before, we indicated in the panel and are in excellent agreement. make the simplifying assumption that the simulation Following the parameters outlined in Table 1, Figs. 1b–f variance is stationary. In this case, we have show the theoretical (solid lines) and Monte Carlo b 5 (n 2 1)r2. And if n 1, then we can approximate generated values (dots) of the expected value and vari- neff as ance of F as these system and observational parameters vary. The agreement is nearly exact, providing a check on n n ’ . (13) (9), (10),and(11).Equations(10) and (12) and Figs. 1b–f eff 1 r2 1 n indicate that the variance of F increases with increas- s2 s2 r The value of having an analytic form for the variance ing o, ,and t and decreases with larger n and k.For of F is that it illuminates the strategies that could be the parameter set here explored here, the variance of F used to increase the precision of F. A discussion along is only a very weak function of the number of en- these lines is in the final section of this paper. sembles beyond a threshold value of about 10. Simi- larly, it is not until the lag-one autocorrelation reaches 0.6thatthereisastrongfunctionaldependenceonthe 4. A Monte Carlo verification of the properties of autocorrelation. On the other hand, for this parameter the estimator set the variance of F is a strong function of the ob- In this section we use Monte Carlo to servation and simulation variance and the number of demonstrate that (9) and (10) accurately reflect the first observations. two moments of the estimator F. For the purpose of this demonstration, consider (1) in a univariate context 5. Illustration using the POP2 ocean model and with f a time-uncorrelated, zero-mean random variable s2 5 5 : in situ hydrographic data with prescribed variance f 3andM 0 5. This model is then simply a first-order autoregressive pro- Here we apply the observational error variance esti- s2 5 s2 2 2 r 5 rt 5 t cess, with f /(1 M )and t 1 M .Pseudo mator to in situ temperature data that can be assimilated observations can be constructed by integrating (1) into the Parallel Ocean Program version 2 (POP2; Smith through time, at each time t 5 iDt(i 5 1, 2, 3, ..., n) et al. 2010) global ocean general circulation model. adding random draws of the observational error, which POP2 is a level-coordinate model, with 60 vertical levels, is normally distributed with zero mean and variance configured with a nominal horizontal resolution of 18, s2 5 5 1 8 o 5. We generate a total of n 100 pseudo obser- increasing to /4 meridionally near the equator. POP2 is vations. To apply the estimator F to the problem of the ocean component of the Community Earth System trying to infer the observational error variance, an Model (CESM; Gent et al. 2011). Further details on the ensemble of k integrations of (1) must be run. These are model can be found in Danabasoglu et al. (2012). Note analogous to the ensemble of forced-ocean states that that the goal of this illustration is not to make the most would be available in the real problem. This set of precise estimation of the observational error but to observations and the corresponding collection of en- simply illustrate the . sembles would be the only information available to An ensemble of 30 POP2 ocean simulations is pro- s2 infer o. duced by forcing with 30 unique samples of the atmo- In any real application there is only one set of obser- spheric state from an ensemble reanalysis of the vations and collection of ensembles. Thus, although the atmosphere (Raeder et al. 2012) produced with the estimator F is a random variable, only one sample of F nominal 28 resolution Community Atmosphere Model, can be computed. However, in this demonstration, the version 4 (CAM4; Neale et al. 2013). This ensemble goal is to show the distribution of F, so that the Monte atmospheric reanalysis assimilates temperature and Carlo estimates of the mean and variance can be com- winds from radiosondes and aircraft- and satellite- pared to the theoretical values. Many samples of F are derived drift winds. The ocean ensemble was initial- generated by repeatedly creating a sequence of n pseudo ized on 1 January 2004 from a climatological ensemble observations, regenerating the k ensembles from new of POP2 ocean states. The ocean simulations were in- integrations of (1) and computing a sample of F. For this tegrated through 31 December 2006, but 2004 was demonstration, 200 000 samples are used. treated as a spinup period and only years 2005 and 2006 Figure 1a shows the Monte Carlo generated discrete were used for estimating the observation error variance. probability density function of F, computed for the afore- In situ temperature observations from the 2009 World mentioned values of k, n, variance, and autocorrelation Ocean Database (WOD09; Johnson et al. 2009) were

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC 1720 MONTHLY WEATHER REVIEW VOLUME 144

FIG. 1. Monte Carlo estimates of the mean and variance of the estimator for comparison to the theoretical values. F s2 (a) The probability density function for the control parameters specified in Table 1. (b) Varying o, solid lines represent the theoretical moments, and the dots represent the Monte Carlo computed values. (c)–(f) As in (b), but for variations in the ensemble variance of the simulation, the lag-one autocorrrelation of the system, the number of observations, and the number of ensemble members. used to compute observation-minus-simulation re- bathythermographic (XBT) instruments, moored therm- siduals for use in (8). During 2005 and 2006, most of istors, and surface drifting buoys. the in situ temperature data in the WOD09 comes To compute residuals and ensemble sample statistics from autonomous drifting profiling floats (Argo), as called for by (8) the temperature field from each

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC MAY 2016 K A R S P E C K 1721

TABLE 1. List of Monte Carlo experiments using a first-order autoregressive model for demonstrating the validity of (9), (10), and (11).

s2 s2 r Demonstration name o 1 nk Control 5 4 0.5 100 30 s2 Vary o 0.1–10 4 0.1–0.9 100 30 Vary s2 5 0.1–10 0.5 100 30 r Vary 1 5 4 0.1–0.9 100 30 Vary n 5 4 0.5 10–500 30 Vary k 5 4 0.5 100 2–50 ocean ensemble was interpolated to the time and space location of the observations. (This interpolation is sim- ply the application of the forward observation operator H.) Because there are systematic between the observations and the ensemble forced-ocean simula- tion that should not be included in the estimation of observational error, the residuals from the year 2005 FIG. 2. Number of in situ temperature observations in each 58 box are used to estimate the systematic bias within 28 grid used to compute the estimator. White boxes indicate regions where boxes at standard levels. A two-dimensional Gaussian there were no observations available. filter was then applied to this field to smooth the solu- tion. This bias was then removed from the simulation ensembles in the year 2006, prior to applying (8) for 58 observational error statistics for in situ ocean hydro- boxes.5 Note that given a longer simulation, it would be graphic data. These two independent estimates are in reasonable to make a climatologically varying estimate are excellent agreement. As mentioned in the in- of the bias. The number of observations used within troduction, theirs is a purely observationally based es- each grid box is illustrated in Fig. 2 at 100- and 1000-m- timate and is comparable to ours because they use a depth levels. horizontal space-averaging baseline for their estimates 8 For this illustration, we assume time stationarity of the of 1 , which is approximately the same resolution as the observational error variance, although for some appli- POP2 model. The similarity in the estimates suggests cations there may be benefit to computing a seasonally that, roughly speaking, a lack of resolution it the primary dependent estimator. With enough data, this would source of observational error in this model. Note that amount to simply applying the estimator to seasonally they also apply a spatial smoothing to their estimates. differentiated subsets of the data. The variance of our estimator will be a function of the The F estimate of the standard deviation of the ob- simulation variance and autocorrelation at the time– servational error at 100- and 1000-m depths is shown in space locations of the observations, the number of ob- Fig. 3. Zonal versus depth cross sections at the equator servations, the number of ensembles, and the underlying and at 358N are shown in Fig. 4. At the equator the true observational error. Of course, excepting the standard deviation of the observational error is highest number of ensembles and the number of observations, along the main thermocline, where unresolvable circu- we do not know these parameters exactly. However 8 lation or wind errors will naturally be amplified by the within each 5 box approximate values can be assigned stratification. At 358N, the Gulf Stream and Kuroshio using sample estimates from the 2005 period. This is a western boundary currents stand out as regions of high very rough estimate based on the understanding that the observational error variance. This is due in part to the years 2005 and 2006 will share simulation and observa- high eddy activity in these currents—activity that is tional network characteristics and that the sample sta- unresolvable in coarse-resolution ocean models. Our tistics will be correct within an order of magnitude. estimates can be visually compared to the solutions Based on these estimated parameter values, Fig. 5 shows of Forget and Wunsch (2007), who also estimate spatial maps of the approximate variance of the esti- mator at 100- and 1000-m depths. Note that the color scale is logarithmic; by these approximations there is 5 Five-degree boxes were used because we found that there were enormous spatial and depth inhomogeneity in the vari- not enough observations in 28 boxes over 1-yr time frames to make ance of the estimator. What dominates the uncertainty meaningful estimates of the observational error variance. in the estimator? In Fig. 6, the horizontal average of the

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC 1722 MONTHLY WEATHER REVIEW VOLUME 144

FIG. 3. (left) Estimate of the standard deviation of observational error computed as the square root of the expected value of the estimator F. (right) Estimates as described in Forget and Wunsch (2007). At 100-m depth, the line indicates the 18C contour. At 1000-m depth, the line indicates the 1/ 38C contour. percent contribution of each of the three terms in (10) to more ensembles can be used, increasing n and k. And, of the total variance of F is shown as a function of depth. This course, a different simulation distribution, one with re- suggests that, in this system, the vast majority of the un- duced variance and covariance, but that is still reliable, certainty in the estimation of observational error variance could be constructed. is a function of the underlying true observational error and If one only considers the expected value of the estimator, is, thus, reducible only through increasing the number of it can appear that there is no benefit to choosing an en- observations used for estimation. semble that is both reliable and minimizes the variance of the simulation ensemble. On the other hand, if estimator precision is a goal, there is some benefit to decreasing the 6. Discussion simulation ensemble spread and autocorrelation. The ex- This paper presents a simple ensemble-based tech- tent to which that benefit is significant will depend on the nique for estimating observational error variance for relative contributions of the three terms in (10). ocean data-assimilation systems. Given an ensemble of This reduction of uncertainty in the simulation en- atmospheres that reflect the uncertainty in the forcing to semble can result from the assimilation of data. If a the ocean, observation-minus-simulation residuals from data-assimilation update is performed optimally, with the corresponding forced-ocean ensembles can be used to observational and background error covariances per- form an unbiased estimator for the observational error fectly known and perfectly represented and the model variance. We also derive an uncertainty associated with unbiased, then the forecasts used in sequences of the estimator in the case where the standard assumption observation-minus-forecast residuals will remain reli- of uncorrelated observational error is made. able, and the estimator shown here will remain un- We show that the uncertainty in the estimator F is a biased, but with increased precision. On the other hand, it function of the true value of the observational error is impossible to do optimal assimilation since we have no a variance, the model simulation variance and covariance, priori knowledge of the observational error variance— the number of observations used in the estimate, and the and forecasts following a suboptimal update will always number of simulation ensembles. It can be understood violate the reliability criteria. Desroziers et al. (2005), from (10) that as the total number of observations and who developed a set of diagnostic statistics that could be ensembles available for estimation increases and the applied after successive cycles of assimilation to assess simulation ensemble variance and covariance decreases, whether the observation and forecast errors had been the precision of the estimator will improve. While the correctly specified, use analogous reasoning. They show size of the underlying observational error variance is an that in the case of perfectly specified errors, the ob- irreducible factor in the uncertainty in F, the remaining servation error variance is equal to the expected value of factors can be altered. If possible, more data points and the sample covariance of observation-minus-forecasts

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC MAY 2016 K A R S P E C K 1723

FIG.4.AsinFig. 3, but for longitude–depth transects at the equator and 358N. In all panels the line indicates the 18C contour. residuals and observation-minus-analysis residuals. The manuscript, it is straightforward to extend the estimator to advantage of using simulation residuals instead of fore- multivariate space so that observational covariances could cast residuals from a sequential data-assimilation cycling be computed. However, the lack of time and space prox- is that the assumption of correct error statistics need only imate hydrographic observations makes the actual esti- apply to the model for the estimator to remain unbiased. mation practically impossible. Estimation of observational However, the cost of this less stringent assumption is that covariances is more sensible for densely sampled obser- precision in the estimator will decrease. vations such as those derived from satellite remote . We have focused on an ensemble of forced-ocean sim- The framework presented here relies on relatively ulations for the estimation. But, in fact, the estimator F is strict assumptions regarding the independence of sim- equally valid for an ensemble of ocean states from a freely ulation and observational errors and normally distrib- evolving coupled ocean–atmosphere model. In this case, uted processes. While these assumptions are rarely the covariance and autocorrelation of the atmosphere realized precisely, they tend to be useful approxima- ensemble will be larger and will lead to larger variances tions. For much of the global ocean, the large-scale and autocorrelations in the ocean ensemble as well. It is oceanic response to atmospheric perturbations is well then a matter of expert opinion whether or not the re- approximated with linear dynamics (e.g., Anderson and sulting variance of F is too high for useful estimation. Gill 1975; Anderson and Killworth 1977; Malanotte- We have also focused on the estimation of observational Rizzoli 1996); however, in some regions (e.g., high- error variance and have not attempted to estimate the latitude convective regimes) the extent to which possible time or space covariance. As mentioned in the nonlinear dynamics lead to nonnormal distributions

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC 1724 MONTHLY WEATHER REVIEW VOLUME 144

FIG. 5. The approximate variance of the estimator F. Note that the color scale is logarithmic. For 100-m (1000 m) depth the black line indicates the 18C4 (0.0018C4). FIG. 6. For each depth, the horizontal average of the percent contribution of each of the three terms to the total variance of F. could potentially be an issue. This is also an issue for any Blue line: term 1 from (10), representing the estimated contribu- tion to the variance of F that emerges from the magnitude of the nonlinear forward operator. However, as mentioned in the observational error; green line: contribution from term 2, which appendix, even when errors are non-Gaussian, the esti- emerges from a combination of the simulation and observational mator presented here is still unbiased but the variance of error variance; red line: contribution from term 3, which represents the estimator is no longer given by the formula presented the simulation error variance and the reduced degrees of freedom in this work. The assumption of independence between the associated with auto correlation in the simulation ensemble. resolved and unresolved process poses another interesting traditional measures of reliability such as the ‘‘uniformity challenge. This assumption goes far beyond this work; it is of the rank histogram’’ (Hamill 2001) could be in- fundamental to the formulation of most data-assimilation terpreted with less ambiguity. methods. The development of state-dependent observa- tional error statistics would be a fruitful line of research, as Acknowledgments. Special thanks to Alexey Kaplan, would the development of techniques for the joint esti- Jeff Anderson, Robert Miller, and Doug Nychka for mation of resolvable and unresolvable processes. helpful discussions on this topic. Thanks also to the three Finally, it is worth commenting on one of the basic as- anonymous reviewers for their constructive reading of the sumptions in this work: that of a ‘‘reliable’’ ensemble with manuscript. This work was funded in part by the NOAA which to generate our estimate of observational error. Climate Program Office under the Climate Variability and How can one verify that a model ensemble has been Predictability Program Grants NA09OAR4310163 and suitably formed? Unfortunately, when real data are used NA13OAR4310138 and by the NSF Collaborative Re- for benchmarking, there is no objective answer to the search EaSM2 Grant OCE-1243015. question of whether or not the ensemble is reliable be- cause measures of reliability must account for observa- APPENDIX tional errors [see Hamill (2001) and Anderson (1996)]. Deriving the First Two Moments of the Estimator Thus, there is no way of knowing with certainty whether any emerging inconsistencies result from the mis- This appendix derives the expected value and variance of specification of background errors or observational errors. the observational error estimator F. All notation is con- It is a frustratingly circular state of affairs. A possible route sistent with the main body of the paper. We begin by re- forward is to attempt to establish reliability based on in- writing the estimator (8) from the main text in vector form, dependent data sources that can be used to aggregate 1 F5 [Y 2 HhZi]T[Y 2 HhZi] over time–space scales that can be assumed to be well n resolved by the model. It might then be possible to as- 1 k 1 1 1 k sume that the error of representation (and measurement) 2 å [Z 2 hZi]THTH[Z 2 hZi], (A1) 2 j j is much much less than the background error—such that n k 1 k j51

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC MAY 2016 K A R S P E C K 1725 where observed values Yi are in a column vector Y H 5 Hm 1 e e ; S 5 ... Zj j, j N(0, ), j 1, 2, 3, , k and the linear matrix operator H 5 [H1, H2, ... H 5 Hm 1 e e ; N S H3, , Hn] can be used to map each state-space Z* *, * (0, ) vector Zj into an observation space column vector. Y 5 HZ 1 e , e ; N(0, S ). (A2) (As in the main text, h.i denotes an ensemble * o o o average.) The zero-mean random variables ej, e ,andeo are all vectors T * We will derive the moments of F using the algebra of of length n and S 5 HLH and So are [n 3 n] covariance random variables (e.g., Springer 1979). The algebra is matrices. The body of the paper is only concerned with the made considerably more straightforward by treating the case in which the off-diagonal elements of So are zero and s2 random variables Z and Y as the sum of a mean value the diagonal elements are a constant o (regionally station- and a deviation from the mean, where the deviations are ary, uncorrelated observational errors). But we derive the zero-mean random variables. From the main text (2), complete matrix form of the moments of the estimator and (7), and the reliability criteria, we can express the model do not apply this simplification until later in the appendix. ensembles, the model-resolvable true state of the ocean, We use (A2) to express (A1) in terms of the vector and the observations as deviations associated with Y and Z:

1 1 k 1 1 1 k F5 [e 1 e 2 hei]T[e 1 e 2 hei] 2 å [e 2 hei]T[e 2 hei] * o * o j j n n k 2 1 k j51 1 1 k 1 1 1 k 5 [eT e 1 eTe 1 2eT e 2 2(e 1 e )Thei 1 heiThei] 2 å [eTe 1 heiThei 2 2eThei]. (A3) * * o o * o * o j j j n n k 2 1 k j51

A1 heiThei 5 2 k eTe 1 Using the identity 1/k åj51 j j (iii) The observation errors are uncorrelated with the 2 k k eTe F 2/k åj51åm5j11 j m,wecanrewrite (terms are labeled simulation- and model-resolvable true state de- T T for later convenience): viations; that is, E(e e ) 5 E(e ej) 5 0 and 2 o * o " # 6zffl}|ffl{A z}|{B zfflffl}|fflffl{C 1 6 2 k 1 1 k F5 6 eT e 1 eTe 1 eT e 2 e 1 e T å e E F 5 E eT e 1 E eTe 2 å E eTe o o 2 o ( o) j ( ) ( ) ( o o) ( j j) n 4 * * * k * 5 n * * k 5 |fflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflffl}j 1 j 1 D 1 3 5 Tr(S ), (A5) n o 7 7 1 k 4 k k 7 2 å eTe 1 å å eTe 7 j j j m . (A4) where Tr denotes the trace of a matrix. This demon- k 5 (k 2 1)(k) 5 5 1 7 |fflfflfflfflffl{zfflfflfflfflffl}j 1 |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}j 1 m j 1 5 strates that F is an unbiased estimator of the average E F observational error variance. If the observational error variance is constant, then E(F) 5 s2.Notethatal- The expected value of (A4) can be computed for the o though we have assumed normality of the errors, this is following assumptions (also outlined in the main text), not a necessary condition for the estimator to be un- expressed in terms of the expectation operator: biased [see identities in Bao and Ullah (2010)]. To e e F (i) The simulation ensemble is reliable ( * and j are compute the variance of ,weusetheidentity identically distributed); that is, E(eTe ) 5 E(eT e ) F 5 E F2 2 E F 2 E F2 j j * * Var( ) ( ) ( ) .Forming ( )requiresthe for j 5 (1, 2, 3, ..., k). examination of 21 terms (the product of every unique (ii) The deviations associated with the simulation combination of terms A through F). Drawing on the ensembles are independent of one another and in- assumptions outlined above, only nine of those terms dependent of deviations associated with the model- are nonzero, and we can write resolvable true state; that is, E(eTe ) 5 E(eTe ) 5 0 j * j m 1 for j, m 5 (1, 2, 3, ..., k)j 6¼ m. E(F2) 5 E[A2 1 B2 1 C2 1 D2 1 E2 1 F2 n2 1 2AB 1 2AE 1 2BE], (A6)

A1 This identity is easily derivable with basic algebraic expansion where the squares should be understood to be a matrix of the ensemble average operator. product.

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC 1726 MONTHLY WEATHER REVIEW VOLUME 144

We can form the terms in (A6) using a set of de- Unlike the equalities used in (A5), the above equalities only rived identities for the expectation of quadratic hold for normally distributed errors. Equation (A6) is then forms involving two zero-mean, independent, nor- " mally distributed multivariate random variables 1 4(k 1 1) E(F2) 5 3Tr(S S ) 1 Tr(S S ) a and b (see, e.g., Lemma 2.3; Magnus 1978; Bao and Ullah n2 o o k o m T T T T 2010): E([a a][a b]) 5 0, E([a a][a a]) 5 2Tr(CaCa) 1 # Tr(C )Tr(C ), E([aTb][aTb]) 5 Tr(C C ), E([aTa][bTb]) 5 1 2 a a a b 1 2(k 1) SS C C C C Tr( ) (A7) Tr( a)Tr( b), where a and b are the covariance matri- k(k 2 1) ces of a and b; and the exact matrix form of the dispersion of the esti- E 2 5 SS 1 S S (A ) 2Tr( ) Tr( )Tr( ), mator F is E 2 5 S S 1 S S (B ) 2Tr( o o) Tr( o)Tr( o), " 2 E(C ) 5 4Tr(SS ), 2 2 2 o Var(F) 5 E(F ) 2 E(F) 5 Tr(S S ) n2 o o 4 E(D2) 5 [Tr(SS ) 1 Tr(SS)], # k o 1 1 2 1 2(k 1) S S 1 (k 1) SS 2 Tr( o ) Tr( ) . E(E2) 5 Tr(SS) 1 Tr(S)Tr(S), k k(k 2 1) k (A8) 8 E(F2) 5 Tr(SS), (k)(k 2 1) r 0 If we take i,i to be the autocorrelation in the model 0 s2 E(2AB) 5 2Tr(S)Tr(S ), simulation between time–space locations i and i , and i o 2 and s 0 to be the corresponding variances and the ob- E(2AE) 522Tr(S)Tr(S). i servational errors to be uncorrelated and with a constant E 52 S S (2BE) 2Tr( )Tr( o), variance, then we can write

" # 1 n 1 2 n n 2 4 2(k 1) 2 2 (k 1) 2 2 2 F 5 s 1 s å s 1 å å s s 0 r 0 Var( ) n o o i ( i i i,i ) n2 k 5 k(k 2 1) 5 05 " i 1 i#1 i 1 2 2(k 1 1) (k 1 1)2 n 5 s4 1 s2 s2 1 s4 o o 2 n , (A9) n k k(k 1) eff

, where neff is an effective degrees of freedom associ- J. Climate, 9, 1518–1530, doi:10.1175/1520-0442(1996)009 1518: . ated with the autocorrelation and possibly non- AMFPAE 2.0.CO;2. Balmaseda, M. A., K. Mogensen, and A. T. Weaver, 2013: Evalu- stationary variance of the model ensemble and is ation of the ECMWF ocean reanalysis system ORAS4. Quart. given by J. Roy. Meteor. Soc., 139, 1132–1161, doi:10.1002/qj.2063. Bao, Y., and A. Ullah, 2010: Expectation of quadratic forms n n n 2 2 2 2 in normal and nonnormal variables with applications. n 5 b 5 å å s s 0 r 0 , i i i i . (A10) eff 1 b 0 , J. Stat. Plann. Inference, 140, 1193–1205, doi:10.1016/ 1 ns4 i51 i 5i11 j.jspi.2009.11.002. The exact result for the dispersion of our estimator F is Behringer, D. W., M. Ji, and A. Leetmaa, 1998: An improved coupled model for ENSO prediction and implications for given by (A9) and (A10). ocean initialization. Part I: The ocean data assimilation system. Mon. Wea. Rev., 126, 1013–1021, doi:10.1175/ , . REFERENCES 1520-0493(1998)126 1013:AICMFE 2.0.CO;2. Blanchet, I., C. Frankignoul, and M. Cane, 1997: A comparison of Anderson, D., and A. Gill, 1975: Spin-up of a stratified ocean, with adaptive Kalman filters for a tropical Pacific Ocean model. Mon. applications to upwelling. Deep-Sea Res. Oceanogr. Abstr., 22, Wea. Rev., 125, 40–58, doi:10.1175/1520-0493(1997)125,0040: 583–596, doi:10.1016/0011-7471(75)90046-7. ACOAKF.2.0.CO;2. ——, and P. Killworth, 1977: Spin-up of a stratified ocean, with Bormann, N., A. Collard, and P. Bauer, 2010: Estimates of spatial topography. Deep-Sea Res., 24, 709–732, doi:10.1016/ and interchannel observation-error characteristics for current 0146-6291(77)90495-7. sounder radiances for numerical weather prediction. II: Ap- Anderson, J. L., 1996: A method for producing and evaluating plication to AIRS and IASI data. Quart. J. Roy. Meteor. Soc., probabilistic forecasts from ensemble model integrations. 136, 1051–1063, doi:10.1002/qj.615.

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC MAY 2016 K A R S P E C K 1727

Cohn, S., 1997: An introduction to estimation theory. J. Meteor. Karspeck, A., S. Yeager, G. Danabasoglu, T. Hoar, N. Collins, Soc. Japan, 75 (1B), 257–288. K. Raeder, J. Anderson, and J. Tribbia, 2013: An ensemble Compo, G., and Coauthors, 2011: The Twentieth Century Re- adjustment Kalman filter for the CCSM4 ocean component. analysis project. Quart. J. Roy. Meteor. Soc., 137, 1–28, J. Climate, 26, 7392–7413, doi:10.1175/JCLI-D-12-00402.1. doi:10.1002/qj.776. Leeuwenburgh, O., 2007: Validation of an EnKF system for Daley, R., 1993: Estimating observation error statistics for atmo- OGCM initialization assimilating temperature, salinity, and spheric data assimilation. Ann. Geophys., 11, 634–647. surface height . Mon. Wea. Rev., 135, 125–139, Danabasoglu,G.,S.Bates,B.Briegleb,S.R.Jayne,M.Jochum, doi:10.1175/MWR3272.1. W. Large, S. Peacock, and S. Yeager, 2012: The CCSM4 Li, H., E. Kalnay, and T. Miyoshi, 2009: Simultaneous estimation of ocean component. J. Climate, 25, 1361–1389, doi:10.1175/ covariance inflation and observation errors within an ensem- JCLI-D-11-00091.1. ble Kalman filter. Quart. J. Roy. Meteor. Soc., 135, 523–533, Dee, D. P., 1995: On-line estimation of error covariance pa- doi:10.1002/qj.371. rameters for atmospheric data assimilation. Mon. Wea. Lönnberg, R., and A. Hollingsworth, 1986: The statistical structure Rev., 123, 1128–1145, doi:10.1175/1520-0493(1995)123,1128: of the short-range forecast errors as determined from radio- OLEOEC.2.0.CO;2. sonde data. Part II: The covariance of height and wind errors. ——, and A. M. DaSilva, 1999: Maximum-likelihood estimation of Tellus, 38A, 137–161, doi:10.1111/j.1600-0870.1986.tb00461.x. forecast and observation error covariance parameters. Part I: Lorenc, A. C., 1986: Analysis methods for numerical weather Methodology. Mon. Wea. Rev., 127, 1822–1834, doi:10.1175/ prediction. Quart. J. Roy. Meteor. Soc., 112, 1177–1194, 1520-0493(1999)127,1822:MLEOFA.2.0.CO;2. doi:10.1002/qj.49711247414. Derber, J., and A. Rosati, 1989: A global oceanic data assimilation Magnus, J. R., 1978: The moments of products of quadratic forms in system. J. Phys. Oceanogr., 19, 1333–1347, doi:10.1175/ normal variables. Stat. Neerl., 32, 201–210, doi:10.1111/ 1520-0485(1989)019,1333:AGODAS.2.0.CO;2. j.1467-9574.1978.tb01399.x. Desroziers, G., and S. Ivanov, 2001: Diagnosis and adaptive tuning Malanotte-Rizzoli, P., Ed., 1996: Modern Approaches to Data As- of observation-error parameters in a variational assimilation. similation in Ocean Modeling. Oceanography Series, Vol. 61, Quart. J. Roy. Meteor. Soc., 127, 1433–1452, doi:10.1002/ Elsevier, 455 pp. qj.49712757417. Menemenlis, D., and M. Chechelnitsky, 2000: Error estimates for ——, L. Berre, B. Chapnik, and P. Poli, 2005: Diagnosis of obser- an ocean general circulation model from altimeter and vation, background and analysis-error statistics in observation acoustic tomography data. Mon. Wea. Rev., 128, 763–778, space. Quart. J. Roy. Meteor. Soc., 131, 3385–3396, doi:10.1256/ doi:10.1175/1520-0493(2000)128,0763:EEFAOG.2.0.CO;2. qj.05.108. Miyoshi, T., E. Kalnay, and H. Li, 2013: Estimating and in- Forget, G., and C. Wunsch, 2007: Estimated global hydrographic cluding observation-error correlations in data assimila- variability. J. Phys. Oceanogr., 37, 1997–2008, doi:10.1175/ tion. Inverse Probl. Sci. Eng., 21, 387–398, doi:10.1080/ JPO3072.1. 17415977.2012.712527. Fu, L.-L., I. Fukumori, and R. Miller, 1993: Fitting dynamic models Neale, R., J. Richter, S. Park, R. Lauritzen, S. Vavrus, P. Rasch, to the Geosat sea level observations in the tropical Pacific and M. Zhang, 2013: The mean climate of the Community Ocean. Part II: A linear, wind-driven model. J. Phys. Ocean- Atmosphere Model (CAM4) in forced SST and fully coupled ogr., 23, 2162–2181, doi:10.1175/1520-0485(1993)023,2162: experiments. J. Climate, 26, 5150–5168, doi:10.1175/ FDMTTG.2.0.CO;2. JCLI-D-12-00236.1. Fukumori, I., R. Raghunath, L.-L. Fu, and Y. Chao, 1999: Assim- Oke, P., and P. Sakov, 2008: Representation error of oceanic ob- ilation of TOPEX/Poseidon altimeter data into a global ocean servations for data assimilation. J. Atmos. Oceanic Technol., circulation model: How good are the results? J. Geophys. Res., 25, 1004–1017, doi:10.1175/2007JTECHO558.1. 104, 25 647–25 665, doi:10.1029/1999JC900193. Raeder, K., J. Anderson, N. Collins, T. Hoar, J. Kay, P. Lauritzen, Giese, B. S., and S. Ray, 2011: El Niño variability in simple ocean and R. Pincus, 2012: DART/CAM: An ensemble data assim- data assimilation (SODA), 1871–2008. J. Geophys. Res., 116, ilation system for CESM atmospheric models. J. Climate, 25, C02024, doi:10.1029/2010jc006695. 6304–6317, doi:10.1175/JCLI-D-11-00395.1. Gent, P. R., and Coauthors, 2011: The Community Climate System Richman, J. G., R. N. Miller, and Y. H. Spitz, 2005: Error estimates Model version 4. J. Climate, 24, 4973–4991, doi:10.1175/ for assimilation of satellite sea surface temperature data in 2011JCLI4083.1. ocean climate models. Geophys. Res. Lett., 32, L18608, Hamill, T. M., 2001: Interpretation of rank histograms for verifying doi:10.1029/2005GL023591. ensemble forecasts. Mon. Wea. Rev., 129,550–560,doi:10.1175/ Rutherford, I. D., 1972: Data assimilation by statistical in- 1520-0493(2001)129,0550:IORHFV.2.0.CO;2. terpolation of forecast error fields. J. Atmos. Sci., 29, 809–815, Hodyss, D., and N. Nichols, 2015: The error of representation: Basic doi:10.1175/1520-0469(1972)029,0809:DABSIO.2.0.CO;2. understanding. Tellus, 67A, 24822, doi:10.3402/tellusa.v67.24822. ——, 1976: An operational three-dimensional multivariate statis- Hollingsworth, A., and R. Lönnberg, 1986: The statistical structure tical objective analysis scheme. Proc. JOC Study Group Conf. of the short-range forecast errors as determined from radio- on Four Dimensional Data Assimilation, Paris, France, WMO, sonde data. Part I: The wind field. Tellus, 38A, 111–136, 98–121. doi:10.1111/j.1600-0870.1986.tb00460.x. Smith, R., and Coauthors, 2010: The Parallel Ocean Program (POP) Janjic´, T., and S. E. Cohn, 2006: Treatment of observation error due reference manual—Ocean component of the Community Cli- to unresolved scales in atmospheric data assimilation. Mon. mate System Model (CCSM) and Community Earth System Wea. Rev., 134, 2900–2915, doi:10.1175/MWR3229.1. Model (CESM). Los Alamos National Laboratory Tech. Rep Johnson, D., T. Boyer, H. Garcia, R. Locarnini, O. Baranova, and LAUR-10-01853, 141 pp. [Available online at http://www.cesm. M. Zweng, 2009: World Ocean Database 2009 Documentation. ucar.edu/models/ccsm4.0/pop/doc/sci/POPRefManual.pdf.] NOAA Printing Office, 175 pp. Springer, M., 1979: The Algebra of Random Variables. Wiley, 470 pp.

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC 1728 MONTHLY WEATHER REVIEW VOLUME 144

Stewart, L., S. Dance, and N. Nichols, 2008: Correlated observation Wunsch, C., and P. Heimbach, 2007: Practical global oceanic errors in data assimilation. Int. J. Numer. Methods Fluids, 56, state estimation. Physica D, 230, 197–208, doi:10.1016/ 1521–1527, doi:10.1002/fld.1636. j.physd.2006.09.040. Waller, J., S. Dance, A. Lawless, and N. Nichols, 2014a: Estimating Yang, C., and B. S. Giese, 2013: El Niño Southern Oscillation in an correlated observation error statistics using an ensemble ensemble ocean reanalysis and coupled climate models. transform Kalman filter. Tellus, 66A, 23294, doi:10.3402/ J. Geophys. Res. Oceans, 118, 4052–4071, doi:10.1002/ tellusa.v66.23294. jgrc.20284. ——, ——, ——, ——, and J. Eyre, 2014b: Representativity error Zhang, S., M. J. Harrison, A. Rosati, and A. Wittenberg, 2007: for temperature and humidity using the Met Office high- System design and evaluation of coupled ensemble data as- resolution model. Quart. J. Roy. Meteor. Soc., 140, 1189–1197, similation for global oceanic climate studies. Mon. Wea. Rev., doi:10.1002/qj.2207. 135, 3541–3564, doi:10.1175/MWR3466.1.

Unauthenticated | Downloaded 09/29/21 12:09 PM UTC