Statistical Predictability in the Atmosphere and Other Dynamical Systems

Physica D 230 (2007) 65–71 www.elsevier.com/locate/physd Statistical predictability in the atmosphere and other dynamical systems Richard Kleeman∗ Courant Institute of Mathematical Sciences, 251 Mercer Street, New York, NY 10012, USA Available online 24 July 2006 Communicated by C.K.R.T. Jones Abstract Ensemble predictions are an integral part of routine weather and climate prediction because of the sensitivity of such projections to the specification of the initial state. In many discussions it is tacitly assumed that ensembles are equivalent to probability distribution functions (p.d.f.s) of the random variables of interest. In general for vector valued random variables this is not the case (not even approximately) since practical ensembles do not adequately sample the high dimensional state spaces of dynamical systems of practical relevance. In this contribution we place these ideas on a rigorous footing using concepts derived from Bayesian analysis and information theory. In particular we show that ensembles must imply a coarse graining of state space and that this coarse graining implies loss of information relative to the converged p.d.f. To cope with the needed coarse graining in the context of practical applications, we introduce a hierarchy of entropic functionals. These measure the information content of multivariate marginal distributions of increasing order. For fully converged distributions (i.e. p.d.f.s) these functionals form a strictly ordered hierarchy. As one proceeds up the hierarchy with ensembles instead however, increasingly coarser partitions are required by the functionals which implies that the strict ordering of the p.d.f. based functionals breaks down. This breakdown is symptomatic of the necessarily limited sampling by practical ensembles of high dimensional state spaces and is unavoidable for most practical applications. In the second part of the paper the theoretical machinery developed above is applied to the practical problem of mid-latitude weather prediction. We show that the functionals derived in the first part all decline essentially linearly with time and there appears in fact to be a fairly well defined cut off time (roughly 45 days for the model analyzed) beyond which initial condition information is unimportant to statistical prediction. c 2006 Elsevier B.V. All rights reserved. Keywords: Predictability; Information theory; Statistical mechanics; Dynamical systems 1. A Bayesian perspective on predictability which for discrete distributions can be written1 X p(x) The Bayesian perspective of mathematical statistics (see, D(p k q) ≡ p(x) ln (1.1) q(x) for example, [2]) posits prior and posterior probability x∈H distributions for random variables of interest: Before the where H is a countable index set and we are using p(x) as acquisition of particular data concerning a random variable, one the prior and q(x) as the posterior discrete distributions. If we is assumed to have a prior distribution derived from all previous consider a particular partitioning of Rn our state space then observations. Subsequently the new data acquired modifies this this discrete form can be written easily as a Riemann sum by distribution to a posterior distribution. The extent to which this writing the probability of a particular partition as the product new distribution “differs” from the original prior is a measure of the local probability density and the partition volume. In the of the usefulness of the newly acquired data. The functional usual infinite limit this then becomes the continuous density usually deployed (see, for example, [4]) to measure such a difference or utility is the relative entropy D(ppost k pprior) 1 Note that to make this definition well defined we assume two further things: Firstly the summands are only non-zero when p(x) 6= 0 and secondly that q(x) = 0 only when p(x) = 0. The second condition is equivalent to saying ∗ Tel.: +1 212 998 3233; fax: +1 212 995 4121. that if the probability of a particular event was zero in the (infinite) past it will E-mail address: [email protected]. always be zero in the future. 0167-2789/$ - see front matter c 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.physd.2006.06.005 66 R. Kleeman / Physica D 230 (2007) 65–71 distribution form: transparent fashion. A first step towards an understanding of Z f (z) this issue is the concept of a state space partitioning. D( f kg) ≡ f (z) ln dz allRn g(z) 2.1. Partitions where f and g are the limiting probability density functions p q Suppose we have a state space of dimension n and define a corresponding with and respectively. This transition n between discrete and continuous forms will be important in our partition Γ of R as a complete and non-overlapping coverage n { } subsequent discussion below. Note that this transition does not of R by a collection of M subsets Γ1, Γ2,..., ΓM . More ∈ n ∈ occur for the absolute entropy H(p) = − P p(x) ln(p(x)) precisely, for any x R there exists a j such that x Γ j and x∈H ∩ = { } 6= because the volume element in the logarithm does not cancel in addition Γi Γk ∅ for all i k. and so one is left with an infinite “renormalization” constant Now associated with each partition member is a probability when one takes the limit. which is given by The ideal statistical prediction problem consists in deter- Z p = f (x)dx mining an initial condition distribution using observations and i Γi then using a good dynamical model to project this distribution forward in time. In terms of the Bayesian perspective, the where the underlying pdf is f . Clearly in the limit that the intuitively obvious prior distribution for this process is the volume of each partition approaches zero then pi approaches f x∗ Γ x∗ Γ “climatological” or “equilibrium” distribution associated with ( i )Vol( i ) where i is some element of i . Also using the p q the particular dynamical system. This is clearly the best prior i and the analogous i we can define the discrete relative D k Γ in the absence of information concerning the initial conditions. entropy Γ (p q) with respect to using Eq. (1.1) and It also has the advantage that since the posterior prediction dis- the Riemann sum formalism shows that this approaches the continuous D( f k g) in the above limit. tribution usually converges asymptotically to the prior, the util- A partition Λ is said to be a refinement of Γ if every Γ ity of statistical predictions approach zero asymptotically which i contains at least one Λ . We write Γ w Λ and it follows easily coincides with intuition. From a practical perspective, the rel- j that ative entropy measures the utility of the statistical prediction under the assumption that the model from which it is derived Theorem 2.1. If Γ w Λ then DΓ ≤ DΛ where the discrete is perfect. In the realistic case of imperfect models the degree relative entropies have the same underlying continuous pdfs. to which perfect model utility corresponds with real utility is determined by the realism of the dynamical model (more dis- Proof. The result follows easily from the definition of cussion on this point may be found in [7,9]). refinement and Theorem 16.1.2 of [4]. Often our dynamical system will be subject to external The straightforward interpretation of this result is that the forcing. In the case that this is periodic (as it is for the climate coarsening of a particular partitioning results in a drop in the system) we choose our prior to have the same phase with relative entropy since we are discarding the information on finer respect to the periodic forcing as the posterior at the time of scales. Note that as this refinement process approaches the limit interest. Such a convention is the one commonly adopted in the discussed above the relative entropy approaches monotonically climate community. from below the continuous value. Partitions and ensembles have an obvious statistical connection: 2. Coarse graining, ensembles and marginal entropy 2.2. Ensembles An excellent introductory reference for information theory An ensemble E is a set of K points in Rn and one is the book of Cover and Thomas [4]. In Appendix A we review can naturally define a bin count n(Γ , E) to be an integer the properties of relative entropy relevant to this contribution. i valued function on Γ which specifies the number of ensemble More background can be found in the book as well as in [11] members which are members of a particular Γ . It is obvious which is a somewhat more mathematical paper by the present i that the bin count f ≡ n(Γ , E) serves as a basis for estimating author and others. The material presented in this section i i p however it is equally clear that this estimate which we denote relies heavily on information theoretic and advanced statistical i by p is just that and has an uncertainty associated with it. concepts. Less technically minded readers will find a summary bi In fact we can conceptually write down the probability P(p) at section end. b that this estimate is actually equal to p ≡ (p , p ,..., p ). For any practical dynamical system of significant dimen- 1 2 M In [9] we deduced using elementary Bayesian arguments that sionality, the integration of the corresponding Fokker Planck this should be given by equation for the pdf becomes computationally problematical. In such a situation typically a Monte Carlo approach is taken. = + P(bp) Φf (bp) More precisely, initial conditions are sampled from an assumed (2.1) f+ ≡ ( f + 1, f + 1,..., f + 1) distribution and then each sample member is integrated forward 1 2 M in time.

Load more