Feedback Mechanisms and constraints on Climate Sensitivity from a Perturbed Physics Ensemble of General Circulation Models

Benjamin Mark Sanderson Trinity College

A thesis submitted to the Mathematical and Physical Sciences Division for the degree of Doctor of Philosophy in the University of Oxford Trinity Term, 2007

Atmospheric, Oceanic and Planetary Physics, University of Oxford Feedback Mechanisms and constraints on Climate Sensitivity from a Perturbed Physics Ensemble of General Circulation Models Benjamin Mark Sanderson, Trinity College, Oxford Submitted for the degree of Doctor of Philosophy, Trinity Term, 2007 Abstract

One of the major uncertainties plaguing predictions of future climate is so-called “structural uncertainty”, which describes the difference between models and the physical systems to which they relate. In General Circulation Models of the climate, the major structural uncertainty lies in finding the most appropriate parameterisa- tions for processes occurring at scales smaller than that of the model itself. Until recently, the computing power required to explicitly simulate thousands of models using various possible parameter configurations has been unattainable. The avail- ability of distributing computing architectures has allowed such an experiment to take place. Two analyses are presented for the analysis of this multi-thousand member “per- turbed physics” GCM ensemble. In the first, a linear analysis is used to identify the dominant physical processes responsible for variation in climate sensitivity across the ensemble. Model simulations are provided by the distributed computing project, climateprediction.net.A principal component analysis of model radiative response reveals two dominant independent feedback processes, each largely controlled by a single parameter change. These parameters are found to account for a large fraction of the variation in equilibrium climate sensitivity within the ensemble. Regression techniques enable a prediction of the real-world strength of these dominant feedback mechanisms using reanalysis data. In the second analysis, a emulator is developed using a feed-forward neural network, trained with the data from climateprediction.net.The emulator is used to simulate a much larger ensemble which explores model parameter space more fully. This emulated ensemble is used to search for models closest to observations over a wide range of equilibrium response to greenhouse gas forcing - thus identifying regions of interest in the parameter space of the model which may be explored in future experiments. The relative discrepancies of these models from observations provide a constraint on climate sensitivity by identifying the sensitivity at which the model discrepancy from observations is minimised. As more observations are added to the error metric, it is found that the discrepancy between ensemble models and observation rapidly exceeds the discrepancy between the models themselves. This result highlights a pos- sible oversight in previous ensemble-based predictions of climate sensitivity, which tend to ignore this systematic component of model error. Acknowledgements

The research described in this dissertation would not have been possible without the help, support and patience provided by many individuals. I extend my gratitude to my supervisor, Myles Allen for providing unique insight into the problems contained herein, and without whom the collaborations and opportunities I have enjoyed over the last three years would have been impossible. I would also like to thank my co-supervisor, D´aith´ıStone for his ideas, support and sound advice.

Much of this work was made possible with the help of external collaborators. Many, many thanks to Reto Knutti, for his endless support, inspiration and hospitality - both in the office and on the mountains! I would like also to thank the members of the Climate and Global Dynamics group at NCAR for their support and cooperation during my time there, their friendliness and support made my time in Colorado always enjoyable.

My great thanks go also to Claudio Piani (and family) for his guidance and true Italian hospitality. Thanks also to the staff at ICTP who accommodated me so well.

To all past and present members of the Climate Dynamics group in Oxford, thank you. The regular discussions which I’ve had with so many of you have been invalu- able in forming and developing my ideas. A special thanks to William Ingram for patiently answering all my questions and showing so much interest in my work.

I would like also to thank those who provided the data to make this analysis possible, the rest of the climateprediction.net team: Tolu Aina, Carl Christensen, Dave Frame, Nick Faull, Dave Stainforth, Sylvia Knight, Milo Thurston and Hiro Yamazaki and many more. Many thanks also to CEH for their support.

Thanks also to my examiners, David Marshall and Mat Collins, for new ideas and a really interesting discussion.

Mum, Dad, Charlotte: Thank you for always being there in hard times.

And finally, I reserve my greatest thanks for Carly. Your love and guidance pulled me through even when I could not see the end - I am forever grateful. Contents

1 Introduction 2

1.1 ClimateChange...... 2

1.2 Uncertainty in predictions of future climate ...... 4

1.3 ClimateModels ...... 8

1.3.1 Simplemodelsofclimate...... 9

1.3.2 Observational Constraints on Sensitivity ...... 11

1.3.3 GeneralCirculationModels ...... 15

1.4 EnsemblePredictions...... 20

1.4.1 The Coupled Model Intercomparison Project ...... 20

1.4.2 Perturbed Physics Ensembles ...... 21

1.4.3 Thesisoutline ...... 25

2 Ensemble Analysis: Background 27

2.1 PreliminaryResults...... 28

2.2 SensitivityAnalyses...... 29

2.2.1 Pianietal...... 32

2.2.2 Knuttietal...... 36

2.2.3 RougierandSexton...... 41

2.3 Motivationforfurtherwork ...... 42

i 2.3.1 PhysicalInterpretation ...... 42

2.3.2 SystematicModelError ...... 44

3 Technical Background 46

3.1 PerturbedParameterisations ...... 48

3.1.1 Large-ScalePrecipitation...... 49

3.1.2 Saturated Humidity and Large-Scale Clouds ...... 52

3.1.3 ConvectionScheme ...... 56

3.1.4 Sea Ice Albedo Temperature dependence ...... 59

4 Linear feedback analysis 60

4.1 Introduction...... 60

4.1.1 Feedbacks and Climate Sensitivity ...... 60

4.2 Methodology ...... 64

4.2.1 Regionalfeedbackanalysis ...... 64

4.2.2 EOFAnalysis ...... 66

4.2.3 RegressionTechniques ...... 67

4.2.4 Projection of Observations onto Feedbacks ...... 69

4.3 Results...... 71

4.3.1 EnsembleMeanResponse ...... 71

4.3.2 EOFAnalysis ...... 74

4.3.3 ObservationalProjections ...... 83

4.3.4 Sub-ensemble analysis ...... 85

4.4 Sensitivity Distributions ...... 89

4.4.1 Frequency Distribution of Sensitivity ...... 89

4.4.2 Prediction of likely climate sensitivity ...... 91

4.5 Conclusions ...... 93 5 Model Optimisation with Neural Networks 96

5.1 Introduction...... 96

5.2 Methodology ...... 99

5.2.1 DataPreparation ...... 99

5.2.2 NeuralNetworkArchitecture...... 103

5.3 Results...... 106

5.3.1 Verification ...... 106

5.3.2 Monte-CarloSimulation ...... 109

5.4 ParameterDependence ...... 116

5.5 EmulatorVerification...... 120

5.6 Conclusions ...... 121

6 Systematic Constraints on Climate Sensitivity 125

6.1 ProbabilityDistributions ...... 127

6.2 Methodologies...... 128

6.3 Results...... 130

6.3.1 Absolute Likelihood Distribution ...... 130

6.3.2 Directly Scaled Distributions ...... 132

6.3.3 RelativeScaling...... 133

6.4 Verification ...... 135

6.5 Discussion ...... 139

7 Summary and Future Work 143

7.1 SummaryofResults ...... 144

7.1.1 Chapter4 ...... 144

7.1.2 Chapter5 ...... 147

7.1.3 Chapter6 ...... 149 1

7.2 Caveats and possible extensions for this thesis ...... 151

7.2.1 Caveats to the feedback analysis technique ...... 152

7.2.2 Caveats to the model emulation technique ...... 153

A Empirical Orthogonal Functions i

B Neural Network Architecture v Chapter 1

Introduction

“The old men of the valley declare that the climate is changing, and they are very positive that there are now no such winters as they remembered as boys...” —The Valley of Kashmir, Walter Lawrence, 1895

1.1 Climate Change

The weather is changing. This is nothing new. On every timescale we experience change. Some of these changes we claim to understand; most creatures on Earth have at least an intuitive sense of how temperatures will respond to the daily and seasonal cycles. At the heart of these cycles lies the mostly predictable variation in the flux of the Sun’s radiation that reaches the Earth’s surface.

Yet these basic patterns of heating provoke a fantastically complex response on the surface of our planet. The difference in heating between tropical and polar regions cause an elaborate sequence of dynamical processes which transport heat polewards. Such transport is complicated by the fact we live on a rotating sphere - the geometry of which causes large scale weather systems to form in the mid-latitudes which themselves produce enormous variability in the weather we experience.

The presence of water serves to further complicate things. This molecule, which

2 3 exists in three phases in our atmosphere, allows latent heat transport from the surface to the upper atmosphere and produces a vast array of different cloud types which significantly alter the Earth’s radiative balance. At the poles, the extent of the ice caps also affects the fine balance of the Sun’s energy which remains within our environment and, throughout the globe, ocean currents transport vast amounts of energy from one region to another.

The integration of these various components produces the highly non-linear system that lies in the few kilometres above and below the surface of our planet. Over the last few centuries, humankind has developed the skills necessary to predict how the weather will evolve over short periods of time. By taking an approximation of the current state of the system, we can iterate forward the equations of motion and predict how it is likely to evolve. But our atmosphere, like any chaotic system (Lorenz, 1975) is very sensitive to errors in the measurement of that initial state.

Our limited knowledge of the current state of the system causes a “horizon of pre- dictability”. For some components like convective clouds, this can be as little as an hour. Others, like large scale patterns of Atmosphere-Ocean interaction such as the El-Ni˜no Southern Oscillation can be predicted up to a year in advance. Beyond this horizon, we can know little about the precise future evolution of the atmo- sphere and our predictions of the future evolution of the atmosphere are based upon statistics. A statistical description of the mean and variability of weather variables, together with other elements (the oceans, land surface and life on earth) forms what is collectively known as the climate system.

The statistics describing the climate system may change over time; but to discover these changes, the natural internal variability of the climate system must be esti- mated (which is made more complex by non-linear responses to small perturbations in the system) . Changes may also be driven by natural or anthropogenic changes in external forcing. This ‘external variability’ may be due to a change in the Earth’s orbital parameters, or to an introduction of greenhouse gases (GHGs) into the at- mosphere (both of which change the net radiation balance at the surface). 4

1.2 Uncertainty in predictions of future climate

The term “climate change” is here applied to a statistically significant change in the mean state or variability of the climate system persisting for a decade or longer. Fu- ture climate change may be somewhat predictable, but forecasts are made uncertain by three fundamental unknowns:

• Initial Conditions - As we have already discussed, each element of the climate system has its own characteristic timescale. Beyond those timescales, the error due the measurement of the initial state is said to “saturate”. The range of possible states in which the saturated system may lie is the fundamental limit to our possible knowledge of the system.

The timescales relevant to the climate system are generally those of decades or more, so generally only initial states of the slow responding components like ocean temperatures are of consequence (Collins 2002, Pielke 1998). For ex- ample, the may respond to a change in GHG forcing but the amplitude of that response - and its subsequent effect on climate may depend on the initial state of the circulation. Such uncertainties are issues of “Predictability of the First Kind” (Lorenz, 1975).

• Forcing - The evolution of the climate system also depends upon the changing boundary conditions. One such boundary condition is the radiative forcing on the system, which is at least partly governed by future human behaviour and is therefore inseparably intertwined with unresolved politics and economics. Predictions of the future evolution of climate must therefore assume forcing scenarios such as the ‘Special Report on Emissions Scenarios’ (SRES) used by the Intergovernmental Panel on Climate Change (IPCC). These take possible future changes in human activity and the resulting emissions of GHGs (plus other anthropogenic atmospheric components which cause a radiative forcing on the climate). For each of these scenarios, the total future forcing on the climate system may be calculated as a function of time.

However, the relation of atmospheric concentrations to radiative forcing is subject to considerable uncertainty. The direct radiative impact of LLGHGs 5

(Long Lived Green-House Gases, primarily carbon dioxide and methane) are relative well understood; a simple estimate of the direct effect of LLGHGs on the troposphere, stratosphere, clouds and solar absorption was given by Ramaswamy et al. (2001), and was consistent with a radiative forcing of 3.7 Wm−2 for a doubling of atmospheric carbon dioxide. An evaluation of 20 different GCM radiation schemes by Collins et al. (2006b) found all to agree with this estimate within ten percent.

Other atmospheric components are subject to much more uncertainty; the radiative effects of aerosols remain poorly understood (Anderson et al., 2003). Aerosols may have both a direct radiative effect and an indirect effect - mainly due to their potential influence on cloud albedo. It is thought, however, that anthropogenic aerosols have a net cooling effect on the climate system (Yu et al. (2006), Haywood and Boucher (2000)). Black carbon aerosols are good absorbers of solar radiation and thus exert a positive warming on the climate system, but it is currently unclear whether the sensitivity of the system to forcing from aerosols is the same as that for LLGHGs (Roberts and Jones, 2004).

Various other anthropologically produced gases have a direct radiative effect - the effect of decreased ozone in the stratosphere is likely to be a cooling, while concentrations in the troposphere have increased - causing a net warming (Kiehl et al., 1999). The radiative effect of stratospheric water vapour and contrails due to aviation are both poorly understood at present (Forster et al., 2006).

It is not just anthropogenic gases which change the radiative balance of the climate system - our influence on the albedo of the earth’s surface also also has an effect. Human deforestation and agricultural activity tends to increase the albedo of the planet (Betts, 2001), while with the deposition of soot and other particles onto ice or snow covered surfaces (Hansen and Nazarenko, 2004) tends to decrease the albedo. The magnitude of these effects is subject to some considerable uncertainty.

Aside from the anthropogenic influence on global forcing, the natural forcing of the climate system also evolves with time. The parameters of the Earth’s 6

orbit and the variation in Solar output both cause a variation in the intensity of the solar radiation incident on the Earth’s atmosphere. Over the past 20 years, the accuracy of this variation in solar irradiance is well known, but the estimation on larger timescales is indirect and subject to greater uncertainty (Harrison and Shine, 1999). Reconstructions of the second half of the 20th Century estimate an increase in surface solar irradiance of up to 0.7 Wm−2 (Hoyt and Schatten, 1993).

Large volcanoes also emit a large forcing on the climate system through the emission of large quantities of sulphates into the troposphere stratosphere. The recent eruptions of El Chichon and Mt. Pinatubo caused a peak global mean forcing (estimated from satellite data) of -3Wm−2 (Hansen et al., 1998). Earlier estimates of forcing from volcanoes are less certain, based on both ground-based observations (Stothers, 1996), ice-core measurements and other proxies (Lamb, 1970). Estimating the impact of volcanoes in future climate forcing involves developing a stochastic model to simulate the pulse-like forcing of the future climate through occasional large eruptions (Naveau et al., 2002).

• Response - The third major uncertainty in the climate system lies in the nature of the response. The initial forcing on the system may provoke one of many feedbacks; these are changes in the boundary conditions which are themselves caused by the initial response. The most basic of these feedbacks is the black- body response of the surface, which increases the flux of outgoing longwave radiation from the surface on warming.

The ‘greenhouse effect’ is due to the fact that the Earth does not radiate from the surface alone, because the atmosphere is opaque to some parts of the black body spectrum emitted from the surface. Higher up in the atmosphere, greenhouse gases, water vapour and clouds all absorb outgoing radiation and re-emit radiation themselves at a lower temperature. The strength of this ef- fect is governed by the difference in temperature between the surface and that

of the emission height. Hence, in the case of CO2, because the atmosphere is already opaque in the absorbing region of the spectrum, an increase in con-

centration leads to an increase in the mean height of emission from the CO2 molecules and thus a strengthening of the greenhouse effect as the difference 7 between surface and emission temperatures increase. On the same lines, a possible feedback in the climate system is a ‘lapse rate feedback’, where the rate of decrease of temperature with altitude changes systematically with tem- perature. It is expected that in the tropics, where the lapse rate is dominated by the adiabatic rising of moist air, that the lapse rate decreases on surface warming (Wetherald and Manabe, 1986).

There are many other complex feedbacks in the climate system; for example, the amplitude of a given warming is approximately doubled by water vapour feedback (Cess 1989; Hall and Manabe 1999). In its most basic form, this is due to the ability of air to hold more water vapour without saturating as it warms (as implied by the Clausius-Clapeyron relationship). Water vapour is itself a potent greenhouse gas, and thus further warms the surface. In practice, most of the free troposphere is highly under-saturated due to constant overturning and models suggest that the relative humidity remains constant on warming, a property which has been verified observationally (Soden et al., 2002). Basic calculations suggest that this effect is partially cancelled by lapse rate feedback (CESS, 1975); the decreased greenhouse effect caused by a weak lapse rate is offset by the increased radiative effect of more water vapour at high altitudes. Held and Soden (2000) emphasised that the radiative effect of water vapour is highly sensitive to the vertical water vapour profile which one expects in a changing climate.

One of the least understood components of the climate response is that of cloud feedbacks (Cess, 1996). Clouds affect the radiative balance of the planet in several different ways; their interaction with solar radiation is complex, their albedo is highly dependent on the bulk properties of the cloud and the microscopic scattering of the hydro-meteors which they contain. Predicting how these properties may change (and how those changes should be measured) in a warmer environment remains a matter of debate (Soden et al., 2004). In the tropics, for instance, different diagnostic studies have suggested that shortwave cloud feedbacks may produce a positive (Chou and Neelin, 1999) or alternatively, negative (Lindzen et al., 2001) feedback.

Clouds also affect the longwave component of the energy budget, and changes 8

in cloud depth, type and composition all influence their longwave forcing (Quante, 2004). These properties may also change on warming, and most models show an upward displacement of higher troposphere cloud cover on warming - decreasing the radiating temperature of the clouds and resulting in a positive feedback (Senior, 1998).

Models are not consistent when estimating the relative effects of the longwave and shortwave feedbacks in models (Cess, 1996), and there is some debate as to whether such experiments should be measured “offline” by repeating radiation code in a warmer model with and without the changed clouds, or “online” by explicitly measuring the change in the cloud radiative forcing as a function of temperature (Soden et al., 2004).

There are also countless other feedbacks associated with the land surface and carbon cycle. For example, Dufresne et al. (2002) predicted that the uptake of

carbon dioxide (CO2) from the land surface will reduce on warming, leaving a larger fraction in the atmosphere. Also, Cox et al. (2000) found that a warmer climate may result in the drying of the Amazon basin, which could massively reduce carbon uptake in the rain-forest. At higher latitudes, Bonan et al. (1992) found that the redistribution of boreal forest due to logging or a changing climate could initiate significant climate-vegetation feedbacks.

1.3 Climate Models

The ability to simulate the future evolution of the climate system requires a model of the response of the system to a change in forcing. These models may range in complexity from a single differential equation to a full atmosphere-ocean general circulation model (AOGCM), resolving many aspects of the climate system including atmospheric chemistry, clouds or the behaviour of the land surface. Each type of model has its own disadvantages - an increase in complexity may be associated with an increase in model computational cost, and maybe even by less interpretable (if more realistic) results. 9

1.3.1 Simple models of climate

The general response of the global mean climate to a radiative forcing may be described with the simplest of models; Hansen et al. (1985) used an energy balance model (EBM) - a zero dimensional approximation to the global mean climate to describe how the system behaves. In this paper, a succession of increasingly realistic EBMs are considered, beginning with a black-body approximation; in this regime, the system will respond to a change in forcing by increasing its temperature until energy balance is restored:

dT (t) c = F (t) − σ T (t)4 − T (0)4 , (1.1) dt  where c is the heat capacity per unit area, T (t) is the time-dependent temperature of the system, F (t) is the anomalous forcing and σ is the Stefan-Boltzmann constant. Now they considered an idealised instantaneous doubling of carbon dioxide:

F (t) = 0(t< 0)

F (t) = F2×CO2 (t> 0),

In this case, then T(t) would equilibrate to a temperature T (0) + ∆T0(2 × CO2) with an e-folding time τb, where:

∆T0(2 × CO2) ≈ 1.2K c τb = 3 ≈ 3.5 years. 4σTi for T (0) = 255K and c corresponding to a 100m deep oceanic mixed layer covering 70% of the global surface. However, this model is inappropriate for the Earth’s climate because it does not consider feedbacks in the climate system. These are discussed in section 1.2, and take into account effects such as the water vapour feedback, surface albedo feedbacks and cloud feedbacks. These can be modelled 10 with a feedback factor, f, such that:

∆Teq(2 × CO2) = f∆T0(2 × CO2)

τ = fτb.

The equilibrium temperature, ∆Teq(2× CO2) is known as the Climate Sensitivity of the system (henceforth, often abbreviated to S). The response time, τ is increased because the feedbacks only act when the temperature has already increased, thus more time is required for the full equilibrium warming to take place. If feedbacks are positive, resulting in an increase in net warming, then f > 1.

The major inaccuracy in this picture of the climate system now lies in the heat capacity term, c. This describes the mean heat capacity of the Earth’s surface, which is dominated by the oceanic mixed layer - the depth of ocean which is mixed by surface winds and is considered to be in direct thermal contact with the atmosphere. Hence, in order to consider the effect of heat transfer to the deep ocean - Hansen et al. (1985) introduced a “depth of penetration”, d of temperature change into the diffusive layer. Given a diffusion coefficient, k, scale analysis shows that this depth of penetration must itself be related to the e-folding time, such that d =(kτ)1/2 and

d + d0 τ = fτb, (1.2) d0 where d0 is the depth of the mixed layer. Such models, although simple, can give insight into the behaviour of the climate system.

1.3.1.1 Transient Response

Of course, in the real world - an instantaneous increase in global forcing is a poor analogue for the gradual historic increase in greenhouse gas forcing. The final point made by Hansen et al. (1985) was to consider the implications of a transient forcing scenario, where the radiative forcing was described as a linear function of time. Such a scenario introduces the concept of “unrealised warming” - where the transient temperature response has not yet reached the equilibrium response. They found that as climate sensitivity increased, the unrealised warming became more significant (see 11

Figure 1.1) indicating that in a transient scenario, the global mean temperature will continue to rise after greenhouse gas emissions are stabilised.

Raper et al. (2002) discussed the relationship between climate sensitivity, ocean heat uptake and transient response by investigating the members of the CMIP2 ensemble of GCMs (see section 1.4.1). They found that the fraction of the equilib- rium warming that was realised at any particular time was less in models with a high sensitivity - meaning that there was a lesser spread in the models’ transient responses than in their equilibrium sensitivities. They also found that there was an apparent relationship between model climate sensitivity and the efficiency of ocean heat uptake in the models - an effect predicted by the simple model used in Hansen et al. (1985).

1.3.2 Observational Constraints on Sensitivity

Much effort has been made in the past to constrain aspects of the real-world response to greenhouse gas forcing using simple models of the type discussed above. For example, Frame et al. (2005) used the simple model of Hansen et al. (1985) to illustrate the range of equilibrium response resulting from observationally consistent estimates of f and c. These are derived by examining the historic temperature response to known climate forcings such as the increase in greenhouse gases, or short-term forcings such as large volcanoes which emit sufficient sulphates to be significant on global time-series (Frame et al. (2005), Gregory et al. (2002)).

One potential problem with this approach is that different forcings on the climate system have different spatial patterns which will affect the effective heat capacity. For instance - sulphate forcing tends to occur over heavily industrialised landmasses, where the local surface heat capacity may be less than the global mean. Thus, each type of forcing must have its own effective heat capacity - adding yet more unknowns to the problem (Stone et al., 2007). Some studies also suggest that the climate system may have different sensitivities for different types of forcing; Roberts and Jones (2004) found that the sensitivity of the UKMO model to a forcing due to black carbon was different to an identical forcing using well mixed GHGs.

Another approach to constraining the sensitivity of a simple model is to examine 12

Figure 1.1: A plot from Hansen et al. (1985), showing the unrealised warming at time of CO2 doubling for EBMs with a different feedback magnitude, f. The horizontal axis shows the equilibrium temperature change, while the horizontal axis shows the temperature change at the time of doubling. The also shows the effect of changing the model diffusion coefficient, k. 13 paleo-climate response. In order to find the effective sensitivity, it is necessary to estimate both the surface temperature and the radiative forcing of the climate system at that time. Hoffert and Covey (1992) used ice core records to estimate temperatures at the LGM (Last Glacial Maximum - 21,500 years ago), while the radiative forcing was estimated from knowledge of the Earth’s orbital parameters along with estimated snow, ice, vegetation, sulphate and greenhouse gas forcing taken from both model and proxy-based studies. A comparison of the change in radiative flux from the present day to the change in surface temperature yields the feedback parameter of the system.

One possible criticism of such approaches is that the climate sensitivity of the system may change over time. This point was made by Senior and Mitchell (2000), who ran a simple model for an 800 year simulation. They found that the effective sensitivity of the model (the equilibrium response assuming that the feedbacks remain constant and may be derived from the transient response) was found to increase by about 40% over the lifetime of the run, due to interhemispheric temperature differences which altered the degree of cloud feedback. One solution to such problems is to use more recent history to derive the response parameters of the system; Forster and Gregory (2005) used temperature data from the last century to estimate the global feedback parameter. The authors admitted that the major uncertainty in such studies is the estimation of the change in radiative forcing, along with the estimated change in the ocean heat uptake for which there is no observational estimate.

1.3.2.1 Bayesian formulation

An important question when dealing with different predictions of S using different observations as constraints is how the resulting probability distributions for S should be combined together. If the sources of data are completely independent, then the probability distribution for a continuous variable x (such as climate sensitivity) may be updated in the light of both pieces of information (Papoulis, 1984):

f(O|x)f(x|H) f(x|O,H)= (1.3) f(O|H) 14 where f(x|H) is the prior estimate for the PDF of x given previously acquired information H, and O is a new observation. f(O|x) is the likelihood of x given O. In practise this means that a new distribution is produced by multiplying the existing prior PDF by the likelihood function from the new data, and normalising appropriately.

Figure 1.2: A plot from Annan and Hargreaves (2006) showing how different obser- vational constraints are combined under a Bayesian framework to produce a tighter constraint on climate sensitivity. The blue curves represent observational constraints on S obtained from three different time periods. The thick red curve represents the combined PDF for S when all observations are considered in the Bayesian frame- work.

This approach was used by Annan and Hargreaves (2006) who took various estab- lished constraints on S using data from different time periods. On the assumption that the data were independent, they were combined to produce an overall constraint which made high values of Sensitivity appear significantly less likely (Figure 1.2. A similar approach was taken by Hegerl et al. (2006), who used various temperature reconstructions taken from various proxies over the last millennium, and combined 15 them using Bayes’ theorem. This work again suggested that very large values of climate sensitivity were unlikely, placing the 95th percentile at 6.2K.

1.3.3 General Circulation Models

To derive more aspects of the response to GHG forcing, more complex dynami- cal models are required. Atmospheric General Circulation Models (AGCMs) have been used for over forty years to simulate weather and climate (SMAGORINSKY et al., 1965), often coupled to oceanic GCMs to create an atmosphere-ocean coupled general circulation model (AOGCM).

AGCMs model the atmosphere, and may include land surface models and atmo- spheric chemistry. The surface of the planet is discretised in 3 dimensions into ‘grid-cells’ (or alternatively, spherical harmonics). Each grid-cell has prognostic variables like surface pressure and temperature. The dynamical core of the model integrates the equations of fluid motion forward in time to determine the evolution of these quantities. Diagnostic variables are deduced from the prognostic variables.

The radiative transfer code in the AGCMs typically splits the radiative budget into two sections: ‘longwave’ which represents terrestrial infra-red radiation and ‘short- wave’ representing the solar radiation. The radiation code calculates the emission and absorption from each model layer in the atmosphere, and is often repeated with and without the effect of clouds in order to gauge the ‘cloudy’ and ‘clear-sky’ flux.

The oceanic component of the model can be either a simple thermodynamic ‘slab’ or a full dynamical ocean. A slab ocean is a single layer, with a heat capacity equiv- alent to that of the oceanic mixed layer. A slab model must usually be calibrated with a heat flux correction by measuring the heat flux necessary at the ocean sur- face required to maintain observed SSTs, this process requires only a short model simulation.

A full OGCM, however, is discretised vertically and horizontally, fully resolving the heat and fluid transport, and generally contains a model for sea ice. OGCMs generally require a long “spin-up” time, where the additional fluxes required to keep the atmosphere and the ocean in a stable state are calculated. During this 16 simulation, the sea surface temperature is relaxed to climatological values, and the additional (artificial) fluxes required to maintain these values are calculated. Fluxes of heat, wind-stress and moisture are applied to the ocean surface in order to produce models which can simulate the current climate without drifting too far. The argument against the use of such corrections is that they are unphysical, and do not conserve heat and water across the atmosphere-ocean divide. Some have argued (Marotzke and Stone, 1995) that correcting errors in the control state does not necessarily correct errors in the processes important for climate change response. Many recent AOGCMs can function without the use of flux corrections, at the expense of greater errors in the simulation of current ocean temperatures (Covey et al., 2004).

Many processes occur at a scale smaller than that of the model grid, and hence must be parametrised. Parameterisations relate grid scale diagnostic variables to these ‘sub-grid’ scale processes. These include convection, cloud cover and land-surface processes such as albedo and hydrology. Each of these parameterisations is subject to uncertainty, and an inappropriate physical model or parameter value used in the parameterisation may cause a systematic bias in results. This uncertainty in the accuracy of the model physics leads to an uncertainty in the model response.

1.3.3.1 GCM response to greenhouse gas forcing

Various studies have investigated the differences between global and local feedbacks in GCMs which lead to a variation in their response to greenhouse gas forcing. Colman (2003) performed a comparison of various feedback in GCMs available at the time, including water vapour, cloud, albedo and lapse rate feedbacks during double

CO2 experiments. The author found a significant spread in all of the feedbacks discussed, with a negative correlation between water vapour and lapse rate feedbacks and also between longwave and shortwave components of cloud feedback. Meehl et al. (2004) investigated the role of different model components in an AOGCM for determining the global feedback amplitude, finding that the atmospheric component (rather than the ocean, land surface or sea ice models) was the most significant for the global mean response to greenhouse gas forcing. 17

Soden and Held (2006) also discuss climate feedbacks in GCMs, finding that water vapour feedback is the largest positive feedback in all models consistent with the expected constant relative humidity in a warming world. Schneider et al. (1997) in- vestigated water vapour feedback in a GCM, with clouds held constant. They found that the water vapour feedback was most significant in the extra-tropical free tro- posphere (between 450 and 750 mb). Soden et al. (2002) investigated water vapour feedback by examining the cooling after the eruption of Mount Pinatubo in 1991, by performing model simulations with and without a water vapour feedback allowed. The experiment showed that water vapour feedback was necessary to produce the observed cooling, providing evidence of the reliability of the water vapour feedback in the current generation of GCMs.

In their evaluation of different feedbacks in GCMs, Soden and Held (2006) find positive net cloud and surface albedo feedbacks in all models, with the only negative feedback coming from the black body temperature response. The authors also find large differences in lapse rate feedback among the GCMs. They conclude by finding that inter-model differences in cloud feedback provide the largest uncertainty in current predictions of climate sensitivity.

Many attempts have been made to categorise the cloud feedbacks in different GCMs; Senior (1999) investigated the mechanisms for the differences in cloud-based feed- backs in different versions of the UKMO model. The author found that the inclusion of interactive cloud properties (where the radiative properties and water phase of the clouds are influenced by the temperature and relative humidity distribution) had a marked effect on the model response - and models with interactive clouds were strongly dependent on the initial moisture distribution. However, multi-model stud- ies of cloud response were limited by the lack of a common data format and different assumptions about how clouds were treated by the radiation schemes of different models. In other GCMs, Zhang and McFarlane (1995) showed that the Canadian Climate Centre GCM was highly sensitive to convective cumulus parameterisation and Yao and Del Genio (1999) showed the effect of the cloud parameterisation on the GISS model. The authors of Yao and Del Genio (1999) found that changing the resolution of the model reduced the coverage of tropical cirrus, significantly reducing the sensitivity of the model. They also found that introducing a convective anvil 18 parameterisation tended to decrease the model sensitivity, and in common with Se- nior (1999), found that the inclusion of interactive cloud optical thickness and water phase influenced the climate sensitivity.

In recent years, the CFMIP project has endeavored to enable the comparison of cloud feedbacks in different models by introducing a predefined set of diagnostics and experiments which enable the study of clouds in a changing climate (McAvaney and Le Treut, 2003). Ringer et al. (2006) used these data to compare the cloud forcing response in 10 different AGCMs when the SSTs were increased by 2K. They found that the variation in total climate feedback was largely explained by the net cloud feedback in the different models. Webb et al. (2006) investigated how local cloud feedback mechanisms contributed towards global climate sensitivity using the CFMIP data and the QUMP ensemble (discussed in Section 1.4.2.1), finding again that cloud feedback accounted for a large percentage of the total response in both ensembles. Webb et al. (2006) introduced a characterisation system for local cloud feedbacks on the basis of changes in the longwave and shortwave budget; these changes were related to changes in cloud cover using a simulator which predicted the cloud changes most likely to produce the observed change in radiance. In CFMIP, it was found that the dominant difference in cloud feedback between the different GCMs was due to variation in positive cloud feedback, where the area of low cloud coverage was reduced.

1.3.3.2 Uncertainty in GCM simulations

Section 1.2 discussed the major types of uncertainty in predictions of future climate. In the case of GCM simulations for a given future forcing scenario, there are two major sources of uncertainty in the prediction of the global temperature response: boundary conditions and model response. Stott and Kettleborough (2002) explored the relative uncertainty in predictions of global mean temperature due to boundary conditions and response. This was achieved using a process known as ‘optimal fingerprinting’, which involves performing hind-cast simulations using estimations of past GHG forcing and then calculating scaling factors between model output and observations. The scaling factors, and their associated uncertainties can then be 19 used to scale the climate response to different future scenarios. They found that over the next 30-40 years, the dominant uncertainties were those of model response, rather than forcing scenarios or initial conditions. Hence, representing the true uncertainty in predictions due to the imperfect representation of model physics in GCMs is a priority. The authors did not repeat the study for other quantities, such as global mean precipitation - where the balance of uncertainty between boundary conditions and response could be different.

In weather forecasting, model parameters can be tuned to best represent the under- lying physics by continuous comparisons of model forecasts with observations of the predicted period. However, in the climate problem, although models can be tuned to best represent current and past climatology - this is not a verification of the response to future changes in forcing. In addition, the effect of multiple parameter perturbations on the large scale cannot be predicted, and thus cannot be directly related to uncertainty in predictions of future climate.

Ideally, the solution to this problem is to explicitly simulate models over all the parameter ranges of interest, and compare their output to observations in order to constrain the parameter values (Smith, 2002). However, to run simulations using full AOGCMs, and explore their full range of parameter uncertainty requires a large amount of computing power because it cannot be assumed that parameter effects will combine linearly, and thus each combination of perturbed parameters must be explicitly calculated. Hence for a crudely designed experiment, the computational demand rises exponentially with the number of parameters considered (Allen, 1999) (although with more intelligent ensemble design, this may be improved - see Chapter 5).

Sokolov and Stone (1998) attempted to address this difficulty by simulating the response of a full GCM with an ‘intermediate complexity model’. Such models are developed to run faster than a full GCM, often sacrificing geographical detail, sometimes to the extent of averaging over lines of latitude (the CLIMBER model, for example, works on this principle (Petoukhov et al., 2000)). Such models have parameters which allow them to approximate the response of more complex mod- els. Determining the uncertainties in parameter values requires a comparison of model output to observations by optimal fingerprinting. Forest et al. (2002) used 20 optimal fingerprinting techniques to constrain the parameters in the Sokolov and Stone (1998) model. However, such parameter constraints were only able to weakly constrain the sensitivity range of the model, and were unable to constrain the ocean heat uptake at all.

The authors of Allen et al. (2000) make the assumption that different AOGCMs (and the real world) will produce the same spatial patterns of response to external forcing, but with different magnitudes. Thus, GCM response can be scaled up to match observational records of forcing and climate and it is assumed that this scaling remains constant for predictions of future climate. This approach does tend to produce consistent results for global temperature, but less so for other quantities such as global precipitation (Allen and Ingram, 2002).

1.4 Ensemble Predictions

1.4.1 The Coupled Model Intercomparison Project

At the time of publication of the Third Assessment Report of the IPCC, the largest available ensemble of AOGCMs was the coupled model inter-comparison project (CMIP-2), which was a collaboration between various institutions to use nineteen different AOGCMs, each of which was subjected to a range of possible future forc- ing scenarios. Allen et al. (2003) suggested that the major problem with such an approach is that each model has its parameters tuned to produce the best-fitting re- sults when compared to observations of past climate. Thus, the forecasts from such an ensemble are likely to cluster about the most likely model, without spanning the full parameter uncertainty range. They corroborated this hypothesis by taking the range of predictions of transient climate response (to a cumulative 1 % per year in- crease in CO2 which would be consistent with observations to date and compared it to the transient predictions from the CMIP-2 ensemble. Although they found that the range of predictions in CMIP-2 did not span the full range of observationally consistent values, it could be argued that this constraint is an emergent property of the physical climate system as it is simulated by the GCMs. 21

1.4.2 Perturbed Physics Ensembles

In order to properly evaluate the uncertainty in model response, models must be “de-tuned” - i.e. they must be simulated with their model parameters perturbed over the full range of physically plausible values. Over the last five years, the increase in available computing power has made possible the approach of running Perturbed Physics Ensembles (PPEs) of GCMs a realistic proposition (Allen, 1999). The first ensemble of this type was conducted at the UK Met Office by Murphy et al. (2004).

1.4.2.1 The QUMP Ensemble

The project, known as QUMP (Quantifying Uncertainty in Model Prediction), orig- inally created an ensemble of 53 members using single perturbation experiments, where uncertain parameters were perturbed individually to sample the full uncer- tainty range for each. In Murphy et al. (2004), it was assumed that climate sen- sitivity could be assumed by a linear interpolation from the single-perturbation simulations. A large number of randomly generated parameter settings were cre- ated, and their sensitivities were inferred linearly. Various aspects of model climate were also predicted, and a probability density function was produced by weighting each model by its likelihood. The likelihood used was a “Climate Prediction Index”, or CPI which was a combined normalised root mean square error from observations over a number of climate variables.

However, there remains debate over whether a probability distribution inferred through a prior sampling of the parameter space is a fair representation of the true likelihood of climate sensitivity. Frame et al. (2005) used a simple energy bal- ance model to illustrate that the sampling of the prior distribution of models had a large influence on the shape of the inferred sensitivity distribution.

As discussed in Section 1.3.1, Frame et al. (2005) used an EBM (Hansen et al., 1985) to simulate global response, perturbing the values of c and f to produce models with different characteristics. Using this simple model, three ensembles were created, each sampled in a different way. Two observables were used: the change in heat content of the upper ocean, and the warming in the 20th century that could be attributed 22

Figure 1.3: The components of the Climate Prediction Index employed by Murphy et al. (2004), each column represents a diagnostic in the model which can be related to observations. Figure from Murphy et al. (2004)

Figure 1.4: A plot from Murphy et al. (2004) showing the probability density func- tion for Climate Sensitivity resulting from the QUMP experiment. The blue line rep- resents the distribution inferred from an unweighted ensemble, the red line weights each model by its likelihood 23 to greenhouse gases. For the first ensemble, the response of the model was then sampled uniformly in the space created by these two observations, but only models consistent with observed values were used in the analysis. The second and third ensembles sampled evenly in S and 1/S respectively, but again only included models consistent with the observations. The sampling of models is shown in figure 1.5. A probability distribution of sensitivity was created for each ensemble by creating a histogram of all those observations consistent with observations (figure 1.6), which shows that the resulting distribution of Climate Sensitivity is dependent on the prior sampling strategy.

Figure 1.5: The different sampling strategies illustrated in Frame et al. (2005). The crosses represent models sampled uniformly in the observations, the diamonds represent models sampled uniformly in Sensitivity (a) and 1/Sensitivity (b). Models inside the black line represent those models consistent with observations.

This provoked a search for methods of predicting the sensitivity of an ensemble which were independent of the prior distribution of models. We will discuss several such approaches, taking their data from both QUMP and another perturbed physics project - climateprediction.net , which is discussed in Chapter 2.

1.4.2.2 Annan (2006)

An alternative approach to ensemble model verification was taken by Annan (2006) - which described the results of a 120 member climate model ensemble using the atmospheric component of the MIROC model. It was found that tuning the model so that present day observations were best matched resulted in a distribution of model climate Sensitivity included very large sensitivities of up to 8K. In order to 24

Figure 1.6: The resulting probability distributions for the different sampling strate- gies in Frame et al. (2005). The red line shows the distribution sampled evenly in Sensitivity, the blue line shows sampling in (1/Sensitivity) and the green line shows uniform sampling in the observations. 25 test the feasibility of these models, they were each tested in their ability to reproduce the paleo-climate of the last glacial maximum. With this additional constraint on model performance, it was found that the models with very large sensitivities could not reproduce sea surface temperatures consistent with proxy records from that era. Hence, the upper 95th percentile for climate sensitivity was reduced to 6K in this study.

1.4.3 Thesis outline

This thesis comprises two alternative approaches for the study of large ensembles, and how they may be used to place constraints on climate response. In order to provide the necessary grounding for this work, Chapter 2 provides a background to the climateprediction.net project - from which we take our data, describing both the project history and the merits of different analyses which have already been conducted using the ensemble. Later in Chapter 2, we discuss some of the issues surrounding the constraint of climate sensitivity, the role of structural uncertainty and the relevance of Bayesian statistics.

In Chapter 3, we begin by discussing some of the more technical aspects of the ensemble architecture, including the parameter sampling schemes and the data re- duction necessary for the distributed computing architecture. This is followed by an in-depth discussion of the parameterisation schemes which are perturbed in the climateprediction.net ensemble, together with their roles in the HadAM3 climate model.

Chapter 4 is the first of the techniques proposed for ensemble analysis - using a linear analysis to classify different radiative feedback processes which are present within the ensemble. In the process, this chapter aims to determine the major physical processes which are responsible for the variation of climate sensitivity within the climateprediction.net ensemble and attempts to put constraints on the real-world extent of those processes using reanalysis data.

Chapter 5 is a study of the use of neural network emulators trained using data from a large ensemble. Once trained, the emulator can be used to interpolate over gaps in the original parameter sampling scheme used in the ensemble. The method is 26 proposed as a component of an ‘adaptive sampling’ strategy for future ensembles, where further simulations could be conducted in the parameter space where the behaviour is least understood, or where the models are predicted to verify well against observations.

In Chapter 6, we use the results of Chapter 5 to attempt to place constraints on Climate Sensitivity by considering best possible model formulations at different values of climate sensitivity. Finally, Chapter 7 discusses the conclusions which can be made from the work in this thesis, and where it may be extended. Chapter 2

Ensemble Analysis: Background

‘ “My is Legion,” he replied, “for we are many.” ’ —The Gospel according to Mark 5:9

The main problem with a perturbed physics ensemble is that it soon grows very, very large. Let us imagine that there are 15 parameters in a GCM whose values are uncertain, and we require at least 3 different possible values for each in order to cover that range of uncertainty. In this situation, 315 (≈ 14 million) simulations must be carried out in order to sample the entire parameter space of the model (in the case of a simple experiment design). Clearly this number of simulations is many orders of magnitude more than could possibly be conducted within a single organisation.

Murphy et al. (2004) addressed this issue by assuming that once a small ensemble had been established, a linear interpolation in the parameter space between the ensemble members could adequately describe the model behaviour. Although this approach successfully produced an ensemble of GCMs which spanned a range of response to greenhouse gas forcing - there still remained some doubt over how valid this linear approximation could be (Stainforth et al., 2005).

Allen (1999) proposed an ensemble of climate models, which could exploit the pro- cessing power of home personal computers during their idle time. Although this approach is still unlikely to sample the entire parameter space, there was a poten- tial to make ensemble sizes a few orders of magnitude more than would otherwise

27 28 be possible. The concept was further explored in Hansen et al. (2001): the project, then termed “Casino-21”, would use a distributed computer architecture similar to that used for the “SETI@home” described by Anderson et al. (2002). In order to run on the home computers available at the time, the current generation GCMs would need to be simplified somewhat and the amount of data returned for each model would be somewhat limited. However, the proposal was met with enthusiasm in that there was potential to address some of the major issues of climate model un- certainty, with the added bonus of public participation and the potential to produce freely available data.

2.1 Preliminary Results

Climate Sensitivity (K)

Figure 2.1: The distribution of Sensitivities (K) from the first 2,500 simulations in the climateprediction.net ensemble, as shown in Stainforth et al. (2005). The black line shows the distribution of S in the entire ensemble, the red line shows only results with the low entrainment coefficient parameterisation, the blue line shows the distribution with the medium and high values of the entrainment coefficient. 29

Stainforth et al. (2005) described the first results of the project, which was released under the name “climateprediction.net ”. At the time of publication of this paper, approximately 2,500 simulations had been performed under the distributed network. A clear feature of the ensemble from the outset was the wide range of sensitivity which could be produced by perturbing only 6 parameters in the HadAM3 climate model (Figure 2.1). Several properties of the ensemble were discussed;

• Firstly, the S in the ensemble was demonstrated to be a non-linear function of the input parameters. This finding directly challenged the methodology of Murphy et al. (2004), whose sampling strategy assumed that the effect of the sum of two parameter changes could be approximated by a linear combina- tion of the individual changes (although some multiply perturbed simulations were introduced to partially account for this issue). The non-linearity of the sensitivity response is illustrated in Figure 2.2.

• Secondly, it was shown that a simple discrepancy term derived from obser- vational differences in surface observations did not show any particular rela- tionship to climate sensitivity, ruling out the instantaneous dismissal of the physicality of very high sensitivity models (i.e. Sensitivities significantly out- side the accepted 2 - 4.5 K range as published in the Fourth Assessment Report of the IPCC (Solomon et al., 2007)). Figure 2.3 shows that even the highest sensitivities of more than 10K cannot be ignored by simply comparing model discrepancies from surface observations.

Hence, the challenge remained to produce an estimate of likelihoods of different climate sensitivities from the ensemble, where a basic comparison of models to observations had failed to produce any strong constraints.

2.2 Sensitivity Analyses

The data from the climateprediction.net project has been used to create various predictions of real-world distributions for climate sensitivity, all of which attempted to address the sampling dependency issue raised in Frame et al. (2005). The first, 30

Figure 2.2: A figure from Stainforth et al. (2005) demonstrating the non-linearity of Sensitivity when considered as a function of model parameter changes. The vertical axis represents the linearly estimated climate sensitivity (in Kelvin) of each model, using single perturbation simulations. The horizontal axis shows the true modelled climate sensitivity. Different colours represent different values for the entrainment coefficient, an important parameter in the determining model response in the ensemble (see Chapter 4). 31

Climate Sensitivity (K)

Figure 2.3: The climate prediction index of 2,500 models as shown in Stainforth et al. (2005), plotted against climate sensitivity. The CPIs in this case were taken as root mean square errors from reanalysis data, taken from various fields in the model. The unperturbed model is shown by a red diamond, single parameter perturbations are shown by yellow diamonds, triangles show CMIP II models - where HadCM3 is shown in red, and others in blue. 32

Piani et al. (2005), attempted to create linear predictors of sensitivity using infor- mation from the control atmosphere of models in the ensemble. Once found, the predictors can be used to estimate the true climatic response to greenhouse forcing:

2.2.1 Piani et al.

The first sensitivity study considered here is Piani et al. (2005). This paper at- tempted to derive predictors of sensitivity by projecting patterns of natural vari- ability onto members of the climateprediction.net ensemble. The logic behind this process is discussed in Allen et al. (2000), where it is argued that patterns of response to GHG forcing may be described in terms of modes of internal variability and that these patterns should form the orthogonal basis onto which ensemble members are projected. Dominant patterns of natural variability were determined by performing a Principal Component Analysis (PCA) on a 500 year control simulation to find the leading EOFs (Empirical Orthogonal Functions). For a full description of Principal Component Analysis, see Appendix A.

The resulting EOFs were patterns which describe the dominant varying compo- nents in various different diagnostic fields from the model. The fields used included temperature, precipitation, top of atmosphere radiative balances as well as humid- ity and dynamical data. They then projected the control simulations from the climateprediction.net ensemble onto these EOFs, and took linear combinations of these patterns in order to make the projections completely statistically independent within the ensemble, these new patterns were Rotated EOFs, or REOFs. Thus, the method found the most efficient orthogonal basis to describe ensemble variability in the control simulations. The process is illustrated in Figure 2.4.

Now, the control simulation could be approximately summarised by a short state vector, xi, which had K components, where K is the truncation length - which is determined later. The components of this vector were then regressed against the sensitivity parameter (the inverse of climate sensitivity, λi) of each model to give the least squares fit for the regression coefficients, β, where: 33

n(gridpoints)

n n(spatial)

o

i

t 

a

l

u

m

i

S

)

l   s

o

r F t EOFs – Patterns of natural variability

n O

o

 E (

C

n r   

a

e

y

0

0



5









n(EOFs) n(EOFs) n(gridpoints)

e

e

l

l

b

b

e

l

m

m

b

e e 

s

m

s

n

e

n

e

s

e

n

n

n

i

e

i

Rotated

s

n

s

l

i l Ensemble    e

e

s

d

l

Ensemble d

o

e  o

d Loadings

m

m

o

Loadings f

f

o

m

o

f

r

r

o

e

e

b

r

b

e

m

m

b

u

u

m

N

N

u

N

Figure 2.4: A diagram illustrating the process used in Piani et al. (2005) to compress the control state of a model in the ensemble into a short state vector by projecting natural patterns of variation onto the ensemble. 34

K

λi = xijβj + noise. (2.1) j=0 X In order to determine K, the truncation length, an F-Test is performed such that a vector is found perpendicular to β. This vector is then projected onto both an observational time-series, and a control simulation of the model (HadAM3). The ratio of the model variance to the observational variance is then subjected to an F-Test, which determines whether both could have originated from the same distribution. If the ratio is greater than that allowed by the F-Test, then insufficient variance is described by the first K modes. If the ratio is less than the allowed limits, then the higher modes are likely to describe only noise in the model. If the F-test is passed, at the 90% confidence level, then the truncation length, K is acceptable. Piani et al. (2005) found that any truncation length between 8 and 12 modes was acceptable.

The sensitivity parameter is the inverse of Sensitivity, and can be thought of as the sum of feedbacks in any given model. Piani et al. (2005) found that the observables tend to scale linearly with the inverse of sensitivity. A plot of predicted against actual sensitivity parameter is shown in Figure 2.5.

Once the ensemble had been used to establish linear regression coefficients to predict the sensitivity parameter from the state vector of a model, an estimate of the sensi- tivity of other models and reanalyses was discovered. Piani et al. (2005) then took a selection of reanalysis data, together with data from other GCMs and projected them onto the previously established REOF patterns.

Now, each of the reanalyses and GCMs is approximated by a state vector, xo. Together with the already established regression coefficients - a best guess for the feedback parameter of the real world,λo was given by:

K

λo = xojβj. (2.2) j=0 X The uncertainty in this estimate due to natural variability is approximated by pro- jecting a second 500 year control onto each of the REOFs, and from this calculating 35

Figure 2.5: A plot from Piani et al. (2005), in which the actual feedback parameter (1/Sensitivity) is plotted against the simulated feedback parameter as projected from the control state. The distribution at the bottom is simply a Gaussian with the mean and standard deviation taken from the predicted values of the sensitivity parameter from observations. The thin line on the y-axis adds the unexplained variance in the actual values to this initial estimate (i.e. takes into account the error in predicting sensitivity within the ensemble). The thick line uses the scatterplot itself as a transfer function for the predicted distribution. 36 the variance in the expected prediction of the feedback parameter. This was rep- resented by the Gaussian Distribution on the x-axis of Figure 2.5. The total error included also that associated with the ability to predict the feedback parameter of a model, which was shown in the distributions on the y-axis of Figure 2.5.

The final issue to consider is that of systematic model bias. Piani et al. (2005) raised the issue of biases in the climatology that were due to the underlying model struc- ture, rather than changes in parameters. Such discrepancies must be included in the error estimate of any prediction made by comparing model output to reanalysis data. No systematic method of determining this error was discussed but it was esti- mated in Piani et al. (2005) by finding the error between the unperturbed HadCM3 model and the reanalysis dataset. This error was combined with the prediction error in order to provide the final estimate.

Because the ensemble is used only to establish relationships between predictors and total feedback, the method of Piani et al. (2005) was not directly dependent on the prior distribution of models. However, there is potential for the technique to be indirectly dependent on the sampling strategy; the rotation producing the REOFs used as predictors in the regression is ultimately dependent on the ensemble itself. The range of model response and how that might relate to model climatology is also clearly a function of the original choices made in the ensemble design process, both the parameters and the perturbed values influence the ‘space’ in which the prediction of real-world response is made. Moreover, the predictor is entirely trained within the ‘structure’ of HadAM3, and any correlations between observables and model response could be argued to be only valid within that model environment. These issues are discussed further in Chapter 5.

2.2.2 Knutti et al.

Another sensitivity study arising from the climateprediction.net ensemble was Knutti et al. (2005). This paper again attempted to use the control simulation to establish predictors of climate sensitivity, but this time used a neural network to allow for non-linear relationships between the predictors and sensitivity.

The study exploited the fact that the seasonality of some regions (the difference 37 between summer and winter mean temperatures) often displayed a relationship to the sensitivity of the model. Figure 2.6 shows that northern hemisphere regions do generally show a greater seasonal response in the models of high sensitivity. This is because the Northern Hemisphere landmasses are subject to a large change in radiative forcing between summer and winter and have relatively small heat capacity compared to the ocean-dominated south. It also shows that the observations inferred from reanalysis data lie within the ensemble in the regions shown.

Figure 2.6: A plot of seasonality in four regions taken from the cli- mateprediction.net ensemble against the climate sensitivity of each model. The horizontal lines represent the values of seasonality taken from reanalysis data, while the circles show the seasonality and sensitivity taken from other climate models.

This relationship was exploited by using the seasonalities from several regions in the model as an input vector for a neural network. A neural network is a computational method that effectively fits an arbitrary non-linear function of a set of inputs in 38 order to predict the value of one or many outputs. The development of the type of networks used here is discussed in Specht (1990), and they are specifically related to the issue of predicting climate sensitivity in Knutti et al. (2003).

In Knutti et al. (2005), the ensemble was split into two parts, one of which was used to train the network to predict sensitivity while the other was used for verification that the dataset had not been ‘over-fitted’. Once the network had been satisfactorily trained (Figure 2.7), it can be used to predict the sensitivity of the real world by using reanalysis data with the trained neural network.

It must be noted that not all regions in the model were suitable for this analysis. A neural network can only make useful predictions if given values within the range in which it has been trained. Hence, if the observed values of seasonality from the reanalysis data lie outside the majority of the climateprediction.net ensemble - then that region must be excluded from the analysis (even though its distance from the observations may in truth be one of the strongest constraints).

The uncertainty due to natural variability in the observations was measured from the datasets themselves and used to create a large simulated dataset of seasonality values with the measured standard deviations. This was then used with the trained neural network to produce a distribution of predicted sensitivities for the reanalyses. Additional error was added to account for the inherent inaccuracy of the neural network itself (which was measured from the ability of the network to predict S in the verification set). The resulting probability density function is shown in Figure 2.8.

It is noticeable in Figure 2.8 that different distributions are obtained if ocean re- gions are not considered in the analysis. The authors suggest that one should give more credence to the land-only results, because the ocean temperatures in climateprediction.net are effectively constrained to observations and thus do not represent any real aspect of the perturbed model climatology. 39

Figure 2.7: A plot from Knutti et al. (2005) of predicted value of S against actual S for models in the climateprediction.net ensemble. These models make up the verification set which were not trained by the neural network, and thus represents a test of its ability to predict S. The circles represent models from the CMIP-2 ensemble - the sensitivities of which are predicted by the same technique. 40

Fig. 4: PDFs for climate sensitivity derived from the ERA40, NCEP and HadCruT2v datasets Figure 2.8: A plot from Knutti et al. (2005) showing the final inferred probability density function for climate sensitivity accounting for the discrepancies between predictions from different reanalyses. The solid lines represent predictions made when the network was only given data from land covered regions. The dotted lines show distributions when ocean regions are also included. The straight lines at the top indicate the 10-90 percentiles of the prediction for the different reanalyses. 41

2.2.3 Rougier and Sexton

The final sensitivity study which we will consider here is Rougier and Sexton (2007). This paper discusses the relative merits of inferring a distribution of sensitivity based upon various different sampling strategies in the model parameter space. Unlike Knutti et al. (2005) and Piani et al. (2005), this work makes no use of observational data to make its predictions - instead it uses a model emulator to illustrate the effects of using different idealised prior parameter distributions in an ensemble.

The basic technique uses a statistical emulator to relate the model output (in this case, the climate sensitivity) to a set of model inputs (taken as a set of some discrete and some continuous input model parameters). These are used to train a model em- ulator - which is in theory able to relate the input parameters to the outputs. Once trained, the model emulator was used to evaluate a selection of different ensemble sampling techniques, the first of which was a simple Monte-Carlo distribution in the input parameter space. In this case, the finite distribution of sensitivities can be used to make an estimate of the underlying distribution - together with confidence intervals for that distribution. They then tested the technique by using their model emulator to produce Monte-Carlo distributions of different sizes and attempting in each case to determine the underlying distribution - finding that at least 200 models were required in their emulated ensemble to place an upper confidence bound on the 90th percentile for climate sensitivity.

The paper then goes on to discuss the merits of “importance sampling”; a statisti- cal technique used to make Monte-Carlo ensembles more efficient by weighting the parameter sampling so that more likely regions are more densely sampled. With such a distribution - an ‘Effective Sample Size’ may be defined, which must be a significant fraction of the original ensemble size to be sure that the estimated dis- tribution is not based on an unrealistically small sample. They discuss the use of a triangular distribution to weight each of the continuous input parameters - such that the most likely parameter value lies at the peak of the distribution, and the extreme values lie at the points. The problem with such a distribution, they find, is that given that there are 14 degrees of freedom - the weighting function is effectively raised to the 14th power and only a very small number of models in the ensemble are 42 considered. Their solution is to restrict the weighted distributions to 5 parameters which they deem important to determining the model sensitivity. They conclude that importance sampling is useful only if the approximate parameter weighting is already known, and small perturbations are considered.

The final method exploited is the direct use of the statistical emulator to produce the final distribution - eliminating the Monte-Carlo sampling all together. This effectively integrates over an approximation of the sensitivity distribution in param- eter space - allowing the use of any desired weighting of the parameter space in that integral.

The techniques outlined in this paper differ strongly from the two approaches al- ready discussed in that fundamentally, the only probabilities considered are those of different parameter values, rather than evaluating the diagnostic output of the models concerned. Such an approach is essential when exploring the range of attain- able behaviour in perturbed versions of a single model, effectively allowing different prior distributions to be tried with relative ease, and without the burden of actually creating a new ensemble!

2.3 Motivation for further work

Although the analyses discussed in Section 2.2 produce consistent estimates for the likely true range of climate sensitivity, there remains questions on their robustness and how these results should be interpreted in light of different observations and physical feedback mechanisms. In the body of this thesis, we will consider the following major extensions to the work already conducted:

2.3.1 Physical Interpretation

The analyses discussed in Section 2.2 both used the climateprediction.net ensemble in order to produce predictors of climate sensitivity which are valid within the ensemble, and then applied them to observations in order to estimate the real-world value of S. In both cases, however, little effort was made to identify the physical 43 significance of those predictors or to explain why they should be related to the greenhouse gas response. In addition, none of the previous ensemble analyses have attempted to isolate different types of feedback response to greenhouse gas forcing.

In the QUMP ensemble, Webb et al. (2006) attempted to break down global feed- backs into local components using the longwave and shortwave cloud radiative re- sponse to surface warming in equilibrium double CO2 experiments. Relative changes in longwave and shortwave flux were categorised into different types of cloud feed- back; for example, a increase (or decrease) in shortwave cloud radiative forcing on surface warming, with little or no change in the longwave forcing was interpreted as an increase (or decrease) in low level cloud. This information was used in parallel with ISCCP diagnostics to determine the height and type of clouds which were re- sponsible for the change. The method was used to determine which types of cloud feedback were most responsible for the variance in climate sensitivity within the ensemble, breaking down the response into latitude bands to represent tropical and extra-tropical responses independently. The authors found that increases in low-top level cloud cover accounted for approximately 60% of the total cloud response, with only 20% accounted for by high-top cloud feedbacks.

In this thesis, we introduce another method of local feedback analysis; in contrast to Webb et al. (2006), Chapter 4 attempts to analyse geographical patterns of radiative feedback which scale over different members of the ensemble. In so doing, we address the following issues:

• What are the major feedback processes in climateprediction.net? - Firstly, we seek to identify independent physical processes which lead to variation in S with the climateprediction.net ensemble by describing the overall global feedback parameter as a sum of regional components. These modes of response must be identified systematically and must be identifiable by examining pat- terns of radiative response to greenhouse gas forcing.

• What are the dominant model parameter changes, and how do they relate to the major modes of response? - Once we have identified the physical feedback processes present in the ensemble, we seek to identify how they scale with the perturbed model parameters. This relates the macroscopic variation in global 44

response to the sub-grid scale parameterisations, and provides insight into the physical basis for the identified feedback processes.

• How can we create meaningful predictors of climate sensitivity? - By search- ing for predictors of the different independent feedback processes within the diagnostic output of the control models, rather than predictors of S itself, we seek to create a more meaningful prediction of the global response for the real world. Because each predictor is related only to one specific feedback process, we may quickly determine the most important observations for determining the likely real-world amplitude of the process. Thus we may physically describe why the use of different observational sources results in different predictions of likely greenhouse gas response.

2.3.2 Systematic Model Error

Rougier (2007) described the difference between models and the climate itself to be a sum of two parts: a reducible and an irreducible part. The reducible part may be lessened by a better choice of model parameters, while the irreducible part is a ‘systematic error’ - a result of model imperfections which cannot be removed by further ‘tuning’ of parameters.

The approaches described in Section 2.2 make the implicit assumption that the ‘perfect model’ is a tunable state of the GCM because they assume that predictors valid within the ensemble are applicable to the real world, ignoring the irreducible error described in Rougier (2007). To ignore this component of the error means that some degree of extrapolation is required when applying predictors to observations, adding an unknown error to the results. Knutti et al. 2006 already noted that because of the structural biases in HadAM3, the relation between observed and predicted quantities did not always hold in different GCMs. Meanwhile, Piani et al. (2005) made the assumption that the irreducible component of model error could be approximated by expressing the control model variability about the ensemble mean, but the authors accepted that this was a crude approach only to determine the order of magnitude of the effect. The authors gave little justification as to why the correlations found were applicable in a different model environment (outside the 45 perturbed HadAM3 structure), and how this should affect the overall uncertainty in the predictions made.

Hence, the approach of finding predictors of S within the ensemble in order to constrain S has a fundamental flaw in that we do not know how applicable these predictors are to other models, or the real world. To fully deal with this issue, the irreducible error must be considered directly within the analysis. Hence, in chapter 5 - we propose a new method of using an ensemble to constrain the value of climate sensitivity. Rather than using the ensemble to find predictors of S, we seek to determine an underlying function E(S), which is the minimised model error at a given value of climate sensitivity, S. Such a function less dependent of the sampling strategy of the ensemble (although the range of behaviour in the ensemble is still governed by the parameters chosen to be perturbed), hence is not directly subject to the objections raised in Frame et al. (2005). In addition, this method would not require a crude estimation of systematic model error - because the systematic error itself forms the constraint. The challenge, which is further discussed in chapter 5, is to find the underlying function E(S) from a discretely sampled ensemble such as climateprediction.net.

Although the climateprediction.net ensemble has been used to make predictions of likely response using observations - little work has examined how different ob- servational constraints on S combine and how this might relate to the Bayesian interpretation of multiple constraints (Section 1.3.2.1). In chapter 5, we seek to examine how the systematic constraint on S established in section 2.3.2 changes as more observations are added to the metric, and how this compares with combining observational constraints under the Bayesian framework.

In addition, we seek to compare individual observational systematic constraints on S by considering a range of model diagnostic outputs - both annual mean and seasonal differences - in order to determine which observations are most useful in categorising climate response to greenhouse gas forcing. Chapter 3

Technical Background

‘ On a waiter’s bill pad, - said Slartibartfast, - reality and unreality collide on such a fundamental level that each becomes the other and anything is possible, within certain parameters... ’ —Life, The Universe and Everything, Douglas Adams, 1985

GCMs are unpleasantly complex computer programs, and before embarking on an analysis of the behaviour of the perturbed models within the climateprediction.net en- semble, in this chapter we hope to shed some light on the parameterisations which are perturbed in the experiment. The analyses contained in Chapters 4 and 5 use data from the first climateprediction.net ensemble. This is based on the Met Of- fice HadAM3 , coupled to a thermodynamic slab ocean. The perturbed parameters are spread amongst the cloud scheme, precipitation scheme, convection scheme and sea-ice models. Later developments in the ensemble architec- ture introduced a sulphur cycle and a coupled, fully dynamic ocean, both of which may themselves be perturbed. However, the analysis of these components is left to future work.

The HadAM3 model was developed by the UK Meteorological Office, it has a reso- lution of 3.75◦ x 2.5◦, with 19 vertical levels ranging from 1000 to 10mb. The slab ocean is a representation of the oceanic mixed layer, and has an equivalent heat ca- pacity. A calibration process is necessary to make this slab ocean representative of

46 47 the real ocean when simulating present day conditions. Thus, each model simulation is conducted over three stages: a calibration, control and double CO2 simulation:

The calibration is a fifteen year simulation where the sea surface temperatures are fixed to observed values. The carbon dioxide concentrations are set to pre-industrial levels, and the model is allowed to equilibrate. At the end of the simulation, the energy imbalance at the ocean surface due to the fixed temperature boundary condition is measured and recorded.

The control simulation takes the flux measured in the first stage and applies it as an ‘anomalous heat flux’ or ‘heat convergence’ at the ocean surface. The sea surface temperatures are now allowed to adjust from their initial state and the model is allowed to run for fifteen years.

The double CO2 simulation has the same initial conditions and corrective fluxes

as the control simulation, but the CO2 concentrations are doubled from the pre-industrial levels at the start of the run. The atmosphere is again left to adjust for fifteen years, and thus the climate sensitivity of the model can be inferred using an exponential fitting algorithm (Stainforth et al., 2005).

In most cases, the flux correction method will successfully prevent a drift of global- mean temperatures during the control stage. In a few cases, a runaway interaction occurs between the thermodynamic ocean and extensive cloud cover in the mid- Pacific region (Stainforth et al., 2005). Such a feedback is dependent on the un- physical inability of the thermodynamic ocean to transport anomalous heat, and so this small minority of drifting runs is removed.

The analyses presented in this thesis use 15 years of seasonal data from both the control and double-CO2 stages for different geographical regions. The limited band- width available to the public participants in the project restricts the amount of data which may be retrieved for each model. Thus, time series data is limited to geographical means of regions (defined in Table 3.2). The regions are primarily land-based (although, due to the regions being rectangular, there is some ocean area included near coastlines). There are also four additional zonally averaged re- gions, representing means over each hemisphere, and over the southern and northern 48

Definition of Regions as used in the analysis Number Name Longitude Latitude 1 Australia 110 → 155 -45 → -10 2 Amazon Basin 280 → 325 -20 → 10 3 Southern South America 285 → 320 -55 → -20 4 Central America 245 → 275 10 → 30 5 Western North America 230 → 255 30 → 60 6 Central North America 255 → 275 30 → 50 7 Eastern North America 275 → 300 25 → 50 8 Alaska 190 → 255 60 → 70 9 Greenland 255 → 350 50 → 85 10 Mediterranean Basin 350 → 40 30 → 50 11 Northern Europe 350 → 40 50 → 75 12 Western Africa 340 → 20 -10 → 20 13 Eastern Africa 20 → 50 -10 → 20 14 Southern Africa 350 → 50 -35 → -10 15 Sahara 340 → 65 20 → 30 16 Southeast Asia 95 → 155 -10 → 20 17 East Asia 100 → 145 20 → 50 18 South Asia 65 → 100 5 → 30 19 Central Asia 40 → 75 30 → 50 20 Tibet 75 → 100 30 → 50 21 North Asia 40 → 180 50 → 70 22 Antarctica 0 → 360 -90 → -65 23 Northern Hemisphere (NH) 0 → 360 0 → 90 24 Southern Hemisphere (SH) 0 → 360 -90 → 0 25 NH Extratropics 0 → 360 30 → 90 26 SH Extratropics 0 → 360 -90 → -30

Table 3.1: Definition of regions as used in the climateprediction.net experiment.

“extratropics”. The total number of regions used, R is 26.

3.1 Perturbed Parameterisations

Each model in the ensemble is allocated a set of model parameters, each of which can assume one of two or three values - one of which is the default value in HadSM3. The values were selected by expert solicitation to represent the limits of current uncertainty, similar to those used by Murphy et al. (2004). A list of the parameters perturbed for the subset of models considered for this thesis is given in Table 3.2. 49

Initial condition ensembles are models with identical parameter settings, but with a slightly altered initial temperature field (these help determine the significance of uncertainty of the first kind in predictions).

The major parameterisations are associated with the cloud scheme along with large scale and convective precipitation. In the following section, we attempt to place these parameters in context in their respective schemes. This should not be inter- preted as a full description of the schemes, as this is well beyond the remit of this thesis, but we briefly introduce the relevant aspects of the model and the roles of the perturbed parameters:

3.1.1 Large-Scale Precipitation

The HadAM3 model contains prognostic cloud water content variables, and thus the liquid and frozen cloud water variables must be parametrised as a function of precipitation. The calculations are conducted over model layers, calculating the liquid (L) and frozen (F) precipitation from the top “wet layer” downwards (Smith et al., 1998). In order to calculate the change in precipitation flux between the top and bottom of each layer, 3 processes are considered:

Evaporation and Sublimation - The evaporation of falling rainfall and sublima- tion of falling ice are dealt with similarly - hence here we describe the model treatment of evaporation:

∂qk = Cev(T,P )(qsat(TK ,pk) − qk), (3.1) ∂t evap

where qk is the specific humidity in the layer k, qsat is the saturation relative

humidity and Cev is the saturation bulk rate coefficient (itself a function of temperature and precipitation to account for the change in diffusivity of wa- ter). Once derived, this is related to the change in temperature in the layer:

∂T L ∂q k = C k , (3.2) ∂t evap cp ∂t evap

where Tk is the mean temperature in the layer, LC is the latent heat of con- 50 wn here - - - - 3 4 4 5 − − − − 10 2.0 9.0 0.9 0.7 0.65 Max 10 10 10 10 × × × × 2 5 4 4.0 4 5 4 5 5 − − − − 1.0 3.0 0.73 0.63 0.57 10 10 10 10 × × × × Type 7 Type 7 Type 7 Type 7 Default 2 5 1 3.0 5 4 4 5 2 − − − − .net experiments used in this analysis. 0.5 0.6 0.5 0.5 Min 0.65 10 10 10 10 × × × × Type 1 Type 1 Type 1 Type 1 5 1 2 . 2.5 0 prediction ) ) ) ) 3 1 1 3 − − − − Description ubset of climate ates parameters defined on vertical model levels, values sho Ice fall speed (ms Entrainment coefficient Accretion constant (s Albedo at melting point of ice Ice crystal size in radiation (m) Critical relative humidity(%)*** Empirically adjusted cloud fraction*** Stratiform cloud ice crystalStratiform in cloud LW radiation ice crystal in LW radiation Convective cloud ice crystal in LW radiation Convective cloud ice crystal in SW radiation Threshold for precipitable water, sea (kgm Temperature range of ice albedo variation (K) Threshold for precipitable water, land (kgm CT VF1 EACF DTICE RHCRIT ALPHAM ENTCOEF CW_SEA * ICE_SIZE CW_LAND * I_ST_ICE_LW ** I_ST_ICE_SW ** I_CNV_ICE_LW ** I_CNV_ICE_SW ** Perturbation label are means over all model levels. Table 3.2: Definition of perturbed parameters as used in the s Parameters marked * and ** are perturbed together. *** indic 51

densation and cp is the specific heat capacity of water at constant pressure. Similarly the water budget is balanced by adjusting precipitation:

∂qk Pk−1/2 = Pk+1/2 − (ρ∆z)k , (3.3) ∂t evap

where Pk−1/2 and Pk−1/2 are the precipitation rates at the top and bottom of the later respectively. Because T and P feature in equation 3.1, the above equations must be implicitly dealt with when discretised in time. The full numerical details are discussed in Smith et al. (1998).

Precipitation Formation from liquid cloud water - The model accounts for differences between water and ice clouds. In water clouds, water condenses on CCN (cloud condensation nuclei), and rarely rains out of its own accord. Thus, the rate of precipitation is governed by the formation of large droplets, and is parametrised as follows (Sundqvist, 1978):

∂q ρq /C 2 P q cl = −C c 1 − exp − cl + cl , (3.4) ∂t T c c C "  w  #! a !

where qcl is the amount of liquid cloud water. The leading factor C is the fraction of the grid-box containing cloud. The auto-conversion of cloud water

into rain is governed by the expression starting with the rate constant cT ;

the exponent is the ratio between the in-cloud condensed water (ρqc/C) to cw which is the threshold for precipitable water. Additional precipitation occurs

from the accretion term P/CA, which represents the accretion of moisture onto existing precipitation.

In climateprediction.net , two of these parameters are varied: the accretion

constant (CA or CT in Table 3.2) and the threshold for precipitable water (cw or CW_LAND and CW_SEA) - where different values are appropriate over land and ocean areas because the air over the oceans has less condensation nuclei.

Precipitation Formation from ice clouds - The treatment of precipitation from frozen clouds is somewhat different, the rate of change of cloud ice water is 52

given as follows: ∂q P∗ − ρq v cf = cf f , (3.5) ∂t ρ∆Z

where qcf is the frozen cloud water content, P∗ is the proportion of falling frozen precipitation which enters the layer and remains within it (rather than falling out), ρ is the air density and ∆Z is the thickness of the layer. The

process is scaled by vf - the ice-fallout speed, which parametrises ice fallout in terms of in-cloud water content, using an observationally derived formula (Heymsfeld, 1977): ρq /C 0.17 v = v C , (3.6) f f1 c  f  −1 where the scaling factor is vf1, with a default value of 1ms and the term in

brackets is the ratio of the in-cloud condensed water ρqc/C to the threshold

for precipitation of frozen water cf . In the climateprediction.net project, it is

the scaling factor vf1 (VF1 in table 3.2), which is varied.

3.1.2 Saturated Humidity and Large-Scale Clouds

3.1.2.1 Critical Relative Humidity

Details of the large-scale cloud scheme are given in Smith et al. (1997) and Smith

(1990). The saturated humidity qs is assumed in the model to be given by:

ǫe (T ) q (T,p)= s , (3.7) s p where es is the saturation vapour pressure of water vapour (itself a function of T ), p is the atmospheric pressure, T is the temperature and ǫ is the ratio of the molecular weights of water and dry air. At the end of every timestep, if the humidity exceeds this level, it is returned back to the saturation value (with the appropriate adjustment of temperature and precipitation).

The cloud amount is determined in the mean by the difference between specific total water content qt and the saturation specific humidity, but a statistical distribution about this point is assumed (Mellor, 1977). The grid box mean of this difference is 53 defined as:

Qc = aL (qt − qs (TL,p)) (3.8) where aL is a scaling factor, subject to latent heat constraints (L could be the latent heat of condensation or sublimation, depending on the temperature):

1 a = , (3.9) L 1+ L α Cp L where αL is given by the Claudius-Clapeyron relationship:

∂qs ǫLqs(TL,P ) αL = |T =TL = 2 . (3.10) ∂T RTL

The model also assumes local deviation s from the mean (QC ) which are attributed to sub grid-scale changes in humidity and temperature:

′ ′ s = aL (qt − αLTL) (3.11)

It is assumed that the distribution G(s) of s within the grid-cell is a symmetric triangular function, width 2bs. G(s) is illustrated in figure 3.1, and satisfies the following: +∞ G(s)ds =1. (3.12) −∞ Z Thus, cloud is first formed when the QC exceeds −bs and cloud coverage is total when QC is +bs. bs is determined as follows:

bs = (1 − RHc)aLqS(TL,p), (3.13)

where RHc is the threshold relative humidity for cloud formation. Thus, if RHC is increased, the width of the distribution will decrease and the specific humidity in the grid-cell must be closer to the saturation specific humidity before clouds may form. In climateprediction.net - it is the value of RHc (RHCRIT in Table 3.2) which is varied. 54

Figure 3.1: An illustration of the distribution of specific humidities which is assumed to exist within each grid-cell. The cloud fraction C is obtained by integrating from the grid-box mean specific humidity across the distribution.

3.1.2.2 Cloud Fraction

The cloud fraction C in the grid-box is calculated from the grid-box mean specific humidity as follows (Figure 3.1):

+∞ C = G(s)ds, (3.14) − Z Qc hence when QC is −bs then C = 0, when Qc = 0 then C = 0.5 and when Qc = bs then C = 1. A commonly used adjustment to the Smith (1990) scheme is the empirically adjusted cloud fraction (Smith et al., 1997) which increases the cloud fractions which it produces, in order to better match observations. The adjustment takes the value of Qc used to calculate the cloud fraction, and provides an increased value to the cloud scheme (Eq. 3.14), with a linear mapping:

(Q + k) Q′ = c (3.15) c 1 − k where k is empirically determined. The empirically adjusted cloud fraction is cal- 55

culated for the case when QC = 0:

+∞ EACF = G(s)ds, (3.16) − k Z 1−k thus if k = 0, then EACF is 0.5, but if k > 0 then EACF > 0.5. EACF may be set to different values on different model levels. In climateprediction.net - EACF is one of the perturbed parameters (EACF in Table 3.2).

3.1.2.3 Clouds in Radiation

The effect of clouds on the radiative budget is described in Ingram (1990). In longwave radiation, the clouds are assumed to emit at the temperatures of the layer boundaries at their base and top which is a simplification, unless the cloud is optically dense. The errors introduced by this simplification are subsumed in the tuning process, where bulk emissivities are defined as a function of the cloud water path:

ǫ =1 − exp(−k × CWP ), (3.17) where k differs for water and ice clouds. The cloud water path CWP is defined for each level using the condensed water content:

CWP = ρqc∆z. (3.18)

In the case of ice clouds - this is a poor representation of the truth, because the surface area of ice particles is an important feature of their radiative effectiveness. This inaccuracy is accounted for within the tuning of k.

The properties of clouds in shortwave radiation are based on the parameterisation of Slingo (1989). This parametrises the key optical properties of the cloud in terms of known diagnostics: 56

τ = CWP (a + b/re) (3.19)

1 − ω˜ = c + dre (3.20)

g = e + fre (3.21)

where re is particle size in the radiation code (representing the peak of the particle distribution), τ is the total extinction depth of the cloud including absorption and all scattering,ω ˜ is the single-scatter albedo (the fraction of extinction to be scattered rather than absorbed in a single interaction). Finally, g is a measure of how much of the scattering is concentrated in the forward direction. The constants a..f are band-dependent, and ultimately determined through Mie theory and the measured complex refractive index of water. Their values are set independently for stratiform and convective clouds and for ice and water particles in the longwave and shortwave code.

There is no way to derive re, hence it remains a tunable parameter of the model. In the case of water clouds, this tuning is theoretically well defined and the theory agrees well with observations (Slingo, 1989). However, this scheme is also used for ice clouds where its use is not justified physically but where it is tuned through re and

CWP . Hence some considerable uncertainty lies in the appropriate value of re for ice particles, and it is one of the parameters tuned in climateprediction.net (ICE_SIZE in Table 3.2). In the same vein, we sample two possible sets of the coefficients a..f for the ice particles in the radiation code - both the defaults derived from Slingo (1989) and a second set based on an alternative parameterisation (I_CNV_ICE_SW, I_CNV_ICE_LW, I_ST_ICE_SW and I_ST_ICE_LW in Table 3.2).

3.1.3 Convection Scheme

Convection in the model is described in Gregory and Rowntree (1990) and Gregory and Inness (1996), and is applicable to moist convection of all types (shallow, deep and mid-level) together with dry convection. The scheme is based upon parcel theory, where the model calculates an ensemble of convective clouds in each grid- 57 cell and returns bulk characteristics (temperature, mixing ratio and mass flux) as the average of that ensemble.

For each given column of atmosphere, the buoyancy of the air is tested on each pressure level, from the bottom upwards. The stability is tested by considering the temperature of an air parcel transported adiabatically up by one model level; if the temperature of the parcel is greater than its surroundings by a given threshold (0.2K in the default setup), then convection is initiated and the parcel continues to rise. On each level, the parcel entrains environmental air while detraining cloudy air and this process continues until the parcel is no longer buoyant at level k; at this point, the parcel detrains a sufficient proportion of its air-mass such that the remaining proportion remains just buoyant and rises to the next level. This process continues until the mass flux falls below a given threshold or the parcel reaches the zero buoyancy layer of an undilute parcel rising from the point of convective initiation. Thus the ‘ensemble’ of clouds is not explicitly simulated, but is emulated by assuming that proportions of cloud detrain at different heights.

3.1.3.1 Entrainment and Detrainment

The variation in cloud mass flux (MP ) with pressure p is described as follows:

∂M − P i =(E − N − D ) , (3.22) ∂p i i i where i is a cloud within the ensemble, Ei is the entrainment rate, N is the mixing detrainment rate and D is the forced detrainment rate. Mixing detrainment rep- resents the turbulent detrainment of vapour at the edge of the clouds and occurs throughout the cloud mass. Forced detrainment, however, only occurs at the cloud top when the cloud reaches zero buoyancy - therefore Di is zero for all other levels. Summing over all the clouds in the ‘ensemble’ leads to the expression for the bulk cloud term: ∂M − P =(E − N − D) , (3.23) ∂p the entrainment term E is defined as a product of the cloud mass and the weighted entrainment coefficient, ǫ: 58

E = Ei = ǫMP i X ǫ M = i i P i M . M P P i P i  P It is recognised that the degree of entrainment is affected by the radius of the cloud in question; SIMPSON and WIGGERT (1969) proposed an inverse relationship between cloud radius and entrainment rate (for a height coordinate model):

0.2 ǫ = , (3.24) z R such that shallow clouds are assumed R = 500m → ǫ ≈ 4 × 10−5, while deep clouds are assumed R = 2000m → ǫ ≈ 1 × 10−5. Near the surface, both shallow and deep convective clouds coexist, while in the upper troposphere only the deep clouds remain. To accommodate this difference, a pressure dependence is introduced to the term ǫ such that: p ǫ = k A , (3.25) ent E p∗2

p where p∗ is the fraction of the surface pressure at the level concerned, and AE is a constant 1.5 (or 1 on the bottom convective level). The constant kent is set to 3 in the default setup giving entrainment values which are consistent with the expected dis- tributions of cloud radii at different pressure levels. In the climateprediction.net en- semble - it is kent or ENTCOEF in table 3.2 which is varied.

The detrainment in the clouds is similarly defined:

N = Ni = µMP i X µ M = i i P i M , M P P i P i  where the coefficient µ is itself a function ofPǫ. It is assumed for simplicity that detraining air has the same properties as the bulk cloud at a given level. Once 59 the values E, N and D have been established, they are used to calculate potential temperature Θ, mixing ratio q and cloud water l changes in the cloud on each model level.

3.1.4 Sea Ice Albedo Temperature dependence

The HadAM3 model contains a sea-ice model which relates net fluxes from the at- mosphere and ocean to changes in snow and ice mass. The sea ice model provides the atmosphere with a lower boundary condition (with variable surface tempera- ture). The full extent of this coupling is described in Ingram (1990), but in this experiment we are concerned only with the variation of albedo with temperature.

The albedo of the sea-ice fraction of a grid-box, α is specified as a function of the surface temperature of the ice, T :

αc if T ≤ TM − ∆Tice, α =  (TM −T ) αm +(αc − αm) if TM − ∆Tice < T ≤ TM ,  ∆Tice

 with linear variation between −∆Tice and 0. αc is the albedo of frozen sea-ice, αm is the albedo at the melting point of sea ice and TM is the melting point of sea-ice. In the climateprediction.net project - the values of αm and ∆Tice are perturbed (ALPHAM and DTICE in Table 3.2). Chapter 4

Linear feedback analysis

‘ “There is no case known (neither is it, indeed, possible) in which a thing is found to be the efficient cause of itself; for so it would be prior to itself, which is impossible.” ’ —Saint Thomas Aquinas (1225-1274), “The Five Ways”

4.1 Introduction

4.1.1 Feedbacks and Climate Sensitivity

In 1990, the Intergovernmental Panel on Climate Change (IPCC) proposed an un- certainty range of 1.5-4.5K for the equilibrium global surface response to a doubling of carbon dioxide (Houghton et al., 1990). In the years since, this uncertainty has remained almost constant despite considerable improvements in model complexity and increases in understanding (Solomon et al., 2007). Many attempts have been made to further constrain predictions using combinations of model and observational data, but uncertainty (especially when attempting to rule out very high values of climate sensitivity) has remained stubbornly high (Forest et al. 2002, Knutti et al. (2005), Gregory et al. (2002))

Much of this uncertainty may be attributed to differences in the amplitude of cloud

60 61 and water vapour feedbacks between models (Houghton et al. 1990, Cess 1996). Re- cently, there have been attempts to directly determine these feedbacks from obser- vational data; Forster and Gregory (2005) used ERBE radiative flux data, together with observations of surface temperature to categorise the change in radiative bal- ance at the top of the atmosphere in response to surface warming. Such measure- ments were then combined to produce an estimate for the global climate feedback parameter, λ:

∂F λ = − , (4.1) ∂∆Ts where F is the is the net downward irradiance at the top-of-atmosphere (TOA) energy balance and ∆Ts is the surface temperature change. λ is thus the change in top of atmosphere flux per unit warming. This value is directly related to the equilibrium surface response to a doubling of carbon dioxide:

S = Qc/λ, (4.2) where Qc is the constant radiative forcing due a doubling of carbon dioxide (ap- proximately 3.7 Wm−2), and S is the climate sensitivity.

The study of feedbacks in GCMs has traditionally been approached in one of two ways: “online” or “offline”. Online calculations were pioneered by CESS et al. (1990) and provide a relatively simple approach to feedback analysis, where fixed SST perturbations are applied to GCM simulations, and the resulting change in the ’clear sky’ radiative flux and the ’cloud radiative forcing’ are measured. These values are used to infer a clear sky sensitivity and cloud feedback. Various questions have been raised on the validity of this approach, Senior and Mitchell (1993) contested that the idealised sensitivities inferred from the SST perturbations may not be identical to double CO2 experiments. Also, Zhang et al. (1996) suggested that the cloud feedback measured by these methods also include some non-cloud feedbacks (such as those associated with a change in humidity distributions).

The alternative approach is to use offline calculations, the effect of feedbacks are judged by performing two separate greenhouse gas forcing simulations, one where 62 the feedback component is allowed to vary interactively with the rest of the model and another where the feedback component is fixed to pre-industrial levels. In theory, this approach may be applied to clouds, water vapour, lapse rate or any other feedback by fixing each component in turn and comparing the result to the full interactive simulation. The method was pioneered by Wetherald and Manabe (1988) and can be complicated and time-consuming to implement, compared to the simpler online approach, but has nevertheless been attempted by several groups with different GCMs, including Watterson et al. (1999) and Gong et al. (1994).

The two techniques were compared by Colman (2003). In this paper, a comparison was made of cloud, water vapour, albedo and lapse rate feedbacks for several GCMs with mixed layer oceans performing double carbon dioxide equilibrium simulations. The author found substantial spread in all feedback types in the current generation of GCMs, with a negative correlation between lapse rate and water vapour feed- backs and also between longwave and shortwave components of the cloud feedback. Feedbacks were evaluated with both online and offline calculations, and it was found that the results were comparable.

Soden and Held (2006) used a different offline approach, where the feedbacks are calculated from a possible future emissions scenario. In this approach, feedbacks were considered to be a product of two terms, one dependent on the radiative transfer and the other on the climatic response:

dR dx λx = (4.3) dx dTs where x is the climatic feedback component (cloud cover, humidity etc.), R is the net top of atmosphere flux and TS is the surface temperature. The GCM simulations are used to calculate the second term, while a radiative transfer model calculated the first. The authors found that the use of this methodology produced results consistent with those of Colman (2003) for water vapour, temperature and albedo feedbacks, but there was some additional uncertainty in cloud feedbacks because they were calculated as a residual term after all other feedbacks were calculated. The authors claimed that this methodology provided an economical calculation of feedbacks using pre-existing simulations of future forcing scenarios. 63

Various papers have been published exploring the mechanisms in feedback com- ponents in perturbed physics (and multi-model) ensembles. Webb et al. (2006) was discussed in Chapter 2, and focused on establishing different local regimes of cloud feedback based upon the combined change in longwave and shortwave cloud radiative forcing. This (offline) methodology enabled the authors to establish the dominant types of cloud feedback responsible for the variation of global feedback parameter in both the QUMP and the CFMIP ensembles. The authors found that cloud feedbacks provided the dominant variability in global feedback response in both ensembles (66% in CFMIP and 85% in QUMP). They used a simulator to find the likely cloud changes associated with the observed change in cloud radiative forcing. They found that in both CFMIP and QUMP ensembles, the dominant vari- ance in cloud feedback was due to low top-cloud changes. In CFMIP, it was positive changes due increasing low level cloud cover which provided most of the variance, while in QUMP, it was negative changes due to local decreases in low cloud amount which dominated.

In this chapter, in common with Webb et al. (2006), we attempt to find a method of separating and quantifying independent feedback mechanisms, and to determine how they relate to overall climate sensitivity. In addition, we also seek to find which parameter perturbations are responsible for the major feedbacks and which observations are relevant for determining their likely amplitude in the real world.

Section 4.2 is a discussion of the methodology of our approach: in 4.2.1, the method used to determine local feedbacks is discussed. Section 4.2.2 examines the process of isolating independent global feedback processes via principal component analysis. Section 4.2.3 looks at the regression techniques employed to seek out relationships between feedbacks, model parameters and model climatology.

Section 4.3 shows the results of the principal component analysis, with the ensemble mean response followed by the perturbations to that response which we see in the leading EOFs. The physical implications of each mode are then discussed; how they relate to the control climatology and how they each relate to perturbed model parameters.

Section 4.3.3 discusses how observations would project onto each of the modes, and 64 how these projections relate to the predicted range of S. Section 4.3.4 shows how the analysis may be extended by assuming local linearity in a small region of the ensemble where the values of the leading EOFs do not dominate the response.

4.2 Methodology

4.2.1 Regional feedback analysis

A methodology is developed to determine the physical processes responsible for variations in surface temperature response to the doubling of concentrations of CO2 in a perturbed physics ensemble. To categorise the regional response of the climate system, we examine how upward radiative fluxes at the top of the model atmosphere respond to the rising surface temperature. This ‘feedback response’ is defined as the rate of change of a given component of upward radiative flux at the top of the model atmosphere as a function of the local mean surface temperature (figure 4.1).

This value is calculated for four radiation variables as output from the model: long- wave and shortwave clear-sky flux, and longwave and shortwave cloud radiative forcing (CRF). The clear sky radiative fluxes are calculated as if by a repeat run of the model’s radiative transfer code in each timestep, ignoring the effect of clouds. The CRF is calculated by subtracting the clear-sky value from the total flux in both shortwave and longwave cases.

The data used is taken from the third stage of each model simulation, in which the

CO2 concentrations are doubled at the start of the run, leading to an increase in surface temperatures throughout the 15 year simulation. An ordinary least-squares, linear fit of upward flux as a function of annual mean surface temperature is made for each radiative type (longwave and shortwave values for clear-sky and cloud radiative forcing), and the four feedback amplitudes are defined as the gradients of these lines, as illustrated in Figure 4.1. We do not test for the significance of fit, effectively taking the best estimate for the radiative response to warming in each model. Although this means there is sampling ‘noise’ in the determination of the gradients, this is justifiable because only correlated variance is identified by the EOF analysis used 65

−22 42

40 −24 ) ) 38 −2 −2 −26 (Wm (Wm LW CRF

SW CRF 36

−28 34

−30 32 294 296 298 294 296 298 Surface temperature Surface temperature Australia (K) Australia (K)

284 55

282 54.5 ) ) 280 −2 −2 (Wm (Wm 278 54 276 LW clear sky TOA flux SW clear sky TOA flux

274 53.5 294 296 298 294 296 298 Surface temperature Surface temperature Australia (K) Australia (K)

Figure 4.1: An example illustrating the process of determining the regional feedback amplitude for each of four radiative fluxes at the top of the model atmosphere measured against surface temperature. Each point represents one annual mean from a fifteen year simulation, where the CO2 concentrations are doubled at year 0. In each case, the feedback amplitude is taken as the gradient of the ordinary least square regression line. 66 later - and on the assumption that the sampling error can be modelled as white noise, it will not affect the leading modes identified. We verify this by adding varying amplitudes of additional white noise to the gradients at this stage and ensuring that the leading modes of the following analysis are robust. Later, we show in Figure 4.3 that the leading modes considered in the analysis are well defined and separated from later modes which might be attributable to sampling noise.

The process is repeated for each model in the ensemble and once evaluated, the data are weighted according to the area occupied by each region. The hemispheric regions overlap the land-based regions already defined. To avoid double counting; we replace the hemispheric regions with linearly calculated ocean only regions which do not overlap the already defined regions. These regions are calculated by assuming that the hemispheric regions are an area weighted mean over the (known) land regions and the (unknown) ocean regions, allowing a simple linear estimate of the ocean mean value.

The feedback response of each model is now described by a vector, length 4R, comprising the clear-sky and cloud feedback values for shortwave and longwave in each region. Before finding the Empirical Orthogonal Functions (EOFs) from the dataset, the vectors are assembled into a matrix, dimensions 4R x N, where N is the number of ensemble members. The anomaly matrix is obtained by subtracting the ensemble mean vector from each row.

4.2.2 EOF Analysis

The first of two EOF analyses is now carried out in order to identify the leading patterns of feedback response. The input for this EOF analysis consists of the radiative flux anomaly matrix described in Section 4.2.1, and the resulting EOFs are henceforth referred to as ‘feedback EOFs’. To perform this analysis separately on the different variables is possible, but the results are less well defined because important information about changes in cloud amounts and height can only be identified by using changes in different radiative fluxes simultaneously (Webb et al., 2006).

The set of EOFs is truncated where the sampling error in the eigenvalues is compa- 67 rable to the spacing between neighbouring eigenvalues, as described in North et al. (1982). They describe a rule of thumb for identifying significant modes:

δL 2 1/2

4.2.3 Regression Techniques

Climate Sensitivity has an inverse relationship to feedbacks in the climate system, and thus uncertainties in feedbacks are not linearly related to uncertainties in sen- sitivity (Roe and Baker, 2007). Feedbacks in the climate system tend to be linearly related to observations; this may be derived in simple cases such as the black-body response (where the feedback amplitude in W m−2K−1 may be derived as a func- tion of initial temperature, and can be approximated to be linear over the range of temperatures found in the climate system). In the case of more complicated observation-feedback relationships, this fact has been verified empirically Piani et al. (2005). Thus, we choose to predict the inverse of climate sensitivity, which is the global feedback parameter (λ in Eq. 4.2), rather than S itself. The regression model tests for a linear relationship between the identified feedback EOFs and global feed- back parameter, λ.

Throughout this section, if a value is “standardised”, then the ensemble mean of that value is subtracted and the anomaly is normalised by the ensemble standard deviation. We adopt the notation that the EOF analysis produces a set of EOFs, E, a set of principal components, P and eigenvalues L. The ‘feedback’ EOFs are shown by superscript ‘F ’ and the ‘climatology’ EOFs by superscript ‘C’.

To determine the importance of each feedback EOF in describing the overall vari- ability of S, we can perform a multilinear regression of the standardised feedback parameter, λ against the standardised feedback EOFs across the ensemble so that 68 we can predict λ as: K F λi = Pij αj + noise, (4.5) j=1 X where λi is the standardised global feedback parameter of the ith model in the F ensemble, K is the truncation number for the feedback EOFs and Pij is the stan- dardised principal component of the jth feedback EOF in the ith ensemble member.

Thus, αj is the correlation coefficient, indicating the significance of feedback EOF j in explaining the variance in λ.

Given that the feedback EOFs are orthogonal by construction, the standard regres- sion formula gives a simplified least squares solution for αj:

N F αj = Pij λi. (4.6) i=1 X The fraction of the total variance of λ represented by the truncated set of EOFs is now given by: K 2 αj . (4.7) j=1 X In the same way as we regress each EOF against λ, we can also regress each EOF against each perturbed model parameter (Table 3.2), in order to determine which parameter settings in the perturbed physics ensemble are most significant in ex- plaining the variation in each EOF.

A simple linear regression model is used to relate model parameter values to feedback amplitudes. The value xil represents the standardised parameter value for each parameter, l. We can now regress against the principal component of the jth EOF:

P F Pij = xilβlj + noise, (4.8) Xl=1 where P = 10 is the total number of parameters. We can solve for βlj as before:

N F βlj = Pif xil. (4.9) i=1 X 69

4.2.4 Projection of Observations onto Feedbacks

Once the major modes of feedback response have been identified, we seek to find predictors of this response which may be applied to observational data. This requires information from the control climate of each model simulation. To simplify the regression, we produce a second set of ‘climatology’ EOFs from the control state of the models.

To describe each model’s climate, we select a subset of model output data represent- ing the model’s base climate state. The variables are again taken from the regional output data employed earlier, but this time excluding the oceanic regions. While it is necessary to include ocean regions to describe the feedback responses that sum to give the global feedback parameter, we do not use ocean data in the climatology EOFs. The control stage is already relaxed to observed sea surface temperatures, making it difficult to properly verify these regions against observations.

We have elected to keep ocean quantities in the measures of feedback processes themselves as these are not directly compared to observations. To exclude them would lead to an incomplete description of the components of the global feedback parameter.

The result is a RV lines by N columns matrix where V is the number of variables, R is the number of regions and N is the number of models in the ensemble. The variables used in the second stage are listed in table 4.1.

In order to make the climatology EOFs, we adapt the methodology of Piani et al. (2005), where each ensemble member is projected onto independent modes of vari- ability, allowing the climate state to be represented by the principal components of those modes.

A normalised anomaly matrix is then obtained by subtracting the mean of each climate field from each ensemble member, and normalising by the standard deviation across 65 segments of 15 year periods of the field in a 500 year control simulation of HadCM3. Thus each field is normalised by an estimate of natural variability. Each element is area weighted as before. It is expected that the results of the analysis are not strongly dependent upon the choice of model used to estimate the normalisation 70

Climate fields chosen for analysis Climate Variable Dataset(s) used Surface Temperature (TAS) ERA40 / NCEP Seasonal Cycle in TAS (JJA - DJF) ERA40 / NCEP SW upward radiation at TOA ERA40 / NCEP / ERBE* LW upward radiation at TOA ERA40 / NCEP / ERBE* LW clearsky upward radiation at TOA ERA40 / NCEP / ERBE* SW clearsky upward radiation at TOA ERA40 / NCEP / ERBE* Total Precipitation ERA40 / NCEP Latent heat Flux at surface ERA40 / NCEP Omega at 500mb ERA40 / NCEP

Table 4.1: Climatological fields measured for comparison to observational datasets. Yearly means over all available data are taken in the land regions defined in Giorgi and Francisco (2000), plus four zonally averaged regions - northern and southern hemispheres and northern and southern extratropics. * ERBE data is used in regions where it is available. Missing data is taken from the respective reanalysis dataset. because only the standard deviation of the model fields is considered and not the absolute values, thus it was found that scaling fields by the interannual variability of the ERA-40 reanalysis did not produce noticeably different results. The HadCM3 control was used because variation over a large number of independent 15 year means was the appropriate normalisation for the 15 year control simulations.

The RV x N climate state anomaly matrix is subjected to an EOF analysis, creating C the set of climatology EOFs with principal components Pik . The set is truncated to the leading K modes where 95% of the variance in the ensemble is included.

We now follow Piani et al. (2005), by performing a linear regression to relate the C principal components of the climatology EOFs Pik to the principal components of F each feedback EOF Pij :

ne F C Pij = Pik γkj + noise, (4.10) Xk=1 where γkj is the regression coefficient. By multiplying the coefficients, γkj, by the C C climatology EOFs, Ekr, we can now produce a single predictor, Ejr, which links the climatology to each feedback EOF, showing which observations scale with feedback EOF j: 71

K C C Ejr = γkjEkr, (4.11) Xk=1 C where r is one of the R regions. The scalar product of Ejr with the standardised observations (treated identically to ensemble members) is the likely real-world value of the feedback principal component, j:

R F C P(obs)j = Ejror, (4.12) r=1 X F where or are the standardised observations in region r and P(obs)j is the predicted ‘real-world’ principal component of the jth feedback EOF.

F Uncertainty in the projection of P(obs)j is considered from two sources: the projection process itself and natural variability in the source climatology (uncertainty due to the observational bias itself is not considered). We estimate error in the projection process by measuring the mean error in the projection of each feedback EOF within the ensemble:

2 1 N ne σ (P F )2 = P F − P Cγ (4.13) proj j N − 1 ij ik kj i=1 ! X Xk=1 F The uncertainty in P(obs)j due to natural variability can be estimated from the 500 F year HadCM3 control simulation. We predict the likely value for feedback EOF Pj using equation 4.12 in each of 64 periods (where each period is a 15 year mean). F 2 The variance of this projection, σnat(Pj ) is an indication of natural variability (as estimated by HadCM3). In practice, σnat << σproj.

4.3 Results

4.3.1 Ensemble Mean Response

A geographical plot of the ensemble mean feedback response to warming is shown in Figure 4.2. The plot shows the average change in net downward radiative flux 72

d(LWcld)/dT d(LWcs)/dT

d(SWcs)/dT d(SWcld)/dT

−5 0 5 −ve feedback (cooling) +ve feedback (warming) Net Surface radiative response to warming (Wm−2K−1)

Figure 4.2: A plot of the ensemble-mean feedback response to warming on an equal area map shown for each area used in the analysis. Blue represents negative feed- backs, where net downward radiation at the top of atmosphere decreases as the surface temperature rises. Positive feedbacks are shown in red, where the net down- ward flux increases with rising temperatures. Non-overlapping regions (primarily ocean) are linearly calculated over each latitude band. 73 per unit surface warming. Thus a more positive response corresponds to a more sensitive system (in surface temperature). The ensemble mean may hide a great deal of variety, but serves as a reference point for the rest of the analysis.

The fundamental feedback responses which one expects in the climate system are visible in this plot (these are described at length in Section 1.2). The longwave clearsky response to warming is a combination of the surface response described by the Planck function, with a reduction due to water vapour feedback. Hence the clear-sky longwave flux increases almost everywhere on warming, creating a negative feedback. The exception is tropical ocean regions, where we infer that the large surface heat capacity together with large humidity increases create a weak positive clear-sky feedback. The ensemble mean shows little longwave CRF response to the rising surface temperatures.

In the shortwave cloudy component, the sign of response differs between tropical and high latitude regions. Over the Tropics, there is a reduction in sunlight reflected by clouds on warming while over the extratropics, especially over northern hemisphere landmasses, we see an increase. Such an effect was observed in the GISS model by Tselioudis et al. (1998), as well as observational studies - Dai et al. (1997) shows an observed increase in cloud cover over the former USSR while Hahn et al. (1996) shows decreases in China, South America and Africa as observed in Figure 4.2.

In higher latitudes, the increase in cloud cover is generally attributed to increases in vertical cloud extent and cloud water with increased relative humidity. On the other hand, in warmer latitudes, an increase in precipitation efficiency (LAU and WU, 2003) and cloud-top entrainment act to decrease the cloud water content and cloud extent on warming.

Although no information on cloud height is output in the regional data, the lack of any compensating increase in outgoing longwave radiation suggests a decrease in low- level tropical cloud. Such an effect was suggested for regions of subsistence by Bajuk and Leovy (1998), but Bony and Dufresne (2005) highlighted the inconsistency among climate models in the sensitivity of marine boundary layer cloud to warming.

The shortwave clear-sky component is positive over landmasses due to the retreat of snow and ice covered areas on warming, increasing the net shortwave radiation 74 absorbed at the surface. This effect is most dominant over the Northern Hemisphere landmasses.

Webb et al. (2006) found that the variation in low cloud response to surface warming was the most critical for determining global feedback strength. Other groups have evaluated the control model response in HadSM3 (although the climateprediction.net mean response is not necessarily identical to the control model). Williams et al. (2003) evaluated local tropical cloud response to feedbacks by categorising response by both change in sea surface temperature and change in vertical velocity - this process was repeated in both HadSM3, HadSM4 and with ERA-40 reanalysis data coupled with the ISCCP simulator (KLEIN and JAKOB, 1999). This method allowed cloud feed- backs to be separated by process. The group found (in the HadSM4 model, where the most detailed study was made) that the large responses in cloud cover upon a doubling of carbon dioxide were found in the high thick and medium cloud cover, where the response was strongly related to changes in circulation (rather than local SST changes). In contrast, the other dominant change was in low, medium thickness cloud, which was strongly coupled to the change in local SST. In terms of forcing response, this meant that the tropical longwave cloud forcing response to warming was almost entirely determined by circulation (greater ascent resulted in more high level clouds and more positive forcing). Shortwave forcing was due to low and high level clouds and was therefore a function of both circulation and SST (greater as- cent and increased SST both resulted increased shortwave forcing due to increase in tropical cloud cover)¿

4.3.2 EOF Analysis

The process of taking the matrix of regional radiative responses for all ensemble members and performing an EOF analysis is explained in section 4.2.2. The spacing of eigenvalues indicates whether modes may be considered independent and non- degenerate (see equation 4.4). Figure 4.3 shows the two leading modes are well separated and non-degenerate. Interpreting degenerate modes is more troublesome, as sampling noise can cause various linear combinations to be extracted (see North et al. (1982)). Thus for the purposes of this work, we concentrate on the two 75

30

25

20

15 Eigenvalue

10

5

0 1 2 3 4 5 6 7 8 9 10 EOF number

Figure 4.3: A plot of the eigenvalues of the feedback EOFs resulting from the princi- pal component analysis of feedback patterns in the climateprediction.net ensemble. Central lines indicate eigenvalues, the sampling error lines are calculated by equation 4.4. 76 dominant modes.

To determine the proportion of total variance in global feedback parameter described by these two modes, we use equation 4.7. The first mode accounts for just over sixty percent of the variance in global feedback parameter, while the second accounts for twenty percent. Throughout the rest of the work, we adopt the sign convention that feedback EOFs scale inversely with λ - thus a positive value of a feedback EOF in a given model results in a positive feedback, giving a higher value of S than the ensemble mean.

To understand the implications and physical significance of the first mode, we ex- amine how its amplitude relates to the original model parameter perturbations. Using the linear model as described in equation 4.8, we can determine which model parameters are dominant in determining the amplitude of this EOF in the ensemble.

The normalised regression coefficients, shown in Figure 4.4(b), show that the ampli- tude of this mode is to a large degree determined by a single parameter perturbation, that of the ‘entrainment coefficient’, accounting for 85% of the variance. The corre- lation is strongly negative, implying that a reduction in the value of the coefficient leads to a positive amplitude of EOF 1 and a more sensitive model.

The function of the entrainment coefficient is described fully in Chapter 3. It is a parameter in the model convection scheme (Gregory and Rowntree, 1990). The model simulates a statistical ensemble of plumes inside each convectively unstable grid cell. On each model layer, a proportion of rising air is allowed to mix with surrounding environmental air and vice-versa, representing the process of turbulent entrainment of air into convection and detrainment of air out of convection. The rate at which these processes occur in the model is proportional to the entrain- ment coefficient. Hence, in models with a reduced value of this coefficient, buoyant columns are less diluted as they rise. This in turn preserves the buoyancy of the convecting air in the column, and allows it to reach higher altitudes.

The implications of perturbing the entrainment coefficient are shown by examining the results of a “single-parameter” run, where the only parameter change made from the base model is to reduce the entrainment coefficient. Single-perturbation exper- iments show that the “low” perturbation produces a significantly altered tropical 77

d(LWcld)/dT d(LWcs)/dT

d(SWcs)/dT d(SWcld)/dT

−5 0 5

(a) −ve feedback (cooling) +ve feedback (warming)

Feedback Pattern 1

Albedo at Melting point Accretion Constant Precipitation Threshold Albedo Temperature range Entrainment coefficient Cloud ice type Ice size Ice fall speed Empirically Adjusted Cloud Fraction Critical Relative Humidity

Sum Squares: 0.8652 −1 −0.5 0 0.5 1 (b) Regression Coefficient

Figure 4.4: (a) Geographical plot of feedback EOF 1 showing patterns of regional radiative feedback anomalies in the ensemble. Sign convention is such that surface regions in red receive more radiation (than the ensemble mean) on warming, causing positive feedback. (b) A graphical representation of the regression parameters from a multilinear regression, where the predictors are normalised model parameters and the predictand is the amplitude of feedback EOF 1. Thus, parameters with a positive coefficient are positively correlated with the feedback EOF 1. 78 relative humidity distribution (Fig. 4.5); the middle troposphere is drier while the upper troposphere / lower stratosphere (UTLS) region is more moist.

The mode itself can be represented graphically, as perturbations on the ensemble mean response, shown in Figure 4.2. We present EOF 1 in Figure 4.4(a); it shows a perturbation to the longwave clear sky response in most land regions such that less LW radiation is able to escape on warming with some longwave cloud radiative forcing in a similar pattern.

We see a compensating increase in the tropical shortwave CRF response - strongest in regions of mean ascent (e.g. Amazon Basin, Southeast Asia). Meanwhile, over Northern Hemisphere landmasses, there is a reduction in shortwave forcing on warm- ing. Section 4.3.4 shows how the degree of compensation is altered by low cloud formation parameters in the perturbed physics ensemble.

The longwave CRF response to warming is only weakly anti-correlated to the short- wave CRF response, suggesting that different processes may be responsible. The general global trend shows a positive feedback, implying more longwave cloud forc- ing on warming. However, exceptions occur in Western Africa, Northern Europe and North America where longwave CRF decreases on warming.

The 1xCO2 climate (relative to the climateprediction.net ensemble mean) associated C with this feedback is shown by examining the climate state vector, E1r (equation 4.11). Like the EOFs themselves, this vector may be plotted geographically to show regions where observational fields in the control simulation are positively or negatively correlated with the feedback EOF (see Figure 4.6(a)).

The land-based climatology associated with positive values of feedback EOF 1 shows models tend to be warmer over northern hemisphere landmasses and cooler in the tropics. The dominating feature of the climate is a substantially reduced low cloud cover (as implied by the shortwave component of the CRF). However, this reduction in low cloud has very little effect on the longwave energy budget.

The climatology shows an enhanced clear-sky greenhouse effect, which is attributable to the increased high level moisture shown in figure 4.5. The regions showing an enhanced LWCS feedback in the feedback pattern correlate well with those regions showing an enhanced greenhouse effect in the control - with strong effect over the 79

(Low Entcoef − Control) Relative Humidity (%))

35

30

25

20

Height (km) 15

10

5

0 −50 0 50 Latitude

−30 −20 −10 0 10 20 30

Figure 4.5: A difference plot (in a pre-industrial control simulation) between a single- perturbation model with a low entrainment coefficient compared to an unperturbed control for zonal mean relative humidity. Regions shown in blue have a higher relative humidity in the low entrainment simulation. 80

(a) Surface Temperature Clearsky inv. GH effect Cloudy inv. GH effect Reflected SW

Latent Heat Flux Seasonal cycle (TAS) Vertical Velocity Total Precipitation

Standard Deviations of natural variablity from cpdn mean

−4 −2 0 2 4

(b)

Surface Temperature Clearsky inv. GH effect Cloudy inv. GH effect Reflected SW

Seasonal cycle (TAS) Vertical Velocity Total Precipitation Latent Heat Flux

Standard Deviations of natural variablity from cpdn mean

−4 −2 0 2 4

Figure 4.6: (a) A geographical plot of the climate state vector, taken from the 1xCO2 run (anomaly from ensemble mean) which best scales with the (first,second) feedback EOF. Regions shown in red (blue) show positive (negative) correlation with feedback EOF 1. Clearsky inverse GH effect is the LWCS flux at the top of the atmosphere divided by the surface LW flux. Cloudy inverse GH effect is the longwave cloud radiative forcing divided by upward surface flux, ‘Reflected SW’ is the upward shortwave flux, ‘Seasonal Cycle (TAS)’ is (JJA-DJF) surface temperature, ‘Vertical Velocity’ is the vertical pressure velocity on the 500mb pressure level (updraft -ve) and ‘Latent Heat Flux’ is at the surface. (b) Same as in (a), but for feedback EOF 2 81

Asian landmass. It is likely that this moisture allows the formation of more high cloud, which also has a large effect on the longwave energy budget. This is shown in Figure 4.6 by an increased cloudy greenhouse effect. This is especially visible in regions showing enhanced uplift (SE Asia, Australasia).

The second EOF is shown in Figure 4.7(a). In contrast to the first pattern, the effect of this EOF is almost entirely restricted to changes in cloud response. Also, whereas the first EOF affected mainly the tropics - the second shows a consistent response over all landmasses, with little to no effect over the oceans. A positive value of the pattern suggests a model which produces more cloud on warming. This is shown in Figure 4.7(a) by the increased cloud based reduction of outgoing longwave radiation, coupled with an increase in reflected sunlight on warming. This is dominantly a land based effect, which is notably strongest in the northern hemisphere.

The results of a regression onto the model parameters are shown in Figure 4.7(b) which reveals a single parameter, the ‘ice fall speed’, dominates the amplitude of the second feedback EOF in the ensemble, accounting for over 70% of the variance. The ice fall speed is a microphysical parameter in the model cloud scheme (based on Smith 1990 and updated by Gregory and Morris 1996). The parameter scales the speed at which ice particles may fall in clouds. A larger ice fall speed leads to larger particle sizes and increased precipitation. The climatological impact of changing ice particle size and fall speed parametrisation in radiative convective models has been studied by Wu (2001) and Grabowski (2000). Both these papers found that a model with a low ice fall speed would produce a warm, cloudy, moist lower troposphere with less precipitation.

Figure 4.6(b) shows the climatology EOF which scales with feedback EOF 2 (see section 4.2). The figure shows that the feedback amplitude is correlated with an increased greenhouse effect due to both clouds and water vapour, compensated by increased shortwave cloud forcing. This is consistent with a globally cloudier, more moist world with lower precipitation than the ensemble mean, as suggested by Wu (2001) and Grabowski (2000). There is also a large decrease in latent heat flux at the surface, which we infer follows on from the reduction in surface insolation.

In summary, a positive value for the second feedback EOF produces more layer cloud 82

d(LWcld)/dT d(LWcs)/dT

d(SWcs)/dT d(SWcld)/dT

−10 −5 0 5 10

(a) −ve feedback (cooling) +ve feedback (warming)

Feedback Pattern 2

Albedo at Melting point Accretion Constant Precipitation Threshold Albedo Temperature range Entrainment coefficient Cloud ice type Ice size Ice fall speed Empirically Adjusted Cloud Fraction Critical Relative Humidity

Sum Squares: 0.55987 −1 −0.5 0 0.5 1 (b) Regression Coefficient

Figure 4.7: (a,b) As Figure 4.4, for feedback EOF 2 83 and humidity aloft which both produce enhanced longwave forcing on warming. The increased layer cloud also causes a compensating shortwave forcing. All of these effects occur globally over land.

4.3.3 Observational Projections

Section 4.2.4 describes how the amplitude of each feedback may be estimated using data from observational and reanalysis datasets. The reanalysis datasets used are ERA-40 and NCEP, but we have also created two extra hybrid datasets which combine ERA-40 / NCEP climatology with ERBE estimated top of atmosphere radiative fluxes.

In each case, we predict the amplitudes of feedback EOFs 1 and 2 following the methodology described in section 4.2.4, assuming the observations can be treated as a member of the climateprediction.net ensemble. The results are shown in Figure 4.8, which shows climate sensitivity (S ) as a function of feedback EOFs 1 and 2. The projections of reanalysis data onto this space are shown in the same plot. A smooth function is shown to illustrate the best-fit relationship between the two feedbacks and climate sensitivity, allowing a fourth order polynomial relationship to exist between them.

The ensemble is divided into two clusters, separated in the feedback EOF 1 di- mension. The larger left-hand cluster represents the models with the entrainment coefficient set to a high or standard setting, while the smaller cluster represents the low entrainment simulations. We can see from Figure 4.8 that the observations partly project onto a region between the two clusters that has not been sampled in the ensemble.

The two modes account for 70% of the ensemble variance in λ, with all of the high S simulations (that is, S > 7K) lying in the quadrant where both feedback EOFs are positive. However, not all simulations in this quadrant have S > 7K (see section 4.3.4).

The projection from the ERA-40 and NCEP reanalyses both project into the larger cloud of the ensemble, consistent with the standard value of the entrainment coeffi- 84

Sensitivity (K) as function of EOFs 1 and 2 ERA40

5 5

4 8 6 NCEP

10 ERA40/ERBE

4 2 7 NCEP/ERBE HadAM3 3 AMIP models

3 8 10 2 5 6 11 4

7 10 1 9 2

8 8 0 6 7 7

Feedback EOF 2 Coefficients 5

3 4 6 −1 5 5 Climate Sensitivity (K) 4 4 −2 3 3 2 3 −3 2 −2 −1 0 1 2 3 Feedback EOF 1 Coefficients

Figure 4.8: A plot of the amplitude of rotated feedback EOF 2 as a function of feed- back EOF 1. Each point represents a model in the climateprediction.net ensemble, and is coloured according to its value of S. The ellipses represent projections of reanalysis data onto the space defined by the two EOFs, their size represents the uncertainty in the projection process from control climate state to feedback ampli- tude (which is the significant source of error). The yellow ellipse is a projection of the unperturbed HadAM3 model. The contour lines are gradients of a smoothed function of climate sensitivity in the space. The dashed rectangle represents the region sampled for EOF S1. 85 cient. The hybrid datasets using ERBE fluxes (in available regions) are both shifted towards the low entrainment cloud.

Examining the projection coefficients (ejr, not shown) shows the discrepancy be- tween ERBE and the reanalysis projections is largely due to differences in short- wave top of atmosphere fluxes. Weare (1997) noted that the reflected shortwave radiation from the NCEP reanalysis was generally some 20-30 Wm−2 greater than in ERBE. This was attributed to overly reflective clouds in the NCEP model. Allan et al. (2001) found that shortwave flux in ERA-40 also tends to exceed ERBE by approximately 20 Wm−2 in the tropics. They found this was again mainly due to overly reflective clouds, coupled with some excessive low cloudiness in free-ocean subsidence regions.

An examination of the climatology EOF associated with feedback 1 (figure 4.6(a)) shows the dominant feature in the predictor for feedback EOF 1 is a decreased re- flected shortwave flux. Hence, the lower shortwave flux estimated by ERBE produces an increased predicted value for feedback EOF 1. Reflected flux is a complex func- tion of cloud type and distribution, and the available data for climateprediction.net is limited - however, values of the entrainment coefficient slightly below that used in HadAM3 cannot be excluded on the basis of this analysis - but the cloud associated with the ‘low’ value used in climateprediction.net is inconsistent with the projections (see Figure 4.10).

4.3.4 Sub-ensemble analysis

As noted before, Figure 4.8 shows that all the simulations of very high sensitivity (S >7K) lie in the quadrant with positive values of Feedback EOF 1 and 2 - with low ice fall speed and low entrainment. However, not all simulations in this quadrant are sensitive. To investigate the variation of S in this region, we isolate a small portion of the ensemble by limiting the values of EOF 1 and 2, as illustrated by the dotted rectangle in Figure 4.8. The choice of rectangle is essentially arbitrary, and could be applied anywhere within the ensemble. The rectangle used was chosen to include the highest sensitivities in the ensemble and the area was kept small enough to ensure that variance in S within the rectangle was dominated by a feedback EOF 86 uncorrelated with those already discussed.

The values of S in the region range from 3 to 12 degrees Kelvin. Hence, clearly there are processes other than those represented by feedback EOFs 1 and 2 which account for the variation. By repeating the EOF analysis of feedback response in this subset (see section 4.2.2), we can extract a leading non-degenerate mode (hereafter EOF S1). In this region, EOF S1 accounts for 82% of the variance in λ.

The geographical pattern of feedbacks associated with feedback EOF S1 is shown in Figure 4.9(a). A positive value of the pattern shows an increase in reflected short- wave radiation on warming with limited longwave compensation and little clearsky greenhouse effect. Hence, increasing the value of feedback EOF S1 decreases the negative shortwave feedback more than it increases the positive longwave feedback (suggesting a low cloud process where cloud temperatures are similar to the surface). Altering the degree of compensation between these effects produces the observed range of climate sensitivity in the rectangle shown in Figure 4.8.

We can again regress the amplitude of EOF S1 onto the model parameters as de- scribed in section 4.2.3, to determine which parameters cause this modulation of SW feedback (we exclude the entrainment coefficient from the regression because it does not vary in the region considered). The results are shown in Figure 4.9(b). This time, we observe no single parameter is responsible for the amplitude of the pattern. However, the most significant parameters are involved in determining the overall cloud cover (see Chapter 3):

• Critical Relative Humidity (+ve) - the grid-box-mean relative humidity at which cloud is first formed. An increase will reduce the cloud coverage for a given relative humidity by decreasing the width of the specific humidity distribution about the grid box mean (Smith et al., 1997).

• Accretion Constant (+ve) - An increase in the accretion constant allows more cloud droplets to be ‘swept away’ by falling rain, reducing the overall cloud coverage (Smith et al., 1998).

• Empirically Adjusted Cloud Fraction (-ve) - The empirically adjusted cloud fraction scales the cloud coverage for given atmospheric conditions, thus a 87

d(LWcld)/dT d(LWcs)/dT

d(SWcs)/dT d(SWcld)/dT

−10 −5 0 5 10

(a) −ve feedback (cooling) +ve feedback (warming)

Feedback Pattern S1

Albedo at Melting point Accretion Constant Precipitation Threshold Albedo Temperature range Entrainment coefficient Cloud ice type Ice size Ice fall speed Empirically Adjusted Cloud Fraction Critical Relative Humidity

−1 −0.5 0 0.5 1 (b) Regression Coefficient

Figure 4.9: As figure 4.4, for the first subset feedback EOF, S1. This part of the linear analysis examines only the subset of models shown in the dashed rectangle of Figure 4.8. The entrainment coefficient is excluded from the regression because it does not vary over the subset of models selected. 88

decrease will produce a decrease in cloud cover. The value itself refers to the large-scale cloud coverage when the specific humidity in the grid-cell is equal to the saturation value, and defaults to 0.5 (Smith et al., 1997).

• Cloud water threshold for precipitation (-ve) - the minimum cloud vapour density for the onset of rain (the threshold is smoothed to be a continuous likelihood function). If decreased, clouds will precipitate more easily and cloud coverage will decrease. The model has different values over land and sea because fewer condensation nuclei are available over ocean regions - but these values are perturbed together (Table 3.2 and Smith et al. (1998)).

• Cloud Ice Type - the ensemble uses two sets of parameterisations for cloud ice, which result in altered expressions for the extinction depth, single scatter albedo and scattering direction (described as functions of cloud water content and effective and effective ice particle size). The scheme is based upon Slingo (1989), where the parameterisations are fully defined (Ingram, 1990).

In section 4.3.2, we found that the low entrainment parameter setting associated with feedback EOF 1 produced both large positive longwave feedbacks from high cloud and moisture buildup, partly compensated by shortwave forcing from low cloud buildup. Hence, by decreasing the low cloud coverage in these models, we decrease the amplitude of the shortwave compensation. Thus, the clearsky longwave effect due to increased high level cloud and water vapour aloft dominates and the highest values of S are observed.

By repeating the analysis in the S1 region, we have overcome one of the limitations of using a linear analysis - that parameters can interact non-linearly. This is shown to be the case in climateprediction.net , where the primary parameter, entcoef, has an effect on the impact of the secondary parameters listed above. In the main body of the ensemble (where entcoef is set to the high or medium values), the response is well described by the two leading order parameters entcoef and VF1. However, this is certainly not true for the case where entcoef is set to the low value - where the secondary parameters discussed above become very significant in determining the climate sensitivity of the model. 89

4.4 Sensitivity Distributions

4.4.1 Frequency Distribution of Sensitivity

The prediction of the likely sensitivity of the real world is a complex question. Data from climateprediction.net has already been used to make probability distributions for S in Piani et al. 2005 and Knutti et al. 2005. Although we do not explicitly set out to define probabilities here, we can examine the distribution of sensitivity in the region of feedback space in which the observations lie.

A natural first course of action would be to repeat the process of analysing the sub-ensemble in the region of the observations, but we are hindered by the fact that the observations lie outside the sampled region of the space (as shown in Figure 4.8). Hence, any distribution of sensitivity we produce is limited to knowledge of the likely values of feedback EOFs 1 and 2.

In the first results paper from the climateprediction.net project, Stainforth et al. (2005) published a histogram of S for a set of the early models returned in the ensemble. In figure 4.10, we update that figure with all of the simulations used for the current analysis. It is desirable to produce a histogram of the models which lie close to the observations. In order to produce a histogram of ensemble values for S , we consider three scenarios: In the first case, we present a histogram of sensitivity for all models in the ensemble. The second restricts the predicted value of feedbacks EOFs 1 and 2 to those lying within any of the observational projections shown in figure 4.10. The third case requires models to be consistent with all of the observational projections shown in figure 4.10.

The histograms are presented in Figure 4.10(b). They are not probability density functions for the sensitivity of the real world, rather a distribution of perturbed ensemble members which are consistent with the observational projections of the identified feedback EOFs 1 and 2. The shape of the distributions is clearly depen- dent on the sampling strategy of the ensemble (see Frame et al. 2005). This property is particularly obvious in this case because the observations (especially those using ERBE data) project into a poorly sampled region of the ensemble. However, it is noticeable that the median and 95th percentiles in the distribution occur at a consid- 90

ERA40 NCEP ERA40/ERBE 4 1500 NCEP/ERBE

3 HadAM3

2 1000

1

0 Feedback EOF 2 500 −1 Number of Simulations in bin

−2

−3 0 −2 −1 0 1 2 3 0 2 4 6 8 10 12 Feedback EOF 1 Climate Sensitivity (K)

All models

Models in agreement with any observations Models in agreement with all observations

Figure 4.10: (a) As for Figure 4.8, but colouring indicates the regions which are consistent with observations. Regions are inclusive - so that each represents a sub- group of its parent. (b) Histograms of the models’ emulated climate sensitivity in each of the three coloured regions defined in (a) 91

5th P.tile Median 95th P.tile All models 2.17 3.42 8.92 Models consistent with any observation 2.68 3.94 6.24 Models consistent with all observations 3.59 4.37 5.42

Table 4.2: Percentiles and median values for climate sensitivity in various subsets of the climateprediction.net ensemble (shown in Figure 4.10. All values for sensitivity are given in K. erably lower value of sensitivity if we consider only the models with observationally consistent values for feedbacks 1 and 2 (Table 4.2). This provides some reassurance that a constraining the likely value of these feedbacks within the ensemble provides a constraint on S itself.

4.4.2 Prediction of likely climate sensitivity

The distributions illustrated above, however, are histograms and not not likelihood distributions for S. As explained in Frame et al. 2005, the prior distribution of models in any distribution is influential on the shape of the distribution - and in this case, the shape of the distribution is dependent on the parameter sampling strategy in the climateprediction.net ensemble.

Piani et al. (2005) tackled this issue by using natural variability as their prior - using orthogonal patterns of climatic variability as predictors for climate sensitivity. Their method is discussed in full in Chapter 2. If we follow the methodology of Piani et al. (2005), we can predict λ from each of the observational datasets shown in Figure 4.8. Figure 4.11 shows likelihood functions for λ given climatological data from each of the four datasets. These are created using the methodology of 4.2.3, F but instead of predicting the feedback principal components (Pij ), we predict the overall feedback parameter λ.

The bounds on S resulting from the projections are listed in Table 4.3. The projec- tions using ERBE fluxes are shifted to noticeably higher sensitivities, thanks to the higher projected value of feedback EOF 1 - as shown in Figure 4.8 and discussed ear- lier. For comparison, Piani et al. (2005) found limits of 2.2 and 6.8K - but included a crude measure of systematic model error by expressing HadCM3 variability about the climateprediction.net ensemble mean, which effectively added an arbitrary con- 92

Projected CS (K) 10 6 5 4 3 2

0.5 2

) 0.4 −1 (K

λ 3 0.3 4 Evaluated CS (K) Evaluated 0.2 5 AMIP Models 6 ERA 0.1 NCEP 10 ERA/ERBE NCEP/ERBE 0 0.1 0.2 0.3 0.4 0.5 Projected λ (K−1)

Figure 4.11: Likelihood distributions estimated for real world value of the global sen- sitivity parameter, λ using method of section 4.2.3. The plot follows the convention and methodology of Piani et al. (2005); the horizontal axis represents the predicted value of λ for each climateprediction.net model, calculated from regressing against each model’s control climatology. The vertical axis shows the feedback parameter estimated from the simulation, thus the vertical spread of the distribution of models shows the total error in predicting λ from model climatology in the ensemble. The distributions on the lower axis show the projection of observational datasets onto λ, predicted in the same fashion. The width of the lower distribution is due only to interannual natural variability in the observational datasets. The vertical dis- tributions add the full prediction error to each observational prediction to produce a final likelihood distribution for λ from each dataset. Black circles indicate the projected λ for the AMIP ensemble, where values shown are the effective sensitivity (Murphy, 1995) taken from the equivalent coupled models as listed in the IPCC TAR (Ramaswamy et al., 2001) 93

5th P.tile Median 95th P.tile NCEP 2.45 3.31 5.10 ERA40 2.54 3.50 5.59 NCEP / ERBE 2.92 4.23 7.75 ERA40 / ERBE 2.87 4.13 7.32

Table 4.3: Percentiles and median values for climate sensitivity resulting from the projections shown in Figure 4.11. All values for sensitivity are given in K, and are obtained by inverting the global feedback parameter. stant to the estimate of model variance which was related to error estimate in the prediction. This process was not conducted for the analysis described here because the estimate is essentially arbitrary, depending on the climateprediction.net mean which is entirely dependent on the parameter sampling used in the ensemble. It is also recognised that these projections are linear estimates, which may not be able to capture processes such as those described in section 4.3.4, which only apply in some parts of the ensemble. Thus, these constraints are provided to illustrate the origins of differences in projections using different datasets, rather than getting the best possible constraint on climate sensitivity.

4.5 Conclusions

Variation in the response to greenhouse-gas forcing in the climateprediction.net en- semble of climate models is largely dominated by two global feedback mechanisms. The first is associated with the ‘entrainment coefficient’, a parameter in the model’s convection scheme. A reduction in this parameter results in a moistened upper tro- posphere in the Tropics, and introduces the potential for convective feedbacks with increasing surface temperature.

On warming, increased relative humidity aloft causes an exaggerated clearsky green- house effect. The feedback also shows a cloudy component, with a positive longwave cloud forcing in the regions of increased humidity. Both of these effects are to be expected from the enhanced clear sky and cloudy sky greenhouse effect shown in the control simulations of models with a low entrainment setting. Previous pa- pers (RODWELL and PALMER, 2007) have suggested that the models with a low entrainment coefficient are unrealistic. Our projections also suggest that models 94 exhibiting a very strong feedback of this type are unlikely given the observations. However, there is some discrepancy between predictions, which are largely depen- dent on the value of up-going shortwave flux in the observations; ERBE estimates of shortwave flux are noticeably lower than ERA-40 or NCEP resulting in predic- tions of a stronger feedback and higher climate sensitivities when using ERBE data. Given this uncertainty, there remains a possibility that positive feedbacks of this type may be stronger than when estimated using the unperturbed HadAM3 model.

The second independent feedback process is associated primarily with the ‘Ice Fall Speed’ parameter, a reduction of which leads to globally increased cloud coverage and humidity. Such models show a decreased longwave response to surface warming, again with some shortwave compensation due to globally increased cloud cover. Again, the observational projections suggest that models in the ensemble showing a very strong feedback of this type are unlikely.

For models where both of the feedback EOFs were strongly positive, there is still some variation in climate sensitivity. A second EOF analysis on this subset of ensemble models reveals a set of ‘secondary’ cloud formation parameters was found to control the extent of shortwave compensation associated with both of the primary feedbacks. Large values of climate sensitivity could be achieved by combining low entrainment with parameters set to reduce low cloud formation. The observational projections do not fall within this region of potential very high climate sensitivity.

With this approach we demonstrate how to approach some of the non-linear aspects of the ensemble with a purely linear analysis. The very large number of simulations available allows for a sequence of analyses: the primary EOF analysis establishes different regimes within the ensemble. Then, by repeating the analysis on smaller subsets of similar models, we have determined additional variation which may not scale across the entire ensemble and thus would not be discovered by a single EOF analysis.

The ‘real-world’ likelihood of these processes is a complex issue. In Section 4.3.3 - we make the assumption that valid predictors from the ensemble may be applied directly to observations, and find that very strong positive value is unlikely for either of the main feedback EOFs. In reality, the error in such predictions is likely to be larger 95 than indicated because of systematic model error. This occurs because we do not know the validity of the predictors when applied to other models (or observations). Thus, we cannot categorically exclude the possibility of strong feedbacks of this type in the real world leading to very high climate sensitivities. Such issues could be tackled in further work by using multi-model perturbed ensembles, where predictors would be more independent of underlying model structure.

One issue which is highlighted by this analysis is the poor sampling strategy of the original climateprediction.net ensemble. It is clear from figure 4.8, for example, that there is a clear gap in model behaviour between different values of the entrainment coefficient. It is unfortunate that this gap overlaps the region of the ensemble which was closest to the observations. This issue motivates the next chapter; finding a non-linear interpolation which is able to ‘fill in the gaps’ over the unsampled region of the ensemble, and determine parameter values which would be associated with the most likely models. Chapter 5

Model Optimisation with Neural Networks

‘ “I see how thine eye would emulate the diamond” ’ —William Shakespeare, 1602 “The Merry Wives of Windsor”

5.1 Introduction

In Chapter 2, the concept of irreducible error was introduced. This is the component of the model error which may not be reduced by further tuning of model parameters (in this case, ‘error’ is taken as some arbitrary normalised difference between models and observation). In this chapter, we explore the nature of the irreducible error in the HadAM3 model - how it is dependent on the number and type of observations used and how it is directly relevant to ensemble predictions of S.

The two predictions of climate sensitivity using climateprediction.net ; Piani et al. (2005) and Knutti et al. (2006) used the ensemble to find correlations between S and observable quantities in the climate system. Once established, the predictors were applied to observations of true climate, treating those observations as members of the ensemble. Such approaches make the implicit assumption that the ‘perfect model’ is a possible tuned state of the GCM, ignoring the irreducible error described

96 97 in Rougier (2007). To ignore this component of the error means that some degree of extrapolation is required when applying predictors to observations, adding an unknown error to the results. Knutti et al. (2006) already noted that because of the structural biases in HadAM3, the relation between observed and predicted quantities did not always hold in different GCMs.

To address this issue, we seek to identify the irreducible components of model error by minimising the model-observation discrepancy over a perturbed physics ensemble. To do this, we use the climateprediction.net ensemble to fit a surface representing key model output as a function of the model parameter values. This surface can be used to find those models with output closest to observations. By also predicting how climate sensitivity varies with model parameters, we can restrict consideration to models with a specific value of S thus examining how the systematic error varies with equilibrium response. The relative systematic errors at different values of S provide some constraint on sensitivity.

The surface fitting procedure requires an emulator for the parameter dependence of model climatology. The method used in previous studies (Murphy et al., 2004) was to conduct a set of single-perturbation experiments, each producing a range of observable diagnostics and an estimate of equilibrium response. It was then as- sumed that observables for any combination of the individually perturbed parameter settings could be estimated by linear interpolation from the single-perturbation sim- ulations (some allowance for non-linear parameter dependence was made by using some multiply-perturbed simulations). Thus they linearly predicted the equilibrium response of a large number of randomly generated parameter settings. A likelihood weighting was predicted for each simulation based on its predicted closeness to ob- servations. The resulting Probability Density Function (PDF) for climate sensitivity was produced by generating a weighted histogram of equilibrium response for the simulated ensemble, each model weighted according to its predicted likelihood, as judged by its Climate Prediction Index (or CPI - the combined normalised root mean square error over a number of different mean climate variables).

The use of a linear model to predict the model response as a function of parameter values was tested by Stainforth et al. (2005). In this paper, conducted with the re- sults taken from the climateprediction.net ensemble of climate models, it was found 98 that a linear prediction of S made by interpolating the results of single parameter simulations was a poor estimator of the true response of a multi-perturbation sim- ulation. The ensemble response landscape was found to be a nonlinear function of model parameters.

Hence in this chapter, we propose the use of an Artificial Neural Network (ANN) whose weightings may be trained to best relate the perturbed parameters of a climate model to both output diagnostics and equilibrium response. We choose to focus on the equilibrium response, S rather than λ (as in Chapter 4), because it is S that we ultimately wish to know. Whereas in Chapter 4, we were limited by the natural linear relationship between λ and observations, the neural network approach used in this chapter is not restricted to linear relationships and so allows us to predict S directly. The application of non-linear neural network techniques to analyse climate model output is not new (Hsieh and Tang (1998), Knutti et al. (2006) among others), and the use of a neural network to directly emulate climatological model output from perturbed parameter settings has been suggested (Collins et al., 2006a).

RODWELL and PALMER (2007) conducted an ensemble using initial value fore- casting techniques, where the performance of a model was judged by the rate at which it diverged from an observational state. Their work suggested that some of the discrete parameter settings which lead to high sensitivity in the climateprediction.net en- semble may lead to unrealistic atmospheres which quickly diverge from observations. Hence, by using the ANN to emulate an ensemble based on a Monte-Carlo style pa- rameter sampling scheme, we explore the error as a continuous function of model’s parameter space and not just at the extreme values.

A statistical emulator has recently been developed in parallel to this work by Rougier and Sexton (2007). However, whereas these authors use the emulator to determine the effect of different sampling regimes in parameter space - the distributions shown are made by taking different plausible distributions of models in parameter space and using the emulator to map these into distributions of global climate response. As we will discuss in this chapter, the work presented here uses a different approach - where the ease of performing a large number of emulator simulations is exploited in order to produce distributions which are effectively independent of the parameter sampling strategy. By searching for the model with least discrepancy from observations at a 99 given value of S, and increasing the ensemble size until the error distribution remains constant - we effectively make the result independent of sampling strategy, albeit by a ‘brute force’ methodology. The method, of course, is still somewhat restricted by the ensemble sampling strategy, parameters not perturbed in the original ensemble cannot be accounted for and the result is dependent upon the observational data used.

We divide rest of this chapter into three sections: In section 5.2, we discuss the methodology used for the analysis; 5.2.1 shows the techniques used to compress the climatological data while 5.2.2 describes the neural network architecture and training process which is used to optimise the non-linear fit.

In section 5.3.1, we present the results of the emulator; its ability to predict a veri- fication set of models, and how well it interpolates between known values. Section 5.3.2 is a discussion of how the emulator may be used to produce a Monte-Carlo ensemble of simulations, allowing us to predict the most realistic model for a given equilibrium response. We compare the constraints imposed by various different observations, using both annual mean and seasonal data.

Finally, in section 5.4 we discuss the parameter settings suggested by this optimisa- tion process for different values of climate sensitivity. We analyse these parameters in the light of previous research and propose a more efficient method of sampling parameters for future ensembles of this type.

5.2 Methodology

5.2.1 Data Preparation

We seek a smooth fit to the simulated climatology and likely response to greenhouse gas forcing, both as functions of model parameters. The data required to train this emulation is taken the first climateprediction.net ensemble of climate models, those experiments conducted with perturbed atmospheric parameters only. After filtering, an N member subset of models remains for use in this analysis (where N is 6,096).

As in Chapter 4, we use the regional mean areas defined in Giorgi and Francisco 100

Table 5.1: Climatological fields measured for comparison to observational datasets. Winter (DJF) and Summer (JJA) means over all available data are taken in the regions specified, along with standard deviations to represent interannual variability. * ERBE is used for radiative data where available, and is supplemented with NCEP data for latitudes greater than 67.5 degrees north and south. All fields are sampled for seasonal means over a 15 year time period in regions 1-21 (Table 3.1)

Climate fields chosen for analysis Climate Variable Dataset(s) used Surface temperature (TAS) NCEP SW upward radiation at TOA NCEP / ERBE* LW upward radiation at TOA NCEP / ERBE* LW clearsky upward radiation at TOA NCEP / ERBE* SW clearsky upward radiation at TOA NCEP / ERBE* Total precipitation NCEP

(2000) and listed in Table 3.1. In each region, we take a subset of atmospheric vari- ables from the model’s control simulation to represent the model climatology (Table 5.2.1). These are compared with climatological means from the NCEP reanalysis (for temperature and precipitation data) and ERBE (for radiative data). These sources are henceforth referred to as observations (although it is recognised that reanalysis data is somewhat dependent on the model used and the NCEP/ERBE data could easily be replaced by other reanalysis or observational datasets).

We calculate empirical orthogonal functions (EOFs) of the control climatic states of the ensemble to determine dominant modes. In contrast to conventional EOFs, the temporal dimension is replaced by the ensemble itself - which provides a convenient orthogonal basis to compact the ensemble variance in the control climate. This compacted climate vector allows us to simplify the structure and increase the com- putational efficiency and reliability of the neural network by decreasing the number of required outputs. The resulting EOFs are spatial patterns, while their principal components are the expansion coefficients showing the amplitude of the EOF in each ensemble member.

The EOFs are taken over three types (s) of model output: surface temperature, radiative fluxes and precipitation. Thus the input matrices for the temperature and 101

(a) Surface Temperature (b) Precipitation (c) Radiative Fluxes

0.25 0.25 0.25

0.2 0.2 0.2

0.15 0.15 0.15

unaccounted for 0.1 unaccounted for 0.1 unaccounted for 0.1 Proportion of variance Proportion of variance Proportion of variance

0.05 0.05 0.05

0 0 0 0 5 10 0 5 10 0 5 10 EOF Truncation Point EOF Truncation Point EOF Truncation Point

Figure 5.1: Plots indicating EOF truncation point for each of three input fields. The series of EOFs is truncated when mean RMSE from observations over the ensemble can therefore be estimated within 5 percent. Truncation points for temperature, precipitation and radiation fields are 6, 3 and 3 modes respectively. precipitation EOF analyses have R × N elements, where each element is the annual mean anomaly from the control mean for region r in model n, weighted by the area of region r. The input matrix for the EOF taken over the radiative fluxes is size 4R×N to include clear-sky and cloudy-sky fluxes in shortwave and longwave bands.

The resulting set of EOFs are truncated to the first K modes when 95% of the en- semble variance has been accounted for (see Fig. 5.2.1). The truncation is conducted for computational efficiency only, as emulating a large number of outputs with the neural network is computationally expensive. Results are not highly sensitive to a further increase in truncation length.

Model error as compared to observations is calculated by comparing the EOF am- plitudes in each simulation to the projection of those EOFs onto observational or reanalysis datasets for each observation type s. The projection is calculated by first removing the climateprediction.net mean state from the observational dataset, and then calculating the scalar product with each EOF. Each model’s error, Eis is then 102 calculated by taking the root-mean square error across all truncated modes:

K E = (w − o )2, (5.1) is v isk sk uk=1 uX t where wisk is the amplitude of mode k in model i for observation type s and osk is the projection of mode k onto the observational dataset.

To combine the different observational errors, they must first be normalised. This may be achieved by using some estimate of natural variability for the observation in question (Piani et al. (2005)). Thus for each observation type s, the error is normalised by the variance of the projection of the leading EOF, es1 onto a 500 year control simulation of HadCM3. The errors are now dimensionless and the root mean square combination of these gives a total error for each model. No further weighting over observational type is applied.

5.2.1.1 Use of seasonal data

The original analysis of climateprediction.net data shown in Stainforth et al. (2005) found virtually no observational constraint on S ; models with Climate Sensitivities of up to 11K were shown to perform comparably in a simple test of root mean square error as measured from the surface observations used. Annan (2006) suggested that the relatively high performance of these models may be due to the omission of seasonal information when comparing each model to observations.

To address this issue, we conduct two additional experiments. The first replaces the annual mean values with a JJA-DJF difference in each observational field to construct the input EOFs. A second analysis includes both JJA and DJF seasonal means as separate dimensions of the input vector - allowing the seasonal cycle of each region to influence the resulting EOFs. The EOFs using seasonal information are treated identically to the analysis for the annual mean data shown above. 103

5.2.2 Neural Network Architecture

We employ an artificial neural network (ANN) to emulate the response of the climate model output. The theory of Neural Network architecture and training is described in Appendix B.

The network employed is a two layer, feed-forward ANN, illustrated in Fig. B.1.

The elements of the input vector, pil consist of the independent perturbed parameter set associated with each model, i. The parameters are listed in table 3.2. Some parameters are always perturbed together in climateprediction.net , in these cases, only one of the parameters is provided to the neural network, because there is no additional information in providing the second. Where parameters are defined on model levels, the average value of the parameter over all model levels is used. The result is a vector of 10 elements which defines the parameter set for any given model in the ensemble.

The output vector is the quantity we wish to predict. In the first instance, this may be a single value: the model’s climate sensitivity, Si. However, later we extend the analysis to predict the set of EOF amplitudes, wik defined earlier, which define the model’s climatology.

For the ANN to best approximate a relationship between these quantities, it is separated into layers. The input to the network is the set of model parameters, pil, which are combined with scale and offset before being passed to the first ‘hidden’ layer. The weights and biases are set iteratively during the training process. The hidden layer is a set of non-linear functions, arbitrary in principle (in this case, a function closely approximating a hyperbolic tangent is used (Vogl et al., 1988)). The output of the hidden layer is again weighted and biased to produce the elements of the output vector.

To train the network to emulate the output of the model ensemble, we employ the Levenberg-Marquardt learning algorithm (Hagan and Menhaj, 1994). This back- propagation algorithm is a gradient descent optimization, where the algorithm is provided with a set of examples of ‘proper’ network behaviour. In this case, the training set is provided by 60% of the available models in the ensemble, totalling about 4,000 examples. Using a larger number of models in the training set did not 104 noticeably improve accuracy.

The ideal number of neurons to be used in the hidden layer should ensure accuracy while avoiding over-fitting. Fig. 5.2.2(a) shows the mean fitting error of the network in predicting an unseen ‘verification’ set of models as a function of the number of neurons. This plot suggests little increase in accuracy for more than 6 neurons.

Fig. 5.2.2(b) shows the effects of over-fitting to the input data. Here we take a sample of random parameter combinations and perturb them slightly, examining the impact on the predicted Sensitivity (thereby estimating the steepness of the response). For less than 8 neurons - this results in a slight mean perturbation to the predicted sensitivities. However, the tests conducted with 9 and 10 neurons show large discrepancies between the original and perturbed simulations - indicating an over-fitted network, with large gradients in response. Thus, a conservative 6 neurons are used in the hidden layer.

A similar experiment is conducted for the prediction of wik: to measure the predic- tion ability of the network, we predict the K principal components for each member of the verification set. The prediction error is root mean square difference between the neural network estimation and the actual value in the verification set. We mea- sure the smoothness of the response surface by taking the root mean square response to a small parameter perturbation, as before. Again, 6 neurons is appropriate for predicting wik.

The cost function used in the iterating training procedure measures network perfor- mance as a combination of the mean squared prediction error (85%) and the mean squared weight and bias values (15%). This prevents any single neuron from be- ing weighted too highly, which was found to further help prevent the network from over-fitting.

Once the network has been trained and verified, we perform a Monte-Carlo param- eter perturbation experiment, emulating an ensemble many orders of magnitude greater than the original climateprediction.net dataset. The ensemble densely sam- ples the emulated parameter space, allowing a search for models with the smallest observational discrepancies in different (0.1K) bins of climate sensitivity.

The underlying function of minimised model-observation error as a function of sen- 105

(a) (b) 1 6

5 0.8

4 0.6 3 0.4 Mean Error in 2 Sensitivity Prediction (K) to Sensitivity Prediction (K)

0.2 Mean perturbation response 1

0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Number of Neurons Number of Neurons

(c) (d) 10 10

8 8

6 6

4 4 Climatology Prediction Mean Error in "All fields" in "All fields" Climatology Mean perturbation response (relative to natural variability) 2 (relative to natural variability) 2

0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Number of Neurons Number of Neurons

Figure 5.2: Fig. (a) shows the mean error in predicting climate sensitivity of the verification set, as a function of the number of neurons in the hidden layer. Fig. (b) shows the mean absolute difference in predicted sensitivity between 1000 ran- dom emulations, and a second set of slightly perturbed emulations (the perturbed emulations take a weighted average of input parameters, with a 90% weighting to the original values, and a 10% perturbation in each input parameter from a second set of random simulations). (c) shows the root mean square error in the prediction of the ’All Fields’ Climatology for the verification set, scaled by natural variabil- ity. (d) shows the root mean square response in ’All Fields’ climatology to a small parameter perturbation (as in (b)). 106 sitivity E(S) is thus discretised into 0.1K bins of S. The Monte-Carlo ensemble is sufficiently densely populated that the following statements are true:

• E(S) is a smooth, continuous function.

• E(S) does not alter if the sampling density is further increased.

Note that the issues of prior sampling of climate sensitivity raised in Frame et al. (2005) are not relevant here, because we do not attempt to assign probabilities to different values of S. The sampling of S is simply used to outline the shape of the underlying function E(S).

5.3 Results

5.3.1 Verification

We first show a demonstration of the ability of the neural network to predict an unseen verification set within the ensemble itself. Fig. 5.3.1(a) illustrates the net- work’s ability to predict S. Fig. 5.3.1(b) shows that the standard error in prediction increases with increasing sensitivity, an effect also noted both in Piani et al. (2005) and Knutti et al. (2006). This is explained by considering that observables tend to scale with λ, the inverse of S - making the uncertainty greater for large values of S (Roe and Baker, 2007).

The network must be able to predict model climatology for previously unseen param- eter combinations. Fig. 5.3.1 uses the verification set to demonstrate the network’s ability to predict the total RMSE from observations for each the different observa- tion types. This is not a test of the network’s ability to interpolate between discrete parameter values.

5.3.1.1 Parameter Interpolation

The climateprediction.net ensemble uses a parameter sampling strategy that chooses one of a small number of possible values for each parameter. However, once trained, 107

12 3

10 2

8 1

6 0

Sensitivity (K) 4 −1 Predicted Climate Prediction Error (K)

2 −2

0 −3 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Actual Climate Sensitivity (K) Actual Climate Sensitivity (K)

Figure 5.3: (a) A plot showing the predicted sensitivities of the verification set of climateprediction.net models as a function of their actual sensitivities. (b) A plot of the prediction error in S as a function of S. Each point represents a member of the verification set in the climateprediction.net ensemble and the width between red lines represents the standard error in prediction at a given climate sensitivity. the ANN emulator may be used to interpolate between these values and map out the parameter space more completely. Given that we do not know the true behaviour of models in this unsampled parameter space, the ANN is designed such that there is a smooth transition between the model responses at known, discrete parameter values. A more robust test of the parameter interpolation is only possible by the simulation of more models at non-discrete parameter values, which is attempted in Section 5.5.

This process is demonstrated by perturbing each in turn of P individual parameter settings within the limits of the sampled climateprediction.net range, while keeping the other (P − 1) parameters at the standard HadAM3 value. Thus we can observe the emulator’s ability to interpolate climatology and greenhouse gas response be- tween the known discrete parameter settings. Section 5.2.2 described how the choice of network design was chosen to minimise over-fitting to training data, without sac- rificing accuracy. The shapes of the response functions shown in Fig. 5.5 show a cross-section of the fitted surface in each of the 10 parameter dimensions - in each case with the other 9 parameters held at the default HadAM3 value. In Chapter 4, 108

Temperature Annual Flux Annual 100 100

80 80

60 60

40 40

Predicted Model error 20 Predicted Model error 20 relative to natural variability relative to natural variability

0 0 0 50 100 0 50 100 Actual Model error Actual Model error relative to natural variability relative to natural variability Precipitation Annual All fields (Annual) 100 100

80 80

60 60

40 40

Predicted Model error 20 Predicted Model error 20 relative to natural variability relative to natural variability

0 0 0 50 100 0 50 100 Actual Model error Actual Model error relative to natural variability relative to natural variability

Figure 5.4: (a)-(d) The vertical axes show the ANN predicted combined model errors as compared to annual mean observations taken from observations for the verification set of climateprediction.net models for each observation type. The hor- izontal axes show the actual RMSE, before EOF truncation. The model errors are normalised by the variance of the projection of the first EOF onto a 500 year control simulation. 109

Albedo at Accretion Precipitation Albedo Entrainment Melting point Constant Threshold Temperature range coefficient 5 5 5 5 5

4 4 4 4 4

3 3 3 3 3 Sensitivity (K) Sensitivity (K) Sensitivity (K) Sensitivity (K) Sensitivity (K) 2 2 2 2 2 0.5 0.6 2 4 0 1 2 0 5 10 0 2 4 6 8 −4 −3 x 10 x 10 Cloud ice Ice Ice fall Empirically Adjusted Critical type size speed Cloud Fraction Relative Humidity 5 5 5 5 5

4 4 4 4 4

3 3 3 3 3 Sensitivity (K) Sensitivity (K) Sensitivity (K) Sensitivity (K) Sensitivity (K) 2 2 2 2 2 2 4 6 3 4 1 2 0.5 0.6 0.7 0.7 0.8 0.9 −5 x 10

Figure 5.5: A plot showing the results of a set of experiments to calculate the de- pendence of S on individual perturbed parameters. Parameter units are as defined in Table 3.2. Each parameter value is incremented within the bounds of the cli- mateprediction.net limits, while keeping all other parameters at their default value. The predicted S is calculated and plotted as a function of each parameter. The shaded area represents the 1 sigma spread in an ensemble of 10 attempts to fit the data (each using a different training set within the GCM ensemble). we showed that the single perturbations with the most dominant influence on S are those of the entrainment coefficient and the ice-fall speed (with other secondary pa- rameters becoming important in the case of entrainment coefficient being perturbed to its ‘low’ value).

5.3.2 Monte-Carlo Simulation

We emulate a much larger ensemble using a Monte-Carlo sampling scheme in which the value of each parameter is randomly selected between the maximum and mini- mum values in the GCM ensemble. Because the trained neural network is computa- tionally inexpensive, we are able to emulate an ensemble many orders of magnitude larger than the original GCM ensemble. 110

                   

Figure 5.6: An illustration of the requirement for an exponential sampling regime. On the x axis is the sampling space, where random values are selected between -1 and 1. On the y axis is the Monte-Carlo parameter space for a single parameter in climateprediction.net.On the y-axis, 3 values are shown which represent the lower limit, standard value and upper limit of uncertainty for that parameter, but the upper and lower limits need not be symmetrical around the default value. The exponential transfer function means that the randomly selected parameter value is equally likely to be greater than or less then the default value.

We emulate a large (1 million member) Monte-Carlo style ensemble in which each parameter value is ascribed a random value within the limits of the discrete pa- rameter settings used in the climateprediction.net experiment. The parameters are generated randomly using an exponential probability distribution which ensures that model parameters are equally likely to be above or below the default value for HadAM3, this is illustrated in figure 5.6. The sampling space is only interpolated between the values used in climateprediction.net , as the neural network cannot be verified as an extrapolation tool.

These models make up the emulated ANN ensemble. For each model we use the trained neural network to predict its climate sensitivity and the amplitudes of the truncated EOF set used to represent control climatology. Once we obtain an es- timate for the truncated EOF amplitudes for each emulated model, we can use equation 5.1 to calculate a prediction of the model error for that simulation as 111 compared to the observations.

As described in section 5.2.2, we then divide the ensemble into 0.1K bins of S and determine the best performing models in each bin (that is, those with the lowest

Eis). Fig. 5.7 shows the best models in each bin of sensitivity as simulated by the original GCM ensemble, plus the best models emulated in the ANN Monte-Carlo ensemble. By using different observation types, we may compare the ability of the different observations to constrain the value of S within the ANN ensemble.

We measure model performance by a selection of different criteria: firstly using annual regional mean surface temperatures temperature and then again using the JJA-DJF seasonal differences. This process is then repeated for total precipitation and top of atmosphere radiative flux balance (an expanded vector with elements for shortwave and longwave, clear sky and cloudy sky fluxes).

Each EOF must be scaled by an estimation of its natural variability. The control climates in the ensemble are means of a 15 year period, hence we estimate natural variability by projecting each EOF onto 33 separate 15 year periods in a 500 year HadCM3 control simulation, and taking the standard deviation of the projection coefficients. The principal components of this EOF in the perturbed ensemble may then be scaled using this value.

For each constraint, we also include a selection of models from the AMIP model comparison project (which is best suited for comparison with the atmospheric mod- els with observed ocean temperatures used in the ensemble). Each model is treated identically to ensemble members, the anomaly is taken from the climateprediction.net mean, onto which the regional EOFs are projected for each observation type. AMIP mod- els are not processed by the neural network and are shown for comparison only.

Double CO2 equilibrium experiments were not conducted for the AMIP ensemble, so the corresponding CMIP sensitivities are shown for each model. Sensitivities are thus provided for comparison only - though the results of Meehl et al. (2003) sug- gest that in several AOGCMs, the atmospheric model is dominant in determining equilibrium response.

The results show significantly different constraints imposed by the different obser- vations. In general, the neural network emulated ensemble tends to produce slightly 112

Temperature Annual Flux Annual Precipitation Annual All fields (Annual) 15 40 80 30 20 60 10 20 40 10 5 10 20 RMSE (scaled by RMSE (scaled by RMSE (scaled by RMSE (scaled by natural variability) 0 natural variability) 0 natural variability) 0 natural variability) 0 0 5 10 0 5 10 0 5 10 0 5 10 Climate Sensitivity (K) Climate Sensitivity (K) Climate Sensitivity (K) Climate Sensitivity (K) Temperature Seasonal Flux Seasonal Precipitation Seasonal All fields (Seasonal)

20 60 15 80 15 40 10 60 10 40 20 5 5 20 RMSE (scaled by RMSE (scaled by RMSE (scaled by RMSE (scaled by natural variability) 0 natural variability) 0 natural variability) 0 natural variability) 0 0 5 10 0 5 10 0 5 10 0 5 10 Climate Sensitivity (K) Climate Sensitivity (K) Climate Sensitivity (K) Climate Sensitivity (K) Temperature (ann+seas) Flux (ann+seas) Precipitation (ann+seas)All fields (ann+seas)

40 100 20 20 30 15 50 10 20 10 10 5 RMSE (scaled by RMSE (scaled by RMSE (scaled by RMSE (scaled by natural variability) 0 natural variability) 0 natural variability) 0 natural variability) 0 0 5 10 0 5 10 0 5 10 0 5 10 Climate Sensitivity (K) Climate Sensitivity (K) Climate Sensitivity (K) Climate Sensitivity (K)

Figure 5.7: Figures (a)-(l) show the constraints on climate sensitivity imposed by different observational fields. Figures on the top row use annual mean data only, the central row is seasonal differences alone, while figures on the bottom include information on both summer and winter absolute values. The vertical axes represent the RMSE between models and observations in the respective field, in units of natural variability. Models shown in green are the best performing simulations from the original climateprediction.net ensemble, while models shown in red are the best models from the million member Monte-Carlo neural network emulated ensemble. Circles represent the performance of models from the AMIP ensemble of atmospheric models (with values for S taken from the corresponding CMIP model). The filled blue circle shows the performance of the unperturbed HadAM3 model. The right- hand column shows the combined error taking in to account all three observation types simultaneously. 113 smaller minimum model-observational error than the original climateprediction.net en- semble because of the large number of emulated models. We consider each indepen- dent constraint in turn (but at this stage, we cannot discuss probabilities - to do this requires further assumptions, see Chapter 6):

• Regional temperature fields alone show a very weak constraint on S, emulated models with S of up to 10K may have control surface temperatures which are consistent with observations. The perturbed models with S of less than 4K, however, are predicted to significantly differ from observations of annual mean surface temperature.

The use of seasonal cycle data alone shows the minimum in error occurs in models with a Sensitivity between 3 and 4K. This is broadly consistent with the findings of Knutti et al. (2006) who found that the seasonal cycle in tem- perature constrained S to lie between 1.5 and 6.5K at the 5-95% confidence intervals.

The inclusion of both annual and seasonal temperature data in the input vector provides no clearly defined minimum in error as a function of sensitivity; models with S between 4 and 10 perform comparably well. Thus absolute values of the control surface temperatures of a model are a very poor predictor of response to greenhouse gas forcing.

• Results using radiative fluxes show a clearly defined minimum in error at certain values of S , irrespective of the use of seasonal or annual data. The neural network is able to predict models which lie closer to observations than both the original climateprediction.net ensemble and the AMIP ensemble. The best performing models are predicted within S values of 3.8 to 4.5K for the annual data, and 3.9 to 4.2K for the seasonal inputs.

At high sensitivities the original ensemble produces a small number of models which score better than the emulated ANN ensemble. We attribute this to the imposed smoothness in the neural network response, which may elim- inate some of the opportunity for outliers. In addition, the original cli- mateprediction.net measurements of S are subject to some degree of sampling noise - especially at higher sensitivities (Stainforth et al., 2005). 114

• Annual mean precipitation also shows a clearly defined minimum of model error at values of S between 3.5 and 5K. The minimum error is greater (com- pared to the natural variability) than for the case using radiative flux data alone. In contrast, many of the ANN emulated models are able to repro- duce observed seasonal cycles in precipitation, with a minimum in error lying between S values of 2.5 and 5K.

An examination of the minimised observational discrepancies scaled by natural vari- ability shows that the radiative flux constraint is the strongest of the three - irre- spective of the use of annual mean or seasonal data. In the cases of surface temper- ature and precipitation, some models are able to match observed seasonal cycles, but annual mean values are not reproduced within the ensembles. These individ- ual constraints using only seasonal data consistently show a minimum in error to lie somewhere in the range 3-5K (although a probabilistic interpretation of this requires further assumptions - see Chapter 6).

Combining all observations together, weighting each of the observations equally produces the ‘All fields’ plots. The plot using annual mean data only is comparable with Fig. 2(c) in Stainforth et al. (2005), and replicates the result which shows climateprediction.net models with a S of greater than 10K showing comparable RMSE to some members of the AMIP ensemble. However, the emulated ANN ensemble shows a clear minimum in model error between 4 and 5K, an attribute which is poorly defined in the original climateprediction.net ensemble. Clearly, the CMIP sensitivities shown represent the most likely values for S as evaluated by a number of modelling groups, hence this ensemble should not be expected to cover the full range of possible values for S (as found here).

It is notable that the inclusion of additional observations actually decreases the error of high sensitivity simulations relative to the most likely simulations. Even in the experiment using only seasonal data, where the separate constraints on S are consistent for the three observation types; the combined ‘All fields’ constraint is weaker than for any of the separate constraints.

Although the ensemble contains models which are individually able to match the different observation types; this is achieved at the expense of making other fields 115 less well simulated. Hence there is no parameter combination which allows all obser- vations to be matched simultaneously. This more challenging requirement produces an irreducible error - the minimum error using the best tuned model, when summed over all the observations (Rougier, 2007).

Thus, as the number of observational fields is increased, the error of the models with the most likely value of S increases from negligible to some finite irreducible value ǫ. However, for ’less likely’ values of S where a single observation produces an irreducible error, increasing the number of observational fields is unlikely to produce the same large relative increase in error. Hence, as the number of observational fields is increased; the constraint on S is likely to become weaker. This could be significant, as it initially seems to contradict the findings of Bayesian analyses such as Annan and Hargreaves (2006), and naturally opens the question of what happens as the number of observations tends to infinity.

The methodology employed here to provide constraints on S is significantly different to that of Piani et al. (2005) or Knutti et al. (2006). While each of these papers searched for predictors of S using all members of the ensemble, we have instead used information from only the most likely possible model for each value of S. Therefore, a relation between some observable quantity and S may be stronger or weaker when all simulations are considered, compared to the method used here when only the best simulations for each value of S are used. In addition, in Knutti et al. (2006); regions where observations lay outside the entire ensemble were ignored. In contrast, in the methodology presented in this paper - such regions will influence the model’s ‘score’.

Finally, while we find that an increase in the number of observations used will ac- tually weaken the overall constraint, we also find that (unsurprisingly) a perfect model may be impossible to achieve using only perturbations of parameters. Hence, any prediction trained within the ensemble and applied to the ‘perfect’ observations may be to some extent an extrapolation. Piani et al. (2005) approached this issue by taking the unperturbed base model error as a crude estimate of the system- atic error in the prediction, but the treatment of such errors in the prediction of S from imperfect ensembles remains an unresolved issue. However, we propose that the method illustrated here provides a systematic means of finding the irreducible 116 component of model-observation discrepancy giving an upper limit for the system- atic error which must be included when applying ensemble-trained predictors of an unknown quantity such as S.

5.4 Parameter Dependence

Using all observations simultaneously (the ‘All fields’ case), the most likely models for each sensitivity bin are shown in Fig. 5.7. By looking at the input parameters for these models, we can examine the parameter changes necessary (if the ANN interpolation is correct) to achieve the best models at different Climate Sensitivities. The results (shown in Fig. 5.4) predict the optimal parameter settings required to produce a model of a given sensitivity, while making each model as close to observations as possible.

Also shown in Fig. 5.4 is the spread of each parameter setting seen in the best 100 simulations (out of a typical 10,000) in each 0.1K bin of S. Hence, parameters showing only a small amount of spread show a unique optimal configuration for minimised error at a given value of S. Those also showing a large variation over the range of S, while remaining well constrained at any given value of S are deemed most important parameters for determining model response (as emulated by the ANN).

Optimum values of two parameters; the ‘entrainment coefficient’ and the ‘ice fall speed’ show first order sensitivity to S. The findings of Chapter 4 and other in- vestigations (Stainforth et al. (2005), and Knight et al. (2007)) suggest that the ‘Entrainment Coefficient’ (or entcoef ) is dominant in establishing different relative humidity profiles which lead to strongly different responses to greenhouse gas forc- ing.

Entcoef fixes the rate at which environmental air is mixed into an ensemble of simulated rising convective plumes. A high value of entcoef results in a moist middle troposphere, with weak convective activity. A low value of entcoef increases the depth of convection, transporting moisture to higher levels in the tropics (Gregory and Rowntree, 1990). 117

Albedo at Accretion Const. Precip. Threshold Albedo Entrainment −4 −3 Melting point x 10 x 10 Temperature range coefficient 0.65 4 2 10 8 0.6 3 6 2 1 5 4 Value 0.55 Value Value Value Value 1 2 0.5 0 0 0 0 5 10 0 5 10 0 5 10 0 5 10 0 5 10 Sensitivity (K) Sensitivity (K) Sensitivity (K) Sensitivity (K) Sensitivity (K) Cloud ice Ice size Ice fall Empirically Adjusted Critical −5 type x 10 speed Cloud Fraction Relative Humidity 4 2 0.7 0.9 6 3.5 1.5 0.8 4 0.6

Value Value 3 Value 1 Value Value 2 0.7 2.5 0.5 0.5 0 5 10 0 5 10 0 5 10 0 5 10 0 5 10 Sensitivity (K) Sensitivity (K) Sensitivity (K) Sensitivity (K) Sensitivity (K)

Figure 5.8: A plot showing the parameter settings for the best performing model emulated in each sensitivity bin in figure 5.7(l). The black lines show the mean parameter value for the 100 best performing models in each 0.1K bin of climate sensitivity. The gray background shows the 10-90th range of the 100 best performing models. The dots show the parameter value for the best performing model in the bin (if one exists) for the original climateprediction.net ensemble. Parameter units are as defined in Table 3.2. 118

A close examination of Fig. 5.4 shows that at any given value of S, the value of the entcoef is well constrained, showing no significant spread among the best performing 100 simulations. As S rises, the value of entcoef falls monotonically from its default down to its lower limit for high values of S. The majority of the variation, however, occurs at values of S less than 6K - indicating that other parameters are responsible for further increases in S.

Chapter 4 found the reduction of entcoef caused an increase in clear-sky absorption of longwave radiation, as mid-tropospheric humidity was increased by strengthened convection, especially in the tropics.

The ice fall speed (or VF1 ) also shows little spread at any given value of S, and likewise is observed to reduce monotonically throughout the range of simulated S. A large value of this parameter allows the fast fallout of cloud ice. Smaller values of similar parameters in radiative convective equilibrium models lead to increasingly moist, warm, convectively unstable atmospheric profiles (Wu (2001) and Grabowski (2000)).

Chapter 4 also found that a reducing ice fall speed increased longwave clearsky and cloudy forcing by allowing the air to remain moister. A reduction VF1 was found to increase low-level layer clouds, and this increased their positive longwave cloud feedback upon warming. The results here are consistent with those findings.

Extremes of S are achieved with additional secondary parameters:

• Low sensitivities (S < 3K) - Fig. 5.7 makes it clear that at very low sen- sitivities, even the best simulated atmospheres move rapidly away from the observations. An examination of Fig. 5.4 shows two parameters in particular show large variation in this region: the Empirically Adjusted Cloud Fraction and the albedo temperature range.

The models with the lowest S show a very large value for the empirically ad- justed cloud fraction (EACF). EACF is a modification to the cloud scheme of the model which adjusts the fractional cloud coverage to observations in relation to total and condensed water; a higher value produces a greater over- all cloud fraction (Wood and Field, 2000). By setting this parameter to its maximum, the model cloud fraction is maximised. Meanwhile, the lowest sen- 119

sitivity models also exhibit a high value for the temperature range of sea-ice albedo variation. This has the effect of increasing the effective albedo in ice covered regions.

Hence, it seems that the lowest sensitivities are achieved by maximally increas- ing albedo, maximising shortwave negative feedbacks upon warming. However, Fig. 5.7 suggests this approach rapidly leads to unrealistic atmospheres in all three observation types.

• High Sensitivities (S > 5K) - The simulated models with the highest sensitiv- ities all show entcoef and the ice fall speed to be set to low values. However, the best performing models with high S show two additional parameter per- turbations: Critical Relative Humidity and again, EACF.

The critical relative humidity (RHCrit) is the relative humidity at which cloud vapour will start to form (Smith, 1990). It is the dominant parameter in determining the sensitivity of the simulated models with S greater than 5K.

In the low entcoef simulations, Chapter 4 found the strong positive longwave feedback produced by the increased humidity is partly offset by a negative feedback caused by increased albedo due to high level cirrus clouds which condense in the moist upper troposphere. The amplitude of this negative feedback is modulated by the value of RHCrit, a high value making cloud formation more difficult, thus reducing the negative albedo feedback.

At values of S of 8-9K, RHCrit nears the upper limit defined in the GCM ensemble, and a further reduction in the negative feedback is achieved by a decrease in EACF - which is reduced to its minimum value to achieve the highest values of S in the ensemble.

Hence, it is by suppressing cloud formation that the simulated ensemble achieves very high values of S. Without a negative shortwave response, longwave clear- sky feedbacks enhanced by high level water vapour are left to dominate the response to warming. However, a comparison with Fig. 5.7 shows that this quickly causes very large discrepancies from observations of top of atmosphere radiative fluxes in the mean control state. 120

All fields (ann+seas) 100 20

80 15

60 10 40

5 20 Predicted Climate Sensitivity (K) Predicted Normalised Model Error

0 0 0 20 40 60 80 100 0 5 10 15 20 Actual Error when simulated Actual Climate Sensitivity when simulated (K)

Figure 5.9: A plot showing the ability of the neural network emulator to predict model error and climate sensitivity in parameter space away from the discrete sam- pling points of the original ensemble. Each of the points shown in black is one of the ‘Best performing models’ illustrated in Figure 5.7. Points shown in red show the prediction made by the emulator if it is allowed to use some of the additional runs in its training procedure.

5.5 Emulator Verification

Since the end of the first climateprediction.net experiment, we have had the belated opportunity to test the performance of the emulated models on the distributed climateprediction.net network. A selection of potentially best performing models identified in Figure 5.7 were released on the distributed network so that the pre- dicted climates for the given parameter values could be compared with the actual simulations. Figure 5.9 shows the results; for the reproduction of model error in the control simulation, the interpolation error compared with the emulator’s error when predicting previously unseen discrete parameter combinations is increased by approximately a factor of 2. The sensitivity predictions appear to become less accu- rate for sensitivities greater than 5-7K, with an apparent bias to underestimate the sensitivities at higher levels - this is likely to be attributable to the interpolation of model response to one of the parameters which are most important in the sensitivity range 5-10K - namely the critical relative humidity or the empirically adjusted cloud fraction.

It is likely that in future ensembles, a Monte-Carlo parameter sampling of the orig- inal ensemble would reduce such systematic errors by eliminating large ‘sampling 121

Albedo at Accretion Const. Precip. Threshold Albedo Entrainment −4 −3 Melting point x 10 x 10 Temperature range coefficient 0.65 4 2 10 8 0.6 3 6 2 1 5 4 Value 0.55 Value Value Value Value 1 2 0.5 0 0 0 0 5 10 0 5 10 0 5 10 0 5 10 0 5 10 Sensitivity (K) Sensitivity (K) Sensitivity (K) Sensitivity (K) Sensitivity (K) Cloud ice Ice size Ice fall Empirically Adjusted Critical −5 type x 10 speed Cloud Fraction Relative Humidity 4 2 0.7 0.9 6 3.5 1.5 0.8 4 0.6

Value Value 3 Value 1 Value Value 2 0.7 2.5 0.5 0.5 0 5 10 0 5 10 0 5 10 0 5 10 0 5 10 Sensitivity (K) Sensitivity (K) Sensitivity (K) Sensitivity (K) Sensitivity (K)

Figure 5.10: As for Figure 5.4, but allowing models from the verification set of models suggested in Figure 5.4 to be included in the training set for the neural network. The process is then iterated to suggest a new set of best models over a range of sensitivity, including the information contained within the verification set. The old ‘Best parameters’ are shown in black, while the new suggested best parameters are in red. Parameter units are as defined in Table 3.2. gaps’ in the parameter space, we demonstrate this by allowing some of the verifi- cation models to be used in the training procedure. Figure 5.9 also shows the im- provement in emulator predictions which is made when the training set includes the verification models. We note also that the inclusion of these models into the training set alters slightly the prediction of best-fitting models (Figure 5.10) - demonstrat- ing that this technique may be used as part of an iterative procedure where extra simulations may be conducted in regions of interest or uncertainty.

5.6 Conclusions

A two layer feed-forward neural network was trained to emulate and interpolate model output from a multi-thousand member climate model ensemble. Having 122 trained a network with data from the climateprediction.net dataset, we were able to predict both equilibrium temperature response and the amplitudes of leading EOFs of climatology for various different model outputs in an unseen verification set of models.

The network was successfully used to examine equilibrium response to individual parameter changes, and smoothly interpolate between known discrete parameter settings. A much larger, neural network emulated ensemble was designed, which em- ployed a Monte-Carlo sampling scheme replacing the original climateprediction.net dis- crete sampling. The neural network was used to simulate model output from a very large ensemble in order to fully sample the parameter space within the discrete sampling of the original climateprediction.net experiment. The model output was divided into bins of climate sensitivity, such that in each bin a model most consistent with observations could be found.

Various different observational fields were employed, giving dramatically different constraints on climate sensitivity. The strongest constraints were found to result from observations of top of atmosphere radiative fluxes. Only a small range of mod- els were consistent with these observations, showing a strong constraint to climate sensitivities between 3 and 5K. The simulated ensemble predicted some models to be closer to observations than all members of the climateprediction.net or AMIP ensembles. Seasonal data in radiative fluxes produced a similar constraint. The use of these diagnostics as tuning parameters may thus help to explain the clustering of simulated values of S in ensembles such as CMIP and AMIP.

Using only observations of surface temperature to constrain the models resulted in no upper bound constraint on S. The lower bound constraint suggested that only models with S less than 3K could produce reasonable annual means in surface temperature. However, observations of the seasonal cycle in temperature produced a constraint on S, with some models between 2 and 5K in agreement with observations.

Observations of precipitation showed all models in the climateprediction.net , ANN simulated and AMIP ensembles could not reproduce annual mean data within the bounds of natural variability, hence the constraint is weaker than for the radiative case. However, the best performing models were able to reproduce seasonality in 123 rainfall where they could not reproduce absolute values - and models with S between 3 and 8K could reproduce seasonal rainfall differences.

Requiring models to match all observations simultaneously proved a more difficult task for all of the ensembles. The ANN simulated ensemble suggested that model parameters could at best be tuned to a compromise configuration with a finite error from the observations. This ‘best model discrepancy’ was found to increase with the inclusion of increasing numbers of separate observations, and was not itself a strong function of S.

Hence although models can be found to independently reproduce seasonal differences in the three observation types - there is no single model which can reproduce all three simultaneously. The relative errors of best models at different sensitivities will decrease and the constraint of sensitivity decreases as more observations are added. Thus the ’all fields’ approach yields no models which are fully consistent with observations - although it shows a minimum in error at S = 4K.

Such an effect is a natural by-product of tuning an imperfect model to match obser- vations: it is easy to tune parameters to match a single observation, but impossible to match all simultaneously. Such an effect must be considered in predictions of sen- sitivity such as Knutti et al. (2006) and Piani et al. (2005) where trends determined through analysis of an imperfect ensemble were applied directly to observations. We have found that the perfect model state may be unattainable through parameter perturbations alone, hence an estimation of irreducible error should be included when using ensemble-trained predictors of S, and how this should be achieved in a probabilistic prediction should be a goal of future research.

We find that a constraints on S based on the relative likelihood of best performing models are strongly dependent on the number of observations used to constrain the result. Although single observational fields can independently constrain S to similar values; there may not be a possible tuned state in which both observations are simultaneously satisfied - introducing an irreducible error at the most likely value of S. Hence, although adding observational fields will make all models appear less plausible, the relative likelihood of models with high values of S will increase - weakening the constraint. 124

The neural network was also used to show the parameter settings for the best per- forming models over a wide range of S. We propose this as a convenient tool for the intelligent sampling of parameter space in future ensembles. For example, using the parameters suggested in Fig. 5.4 would provide a small, efficient ensemble con- taining only the most relevant models necessary for wide distribution of S. These models were simulated on the distributed network in order to test the ability of the emulator to interpolate between known parameter values. The findings of this verifi- cation process were sufficiently positive to make this method of parameter sampling a consideration for the next generation of climateprediction.net models using the HadGEM model. (A similar parameter sampling strategy was used in Webb et al. (2006), where the QUMP ensemble was used to generate a Monte-Carlo ensemble, with linearly interpolated values for climate sensitivity and for the CPI. The en- semble was split into bins of climate sensitivity - and the best performing models in each bin were chosen.)

Furthermore, by highlighting regions of interest in the parameter space (e.g. those with steep gradients in the response function for S ) - efforts could be made to conduct additional simulations in those regions, further improving the fit in regions where the response is ambiguous.

We propose that a possible extension of this work with the advent of future coupled ensembles providing more comprehensive data for each model would be to evaluate the model climatology with EOFs of fully gridded data, rather than regional means. The added information in such an analysis would allow more a comprehensive metric for model verification.

Finally, the approach illustrated is here not restricted to an investigation of climate sensitivity. The method could be equally well applied to provide improved sampling and constraints for any climate model output diagnostic of interest, with a potential for a multivariate predictand such as the joint probability of regional change in temperature and precipitation. Chapter 6

Systematic Constraints on Climate Sensitivity

‘ “We balance probabilities and choose the most likely. It is the scientific use of the imagination.” ’ —Sir Arthur Conan Doyle, 1902, “The Hound of the Baskervilles”

Climate models are incomplete. Irrespective of how they may improve with time, with increasing resolution and improved parameterisations, without simulating every aspect of the true system, there is no way that every property of the climate system may be reproduced. This is not akin to saying that climate models have no use, as they can provide great insight into relevant properties of the true system. The problem, which we discuss in this chapter, is how the performance of a given climate model should be judged.

There are many ways of evaluating model performance given a set of observations - methods can include simple comparisons of the base climate state (Stainforth et al., 2005), response to known past climate forcings (Frame et al., 2005) or measures of how quickly models diverge from observed initial conditions (RODWELL and PALMER, 2007). But in any of these approaches, one must first decide on the observed metric to compare with model output.

In detection-attribution studies, a common philosophy is that the most appropriate

125 126 observed metric is the predicted variable itself - i.e. past global mean temperature is the best metric for predicting future global mean temperature (Hegerl et al., 2000). In this case, the methods are sensitive to model error because the GCMs are used to compute the covariance of climate noise and the climate change pattern - these two are combined to produce the optimal fingerprint (see Chapter 1). Hence, the diagnostics used cannot include terms for which we do not have an adequate model, nor for those for which we do not have adequate observations.

In pure sensitivity studies like Knutti et al. (2005), the diagnostic fields by which the models are evaluated are also quite limited. Knutti et al. (2005) takes the seasonal temperature differences in various regions of the model as a metric, and ignores any regions where the entire ensemble is inconsistent with the observations. This bypasses the issue of “model discrepancy” which was raised in chapter 2, because any fields in which the whole ensemble performs poorly are ignored. In other studies such as Piani et al. (2005), a more complete set of observations are used - but the spatial scale of the comparison is restricted by an EOF truncation of the model variability. In this case, the issue of model discrepancy is more subtle: because the ensemble-derived predictors of climate sensitivity are linear, it is not imperative that the observations lie within the distribution of predictands in the ensemble - however, if too much extrapolation is required outside the sampled range, then the accuracy of those predictions comes into question.

The rationale behind using only a limited diagnostic set to verify models is that we should only evaluate the parts of our simulation for which we have an ad- equate model. For example, the simulations presented in this thesis from cli- mateprediction.net do not include a model of the sulphur cycle, and this is likely to introduce a bias into the simulation of both the mean state and the response to radiative forcing. However, if such an argument invalidates the use of a selection of diagnostics (such as measurements of the radiative budget, perhaps, which are to first order affected by the sulphur cycle) - then it could easily be argued that an in- complete model should exclude the use of any diagnostics, which may be indirectly affected by the incompleteness of the model.

In this chapter, we investigate a somewhat less defeatist approach to the problem of using model-observation inconsistencies as a constraint for model response. In 127 contrast to the analysis of Knutti et al. (2005), we do not ignore those observations which the entire ensemble fails to reproduce because to do so allows some models to be arbitrarily poor in some respects without being deemed any less likely to represent reality. Instead, here we accept that the models in our ensemble are imperfect - and they are judged by their degree of imperfection. Finally, unlike the analyses of Piani et al. (2005), Knutti et al. (2005) and Rougier and Sexton (2007), we do not use all the models in the ensemble. Instead, we only consider the best possible model at each value of climate sensitivity. The argument for this is that any model can be made arbitrarily poor or unphysical - but the interesting information lies in the limit of how good the model can be.

6.1 Probability Distributions

In Chapter 5, we used the climateprediction.net ensemble and neural network em- ulation to establish a relationship between climate sensitivity and irreducible error in the HadAM3 model. Using this information, we have built up some qualita- tive understanding on how using different observations imposes different degrees of constraint on climate sensitivity. However, until this point we have avoided any probabilistic estimate of climate sensitivity based upon the relative values of irre- ducible model error at different values of S. In this chapter, we attempt to address this issue.

The probabilistic predictions of S using ensemble data made thus far have fallen into two categories:

• Weighted Ensembles - This approach uses an ensemble to produce a weighted histogram of model climate sensitivity which is interpreted as a probability density function. This approach was used by Murphy et al. (2004) which was discussed in section 1.4.2.1. The PDFs produced by such methods, however, are critically dependent on the sampling strategy of the original ensemble. Various different ensemble weighting strategies were employed by Rougier and Sexton (2007), who attempted to use a statistical emulator to eliminate any sampling bias in their results (see Chapter 2). 128

• Ensemble-trained predictors - This approach uses the ensemble to find pre- dictors of S. Both Piani et al. (2005) and Knutti et al. (2005) fall into this category. These methods, however, represent a lower limit on the uncertainty in the true value of S because there is no consistent way of dealing with sys- tematic error - that is, there is no way to know how valid the predictors will be when applied to the real world.

In this chapter, we propose a method of producing a PDF of climate sensitivity which addresses these two major issues.

6.2 Methodologies

In chapter 5, we produced an estimate for the function E(S) - the irreducible com- ponent of model error as a function of climate sensitivity. However, to go from this to a probability density function requires the use of a transfer function between model error and probability, the choice of this function can significantly alter the shape of the resulting distribution. In the following discussion, we consider three possible approaches:

• Absolute likelihood - In the first case, we consider a direct transfer function where the likelihood is scaled only relative to natural variability:

−(E(S ))2 P i L(Si)∆S = e 2 ∆S, (6.1)

where L(Si) is the likelihood of climate sensitivity lying between Si and

Si + ∆S, and E(S) is the standardised minimum model difference from obser- vations for defined in section 5.2.1 (this is a measure of the minimum model

distance from observations for a sensitivity of Si, scaled by natural variabil-

ity). The likelihood L(Si) is thus an estimate of the likelihood of an error of

E(Si) occurring naturally in a given 15 year period, assuming that variation about the observational mean follows a normal distribution.

This transfer function is chosen to satisfy the boundary conditions that a model with zero error has a likelihood of unity, while a model with infinite 129

error has a likelihood of zero. In this case, the likelihood of a given error is fixed by the natural variability, such that one sigma equates to the standard deviation of a long control run split into 15 year means to be comparable with the control runs in climateprediction.net.The area of this distribution is not normalised - hence the values do not represent probability but likelihood as scaled by natural variability.

It has been shown (Frame et al., 2005) that such distributions are dependent upon the prior distribution of points which are considered. In this case, we have assumed a prior that is evenly sampled in S, but it should be noted that sampling evenly in λ would produce a different shaped distribution when plotted as a function of S.

• Direct Scaling - In order to create a Probability distribution P (Si), the area under the distribution must be normalised. The most intuitive method of achieving this is to directly scale the above distributions:

− E S 2 P ( ( i)) P (Si)∆S = αe 2 ∆S, (6.2)

where α is fixed by requiring unit area under the curve:

P (Si)∆S =1, (6.3) i X

The renormalisation process means that the shape of the final distribution is identical to that of the likelihood distribution.

• Relative Scaling - The normalisation can also be achieved by considering only relative errors. In this case, the absolute magnitude of the error is unimportant and only the relative minimum errors at different values of S are considered:

− 2 (E(Si)) P β P (Si)∆S = e 2 ∆S, (6.4)

where β is again fixed by requiring unit area beneath the curve as in equation 6.3 (β must be calculated iteratively). 130

Figure 6.1: A demonstration of the different means of normalisation for PDFs of climate sensitivity derived from a function E(S). a) shows the basic transfer func- tion of the type described in Equation 6.1 where a 1:1 function relating error and likelihood is defined. b) shows one method of normalisation where the baseline is raised as in Equation 6.2, which results in a direct scaling of a). c) shows an alter- native method of normalisation, where the rate of decay in the transfer function is altered such that only the relative errors at different sensitivities are considered

These methodologies are illustrated in Figure 6.1 which illustrates the role of the normalisation process and the effect that it has on the final distribution.

6.3 Results

6.3.1 Absolute Likelihood Distribution

We first examine the likelihood distribution produced using Equation 6.1. The initial error function E(S) is calculated as in section 5.2.1, and is shown for vari- ous observational constraints in Figure 5.7. The resulting likelihood distribution is shown in figure 6.2.

Using this approach, we find that only in cases where the minimum model error 131

1 Temperature Annual 0.9 Temperature Seasonal Flux Annual 0.8 Flux Seasonal Precipitation Annual 0.7 Precipitation Seasonal All fields (Annual) 0.6 All fields (Seasonal) 0.5

0.4

0.3

0.2 Estimated likelihood of best model

0.1

0 0 1 2 3 4 5 6 7 8 9 10 11 Climate Sensitivity (K)

Figure 6.2: Likelihood Functions for climate sensitivity implied by the distributions of model error as evaluated for various observations as shown in figure 5.7. The likelihood in each 0.1K bin of CS is evaluated as a function of the error of the best performing model in that bin from the large Monte-Carlo simulated ensemble (as described in section 6.2). The transfer function in equation 6.1 transforms the model error to a likelihood by measuring the model distance from observations, scaled by natural variability assuming a normal distribution about the observations. The likelihood associated with the “All Fields” plot is too small to be seen on these axes, with a peak likelihood of e−20, although it is shown normalised in Figure 6.3. 132 is comparable to natural variability is there a significant distribution of possible models. For instance, in the case of seasonal differences in total precipitation, Figure 5.7 shows that a range of models with values of S between 2 and 5K have differences from observations less than or equal to the natural variability. This results in a broad peak of likelihood in figure 6.2. In contrast, only models with a very narrow range of S have observationally consistent values for seasonal flux in Figure 5.7, resulting in a less pronounced peak over a more narrow range of S in Figure 5.7, showing likely models only exist in the range 3 to 4.5K.

However, in some cases the best performing models in the emulated (and original) climateprediction.net ensemble have an error significantly greater than the natu- ral variability. For instance, in the case of annual mean precipitation, the best performing model lies 4 standard deviations of natural variability away from the observations. This results in a likelihood of e−8 at the peak of the distribution. The situation for the ‘All fields’ distribution is even more extreme, with the most likely models lying over 20 sigma from the observations. In these cases, the maximum of the likelihood distribution is infinitesimal.

Thus, we are faced with a choice in how to best deal with this situation - either we accept that the difference from observations in the models is too great to make them of any use, or find a way to rationalise the difference in errors for the models we have. In practice, this is achieved by normalising the distributions.

6.3.2 Directly Scaled Distributions

In the first case, we attempt to directly normalise the distributions found in figure 6.2 using equation 6.2. This results in a set of Probability Density Functions (PDFs), which are shown in figure 6.3. The normalisation process produces a set of more self-consistent distributions for S. This type of normalisation ignores the error in the best performing model, effectively adding a constant offset to the model error at each value of S to normalise the distribution

The temperature annual mean observations provide the only anomalous constraint - but this is understandable when figure 5.7 is considered, the ensemble is able to produce realistic simulations of temperature over a wide range of S. We attribute 133

1 Temperature Annual 0.9 Temperature Seasonal Flux Annual 0.8 Flux Seasonal Precipitation Annual 0.7 Precipitation Seasonal All fields (Annual) 0.6 All fields (Seasonal) 0.5

0.4

0.3

0.2 Estimated probability of best model

0.1

0 0 1 2 3 4 5 6 7 8 9 10 11 Climate Sensitivity (K)

Figure 6.3: Normalised Probability Distributions for Climate Sensitivity implied by the distributions of model error as evaluated for various observations as shown in figure 5.7. In each case, the area beneath the distributions as shown in Figure 6.2 are normalised to create the different distributions shown. this to the flux correction process, which helps maintain temperatures near the observations even in unrealistic climates.

The rest of the individual observational constraints, both seasonal and annual mean information provide consistent constraints of S in the range 2 to 6K. The tightest of these is provided by radiative flux measurements, which suggest 5-95 percentiles of 3.1 to 4.5 K from the seasonal data and 3.3 to 4.9K from the annual mean data. When the observations are combined to the ‘All Fields’ case - the resulting distribution becomes more constrained. The distribution using seasonal data has 5-95 percentiles at 3.5 to 4.7K and the annual data suggests 3.5 to 4.8K.

6.3.3 Relative Scaling

In order to include the effect of irreducible model error in our estimate of a probabil- ity distribution for S, we propose that the relative errors for best performing models 134

1 Temperature Annual 0.9 Temperature Seasonal Flux Annual 0.8 Flux Seasonal Precipitation Annual 0.7 Precipitation Seasonal All fields (Annual) 0.6 All fields (Seasonal) 0.5

0.4

0.3

0.2 Estimated probability of best model

0.1

0 0 1 2 3 4 5 6 7 8 9 10 11 Climate Sensitivity (K)

Figure 6.4: Normalised Probability Distributions for Climate Sensitivity implied by the distributions of model error as evaluated for various observations as shown in figure 5.7. Normalisation is achieved by altering the decay rate of the transfer function as shown in Equation 6.4. The resulting distribution is only dependent on the relative errors of best performing models at different values of S. at different values of S should be considered. The normalisation is performed as in Equation 6.4, and the resulting distributions are shown in Figure 6.4. The use of relative errors means that the irreducible model error (the minimum error acheiv- able by parameter tuning) is not simply ignored in the normalisation process, as it is when absolute scaling is used.

The effect of this normalisation is clearly noticeable when there is a significant irreducible error. For example, in the case of precipitation - the seasonal constraint is relatively strong, with 5-95 percentiles at 2.2 and 4.7K. However, the annual constraint is much weaker, with 5-95 percentiles at 2.6 and 7.5K. An examination of Figure 5.7 shows that this is attributable to the larger minimum error in the annual mean case. Thus, with this method of normalisation, a measure of uncertainty due to irreducible model error is naturally included in the final distribution - without the need of the crude approximation used in Piani et al. (2005). 135

One counter-intuitive property of this normalisation procedure is the effect on the constraint of combining observations. This is best illustrated by the seasonal-cycle constraints. When considered independently - each of the seasonal cycles produces a consistent constraint on S. The temperature constraint suggests 5-95 percentiles of 1.9 and 5.0K, the precipitation suggests 2.2 and 4.7K and the radiative fluxes suggest 3.1 and 4.9K. However, when all observations are combined into a single metric for the ‘All fields’ case - the constraint on S weakens dramatically, with 5-95 percentiles of 2.5 and 8.1K. Thus, the counter-intuitive effect is that the constraint becomes weaker as more observations are added to the metric.

In order to understand this effect, one must examine Figure 5.7. It is apparent that for the seasonal cycles in each of the observation types, there are some models which can accurately reproduce the seasonal cycle with errors smaller than the natural variability in that observation. However, the requirement for a model to match all of those seasonal-cycle observations at the same time is much more challenging and there are no models in the ensemble which satisfy this requirement. The result is the introduction of a significant irreducible error in the ‘All fields’ case, which results in the broadening of the ‘All fields’ distribution in Figure 6.4.

6.4 Verification

In order to ascertain whether the methodology presented within this chapter is accurate, it is necessary to test the predictions under a different model environment - that is, outside of the HadAM3 model framework. One way to achieve this is to take coupled models available in the WCRP CMIP3 ensemble, and to treat them as ‘reality’ (using the effective sensitivity of each model, which is derived from transient simulations by measuring global mean temperature response to transient climate forcing) . Although the range of climate sensitivities available in the CMIP3 ensemble is somewhat smaller than that of climateprediction.net - this will provide some indication of whether similarities in the base state of models can correctly indicate their true Climate Sensitivity. This gives one estimate of systematic model error, which might well be conservative. 136

In order to achieve this, we repeat the methodology used to create Figure 6.4, but instead of using reanalysis data for the observations, we take each member of the CMIP3 ensemble to be “truth”. The results are shown in Figure 6.5 using each of the different observational types referred to in Figure 5.7.

We find that the technique works better for some models than others. For example, unsurprisingly, we find that if HadCM3 is taken as “truth” - then the ‘true’ value of HadCM3 sensitivity always lies at the peak of the probability distribution. This is reassuring, but was not guaranteed to be the case - given that HadCM3 is a coupled model which may have introduced some bias when compared to the slab models used in climateprediction.net.

There follows a selection of models for which the technique works relatively well - i.e. some observations provide an accurate fit to the model’s ‘true’ climate sensitivity. For instance, the true sensitivity of the CSIRO mk 3.0 model is accurately described by any of the annual mean measurements - in each case the 33-66 percentiles include the true value for sensitivity. However, the accuracy with seasonal data alone is poorer - with only the precipitation seasonal cycle providing a constraint to the correct value.

The climate sensitivities of both the MIROC models are fairly accurately repro- duced by the technique, the true value lies within the 10-90 percentiles in all cases. The radiative flux measurements provide an even more accurate fit, with the true value lying within the 33-66 percentiles for both models, irrespective of the use of annual means or seasonal cycles. The CCCMA models also perform fairly under the evaluation, again the technique performs better using annual mean measurements rather than the seasonal cycles to constrain the model.

However, there are a selection of models where the true sensitivity of the models is overestimated by the technique. For each of the MRI CGCM, NCAR CCSM 3.0, GISS model and INMCM model - there appears to be a bias in the sensitivities such that the true value is less than that of the models most similar to them in the climateprediction.net ensemble. There are many potential reasons for this - the use of different parameterisation schemes or different model architectures could easily result in different relationships between base state variables and model response to 137

Temperature Annual Flux Annual Precipitation Annual All fields (Annual)

0 5 10 0 5 10 0 5 10 0 5 10 Climate Sensitivity (K) Climate Sensitivity (K) Climate Sensitivity (K) Climate Sensitivity (K) Temperature Seasonal Flux Seasonal Precipitation Seasonal All fields (Seasonal)

0 5 10 0 5 10 0 5 10 0 5 10 Climate Sensitivity (K) Climate Sensitivity (K) Climate Sensitivity (K) Climate Sensitivity (K) Temperature (ann+seas) Flux (ann+seas) Precipitation (ann+seas) All fields (ann+seas)

0 5 10 0 5 10 0 5 10 0 5 10 Climate Sensitivity (K) Climate Sensitivity (K) Climate Sensitivity (K) Climate Sensitivity (K)

HadCM3 CSIRO mk 3.0 33−66 Percentiles Miroc 3.2 hires Miroc 3.2 medres 10−90 Percentiles CCCMA t63 CCCMA t47 5−95 Percentiles MRI CGCM 2.3.2a NCAR CCSM 3.0 GISS model er INMCM 3.0

Figure 6.5: A plot using the same principles as 6.4, but using different CMIP3 models as ‘truth’. In each case, the members of the climateprediction.net ensemble are compared to members of the CMIP3 ensemble, and in each bin of sensitivity - the climateprediction.net model most like the CMIP3 model is used. The relative errors of models at different sensitivities are used to make a PDF using 6.4, and the percentiles of that PDF are plotted for each CMIP3 model. The true sensitivities of each of the CMIP models are shown for comparison. 138 forcing in these models.

When comparing the ability different observation types to determine climate sensi- tivity, for the “HadAM3-like” models, we find that the most reliable indicator are the annual mean top of atmosphere radiative fluxes. In 7 of the 10 models shown, a comparison of the fluxes reveals the true value for the climate sensitivity, and the introduction of the seasonal cycle to the constraint does little to improve this. However, in the remaining 3 models - the sensitivity is overestimated.

We note the reliability of the total precipitation amount to constrain the sensitiv- ity - perhaps unsurprising given the significance of entrainment and ice-fall speed parameters, both important in the scaling of precipitation. In all 10 models - the true value for sensitivity lies within the 10-90 percentiles implied by a comparison of annual mean precipitation. This is further improved by the inclusion of the sea- sonal cycle to the constraint (although the seasonal precipitation cycle alone makes a poor predictor). Unfortunately, there is no reliable source for observations of absolute global precipitation.

The consistently poor performer is the use of surface temperature as a constraint. In almost all cases - a comparison of surface temperatures place a very large un- certainty on the true value, and will still overestimate the true result (irrespective of whether the seasonal cycles are used). This is of some consequence, given that some methodologies use only surface temperature as their observed metric to pre- dict future response (Tebaldi et al., 2006). Combining the observations into a single metric also tends to work well for annual mean values, and when seasonal cycles are added to the annual mean information. However, seasonal cycles alone remain a poor predictor of the model climate sensitivity - irrespective of the observation used.

The reliability of this method when applied to other model environments, however, is at least comparable to the ability of the technique described in Piani et al. (2005). Although the paper did not apply predictors to members of the CMIP ensemble itself, in Figure 4.11, we followed the methodology of Piani et al. (2005) and applied the predictors derived using the technique to models in the CMIP2 ensemble, shown as points in Figure 4.11. It is notable that the results derived using the technique 139 shown in this chapter produce a reliable estimate for 7 out of 10 models, whereas the linear predictor technique used Piani et al. (2005) mostly fails to predict any of the differences in model Sensitivity in CMIP2. Knutti et al. (2005) did include predictions of sensitivity for other climate models, and found comparable predictive ability to the technique shown here, with S for 15 of 17 models lying within the suggested 5-95 percentile range (Figure 2.7).

6.5 Discussion

In this chapter, we have attempted to derive a Probability Density Function for climate sensitivity. The starting point is a function E(S) representing minimum model-observation distance as a function of S. We find that the resulting distribu- tion is highly dependent on the type of normalisation which is employed to translate a likelihood distribution into a PDF.

The first approach to normalisation shown in section 6.3.2 is to calculate likelihood based upon some exponential weighting, e−(m−o)2/(2ω2) - where m is the best per- forming model at a given value of S, o is the observations and ω is a measure of natural variability. The resulting distribution can then be directly scaled in order to satisfy a unit integral over the sampled range of S.

The approach of directly scaling a distribution in order to create a PDF is standard in evaluating relative model performance (Murphy et al., 2004), but it is not clear that this normalisation is justified. In order to produce the ‘All Fields’ distribu- tion, the baseline is raised to nineteen standard deviations from the observations before models are compared. This concept is illustrated in Figure 6.1(b), which demonstrates that the error separating models of different climate sensitivity may be orders of magnitude less than the error of the best performing model. The best performing model may be arbitrarily poor, but the shape of the distribution would not change and the constraint on sensitivity is artificially strong.

In section 6.3.3, we introduced an alternative method for normalisation, where only the relative errors of different models are considered. In this case, the scaling of probability with error is changed to normalise the distribution over the sampled 140 range of S. This process is illustrated in 6.1(c). As the ensemble’s minimum error increases, the ratio between the minimum error (at the most likely value of S ) and the error of models with ‘less likely’ sensitivities will tend towards 1. Hence, if the error of the best performing model is much greater than the difference in errors between models of different sensitivities - then the constraint on S will be weak. This effect is clearly visible when comparing the ‘All fields’ cases in figures 6.3 and 6.4.

Hence, we have shown that the two methods of normalisation give very different results - but which is correct? It could be argued that neither method is believable, simply because the systematic error involved is so large. An examination of figure 5.7 shows that in the ‘All fields’ case, the best performing model lies over 20 standard deviations of natural variability away from the observations. In practice, this implies that the probability of the observations matching such a model are minuscule, and that any attempt to derive significance from the variation in model error at levels above this is invalid. This standpoint is supported by figure 6.2, where a likelihood distribution for ‘All fields’ is not even visible on the scale.

The argument for a direct scaling of these distributions lies in the fact that one preserves the scaling of model error with natural variability - which would seem to be a sensible unit in which to scale likelihood. However, this is achieved at the cost of losing all information associated with systematic model error. The best performing model could be 2 or 200 standard deviations from the observational mean and the resulting distribution would be unaffected. We propose that such an approach creates an unrealistically strong constraint on S by making the best- performing model look artificially likely.

If one accepts that models are not a true representation of reality and a method of evaluating their relative performance is desired, then it can be argued that relative scaling is the correct normalisation procedure. Because absolute model errors are compared directly, we are no longer ignoring the systematic model error. Such an approach makes no artificial assumptions on how models are applicable to reality.

An interesting property of the use of this normalisation procedure is that the con- straint on S becomes weaker as more observations are added to the metric (section 141

6.3.3). This situation arises due to the well established problem of tuning - i.e. it is easy to tune parameters to match a single observation, but more difficult to match many observations simultaneously. Figure 5.7 confirms that there are no models within the ensemble which are close to simultaneously matching all of the sampled observations simultaneously.

This observation naturally poses the question of how many observations should be included in such an analysis. This question verges on the philosophical, but one can propose different answers. One approach would be to include all available model outputs in the diagnostic, however, this approach would clearly include much information which was completely irrelevant to the desired quantity (in this case, climate sensitivity). At the other extreme, it is clearly wrong to include too few diagnostics - for example, a model which is able to describe the earth’s surface temperatures accurately but has a vastly incorrect radiative budget is clearly not a ‘good’ model of the climate.

In section 6.4, we investigated the ability of the relative best error technique to constrain the climate sensitivities of various CMIP3 models. The findings were mixed, with some models clearly closer in structure to the HadAM3 model used in climateprediction.net.Of the observations used as constraints, it was found that by selecting models with radiative fluxes and precipitation levels in the control simula- tion as close as possible to each CMIP3 model, we could place reasonable on the true value of the climate sensitivities of those models using the climateprediction.net en- semble. The best single observation types were those using annual mean values - although adding seasonal information to the annual values did result in a small in- crease in the precision of the technique. Combining observations together did again result in an increase in uncertainty in the predictions, as was seen in Figure 6.4.

The fact that the uncertainty in this method increases with added observations in the metric implies that one should take care in evaluating whether observations are necessary. We find overall that the in 7 out of 10 of the models presented, simply considering climateprediction.net models with a similar radiative budget to the respective CMIP3 model resulted in relatively accurate and precise constraints on climate sensitivity (although sensitivity in the remaining 3 models was overesti- mated with this approach). As a first step, these results suggest that any metric for 142 the comparison of different model response to greenhouse gas forcing should include an evaluation of the base climate radiative budget. Chapter 7

Summary and Future Work

‘ “Well, no need to brood on what tomorrow may bring. For one thing, tomorrow will be certain to bring worse than today, for many days to come. And there is nothing more that I can do to help it. The board is set, and the pieces are moving...” ’ —J.R.R. Tolkien “The Return of the King”, 1955

The aim of this thesis was to better understand and constrain the uncertainties in the future development of the Earth’s climate. At the current time, the best tools we have to simulate the future climate are General Circulation Models which describe the climate system by dividing the Earth’s surface into a discrete three dimensional array of grid-cells in which equations of fluid motion may be applied (McGuffie and Henderson-Sellers, 2001). In model predictions, these uncertainties partly arise from the error in the parameterisation processes used to describe physical processes occurring at a scale less than the grid resolution in the model. Quantifying how the error in the representation of these processes projects onto error in predictions of future climate is essential if those predictions are to be used for planning and policy making (Palmer, 2000). One formalised approach to this problem is a perturbed physics ensemble - where one simulates future climate with a range of models in which uncertain parameters are perturbed (Allen and Stainforth, 2002).

Such an ensemble has been attempted in the past with simple and intermediate

143 144 models of climate, where large numbers of simulations may be feasibly conducted by individual groups (Forest et al., 2000). However, one cannot be sure of the ability of such models to capture feedback mechanisms in unresolved aspects of the climate system once the climate moves away from its current regime. In addition, such models are generally restricted to a prediction of global climate variables, and are therefore of little use as predictors of regional climate. Some attempts have also been made to use an ensemble of GCMs with mostly single parameter perturbations and make linear interpolations between the model response (Murphy et al., 2004), with the method further developed in Webb et al. (2006). However, it has been argued that such methods are somewhat frustrated by their inability to recreate non-linear interactions of model parameters (Stainforth et al., 2005).

The climateprediction.net ensemble attempted to address these issues by conduct- ing a large number of simulations using idle computing time on PCs belonging to interested members of the general public. Sufficient GCM simulations could be conducted so that a limited section of the parameter space could be sampled more completely (Allen et al., 2000). In this thesis, we have taken the results of the first climateprediction.net ensemble and attempted to address three major issues relating to predictions of climate sensitivity (the equilibrium response to a dou- bling of carbon dioxide): we produced a physical explanation for the variation in climate sensitivity - isolating different feedback mechanisms and how they relate to different model parameters and observations. We also address the problem of systematic model error - that is the fundamental error in model formation - which cannot be corrected through parameter perturbations. Finally, we address the ques- tion of which observations of the current atmospheric state are the most useful in determining the likely future development of the Earth’s climate.

7.1 Summary of Results

7.1.1 Chapter 4

In section 2.3, we described the major issues which required addressing for the physical interpretation of the mechanisms driving variation in Climate Sensitivity 145 in the climateprediction.net ensemble. Chapter 4 described a linear analysis process which attempted to address these issues:

• What are the major feedback processes in climateprediction.net and how do they relate to model parameters? - In chapter 4, we illustrated a linear analysis process which showed the dominant feedback processes in the ensemble in the leading modes of an EOF analysis. This analysis revealed two modes whose eigenvalues were well separated such that they could be interpreted physically.

The first of these modes was associated with the entrainment coefficient; a pa- rameter in the model’s convection scheme. A reduced value for this coefficient was found to alter the vertical profile of relative humidity in the pre-industrial control experiment, such that the lower troposphere was dried while the up- per troposphere and lower stratosphere was moistened. The dry troposphere reduces low level cloud cover while increased relative humidity aloft causes an increased clearsky greenhouse effect which is the partial cause of the increase in S in these models. A secondary effect of this perturbation was an increase in longwave cloud forcing upon warming, also in the regions which showed an enhanced clearsky greenhouse effect. Both of these effects are visible in the models’ control state - before any greenhouse gas forcing is applied.

The second dominant mode was correlated with the ice fall speed parameter- isation. This parameter in the model’s cloud scheme scales the rate of fallout of frozen precipitation from clouds. When set to a low value, net precipitation is decreased while cloud cover and humidity increase. This results in a modi- fication to the radiative budget of the model, with increased longwave forcing resulting from global increases in cloud and moisture. This is subject to some shortwave compensation as the increased cloud cover reflects more radiation into space.

A number of secondary parameters were found to become relevant only when both of the above feedbacks were already present. A set parameters associ- ated with low level cloud coverage were found to be important: critical relative humidity for cloud condensation, empirically adjusted cloud fraction and the accretion constant for the accumulation of cloud droplets by precipitation. 146

Together, these parameters were found to control the degree of shortwave compensation in models which exhibited one of the two primary strong feed- backs. By minimising the extent of shortwave compensation - values of S of up to 11K may be obtained.

• How can we create meaningful predictors of climate sensitivity? - Once the ma- jor feedback mechanisms were established, in the section 4.3.3, we attempted to determine their likely amplitude. To accomplish this, we found patterns of ensemble variability in the control state which were found to be effective pre- dictors of each feedback. This was achieved by a simple regression of feedback strength onto the eigenvalues of a truncated EOF set of control state variabil- ity within the ensemble. A set of observations - taken from both ERA-40 and NCEP reanalyses for surface diagnostics, along with ERBE for radiative flux measurements were used to estimate feedback amplitudes once the predictors had been established.

In the first case, we found that the entrainment type feedback was most ef- fectively predicted by examining shortwave top-of-atmosphere flux data. A reduction in the entrainment coefficient was found to dramatically reduce con- vective cloud coverage in the control state - which had a large effect on tropical cloud coverage and thus on net outgoing shortwave flux. Observational projec- tions suggest that ensemble models exhibiting a very strong amplitude for this feedback are not consistent with observations because their up-going shortwave flux is not sufficiently large. There is some ambiguity because ERBE short- wave fluxes are up to 10Wm−2 less than that from either ERA40 or NCEP reanalyses, thus resulting in a larger predicted feedback.

The most effective predictors of the second feedback (associated with the ice- fall speed parameterisation) were found to be a combination of radiative di- agnostics: both up-going clear-sky and cloudy-sky longwave fluxes together with up-going shortwave fluxes and precipitation measurements. These di- agnostics together describe the increase in cloud and moisture, coupled with the decrease in precipitation which best describes models which exhibit this feedback. Observational projections using different sources are consistent in excluding the possibility of the strongest feedbacks of this type. 147

In section 4.4.1 - we examined how the observational projections of the above processes served constrain the possible values of S in the ensemble. In the first approach, a histogram of climate sensitivity was presented both for the full cli- mateprediction.net ensemble and for only those models with observationally consis- tent values of the feedbacks described above (that is, considering only those models which were consistent with any of the observational datasets used in the analysis). It was found that such a restriction decreased the upper bound (95th percentile) of S from 8.9K to 6.2K.

A more rigorous linear prediction of real-world sensitivity values was made in Sec- tion 4.4.2, by following the methodology of Piani et al. (2005). The method finds optimal predictors of climate sensitivity within the control simulations of the en- semble models, which may then be applied to observations in order to predict likely values of S. Uncertainty in the prediction is estimated from the ensemble spread itself. We extended the method by applying the predictors to a number of different observational datasets (ERA40, ERBE and NCEP), as well as testing the ability of the method to predict the sensitivity of other models. We found that the pro- jections using ERA40 and NCEP were consistent with those of Piani et al. (2005) - with 95th percentiles of S at 6.4 and 5.6K respectively. The use of ERBE data increased the upper bound somewhat to approximately 9K - which we attribute to the higher estimated amplitude of the entrainment-type feedback using ERBE data as discussed earlier.

7.1.2 Chapter 5

In section 2.3.2, we introduced the concept of irreducible error. This is the compo- nent of model error which cannot be reduced by further tuning of model parameters. It is relevant to our discussion because the methodology used in chapter 4 and in the published work of Piani et al. (2005) and Knutti et al. (2005) does not include this error when applying predictors trained within the ensemble to separate observations when a fundamental unknown is how valid those predictors are in a different model environment. In chapter 5, we attempted to address this issue by investigating a systematic method to determine the value of irreducible error from the data held 148 within a perturbed physics ensemble.

To accomplish this, it was necessary to sample the parameter space more completely than it had been in the original climateprediction.net ensemble - where parameters were restricted to one of a number of discrete values. A neural network emulator was designed, capable of accurately reproducing model climatology for unseen com- binations of parameters, and making smooth interpolations between known values. Once established, this was used to finely sample the parameter space for HadAM3, allowing a systematic search for the models with the least model difference from ob- servations. A constrained search was also conducted so the best performing model for a given value of S could be found, and a continuous function E(S) could be estimated, representing irreducible error as a function of climate sensitivity. If the function was found to have a well defined minimum at a certain value of S, this implied a strong constraint on S.

Because the emulator output contained information on various different model out- put diagnostics, in figure 5.7 we were able to compare the effectiveness of various observations in their ability to constrain climate sensitivity. We first considered an- nual mean data; the strongest constraints were found from the radiative balance at the top of the atmosphere, with only a very narrow range of models with S between 3.8 and 4.5K being fully consistent with observations. The error in the simulation of total precipitation had a minimum when S was in the range 3.5 to 5K - but this error minimum was significantly greater than the natural variability for annual mean precipitation, implying a significant irreducible error in the simulation of an- nual mean precipitation. Use of annual mean temperature provided the poorest constraint, with models with values of S of up to 10K showing surface temperatures consistent with observations.

We also considered the ability for the value of S to be constrained by the seasonal cycle. This provided a more consistent result for the different observational types. Although radiative fluxes still proved the strongest constraint (again suggesting S between 3.8 and 4.8K), the precipitation and temperature results were consistent - if more poorly defined (i.e. a broader range of models were consistent with ob- servations). An interesting development was the extension of the metric over which E(S) was defined to include all observations simultaneously. In this case, we found 149 that in cases where the model could be tuned to match the individual observations (using seasonal difference), it could not match all observations simultaneously, and thus the irreducible error becomes significantly large.

We demonstrated the potential of the emulator system as an efficient tool for the sampling of a second-generation ensemble. By choosing optimal models over a wide range of S using the emulator, one could simulate a small ensemble with a large range of S with unprecedented efficiency. In figure 5.4, we demonstrated the parameter changes required to produce this optimised ensemble in the case of HadAM3. This was found to support the findings of chapter 4, with the entrainment coefficient and ice fall speed parameters playing a primary role in determining model sensitivity, but with the added effect of critical relative humidity and other parameters modulating low cloud coverage becoming more important at high values of S. Simulating these ‘best models’ on the distributed network acted as some verification of the emulator’s ability to interpolate.

7.1.3 Chapter 6

Chapter 5 formed the basis for a new methodology for the constraint of S, where the irreducible error was not only considered in the analysis - it formed the basis of it. However, the method only produces a function E(S), an estimate of a minimum irreducible error as a function of climate sensitivity. In order to actually place limits on the value of S itself using this method, one must make assumptions on how the probability of a given value of S is related to its corresponding irreducible error. There is not one clear solution to this problem, but a number of approaches were considered in Chapter 6.

Three basic approaches were considered: the first of these was designed to produce a likelihood distribution - which would not integrate to unity, but would produce an estimate of the likelihood each value of S. For each value of Si, the corresponding minimum error E(Si) was determined using the methodology of chapter 5. This was − E S 2 P ( ( i)) related to a likelihood using the equation L(Si)∆S = e 2 ∆S which was an estimate of the likelihood of the error E(Si) occurring naturally in a given 15 year period. However, this approach shows that if one uses all available observations - 150 the likelihood of the best performing model in the ensemble occurring naturally is infinitesimal.

One conclusion of such a finding is that the models in the ensemble are so far from the real world, that any comparative evaluation of their performance is pointless (for the purpose of constraining climate sensitivity, at least). However, it is clear from figure 5.7 that there is certainly some useful information to constrain S within the function E(S) - but a normalisation of the resulting likelihood distribution may be required.

We considered two possible means of normalisation - the first was a simple linear scaling of the likelihood distributions found above, defining a normalising constant − E S 2 P ( ( i)) α such that P (Si)∆S = αe 2 ∆S. The results using this method were shown in Figure 6.3 showed the constraint on climate sensitivity becoming stronger as additional constraints were added to the metric - as might be expected from the Bayesian arguments given in chapter 2. However, such an approach is potentially flawed if one considers that the equation could equally well be written as:

E S 2 logα−P ( ( i)) P (Si)∆S = e 2 ∆S. (7.1)

In this form, it is clear that the normalisation is achieved by ignoring the irreducible error in the best performing model. Thus, the minimum error may be arbitrarily large - and only the absolute difference between model errors at different values of S is relevant. The basic error in this method of thinking is the assumption that the likelihood function L(S) is applicable to a model which lies over 20 standard deviations of natural variability away from the mean. It is very clear that such a model does not directly represent reality, or a possible state of the real world - and we require a method of normalisation which does not simply ignore this fact.

If we instead consider the relative errors of different models - then we are not simply ignoring the fact that models are not close to observations, and we put the difference between model errors into context. In section 6.3.3, we redefined the normalisation − 2 (E(Si)) P β such that P (Si)∆S = e 2 ∆S. In this case, only the relative values of errors at different values of S are relevant to the result. The resulting distributions in Figure 6.4 have interesting properties in that the width of the distribution, and 151 therefore the estimated uncertainty in the estimate of S, actually grows as more observations are added to the metric.

The reason for this was discussed in the previous section - a model can be tuned to match individual observations, but not to match all simultaneously. With a single observation, the minimum error at the most likely value of S is almost zero but at other values of S, the error is large. Thus the ratio between the errors is very large and the constraint is strong. However, as more observations are added to the metric, the ratio of errors at unlikely and likely values of S begins to decrease - because the model can no longer be tuned to produce zero error at the most likely value of S. Thus we are left with the opposite of the Bayesian viewpoint, and the constraint becomes increasingly weak as more observations are added to the metric.

An attempt to validate this methodology was made by treating members of the IPCC CMIP-2 ensemble as “truth”, and attempting to predict their likely climate sensitivity by finding the most similar models in the climateprediction.net ensemble using various observational metrics. The findings were mixed - with some observa- tions acting as more reliable metrics for global response, and some models appearing more “HadAM3-like” than others. In summary - the radiative fluxes acted as the best individual constraint on climate sensitivity, but was only accurate for 7 of the 10 models attempted. As in the case of using reanalysis data, the combining of dif- ferent observations into a single metric resulted in a wider probability distribution, with no significant shift in the mean.

7.2 Caveats and possible extensions for this thesis

The body of work contained within this thesis is intended to primarily show a proof of concept for analysis of future ensembles which could be designed with these methodologies in mind from an early stage. What follows is a discussion of how the basic concepts introduced in chapters 4, 5 and 6 are limited, and how they could be expanded given more resources and an idealised ensemble: 152

7.2.1 Caveats to the feedback analysis technique

7.2.1.1 Limited resolution

One of the major problems with the technique as illustrated in chapter 4 was the limited resolution of the input data, which made it impossible to resolve the scales necessary to identify established modes of climatic variability. The reasons for this were identified in the analysis - that the climateprediction.net ensemble is limited in the amount of data that it is able to return because of the need for end users to upload information back to central servers. Thus, the EOFs shown in chapter 4 are resolved over crude regional means - and clearly there may be relevant information over smaller scales, in the oceans and in other variables and only practical limitations prevent their use. A natural extension would be to produce feedback EOFs and predictors resolved on fully resolved model grid - though this would considerably increase the computational demand of the problem.

7.2.1.2 Sampling strategy

In section 4.3.4, we demonstrated a technique which overcame some of the limita- tions of using a linear technique to analyse a non-linear system by restricting the analysis to a small portion of the whole ensemble, limited by values of the primary modes. We were prevented from repeating this analysis in the region of interest (i.e. that corresponding to the observational projections) because of the sampling strategy of the ensemble; one of the parameters was sampled such that a gap was left in the resulting distribution of the leading modes. Unfortunately, the observa- tions were predicted to lie in this poorly sampled region - making further analysis impossible. Hence, if this method of feedback identification is to be used in the future, it would be most suited to Monte-Carlo style sampling of parameters.

7.2.1.3 Systematic model error

We demonstrated in chapter 4 that the identified feedbacks were the most relevant in describing ensemble variability in climate sensitivity. However, in figure 4.11, it is clear that the established predictors of S found in the climateprediction.net ensemble 153 provide little skill in predicting the equilibrium response of other models in the CMIP ensemble. This illustrates a fundamental problem with such estimations, along with other ensemble-trained predictor based estimates of climate sensitivity - that we don’t know how applicable they are to other models or the real world.

A major advancement for this technique would be to find feedbacks patterns and predictors which were applicable across multiple model types. To achieve this would require a cross-model ensemble, with different model groups submitting perturbed physics runs, together with their response to greenhouse gas forcing. The benefits of such an ensemble might be enormous; the resulting predictors may be less clearly defined and would perhaps produce more uncertain results, but they would be more defensible because they would be model-independent. Uncertainty in estimations of real-world response to forcing using these cross-model predictors would directly reflect the systematic error in the current generation of models. If no predictors were found which were accurate across different model frameworks, then we would infer that ensemble-trained predictors cannot be used to estimate real-world values (assuming that all model frameworks are equally likely).

7.2.1.4 Limited perturbation set

The range of behaviours explored in Chapter 4 was fundamentally limited by the arbitrary parameters chosen to be perturbed in climateprediction.net.Such a crit- icism may be made concerning any perturbed physics ensemble, but it is impor- tant to remember that any finite set of perturbed parameters cannot explore the full behavioural range of the model (unless all model parameters are perturbed). Therefore, it would always be useful to extend this work by perturbing an increased parameter set.

7.2.2 Caveats to the model emulation technique

The methodology presented in chapter 5 is intended as a proof of concept of a tool which is ultimately most useful in the design of future ensembles. Even so, the analysis as presented has several possible extensions: 154

7.2.2.1 Uncertain interpolation

In Chapter 5, we went to some limited lengths to test the ability of the emulator to interpolate between known values. However, the verification set used for this was small in comparison to the rest of the ensemble and a larger verification would be a useful next step for this analysis. In fact, one of the most useful products of the analysis in chapter 5 is the potential to optimise the parameter sampling in a second generation ensemble. In figure 5.4 - we presented a ‘recipe’ for an optimised ensemble, that is, the perturbed parameters which would produce the models with smallest model possible errors over a range of equilibrium response. These models would provide an ideal ‘base set’ of atmospheric perturbations, to be combined with ocean, land surface or sulphur cycle perturbations in a second generation ensemble.

However, one of the most useful potential applications of the technique would be ‘adaptive sampling’. This would involve the neural network emulator being used as an active component of the parameter sampling process. Such a technique would begin with a small ensemble of perturbed physics models, with their parameters sampled in a Monte-Carlo fashion. One could then use a portion of the ensemble to train a model emulator, and the accuracy of the emulator could be tested with the remaining portion of the ensemble. Thus one could quickly identify regions of the ensemble where model behaviour is being incorrectly emulated, and additional simulations could be performed in that region of the parameter space. In addi- tion, one could identify regions of parameter space with steep gradients of model behaviour and perform more experiments in that region. Meanwhile, in regions of the parameter space where the changes have little or no effect and the emulator is producing accurate results - no further experiments would be required.

7.2.2.2 Observational Uncertainty

Throughout this thesis, we have conveniently ignored the uncertainty in the obser- vations themselves. This is justifiable only in that the methods required a ‘perfect’ model with which the ensemble members could be compared. Clearly though, be- fore these methods can be integrated into a framework for establishing probabilities of climate response, they must take account of the uncertainty in the observations. 155

One practical problem with this is often the lack of availability of uncertainty infor- mation for the reanalysis products currently available. One can make some estimate of the uncertainty by considering a range of different reanalyses - but this hardly serves to sample the full range of uncertainty in the observations and the models used to integrate the reanalyses themselves.

If accurate error information becomes available for the range of observations used here in the future, then it is possible to consider that each model in the perturbed ensemble could be associated with a distribution of possible errors, rather than one discrete error value. However, given that the width of the distributions would be identical for all members of the ensemble, then they are unlikely to change the shape of the distributions shown in Figure 5.7.

7.2.2.3 Regional predictions

The methodology is used to constrain climate sensitivity, but it could equally well be applied to any other unknown quantity of interest. The technique could be repeated in its current form to produce estimates of regional sensitivity to greenhouse gas forcing, or future changes in precipitation or soil moisture in specific regions. The analysis shown here used data from a model with a thermodynamic slab ocean, but there is no reason why a fully coupled ocean model could not be used in future analyses.

More complicated predictions could be possible, where joint error functions could be defined (e.g. for ocean heat uptake and climate sensitivity). It would be relatively trivial to extend the analysis to find best performing models in bins of these two quantities. The more complicated issue would, again, be the relationship of these distributions to the actual likelihood of each model - although similar methods to those used in Chapter 6 could be used - where a transfer function between error value and probability could be defined and normalised over the two dimensions to produce a joint PDF. 156

7.2.2.4 Relationship to probability studies

In Chapter 6, we explored some simple assumptions which could be made to make probabilistic statements from the model error / sensitivity distributions derived in Chapter 5. However, there is clearly more to do on this topic and how its conclusions may be reconciled with Bayesian theory. In Chapter 2, we discussed papers such as Annan and Hargreaves (2006), who combined different PDFs derived from observational constraints to produce an overall PDF with a lower uncertainty than any of its constituents. This result would appear to contradict the findings of Chapter 6, which found that if one considers the relative error of different models, then increasing the number of observations in the metric decreased ones certainty of the best performing model, because the irreducible error of the model grows (relative to the difference in error between the models) as more observations are added to the constraint.

However, it is not necessarily true that these two findings contradict each other. Firstly, the methodologies are markedly different - the direct use of observations to constrain sensitivity is in effect assuming that the Earth may be modelled as a simple energy balance model, which has an equal sensitivity to different forcings at different times - which has been shown to be untrue in GCMs (Senior and Mitchell, 2000). Secondly, the findings in this thesis relate explicitly to finding best performing models in a perturbed physics ensemble: the irreducible error increases with the number of observations because of the limitations of the model framework which prevent parameter tuning to match multiple observations simultaneously. These issues are not present in the Bayesian analyses performed to date because those analyses only consider one parameter: the sensitivity itself.

Now we have raised the possible issue of a relationship between irreducible error in a PPE and the number of observations used, a major issue in extending this methodology is determining the correct number of observations to include in the metric. As discussed in the body of the text, one possible resolution to this problem is to consider only those observations which show some correlation to the unknown quantity of interest. It also remains possible that the weakening of constraint with increasing observations will saturate as more observations are added to the metric. 157

However, if this is not the case, and the constraint does weaken predictably as the metric grows - then it must be at least considered that the constraint of unknown climate variables is effectively a fractal problem; that is, that one measures different results (or uncertainty) depending on how hard one looks. If this is indeed the nature of the problem; then future studies of Perturbed Physics Ensembles cannot ignore this property. Integrating this behaviour into a consistent probabilistic framework is the ultimate extension to this thesis. Appendix A

Empirical Orthogonal Functions

Principal component analysis (PCA) is a technique for simplifying a dataset, and EOF analysis is the special case used in geophysical analyses where the data is composed of a matrix with a spatial and temporal dimension. The process is a linear transformation onto an orthogonal basis, such that the first coordinate corresponds to the pattern that describes the maximum possible variance in the original time series, the second coordinate describes the second greatest variance and so on.

Typically, the low ordered principal components are the most important in describ- ing the behaviour of the data, so the set may be truncated at some suitable value when the required aspects of the data are maintained (but not the noise!)

The EOFs of an m by n matrix, X (space by time) are determined by firstly removing the mean and any trend from the input data. We then want to find an orthonormal projection matrix, P, such that:

Y = PT X (A.1) with a constraint that the covariance matrix of Y, cov(Y) is diagonal and P−1 = PT . Hence, by substitution, we can express cov(Y) in terms of X:

i ii

1 cov(Y) = Y  Y∗ N − 1 1 = (PT X)  (PT X)∗ N − 1 1 = PT (X  X∗)P N − 1 1 = PT cov(X)P (A.2) N − 1 which can be written as Pcov(Y)= cov(X)P (A.3)

Now, if P is written as d ∗ 1 column vectors:

P = [P1,P2, ..., Pd] (A.4) and cov(Y) as a diagonal matrix:

λ1 ... 0 . . cov(Y)= . λ . . (A.5) 2

0 ... λ d

Thus, by substituting into equation A.3, we obtain:

[λ1P1,λ2P2, ..., λdPd] = [cov(X)P1,cov(X)P2, ..., cov(X)P3)] (A.6)

which is an eigenvalue equation, such that λiPi = cov(X)Pi. So, by finding the eigenvectors of X’s covariance matrix, we can obtain a projection matrix, P which satisfies the desired constraints.

Now we have an orthogonal basis. The covariance matrix is diagonal - so each pat- tern varies only with itself, and is responsible for a portion of the total variance given by the eigenvalue, λi. Hence, if we arrange the eigenvalues in order of decreasing magnitude, the first few principal components describe most of the variance in the original dataset. iii

n(spatial) n(EOFs)

n(EOFs) n(spatial)



) )

) Loadings

s s )

s Amplitudes

s

F F

e Principal Components

e

O O

m

i m

t E

  E i

(

( (

t (

n

n n n

Figure A.1: An illustration of the outputs from a Principal Component Analysis on a space by time input matrix iv

The loadings of a principal component analysis are the projections of the principal components onto the original dataset. Hence, by multiplying the loading matrix, the eigenvalues and the principal component matrix - we regain the original dataset (or an approximation of the original dataset, if the set of EOFs has been truncated). This process is illustrated in figure A.1.

Note, that this process can also be performed with the cross-correlation matrix of the input dataset - where the timeseries for each spatial point is normalised before the analysis. Each physical situation must be carefully considered to decide which property is the most applicable, for example Ambaum et al. (2001) show how the dominant pattern of variablity changes from the Arctic Oscillation to the North Atlantic oscillation depending on which method is used.

Finally, physical meaning can only be extracted from EOFs whose eigenvalues are well separated (although it is not guaranteed!). North et al. (1982) investigated the error associated with the sampling of EOFs, and found that if some eigenvalues were close together, then they may form a ’multiplet’ where the EOFs inferred from the analysis were linear combinations of the true modes of the system. To seperate such patterns, a method of rotation must be employed - such as the Varimax algorithm (Kaiser, 1958). Appendix B

Neural Network Architecture

We employ an Artificial Neural Network (ANN) to emulate the response of the climate model output. We will briefly discuss the theory of Neural Network archi- tecture and training here, but a full discussion of the topic is given in Hagan et al. (1996).

The network employed is a two layer, feed-forward ANN, illustrated in figure B.1.

The elements of the input vector, pil consist of the independent perturbed parameter set associated with each model, i. The parameters are listed in table 3.2. The input to the neural network is a vector of P elements which defines the parameter set for any given model in the ensemble.

The output vector, b2, is the quantity we wish to predict. In the first instance, this may be a single value: the model’s Climate Sensitivity, Si. However, later we extend the analysis to predict the set of K EOF amplitudes, wik (defined earlier).

In order for the ANN to best approximate a function linking these quantities, it is separated into layers. The input for the first ‘hidden’ layer is the set of model parameters, pil. Each layer consists of a number of neurons, N. The input data for 1 1 each neuron in the first layer is first weighted by ωnl, and added to a bias, bn. The 1 output of each neuron, an is the hyperbolic tangent of the weighted input:

P 1 1 1 an = tanh bn + ωnlpl . (B.1) ! Xl=1

v vi

Figure B.1: A diagram illustrating the flow of information through a two layer feed- forward neural network. The input vector has two elements. The first layer is a ‘tansig’ type, with two neurons and the second is a ‘purelin’ type with one neuron.

Deciding on the number of neurons in the hidden layer is a balance between having enough to correctly capture the complexity of the underlying function but not so many as to over-fit to the data. In this work, each input parameter is an independent variable, so we choose to have an equal number of P neurons in the hidden layer.

Here, we will consider the case of a network with a single output. The second layer 1 is referred to as the linear layer. Its input is the collection of outputs, an from the 2 2 hidden layer. Each input is weighted by ωn and the result is biased by b to produce the final output of the network, a2:

N 2 2 2 1 a = b + ωnan . (B.2) n=1 ! X To train the network to correctly emulate the output of the model ensemble, we em- ploy the Levenburg-Marquardt learning algorithm (Hagan and Menhaj, 1994). The backpropogation algorithm is a gradient descent optimisation, where the algorithm is provided with a set of examples of ‘proper’ network behaviour. In this case, the training set is provided by 60% of the available models in the ensemble, totalling Q (about 4,000) examples:

[p1, t1], [p2, t2], ..., [pQ, tQ], (B.3) vii

where pq is the parameter set, and tq is the corresponding target output. The algorithm should adjust the network weights and biases in order to minimise the sum squared error between output a and target t:

Q 2 F (x)= (tq − aq) . (B.4) q=1 X

This is achieved by making the stochastic assumption that the sum squared error can be replaced with the error on the latest target in the kth iteration:

Fˆ(x)=(t(k) − a(k))2. (B.5)

We solve first for the weights and biases in the linear layer. The iteration may be expressed as follows:

ˆ 2 2 ∂F ωn(k +1)= ωn(k) − α 2 , (B.6) ∂ωn ˆ 2 2 ∂F bn(k +1) = bn(k) − α 2 , (B.7) ∂bn where α is the learning rate of the network. Hence, it remains to calculate the partial derivatives. We can use the chain rule to express them as a product of derivatives of the inputs to the linear layer:

∂Fˆ ∂Fˆ ∂i2 2 = 2 × 2 (B.8) ∂ωn ∂i ∂ωn ∂Fˆ ∂Fˆ ∂i2 = × , (B.9) ∂b2 ∂i2 ∂b2

The input i2 can then be expressed in terms of the outputs of the hidden layer:

N 2 2 2 1 i = b + ωnan. (B.10) n=1 X viii

Thus the right-hand derivatives in equations B.8 and B.9 may be expressed easily:

2 2 ∂i 1 ∂i 2 = an, 2 =1. (B.11) ∂ωn ∂b

The left-hand derivatives are known as the sensitivities of the final error to the input of the linear layer, i2. The weighted input to the linear layer, i2, is the same as the final network output,a(k), so the derivative can be calculated from equation B.5:

∂Fˆ ∂a(k) s2 = = −2(t(k) − a(k)) = −2(t(k) − a(k)). (B.12) ∂i2 ∂i2

The calculation of weights and biases in the hidden layer is much the same, but the sensitivities must be back propagated through the network. Given there are N inputs to the hidden layer, there are N sensitivities which are calculated with reference to s2: ˆ ˆ 2 1 1 ∂F ∂F ∂i 2 2 ∂tanh(in) sn = 1 = 2 × 1 = s ωn 1 . (B.13) ∂in ∂i ∂in ∂in

Thus, the weights and biases in the network are adjusted by iterating through all ensemble members. To avoid over-fitting to the data, we use a technique known as early stopping (Prechelt, 1998), which involves keeping a proportion of the ensemble as a verification set. On each iteration, the network is used to predict the sum prediction error for the verification set. As the network is trained, this error will decrease, but will stabilise when the network is fully trained, and will actually begin to increase when the network begins to over-fit to the training set. Hence the training process is stopped when the verification error is stable with each iteration.

The error associated with the prediction process may be checked at this point by measuring the prediction error for the verification set. Bibliography

Allan, R., A. Pamment, M. Ringer, and A. Slingo, 2001: First annual report on wp4200, Technical report, UK Met Office.

Allen, M., 1999: Do it yourself climate prediction. Nature, 401.

Allen, M., D. Frame, J. Kettleborough, and D. Stainforth, 2003: Model error in weather and climate forecasting. Proceedings of the 2002 ECMWF Predictability Seminar.

Allen, M. and W. Ingram, 2002: Constraints on future changes in climate and the hydrologic cycle. Nature, 419, 224–232.

Allen, M. and D. Stainforth, 2002: Towards objective probabalistic climate fore- casting. Nature, 419(6903), 228.

Allen, M., P. Stott, J. Mitchell, R. Schnur, and T. Delworth, 2000: Quantifying the uncertainty in forecasts of anthropogenic climate change. Nature, 407(6804), 617–620.

Ambaum, M. H., B. J. Hoskins, and D. B. Stephenson, 2001: Arctic Oscillation or North Atlantic Oscillation. Journal of Climate, 14.

Anderson, D., J. Cobb, E. Korpela, M. Lebofsky, and D. Werthimer, 2002: SETI@ home: an experiment in public-resource computing. Communications of the ACM , 45(11), 56–61.

Anderson, T., R. Charlson, S. Schwartz, R. Knutti, O. Boucher, H. Rohe, and J. Heintzenberg, 2003: Climate forcing by aerosols: a hazy picture. Science, 300, 1103–1104.

ix x

Annan, J., 2006: Efficiently constraining climate sensitivity with ensembles of pale- oclimate simulations. Geophysical Research Abstracts, 8, 05482.

Annan, J. and J. Hargreaves, 2006: Using multiple observationally-based constraints to estimate climate sensitivity. Geophysical Research Letters, 33.

Bajuk, L. and C. Leovy, 1998: Seasonal and Interannual Variations in Stratiform and Convective Clouds over the Tropical Pacific and Indian Oceans from Ship Observations. Journal of Climate, 11(11), 2922–2941.

Betts, R., 2001: Biogeophysical impacts of land use on present-day climate: near- surface temperature change and radiative forcing. Atmospheric Science Letters, 2(1-4), 39–51.

Bonan, G., D. Pollard, and S. Thompson, 1992: Effects of boreal forest vegetation on global climate. Nature, 359(6397), 716–718.

Bony, S. and J. Dufresne, 2005: Marine boundary layer clouds at the heart of tropi- cal cloud feedback uncertainties in climate models. Geophysical Research Letters, 32(20), 20806–20806.

CESS, R., 1975: Global climate change- An investigation of atmospheric feedback mechanisms. Tellus, 27(3), 193–198.

CESS, R., G. POTTER, J. BLANCHET, G. BOER, A. DEL GENIO, et al., 1990: Intercomparison and interpretation of climate feedback processes in 19 atmo- spheric general circulation models. Journal of Geophysical Research, 95(D10), 16601–16615.

Cess, R. D., 1989: Gauging water-vapour feedback. Nature, 342, 736–737.

—, 1996: Cloud feedback in atmospheric general circulation models: An update. J. Geophys. Res., 101, 12791–12794.

Chou, C. and J. D. Neelin, 1999: Cirrus detrainment-temperature feedback. Geo- physical Research Letters, 26(9), 1295–1298.

Collins, M., 2002: Climate predictability on interannual to decadal time scales: The initial value problem. Climate Dynamics, 19(8), 671–682. xi

Collins, M., B. B. B. Booth, G. R. Harris, J. M. Murphy, D. M. H. Sexton, and M. J. Webb, 2006a: Towards quantifying uncertainty in transient climate change. Climate Dynamics, 27, 127–147.

Collins, W., V. Ramaswamy, M. Schwarzkopf, Y. Sun, R. Portmann, Q. Fu, S. Casanova, J. Dufresne, D. Fillmore, P. Forster, et al., 2006b: Radiative forc- ing by well-mixed greenhouse gases: Estimates from climate models in the IPCC AR4. J. Geophys. Res, 111.

Colman, R., 2003: A comparison of climate feedbacks in general circulation models. Climate Dynamics, 20(7), 865–873.

Covey, C., K. M. Achutarao, P. J. Gleckler, T. J. Phillips, K. E. Taylor, and M. F. Wehner, 2004: Coupled ocean-atmosphere climate simulations compared with simulations using prescribed sea surface temperature: Effect of a ”perfect ocean”. Global and Planetary Change, 41, 1–14.

Cox, P. M., R. A. Betts, C. D. Jones, S. A. Spall, and I. J. Totterdell, 2000: Ac- celeration of global warming due to carbon-cycle feedbacks in a coupled climate model. Nature, 408, 184–187.

Dai, A., A. Genio, and I. Fung, 1997: Clouds, precipitation and temperature range. Nature, 386(6626), 665–666.

Dufresne, J. L., P. Friedlingstein, M. Berthelot, L. Bopp, P. Ciais, L. Fairhead, H. Le- Treut, and P. Monfray, 2002: On the magnitude of positive feedback between future climate change and the carbon cycle. Geophysical Research Letters, 29(10), 43.

Forest, C. E., M. R. Allen, P. H. Stone, and A. P. Sokolov, 2000: Constraining uncer- tainties in climate models using climate change detection techniques. Geographical Research Letters, 27, 569–572.

Forest, C. E., P. H. Stone, A. P. Sokolov, M. R. Allen, and M. D. Webster, 2002: Quantifying uncertainties in climate system properties with the use of recent climate observations. Science, 295, 113–117. xii

Forster, P. and J. Gregory, 2005: The Climate Sensitivity and Its Components Diagnosed from Earth Radiation Budget Data. Journal of Climate, 19(1), 39–52.

Forster, P., K. Shine, and N. Stuber, 2006: It is premature to include non-CO2 effects of aviation in emission trading schemes? Atmospheric environment(1994), 40(6), 1117–1121.

Frame, D., B. Booth, J. Kettleborough, D. Stainforth, J. Gregory, M. Collins, and M. Allen, 2005: Constraining climate forecasts: The role of prior assumptions. Geophysical Research Letters, 32(9), 9702–9702.

Giorgi, F. and R. Francisco, 2000: Uncertainties in regional climate change pre- diction: a regional analysis of ensemble simulations with the HadCM2 coupled AOGCM. Climate Dynamics, 16(2), 169–182.

Gong, W., X. Zhou, and W. Wang, 1994: A diagnostic study of feedback mechanisms in greenhouse effects simulated by NCAR CCM1. Acta Meteorol Sinica, 8, 270– 282.

Grabowski, W., 2000: Cloud Microphysics and the Tropical Climate: Cloud- Resolving Model Perspective. Journal of Climate, 13(13), 2306–2322.

Gregory, D. and P. Inness, 1996: Unified model documentation paper no. 27: Con- vection scheme, Technical report, UK Met Office.

Gregory, D. and D. Morris, 1996: The sensitivity of climate simulations to the specification of mixed phase clouds. Climate Dynamics, 12(9), 641–651.

Gregory, D. and P. Rowntree, 1990: A Mass Flux Convection Scheme with Rep- resentation of Cloud Ensemble Characteristics and Stability-Dependent Closure. Monthly Weather Review, 118(7), 1483–1506.

Gregory, J., R. Stouffer, S. Raper, P. Stott, and N. Rayner, 2002: An Observa- tionally Based Estimate of the Climate Sensitivity. Journal of Climate, 15(22), 3117–3121.

Hagan, M. and M. Menhaj, 1994: Training feedforward networks with the Mar- quardt algorithm. Neural Networks, IEEE Transactions on, 5(6), 989–993. xiii

Hagan, M. T., H. B. Demuth, and M. H. Beale, 1996: Neural Network design, PWS Publishing.

Hahn, C., S. Warren, and J. London, 1996: Edited synoptic cloud reports from ships and land stations over the globe, 1982-1991. ORNL/CDIAC–77; NDP–026B, Oak Ridge National Lab., TN, USA. Carbon Dioxide Information Analysis Center.

Hall, A. and S. Manabe, 1999: The Role of Water Vapor Feedback in Unperturbed Climate Variability and Global Warming. Journal of Climate, 12(8), 2327–2346.

Hansen, J., M. Allen, D. Stainforth, A. Heaps, and P. Stott, 2001: Casino-21: Climate simulation of the 21st century. World resource review, 13(2), 187.

Hansen, J. and L. Nazarenko, 2004: Soot climate forcing via snow and ice albedos. Proceedings of the National Academy of Sciences, 101(2), 423–428.

Hansen, J., G. Russell, A. Lacis, I. Fung, D. Rind, and P. Stone, 1985: Climate Response Times: Dependence on Climate Sensitivity and Ocean Mixing. Science, 229(4716), 857.

Hansen, J., M. Sato, A. Lacis, R. Ruedy, I. Tegen, and E. Matthews, 1998: Climate forcings in the Industrial era.

Harrison, R. and K. Shine, 1999: A review of recent studies of the influence of solar changes on the Earths climate. Hadley Centre Technical Note, 6.

Haywood, J. and O. Boucher, 2000: Estimates of the direct and indirect radiative forcing due to tropospheric aerosols: A review. Rev. Geophys, 38(4), 513–543.

Hegerl, G., T. Crowley, W. Hyde, and D. Frame, 2006: Climate sensitivity con- strained by temperature reconstructions over the past seven centuries. Nature, 440(7087), 1029–1032.

Hegerl, G., P. Stott, M. Allen, J. Mitchell, S. Tett, and U. Cubasch, 2000: Optimal detection and attribution of climate change: sensitivity of results to climate model differences. Climate Dynamics, 16(10), 737–754.

Held, I. M. and B. J. Soden, 2000: Water vapor feedback and global warming. Annual Review of Energy and Environment, 25, 441–475. xiv

Heymsfeld, A., 1977: Precipitation development in stratiform ice clouds: an obser- vational and dynamical study. Journal of Atmospheric Science, 46, 2252–2264.

Hoffert, M. and C. Covey, 1992: Deriving global climate sensitivity from palaeocli- mate reconstructions. Nature, 360(6404), 573–576.

Houghton, J., G. Jenkins, and J. Ephraums, eds., 1990: Scientific Assessment of Climate Change., Cambridge University Press.

Hoyt, D. and K. Schatten, 1993: A discussion of plausible solar irradiance variations, 1700-1992. Journal of Geophysical Research, 98(A11), 18–895.

Hsieh, W. W. and B. Tang, 1998: Applying neural network models to prediction and data analysis in meteorology and oceanography. Bull. Am. Meteorol., 79, 1855–1870.

Ingram, W., 1990: Unified model documentation paper no. 23: Radiation, Technical report, UK Met Office.

Kaiser, H. F., 1958: The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.

Kiehl, J., T. Schneider, R. Portmann, and S. Solomon, 1999: Climate forcing due to tropospheric and stratospheric ozone. Journal of Geophysical Research, 104(D24), 31239–31254.

KLEIN, S. and C. JAKOB, 1999: Validation and sensitivities of frontal clouds simulated by the ECMWF model. Monthly weather review, 127(10), 2514–2531.

Knight, C., S. Knight, N. Massey, T. Aina, C. Christensen, D. Frame, J. Kettlebor- ough, A. Martin, S. Pascoe, B. Sanderson, et al., 2007: Association of parameter, software, and hardware variation with large-scale behavior across 57,000 climate models. Proceedings of the National Academy of Sciences, 104(30), 12259.

Knutti, R., G. Meehl, M. Allen, and D. Stainforth, 2005: Constraining Climate Sensitivity from the Seasonal Cycle in Surface Temperature. Journal of Climate, 19(17), 4224–4233. xv

Knutti, R., G. A. Meehl, M. R. Allen, and D. A. Stainforth, 2006: Constraining climate sensitivity from the seasonal cycle in surface temperature. Journal of Climate, 19, 4224–4233.

Knutti, R., T. Stocker, F. Joos, and G. Plattner, 2003: Probabilistic climate change projections using neural networks. Climate Dynamics, 21(3), 257–272.

Lamb, H., 1970: Volcanic Dust in the Atmosphere; with a Chronology and Assess- ment of Its Meteorological Significance. Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences, 266(1178), 425–533.

LAU, K. and H. WU, 2003: Warm rain processes over tropical oceans and climate implications. Geophysical research letters, 30(24), 7–7.

Lindzen, R. S., M. D. Chou, , and A. Y. Hou, 2001: Does the earth have an adaptive infrared iris? Bull. Amer. Meteorol. Soc., 82, 417–432.

Lorenz, E. N., 1975: The physical basis of climate and climate modeling. Climate Predictability, GARP Publication Series, 16, 132–136.

Marotzke, J. and P. H. Stone, 1995: Atmospheric transports, the thermohaline circulation, and flux adjustments in a simple coupled model. Journal of Physical Oceanography, 25, 1350–1364.

McAvaney, B. and H. Le Treut, 2003: The cloud feedback intercomparison project:(CFMIP). CLIVAR Exchangessupplementary contributions, 26.

McGuffie, K. and A. Henderson-Sellers, 2001: Forty years of numerical climate modelling. International Journal of Climatology, 21, 10671109.

Meehl, G., W. Washington, J. Arblaster, and A. Hu, 2004: Factors Affecting Climate Sensitivity in Global Coupled Models. Journal of Climate, 17(7), 1584–1596.

Meehl, G. A., W. M. Washington, and J. M. Arblaster, 2003: Factors affecting cli- mate sensitivity in global coupled climate models. Paper presented at the Ameri- can Meteorological Society 83rd Annual Meeting, Long Beach, CA.. xvi

Mellor, G., 1977: The Guassian cloud model relations. Journal of Atmospheric Sciences, 34, 1483–1484.

Murphy, J., 1995: Transient response of the Hadley Centre coupled ocean- atmosphere model to increasing carbon dioxide. Part 3: Analysis of global-mean response using simple models. Journal of Climate, 8(3).

Murphy, J., D. Sexton, D. Barnett, G. Jones, M. Webb, M. Collins, and D. Stain- forth, 2004: Quantification of modelling uncertainties in a large ensemble of cli- mate change simulations. Nature, 430(7001), 768–772.

Naveau, P., C. Ammann, H. Oh, and W. Guo, 2002: A statistical methodology to extract the volcanic signal in climatic time series. Volcanism and the Earths Atmosphere, 177–186.

North, G., T. Bell, R. Cahalan, and F. Moeng, 1982: Sampling Errors in the Es- timation of Empirical Orthogonal Functions. Monthly Weather Review, 110(7), 699–706.

Palmer, T. N., 2000: Predicting uncertainty in forecasts of weather and climate. Rep. Prog. Phys., 63, 71–116.

Papoulis, A., 1984: Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York.

Petoukhov, V., A. Ganopolski, V. Brovkin, M. Claussen, A. Eliseev, C. Kubatzki, and S. Rahmstorf, 2000: CLIMBER-2: a climate system model of intermedi- ate complexity. Part I: model description and performance for present climate. Climate Dynamics, 16(1), 1–17.

Piani, C., D. Frame, D. Stainforth, and M. Allen, 2005: Constraints on climate change from a multi-thousand member ensemble of simulations. Geophysical re- search letters, 32(23), 23825–23825.

Pielke, R. A., 1998: Climate prediction as an initial value problem. Bull. Amer. Meteorol. Soc., 79, 2743–2746. xvii

Prechelt, L., 1998: Early stopping - but when? Lecture Notes in Computer Science, 1524, 55–70.

Quante, M., 2004: The role of clouds in the climate system. J. Phys. IV France, 121, 61–86.

Ramaswamy, V., O. Boucher, J. Haigh, D. Hauglustaine, J. Haywood, G. Myhre, T. Nakajima, G. Shi, and S. Solomon, 2001: Climate Change 2001: The Scientific Basis. Contribution of working group I to the Third Assessment Report of the Intergovernmental Panel on Climate Change, 8510.

Raper, S., J. Gregory, and R. Stouffer, 2002: The Role of Climate Sensitivity and Ocean Heat Uptake on AOGCM Transient Temperature Response. Journal of Climate, 15(1), 124–130.

Ringer, M., B. McAvaney, N. Andronova, L. Buja, M. Esch, W. Ingram, B. Li, J. Quaas, E. Roeckner, C. Senior, et al., 2006: Global mean cloud feedbacks in idealized climate change experiments. GEOPHYSICAL RESEARCH LETTERS, 33, L07718.

Roberts, D. and A. Jones, 2004: Climate sensitivity to black carbon aerosol from fossil fuel combustion. J. Geophys. Res, 109, 1–12.

RODWELL, M. and T. PALMER, 2007: Using numerical weather prediction to assess climate models. Quarterly Journal of the Royal Meteorological Society, 133(622 A), 129–146.

Roe, G. H. and M. B. Baker, 2007: Why Is Climate Sensitivity So Unpredictable? Science, 318(5850), 629–632.

Rougier, J., 2007: Probabilistic inference for future climate using an ensemble of climate model evaluations. Climatic Change, 81, 247–264.

Rougier, J. and D. Sexton, 2007: Inference in ensemble experiments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 365(1857), 2133–2143. xviii

Schneider, E., B. Kirtman, and R. Lindzen, 1997: Tropospheric Water Vapor and Climate Sensitivity. Journal of the Atmospheric Sciences, 56(11), 1649–1658.

Senior, C. and J. Mitchell, 1993: Carbon Dioxide and Climate. The Impact of Cloud Parameterization. Journal of Climate, 6(3), 393–418.

—, 2000: Time-dependence of climate sensitivity. Geophysical Research Letters, 27(17), 2685–2688.

Senior, C. A., 1998: Comparison of mechanisms of cloud-climate feedbacks in GCMs. Journal of Climate, 12, 1480–1489.

Senior, C. A., 1999: Comparison of Mechanisms of Cloud-Climate Feedbacks in GCMs. Journal of Climate, 12, 1480–1489.

SIMPSON, J. and V. WIGGERT, 1969: Models of precipitating cumulus towers. Monthly Weather Review, 97(7), 471–489.

Slingo, A., 1989: A GCM parameterisation for the shortwave radiative properties of water clouds. J. Atmos. Sci, 46, 1419–1427.

SMAGORINSKY, J., S. MANABE, and J. HOLLOWAY Jr, 1965: Numerical results from a nine-level general circulation model of the atmosphere. Monthly Weather Review, 93(12), 727–768.

Smith, L., 2002: What can we learn from climate forecasts? Proceedings of the National Academy of Sciences, 99, 2487–2492.

Smith, R., D.Gregory, C. Wilson, and A. Bushell, 1997: Calculation of saturates specific humidity and large-scale cloud, Technical report, UK Met Office.

Smith, R., D. Gregory, J. Mitchell, A. Bushell, and D. Wilson, 1998: UM Docu- mentation No. 26: Large Scale Precipitation, Technical report, UK Met Office.

Smith, R. N. B., 1990: A scheme for predicting layer clouds and their water con- tent in a general circulation model. Quarterly Journal of the Royal Meteorological Society, 116(492), 435–460. xix

Soden, B., A. Broccoli, and R. S. Hemler, 2004: Use of cloud forcing to estimate cloud feedback. Journal of Climate, 17(19), 3661–3665.

Soden, B. and I. Held, 2006: An Assessment of Climate Feedbacks in Coupled Ocean–Atmosphere Models. Journal of Climate, 19(14), 3354–3360.

Soden, B., R. Wetherald, G. Stenchikov, and A. Robock, 2002: Global Cooling After the Eruption of Mount Pinatubo: A Test of Climate Feedback by Water Vapor. Science, 296(5568), 727.

Sokolov, A. P. and P. H. Stone, 1998: A flexible climate model for use in integrated assessments. Clim. Dyn., 14, 291–303.

Solomon, S., D. Qin, M. Manning, M. Marquis, K. Averyt, M. M. Tignor, J. H. LeRoy Miller, and Z. Chen, eds., 2007: Climate Change 2007: The Physical Science Basis, Cambridge University Press.

Specht, D., 1990: Probabilistic neural networks and the polynomial Adaline ascom- plementary techniques for classification. Neural Networks, IEEE Transactions on, 1(1), 111–121.

Stainforth, D., T. Aina, C. Christensen, M. Collins, N. Faull, D. Frame, J. Kettle- borough, S. Knight, A. Martin, J. Murphy, et al., 2005: Uncertainty in predictions of the climate response to rising levels of greenhouse gases. Nature, 433(7024), 403–406.

Stone, D., M. Allen, and P. Stott, 2007: A Multimodel Update on the Detection and Attribution of Global Surface Warming. Journal of Climate, 20(3), 517–530.

Stothers, R., 1996: Major optical depth perturbations to the stratosphere from volcanic eruptions: Pyrheliometric period, 1881-1960. Journal of Geophysical Re- search, 101(D2), 3901–3920.

Stott, P. and J. R. Kettleborough, 2002: Origins and estimates of uncertainty in predictions of twenty-first century temperature rise. Nature, 416, 723–725. xx

Sundqvist, H., 1978: A parametrization scheme for non-convective condenstion in- cluding prediction of cloud water content. Quarterly Journal of the Royal Meteo- rological Society, 104, 677–690.

Tebaldi, C., R. Smith, D. Nychka, and L. Mearns, 2006: Quantifying Uncertainty in Projections of Regional Climate Change: A Bayesian Approach to the Analysis of Multimodel Ensembles. Journal of Climate, 18(10), 1524–1540.

Tselioudis, G., A. DelGenio, W. Kovari Jr, and M. Yao, 1998: Temperature Depen- dence of Low Cloud Optical Thickness in the GISS GCM: Contributing Mecha- nisms and Climate Implications. Journal of Climate, 11(12), 3268–3281.

Vogl, T., J. Mangis, J. Rigler, W. Zink, and D. Alkon, 1988: Accelerating the convergence of the backpropagation method. Biological Cybernetics, 59, 257–263.

Watterson, I., M. Dix, and R. Colman, 1999: A comparison of present and dou- bled CO 2 climates and feedbacks simulated by three general circulation models. Journal of Geophysical Research-Atmospheres, 104(D2), 1943–1956.

Weare, B., 1997: Comparison of NCEP–NCAR Cloud Radiative Forcing Reanalyses with Observations. Journal of Climate, 10(9), 2200–2209.

Webb, M., C. Senior, D. Sexton, W. Ingram, K. Williams, M. Ringer, B. McAvaney, R. Colman, B. Soden, R. Gudgel, et al., 2006: On the contribution of local feedback mechanisms to the range of climate sensitivity in two GCM ensembles. Climate Dynamics, 27(1), 17–38.

Wetherald, R. and S. Manabe, 1986: An investigation of cloud cover change in response to thermal forcing. Climatic Change, 8(1), 5–23.

—, 1988: Cloud feedback processes in the general circulation model. Journal of the atmospheric sciences, 45(8), 1397–1415.

Williams, K., M. Ringer, and C. Senior, 2003: Evaluating the cloud response to climate change and current climate variability. Climate Dynamics, 20(7), 705– 721. xxi

Wood, R. and P. R. Field, 2000: Relationships between total water, condensed water and cloud fraction in stratiform clouds examined using aircraft data. J. Atmos. Sci., 57, 1888–1905.

Wu, X., 2001: Effects of Ice Microphysics on Tropical Radiative–Convective–Oceanic Quasi-Equilibrium States. Journal of the Atmospheric Sciences, 59(11), 1885– 1897.

Yao, M. and A. Del Genio, 1999: Effects of Cloud Parameterization on the Simula- tion of Climate Changes in the GISS GCM. Journal of Climate, 12(3), 761–779.

Yu, H., Y. Kaufman, M. Chin, G. Feingold, L. Remer, T. Anderson, Y. Balkanski, N. Bellouin, O. Boucher, S. Christopher, et al., 2006: A review of measurement- based assessments of the aerosol direct radiative effect and forcing. Atmos. Chem. Phys, 6, 613–666.

Zhang, G. and N. McFarlane, 1995: Sensitivity of climate simulations to the pa- rameterization of cumulus convection in the Canadian Climate Centre general circulation model. Atmos.–Ocean, 33, 407–446.

Zhang, M., J. Kiehl, and J. Hack, 1996: Cloud-radiative Feedback as produced by different parameterizations of cloud emissivity in CCM2. Climate Sensitivity to Radiative Perturbations: Physical Mechanisms and their Validation. NATO ASI Series, 1, 34.