<<

INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 25: 581–610 (2005) Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/joc.1143

TOWARDS ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS OF WEST ANTARCTIC CLIMATE WITH ARTIFICIAL NEURAL NETWORKS

DAVID B. REUSCH,a,* BRUCE C. HEWITSONb and RICHARD B. ALLEYa a Department of Geosciences and EMS Environment Institute, The Pennsylvania State University, University Park, PA 16802 USA b Department of Environmental and Geographical Sciences, University of Cape Town, Private Bag, Rondebosch 7701, South Received 9 March 2004 Revised 22 August 2004 Accepted 12 November 2004

ABSTRACT Ice cores have, in recent decades, produced a wealth of palaeoclimatic insights over widely ranging temporal and spatial scales. Nonetheless, interpretation of ice-core-based climate proxies is still problematic due to a variety of issues unrelated to the quality of the ice-core data. Instead, many of these problems are related to our poor understanding of key transfer functions that link the atmosphere to the ice. This study uses two tools from the field of artificial neural networks (ANNs) to investigate the relationship between the atmosphere and surface records of climate in . The first, self-organizing maps (SOMs), provides an unsupervised classification of variables from the mid- troposphere (700 hPa temperature, geopotential height and specific humidity) into groups of similar synoptic patterns. An SOM-based climatology at annual resolution (to match ice-core data) has been developed for the period 1979–93 based on the European Centre for Medium-Range Weather Forecasts (ECMWF) 15-year reanalysis (ERA-15) dataset. This analysis produced a robust mapping of years to annual-average synoptic conditions as generalized atmospheric patterns or states. Feed-forward ANNs, our second ANN-based tool, were then used to upscale from surface data to the SOM-based classifications, thereby relating the surface sampling of the atmosphere to the large-scale circulation of the mid-troposphere. Two recorders of surface climate were used in this step: automatic weather stations (AWSs) and ice cores. Six AWS sites provided 15 years of near-surface temperature and pressure data. Four ice-core sites provided 40 years of annual accumulation and major ion chemistry. Although the ANN training methodology was properly designed and followed standard principles, limited training data and noise in the ice-core data reduced the effectiveness of the upscaling predictions. Despite these shortcomings, which might be expected to preclude successful analyses, we find that the combined techniques do allow ice-core reconstruction of annual-average synoptic conditions with some skill. We thus consider the ANN-based approach to upscaling to be a useful tool, but one that would benefit from additional training data. Copyright  2005 Royal Meteorological Society.

KEY WORDS: ice cores; synoptic reconstruction; artificial neural networks; self-organizing maps; West Antarctica

1. INTRODUCTION This work seeks to use ice-core proxy datasets to reconstruct 40 years (1954–93) of West Antarctic annual climate as seen in the mid-tropospheric circulation using the well-known but poorly understood link between the atmosphere and ice cores. Ice-core proxy data is calibrated to atmospheric circulation data for the period 1979–93 and then used to predict the latter for the period 1954–78. As an additional test of our method, automatic weather station (AWS) data are also used for a limited reconstruction in the 1979–93 period. As described in further detail in the following sections, our approach to this problem consists of four steps.

1. Simplify the atmosphere: self-organizing maps (SOMs) extract patterns of variability by doing a classification into generalized states. These patterns are useful both for studying the recent atmosphere and as components of the subsequent reconstruction.

* Correspondence to: David B. Reusch, Department of Geosciences and EMS Environmental Institute, The Pennsylvania State University, University Park, PA 16802, USA; e-mail: [email protected]

Copyright  2005 Royal Meteorological Society 582 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY

2. Link ice-core proxy and AWS datasets to the atmosphere: a feed-forward artificial neural network (FF ANN) is trained to predict the patterns from (1) using either ice-core or AWS data. This step calibrates the upscaling tool. 3. Reconstruct earlier climate: ice-core data outside the calibration period are used with the trained FF ANN to predict the associated atmospheric patterns for the rest of the ice-core period. Given sufficient confidence in these predictions, they would be used to develop a full time series of reconstructed climate. 4. Evaluate the methodology: confidence in a climate reconstruction is tied to the data and steps involved in creating it. We thus evaluate the reliability of the SOM-based analysis and FF ANN-based upscaling steps and comment on issues associated with the ice-core data that reduce skill in the upscaling step.

1.1. Ice cores and climate Decades of research have shown ice cores to be extremely valuable records of the ’s climate from subannual to millenial time scales and beyond (e.g. Wolff and Peel, 1985; Legrand et al., 1988; Mayewski et al., 1988, 1997; Zielinski et al., 1994; White et al., 1999). As with marine sediment cores, tree rings and other climate proxies, interpretation of the ice-core record of palaeoclimate is not always straightforward. In many cases, although the data are of unquestionably high quality and temporal resolution, our poor knowledge of the relevant transfer functions can greatly reduce the value of the record. This is a recognized problem (Waddington, 1996) and, although process studies and field work have helped greatly in some areas 18 (e.g. in improving our understanding of how δ Oice records temperature), there are still many gaps in the knowledge we need to understand the proxy records fully. This is particularly true in the Antarctic, where direct observational data for the atmosphere are hard to obtain and, when available, tend to be relatively short (by climatological standards) and spatially limited. Ice cores record many different aspects of the climate system, sometimes in multiple ways. Each proxy captures one or more climate features in a way that will likely differ in both space and time to varying 18 degrees. For example, the relationship between δ Oice and temperature can be different at different places 18 and times (Alley and Cuffey, 2001; Jouzel et al., 2003) as other influences on δ Oice vary in their relative effect. Furthermore, many proxies are only captured during precipitation events. This can lead to biases in the proxy when precipitation is seasonally variable. For example, if wet deposition is the dominant capture process for a chemical species and snow only falls during the summer, then the ice-core record for that species will be biased towards a picture of the summer atmosphere. Unfortunately, the subannual character of precipitation is typically not very well known at West Antarctic ice-core sites, and we are often forced to assume that snow falls uniformly throughout the year. Subannual sampling is still possible, and indeed necessary to reconstruct annual cycles of chemical species, but it must be remembered that these data are projected onto an underlying assumption about uniform snowfall. Thus, unless detailed subannual process data are available (e.g. Kreutz et al., 1999), we are limited to studies of ice-core proxies at annual resolution. High-resolution meteorological data (reanalysis and/or observational) provide a means to study relationships between ice-core proxies and the atmosphere over subannual intervals, but we remain limited to annual (or possibly semiannual)-resolution climate reconstructions from the proxies.

1.2. The meteorological record and reanalysis datasets The best meteorological datasets in the Antarctic are typically from two areas: the coastal stations, such as McMurdo, Mawson and Halley Bay, and the two long-term interior plateau stations, South Pole and Vostok (Figure 1). Records from elsewhere in the Antarctic interior are limited, with few exceptions, to AWSs and short-term data collection during traverses and ice-core drilling operations (e.g. Siple Dome). The latter often only represent the summer field season (when the sites are occupied). AWSs provide year-round data, apart from instrument problems, but only measure the near-surface environment in a limited manner. The AWS network, nonetheless, provides an invaluable sampling of the West Antarctic atmosphere. The shortage of direct observational data has, in turn, increased the importance and utility of numerical forecast/data assimilation/analysis products for Antarctic climate research. The two most widely used datasets of this type are from the National Centers for Environmental Prediction–National Center for Atmospheric

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 583

Figure 1. Site map showing AWS (solid circles) and ice-core sites (squares) of this study, and other sites (open circles) and mentioned in the text. CWA (central West Antarctica) collectively describes four ice core sites (A, B, C, D) published in Reusch et al. (1999)

Research (NCEP–NCAR) in the USA and the European Centre for Medium-Range Weather Forecasts (ECMWF) in the UK. Both forecast models have problems in the Antarctic (e.g. Bromwich et al., 1995; Genthon and Braun, 1995; Cullather et al., 1997) and the shortage of observations tends to produce analyses that resemble the forecast more than in areas with more available observations. Nonetheless, despite their shortcomings (e.g. Marshall, 2002; Bromwich and Fogt, 2004), these products are still much better options than having only the observational data. ECMWF and NCEP–NCAR have each produced so-called reanalysis versions of their model predictions (Kalnay et al., 1996; ECMWF, 2000; Kistler et al., 2001). A reanalysis uses one version of the forecast model for the duration of the study period and thus removes changes to the model as a source of changes in the forecasts. Other factors, such as addition and removal of observational data over time, remain as variables affecting the skill of the reanalyses, but these are external to the model. In short, the reanalysis datasets provide a realistic, 6 h picture of the atmosphere with reasonable horizontal and vertical resolution in an otherwise data-sparse .

1.3. Classification and upscaling: towards synoptic reconstructions The focus of our study has thus been the link between annually resolved ice-core proxies, e.g. major ion chemistry and accumulation, and the annually resolved atmosphere using two tools from the field of

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 584 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY

ANNs. First, SOMs (Kohonen, 1990, 1995) provide a classification of meteorological variables from the mid- troposphere (from the ECMWF 15-year reanalysis, ERA-15) into groups of similar synoptic patterns. SOMs have been used for climate downscaling (Crane and Hewitson, 1998), as a mechanism for climate classification (Cavazos, 1999, 2000), and for examining changes in climate over time (Hewitson and Crane, 2002), all in mid- latitude settings. In this study, we have developed an SOM-based climatology of mid-tropospheric variables for annually averaged ERA-15 data. Second, FF ANNs allow us to find a possibly nonlinear relationship between a set of predictors (e.g. upper air variables) and a set of targets (e.g. AWS surface observations). In particular, we have used FF ANNs to upscale from AWS and ice-core predictor data to SOM-based synoptic classification targets. In this way, past atmospheric conditions can be predicted from proxy data with a confidence based on the quantity and nature of the training data. Unfortunately, the short training period (15 years) and limited spatial extent of the selected ice-core data appear to prohibit a simple, deterministic reconstruction from high-confidence, well-defined predictions. Instead, a more probabilistic approach is needed to make up for the shortcomings in these datasets. Pending development of a more robust methodology and improved datasets, we are limited to an evaluation of what can be done with the current data. In Section 2 we describe the AWS, ECMWF and ice-core datasets. An overview of the ANN tools used is given in Section 3 (also see Hewitson and Crane (2002) and Reusch and Alley (2002). Section 4 presents the SOM-based annual synoptic climatology and upscaling results/atmospheric reconstruction from AWS and ice-core data. Issues related to the methodologies and input data are covered in Section 5.

2. DATA

2.1. ECMWF data The ECMWF 15-year reanalysis data product (ERA-15) provided global-scale meteorological data for the period 1979–93 (fully described in Gibson et al. (1999)). The original ERA-15 production system used spectral T106 resolution with 31 vertical hybrid levels (terrain following and higher resolution in the lower troposphere, lower resolution in the stratosphere). The lower resolution product used here was derived from the production system and provides 2.5° horizontal resolution for the surface and 17 upper air pressure levels. Six-hourly data are available at 0, 6, 12 and 18 UTC. Annual averages of the 6-h data were normalized to the 1979–93 baseline (by respectively subtracting and dividing by the full dataset mean and standard deviation) prior to the SOM analysis (Section 3.1). Figure 2 is an example of the annual-average 700 hPa temperatures used. Potential problems have been noted with ECMWF (re)analysis data over Antarctica. (Comments in this section refer to ERA-15 and operational analyses prior to development of the ERA-40 dataset, which was unavailable for this project. Some of these issues have been resolved in ERA-40, but other problems remain, many of which are related simply to the lack of observational data in this region prior to the satellite era (Bromwich and Fogt, 2004).) The first relates to the flawed surface elevation dataset used by ECMWF for this region (Genthon and Braun, 1995). Elevation errors exceeding 1000 m exist in some areas of Queen Maud Land and the (e.g. Genthon and Braun, 1995: figure 3). Topography in West Antarctica is generally much better, but errors from outside our study area will still have an influence on the reanalysis data (e.g. an elevation error for Vostok station has broad effects on geopotential heights). The horizontal resolution of the model also introduces unavoidable elevation errors in areas where the relief is high relative to grid spacing (e.g. Bromwich and Fogt, 2004). The ECMWF model also suffers from two issues affecting skill in the near-surface region: relatively low vertical resolution near the surface (even with the hybrid levels) and specification of ice-shelf regions as permanent ice. Both lead to possible errors in the surface energy balance due to unresolved katabatic flows and incorrect physics over the ice-shelf regions (Bromwich et al., 2004). The latter leads to large errors in ERA-15 surface temperatures when compared with available AWS data (Reusch and Alley, 2002, 2004). Evaluations of several operational products (e.g. Bromwich et al., 1995, 2000; Cullather et al., 1998) and discussions with experienced polar meteorologists (D. Bromwich, J. Turner, personal communications) suggest that the ECMWF analyses are the best data sets currently available for

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 585

Figure 2. Annual average 700 hPa temperature for ERA-15 period (1979–93) as grid-point anomalies from the grid-wide average after normalizing with the full-period, grid-wide average and standard deviation. Temperature generally decreases poleward; thus, darkest shades represent warmest (coldest) temperatures in the north (south). Zero contour is in bold and values close to zero are not shaded

Antarctica (see also Bromwich et al. (1998)), although the ECMWF 40-year reanalysis (ECMWF, 2001) may set a new standard when it is readily available.

2.2. AWS data The main source of direct meteorological data in West Antarctica is the network of AWSs maintained by the University of Wisconsin-Madison since 1980 (Lazzara, 2000). All stations provide near-surface air temperature, pressure, and wind speed and direction; some stations also report relative humidity and multiple vertical temperatures (e.g. for vertical temperature differences). The main instrument cluster is nominally within 3 m above the snow surface. This distance changes with snow accumulation and removal. Pressure is calibrated to ±0.2 hPa with a resolution of approximately 0.05 hPa. Temperature accuracy is 0.25–0.5 °C, with lowest accuracy at −70 °C, i.e. accuracy decreases with decreasing temperature (M. Lazzara, personal communication). The data used here are from the 3 h quality-controlled datasets available at the University of Wisconsin-Madison FTP site (ice.ssec.wisc.edu). A 6 h subset of these data (for 0, 6, 12 and 18 UTC) is used to match ECMWF time steps (see below). TheAWSsusedinthisstudyareshowninFigure1andsummarizedinTableI.Twoofthesites,Byrdand Ferrell, represent the oldest two AWSs still in operation. Siple was installed at around the same time as these

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 586 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY

Siple 1

0.5

0 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994

Byrd 1

0.5

0 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994

Lettau 1

0.5

0 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994

Marilyn 1

0.5

0 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994

Elaine 1

0.5

0 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994

Ferrell 1

0.5

0 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994

Average 1

0.5

0 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994

Figure 3. Relative proportions of observations (grey) and predictions (white) in AWS temperature and pressure records on a monthly basis. See Table I for site installation dates

Table I. AWS locations and other useful data

Station Latitude Longitude Elevation (m) Date Installed Distancea (km)

Byrd Station 80.01 °S 119.40 °W 1530 February 1980 11.5 Elaine 83.13 °S 174.17 °E 60 January 1986 71 Ferrell 77.91 °S 170.82 °E 45 December 1980 49.5 Lettau 82.52 °S 174.45 °W 55 January 1986 8.3 Marilyn 79.95 °S 165.13 °E 75 January 1987 6.1 Siple 75.90 °S 84.00 °W 1054 January 1982b 103.9 a Distance to the nearest ERA-15 gridpoint. b Siple AWS was removed in April 1992.

AWSs but was removed in 1992 due to logistical problems. Siple’s remoteness from McMurdo-based field support and the high accumulation rates in this region were the main reasons that this station was removed.

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 587

The remaining sites were installed in 1986 and 1987. All sites are within the south/southeast Pacific sector of West Antarctica. Figure 3 summarizes the availability of each AWS for the study period (1979–93) as the fraction of observations recorded each month. An average for the suite of AWSs is also shown. In general, availability is either quite high or quite low, with few monthly values in between. The absence of data from Siple in 1985–87 were not directly related to failure of the meteorological instruments but to other factors (primarily power related, C. Stearns, personal communication, 2002). Otherwise, most data loss is related to winter-season failures and the subsequent wait until the austral summer field season for repair. Because all AWS sites had periods of missing observations (due to failures or just not being active for the full study period), we have developed an ANN-based technique to supply the missing data (Reusch and Alley, 2002). The AWS records used in this study are thus a merger of observations and predictions based on our technique. Numerous sources are available for ANN theory and practice (e.g. Hewitson and Crane, 1994; Gardner and Dorling, 1998; Haykin, 1999; Demuth and Beale, 2000); thus, we will provide only a short overview of our approach here. Briefly, upper air data from ERA-15 provide predictors for available AWS temperature and pressure observations. The training methodology explores numerous predictors, ANN parameters and ensembles of FF ANNs to develop the most skilful network (within the search space). Because of variable (i.e. temperature and pressure) and site differences, a separate ANN is used for each variable at each site (for a total of 12 ANNs) to predict the missing observations from upper air data at the corresponding time steps. The average root-mean-square errors from ANN training (i.e. calibration of the prediction tool) for all six AWS sites were 2.9 °C and 1.9 hPa for monthly average temperature and pressure respectively. The average correlation r of monthly average ANN training predictions and AWS observations was 0.96.  The AWS-prediction ANNs were implemented with the MATLAB Neural Network Toolbox (Haykin, 1999; Demuth and Beale, 2000). The climatological/statistical properties of this dataset are described in more detail elsewhere (Reusch and Alley, 2004).

2.3. Ice-core data Four shallow firn/ice cores (<90 m) from central West Antarctica (Reusch et al., 1999) provided high- resolution glaciochemical and annual accumulation data for comparison with the ERA-15 data (Figure 1). Each core was originally sampled at high resolution (continuously every 3 cm or 10–12 samples per year) to capture + + 2+ 2+ − − the annual signals in the major soluble ions of atmospheric chemistry: Na , K , Mg , Ca , Cl , NO3 and 2− SO4 . Annual averages of the chemistry time series were created from the original subannual resolution time series for this study. Reusch et al. (1999) provide full details on the dating and development of these records. A number of other ice-core datasets are available, but most fail one or both requirements for being useful in a study such as this: availability to the community and sufficient overlap with the calibration period (1979–93). For example, the Byrd Station dataset of Langway et al. (1994) is not readily available and has only a 10 year overlap with the calibration period. Overlap is also an issue with the Siple Station cores of Mosley-Thompson et al. (1991). This situation will improve with time as datasets become available from projects such as the US ITASE program (US ITASE Steering Committee, 1996). Future work will incorporate these datasets and others with sufficient overlap with whatever calibration period is used.

3. METHODS

Our overall methodology breaks down into two main areas: development of a synoptic classification of the atmospheric circulation using SOMs and training/application of FF ANNs to do upscaling from ice-core proxy and AWS datasets to the classified atmospheric patterns.

3.1. Synoptic classification and SOMs 3.1.1. SOMs. SOMs (Kohonen, 1990, 1995) provide a means to do unsupervised classification of large, multivariate data sets into a fixed number of distinct generalized states or modes. Hewitson and Crane (2002) reviewed SOM applications and issues in climatology, so only a brief introduction will be given here.

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 588 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY

SOM analysis effectively quantizes a continuous input space to a finite-state output space (the SOM map). Multidimensional data are projected onto a plane as a rectangular grid (or map) of discrete, generalized states extracted from the data through the training process. The size of the grid (number of states) directly influences the amount of generalization: smaller (larger) maps have fewer (more) available states in the grid, so the final states developed during training will tend to do more (less) generalization of the input. Each map state, or node, is associated with a reference vector that represents the projection from multidimensional space to the two-dimensional SOM space for that state. For example, in an analysis of input containing 10 variables, each reference vector will have length 10. At the start of SOM training, reference vectors are initialized either randomly (distributed across the input data space) or based on the first two principal eigenvectors of the training data (Kohonen et al., 1996). The latter often has the advantage of faster subsequent training. At the end of SOM training, each reference vector represents a generalized state extracted from the input space. States in the SOM grid are usually identified by an (x, y) column, row coordinate pair, where x = 0 to xmax − 1andy = 0toymax − 1. SOM training is based on iterative adjustment of the reference vectors during SOM training phases. Each phase alternates between mapping input records and adjusting reference vectors. Each input record is matched to the closest reference vector (normally determined via Euclidean distance). The value of the matching reference vector is then adjusted towards the value of the input record by an amount determined by the current learning rate. The learning rate is a dimensionless parameter used to promote stability of the reference vectors during training. The difference between an input record and its closest reference vector is scaled by the learning rate, with the result used to adjust the reference vector. A learning rate of zero makes no adjustments and a value of one applies the complete difference. Typical values in this work range from 0.01 to 0.05. A key feature of SOMs is that the reference vectors of a neighbourhood of nodes adjacent to the best match are also updated, but to a lesser degree. The size of the neighbourhood and the learning rate (amount of adjustment of the reference vector) are both reduced as training progresses. This process produces vectors representing distinct portions of the input space. Nodes will also be most similar to adjacent nodes that each represent a nearby region of the input space. Similarity between map nodes thus decreases with increasing internode distance. That is, adjacent nodes have the greatest similarity and diagonally opposite corners will have the largest dissimilarity. This is a direct result of the SOM training process (Kohonen, 1995). In turn, SOM analysis typically involves two logical training phases: ordering and refinement. During the ordering phase, the general shape of the map is determined. The adjustment neighbourhood starts at the value of the larger map dimension (e.g. 5 for a 5 × 3 SOM) so that all nodes will be initially affected. The learning rate starts at a relatively high value (e.g. 0.05, or 5% of the difference between an input record and its closest reference vector is applied to the reference vector) so that changes are initially relatively large. In the refinement phase (or phases), the initial size of the adjustment neighbourhood and learning rate are reduced (e.g. to 2 and 0.02 respectively) to attempt finer adjustments over smaller subareas. This enables further separation of related nodes from less-related neighbours (assuming a stable configuration does not already exist). Whereas the ordering phase may produce slightly different classifications depending on the number of training iterations, a properly applied refinement phase will lead to a convergent solution. Once training is complete, each reference vector is an abstraction of a portion of the input data space and each input vector maps to one reference vector, i.e. a node of the SOM map and the data mapping to that node share similar characteristics. Because of the quantizing nature of the SOM generalization, input data are generally not identical to their mapped reference vectors and a residual difference remains. Depending on the application, the residuals may be useful, e.g. to compare records mapping to the same reference vector. The quantizing process may also lead to SOM nodes that have no mapped input vectors because there are no training data in the region of the input space represented by the SOM node. Such SOM nodes are perfectly valid; they just represent states intermediate between those seen in the training process. It is possible that data not present in the training set may map to one of these nodes in the future. This is an example of the robust nature of the SOM classification process. A SOM classification differs from more traditional linear analysis in a number of ways that give SOMs additional power over nonlinear datasets. For example, empirical orthogonal function (EOF) analysis, or principal component analysis (PCA), and its variants, has been widely and successfully used for many years in

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 589 the climate and atmospheric sciences to simplify large, multivariate datasets for the purposes of interpretation and understanding (e.g. Smith et al., 1996; Mayewski et al., 1997; Sinclair et al., 1997; Reusch et al., 1999; von Storch and Zwiers, 1999; Schneider and Steig, 2002). Because an EOF analysis by definition produces a linear combination of orthogonal variables, it may not always be appropriate to apply this technique to data known to have nonlinear characteristics. The resultant EOF components may not represent realistic groups of variables (Barry and Carleton, 2001). EOF analysis also has other pitfalls less related to nonlinearity, such as sample size issues and determining which components to retain (e.g. North et al., 1982; von Storch and Zwiers, 1999; Barry and Carleton, 2001), that require significant experience with the technique and knowledge of the input data. Although the SOM technique is not free of a learning curve, it does not force the data into orthogonal linear combinations. SOM states are not linearly spaced, rather they represent an even coverage of the probability density function (PDF) of the input data (Hewitson and Crane, 2002). Distances between SOM states vary with the magnitude of the difference between the generalized patterns of each state. For example, a subset of similar states is likely to be well separated from the remaining states in the grid but be relatively close to one another. SOMs also interpolate into regions of the PDF without input data (Hewitson and Crane, 2002). Lastly, the sum of the reference vectors from an SOM analysis will not reconstruct the original input data, as would a summation of EOF components, because each input record has a residual with its closest matching reference vector. The SOM states are generalized patterns from the input data, not the data themselves.

3.1.2. Synoptic classification. Three ERA-15 variables from the middle troposphere (700 hPa) were used in the SOM-based analyses: temperature T , geopotential height Z and specific humidity q. The 700 hPa level was selected, somewhat subjectively, to get above ERA-15 near-surface problems and the physical surface for (most of) West Antarctica. (A -wide analysis would require us to move to at least 600 hPa because this is the first pressure level fully above the surface for the full continent.) These three variables were selected to capture the synoptic circulation and moisture transports. (Full characterization of moisture transport, as in Bromwich et al. (2000), also requires u and v wind components.) Because of the converging lines of longitude in polar regions, and so that grid points represented similar spatial areas, the ERA-15 data were first resampled to an equal-area grid. Both 250 km and 125 km versions of the National Snow and Ice Data Center EASE-Grid (Armstrong and Brodzik, 1995) were tested with the 125 km grid being used for this study. Grid-scale means and standard deviations were then calculated from 6 h values for annual, semiannual and seasonal time scales for each variable from the regridded data (but only annual data have been fully analysed). Because T , Z and q have widely different mean and extreme values, each variable was standardized to avoid scale problems in the SOM. Point-wise anomalies from the 15-year grid-point means of the standardized values were calculated for each grid point to highlight patterns of variability. Finally, the six variables (anomalies of the mean and standard deviation of T , Z and q) were combined for input to the SOM software. With each input variable representing spatial data, the SOM grid is effectively a map of maps. Furthermore, since each input record contains six variables, each SOM node is actually six separate maps representing the generalized spatial state for each of the variables. For simplicity, we will typically show only one variable at a time. The SOM analyses were performed using SOM-PAK software (Kohonen et al., 1996). We have used three map sizes in this study: 3 × 2, 4 × 3and5× 3. Although this was not the primary goal of this work, the use of multiple sizes allows us to look at how data grouping changes with varying numbers of SOM states available for generalization. Smaller maps have fewer states to which the input data can be mapped, so it is expected that some grouping of input years will occur. The 5 × 3 map has 15 states and matches the size of our input data (15 years); thus, it is possible, if not highly probable, for each year to map to a unique SOM state. That is, it is expected that a 5 × 3 map will only group records with significant similarity, whereas smaller maps will have some degree of forced association due to having fewer states. Thus, the 5 × 3 maps (15 nodes) are preferred for our purposes. Although we are interested in the SOM’s ability to generalize, we are also interested in its ability to classify and order without supervision. Even without substantial generalization, a SOM analysis provides useful information about the similarities and differences in the input data through the spatial mapping of the data on the SOM grid. Each SOM analysis produces a classification of input records

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 590 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY

(calendar years) grouped by the similarity of their meteorological data (with one or more years per group). AWS and ice-core data from each group were then compared for similarity within groups and differences between groups before being used as predictors of the SOM classifications.

3.1.3. Assessment of results. As with many statistical analysis methods for grouping similar data, including cluster analysis, it can be difficult to determine a ‘perfect’ classification. Assessment of an SOM classification is both a subjective and quantitative process, especially with smaller datasets (details of individual mappings are often less important as the size of the input dataset grows). As always, it is important to bear in mind the question being asked of the analysis method when trying to determine whether the method has given a reliable answer. Ideally, the refinement phase produces one convergent solution from dissimilar ordering- phase classifications. In the event that this is not possible, other criteria are required. For example, training might be considered ‘done’ (subjectively) once mappings to the diagonal corners of the map have stabilized, since these nodes are often the most distinct (although continuing beyond this stage is usually recommended). Sammon maps (Sammon, 1969), which provide two-dimensional projections of multidimensional vectors, are a common method for this type of evaluation. Quantization error, the difference between the input data and the reference vectors, can be useful as a quantitive measure of SOM ‘error’, but it is unlikely to be the only useful metric. For example, there may be little need to continue reducing the error if the mappings have stabilized across all nodes. Similarly, there is no guarantee that continued efforts to reduce the error will lead to stable mappings if some data are right on the boundary between quantized states. However, in theory, even those data records can be separated successfully if enough time is spent on training, though this is often unnecessary. Examination of residuals provides another quantitative assessment approach. After much testing and evaluation, we settled on a combination of input-data mapping stability (i.e. are groups still changing significantly) and quantization error to determine how much training was enough. With the small number of input records, as few as 1000 iterations were sufficient to provide initial ordering of the input data, although 5000 were preferred. (A run of 5000 iterations typically required only a few minutes to complete on an 867 MHz Apple Macintosh G4 Powerbook laptop.) A refinement run of up to 20 000 iterations often produced useful further separation of the modes; longer run lengths had only minor benefits.

3.2. Climate upscaling 3.2.1. Overview. FF ANNs were also used for our climate upscaling studies, but with the freely available NevProp package (Goodman, 2002). Climate upscaling uses surface data to predict corresponding SOM- classified synoptic states. In this way, synoptic conditions can be reconstructed from a surface sampling of the atmosphere. More surface sites and greater record length in the calibration period both improve the quality of such reconstructions. Two surface records were used as predictors: AWS observations (six sites, 15 years) and ice-core climate proxies (four sites, 40 years). Each dataset allowed us to evaluate the surface- to-mid-troposphere relationship, but with different levels of dating uncertainty and, thus, different temporal resolutions. AWS observations are effectively perfectly dated records and, thus, support comparisons at up to 6 h resolution. Thus, upscaling from AWS data allows us to test the approach under best-case conditions for a given number of AWS sites. Because of the shortness of the AWS record, however, only very limited testing of data not used in training is possible. As noted previously, ice-core records are generally limited to annual or lower resolution because of assumptions about the rate of deposition and dating uncertainties. This restricted temporal resolution is offset by the potentially much greater length of the ice-core records. Calibration is still done for the 15 year overlap with ERA-15 data, but upscaling can be done further into the past using the ice-core data. In this study, the four ice cores provided an additional 25 years at all sites, taking the upscaling back to 1954.

3.2.2. Data coding. Training of an upscaling ANN requires an encoding of the predictor and target data. Full details of how this was done are presented in Appendix A. Briefly, the predictor data are simply the measurements at all sites as standardized anomalies from the study period mean (15 years for AWS, 40 years for the ice cores) at each site. To match the ice-core accumulation data (and SOM analysis data), the AWS

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 591 and ice-core chemistry data were averaged to annual values prior to calculating the anomalies. The target SOM classifications were encoded as x, y grid coordinates from the 5 × 3 SOM map (after testing three possible encoding schemes). Because SOM grid coordinate values are predicted as real numbers (our ANNs use floating-point outputs), issues arise regarding how best to convert the predictions to usable values. We have chosen only to map predictions to the closest SOM grid coordinates (based on shortest distance). See Appendix A for further details related to the pros and cons of this approach.

3.2.3. Generalization. Because of the limited number of input records (15), further limited by grouping, it is more challenging than typical ANN practice to train an ANN that does not overfit the data. A fraction of the input can, of course, be withheld for validation, but the benefit of this is likely to be outweighed by the severe reduction in the number of training cases. Standard practice would have us withhold ∼30% of the input, which would cut the training set down to 10 or 11 records without necessarily bringing much benefit of generalization skill. Our approach to avoiding this problem is to generate groups of new input records by adding small amounts of noise to each original record. In this way, the input data set is enlarged with prediction vectors that are close to the original vectors and map to the same targets, thus improving network generalization (Haykin, 1999). A scientific rationale for this approach comes from the hypothesis that any local meteorological measurement is a function of synoptic/regional effects, fixed local effects (e.g. orography) and variable local effects (e.g. soil moisture, in the general case). Upscaling takes a local measurement and maps it to a synoptic/regional value. Because there are local effects not captured by the synoptic/regional state, it is reasonable to suppose that values close to the local value will map to the same state. In this case, we simply add small amounts of normally distributed noise to reproduce this effect. The distance from the original to the new vectors was constrained to be within 0.03 standard deviations to keep the new vectors in a small cluster around the original vector. Obviously, other noise distributions could be used, as well as other approaches to adding the noise, but this approach has the advantage of conceptual simplicity. With the additional input records (10 to 20 for each original record), traditional techniques for robust ANN training again become viable. We have used two widely accepted techniques, cross-validation and bootstrapping, to attempt to improve generalization and avoid overfitting of the training data. Cross-validation splits the input data into training and testing (holdout) subsets. The training subset is used to adjust the network weights iteratively and ‘learn’ the data as in standard training. The testing subset is used to check whether the ANN has overfit the training data. Increasing errors from the testing subset strongly suggest that training has gone too far and should be stopped. For this reason, this approach is also known as early stopping. NevProp implements early stopping as a two-phase process. In phase one, multiple ANNs are trained using random splits of the input data based on a user-defined percentage holdout level (e.g. set aside 30% for testing). Training of each ANN continues until the testing error starts to rise, at which point the ANN is saved as the best version for that data split. Phase two uses the complete input dataset for training and the mean error of the ANNs trained in phase one as the target to stop training. With this approach, NevProp determines an unbiased estimate of the mean error value at which to stop training by doing early stopping training on multiple splits (up to 10) of the input data. The final model (ANN) is produced by phase two. The second technique for improving generalization, bootstrapping (e.g. Efron and Tibshirani, 1993), is logically a level higher than the early-stopping technique. NevProp’s version creates a user-defined number of ‘booted’ datasets by sampling with replacement from the original input data (thus, some samples may be replicated and others omitted entirely). Each of these datasets is then used as input for training with early stopping. Results from each booted dataset are then used to adjust overall performance statistics to account for cross-validation training being based on a smaller subset of the input space and, thus, possibly producing overly optimistic performance statistics. Bootstrapping as implemented by NevProp also provides 95% confidence intervals on predicted data.

3.2.4. ANN configurations and ensembles. Table II summarizes configurations used to predict the SOM classifications from ice-core data. To address situations where the early-stopping criteria were not satisfied, and to provide another means to avoid overtraining, two different maximum iteration stopping points were

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 592 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY

Table II. Summary of ice-core-based upscaling configurations

Groupa Hidden nodesb Iteration limit(×10−3)c Predictors

1a 4 10 Accumulation 1b 4 20 Accumulation 25 20Na+ 2− 3a 4 10 SO4 2− 3b 4 20 SO4 4a 3 10 Accumulation and Na+ 4b 3 20 Accumulation and Na+ + 2− 5a 4 10 Na ,SO4 + 2− 5b 4 20 Na ,SO4 + − 2− 6 5 20 Accumulation, Na ,NO3 ,SO4 a All versions used 10 extra training records per data record added using noise methodology (Section 3.2). All versions also used the x, y grid coordinate from the SOM used directly as the ANN target. Groups with an alphabetic suffix are results from an ensemble of 20 individual ANNs. b Number of nodes in the hidden layer of the ANN. c The maximum iteration count used in training. used for each configuration. To test ANN skill further, we also created ANN ensembles by training multiple instances using the same configuration of predictors and hidden nodes and repeating the above training steps. This produced an ensemble of 20 ANNs for each configuration in which each instance was trained from slightly different starting conditions and with different subsets and ordering of the training data.

4. RESULTS

Before describing our results, it is worthwhile reviewing the main assumptions involved in these analyses:

• ERA-15 provides a reasonably valid representation of the free atmosphere over West Antarctica. • AWS observations are representative of the near-surface environment. • Synthesized AWS data are valid and capture the natural variability in the system. • Ice-core dating is accurate enough for the annual values to be valid.

4.1. Synoptic classifications 4.1.1. Generalization and SOM grid size. The characteristics of an SOM-based climatology are influenced by the grid size of the SOM, since this affects the level of generalization. The strongest associations, i.e. groupings of years, will persist as the SOM size increases. Weaker associations will shift as more SOM nodes become available to differentiate the data. We have used three sizes of SOM specifically to examine this behaviour. Figure 4 and Table III summarize the grouping of years from annual analyses by our three grid sizes (3 × 2, 4 × 3and5× 3). Groups classified by each SOM are listed by the rows in Figure 4. Each row shows the year groups (grey boxes) identified by each SOM. Shading and black boxes around groups show the history of each group from the largest (5 × 3) to smallest (3 × 2) SOM. In particular, three groups from the 5 × 3 SOM (1980, 1988; 1983, 1993; 1991, 1992) are seen to be quite stable at the three generalization levels. Each of these groups starts as part of a larger group in the smallest SOM, but the additional years move to new groups as the SOM size increases. For example, 1980 and 1988 are grouped with 1979 and 1981 in the smallest SOM. In the 4 × 3 SOM, 1980 and 1988 are grouped separately and the other years have moved to two new groupings. Robust year groups suggest that those years are highly separable from the rest of the data. This is particularly true for the 1980, 1988 and the 1983, 1993 groups since these are mapped to the corners of the 5 × 3SOM.

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 593

1983 1993 1989 1984 1985 1979 1981 1980 1988 1982 1986 1987 1990 1991 1992 3x2

1983 1993 19891985 1979 1980 1988 1982 1984 1986 19811987 1990 1991 1992 4x3

1983 1993 1989 1990 1979 1981 1980 1988 1982 1984 1985 1986 1987 1991 1992 5x3 3 725 14 6

Figure 4. Generalization and grouping by annual SOMs. Each row represents groups determined by a given size SOM. Sizes are noted at the left side. Groups classified by a given SOM indicated by grey boxes in each row. Shades, lines and black boxes indicate the history in the smaller SOMs of the groups in the largest (5 × 3) SOM. Numbers at the bottom indicate the 5 × 3 SOM group number. Forced generalization decreases from top to bottom as the SOM size increases and more states become available. This figure is available in color online at http://www.interscience.wiley.com/ijoc

Table III. Summary of SOM-based classifications of annual data, by SOM size

Group SOM grid coordinate Years

3 × 2SOM 1 0, 0 1983–84, 1989, 1993 2 1, 0 1985 3 2, 0 1979–81, 1988 4 0, 1 1982 5 1, 1 1986–87, 1990–92 4 × 3SOM 1 0, 0 1980, 1988 2 1, 0 1979 3 2, 0 1985, 1989 4 1, 1 1981, 1987 5 3, 1 1983, 1993 6 0, 2 1990–92 7 2, 2 1982, 1984, 1986 5 × 3SOM 1 0, 0 1980, 1988 2 2, 0 1985–87 3 4, 0 1983, 1993 4 3, 1 1982, 1984 5 0, 2 1979, 1981 6 2, 2 1991–92 7 4, 2 1989–90

With Table III, the Sammon map (a two-dimensional projection of the SOM reference vectors; Sammon, 1969) for the 3 × 2 SOM (Figure 5(a)) shows that the annual data map almost entirely into three well- separated groups on nodes (0, 0), (2, 0) and (1, 1). At this level of generalization, only 1982 (node 0, 1) and 1985 (node 1, 0) are distinct and ungrouped with other years. The Sammon distances between node (2, 0) and its neighbours also suggest that this group of years (1979, 1980, 1981 and 1988) is well separated from the remainder of the data. The next larger size SOM grid (4 × 3, Figure 5(b)) shows the three large groups of the 3 × 2 SOM splitting into smaller groups. Year associations also change as the new groups are formed. For example, 1989 splits

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 594 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY

Figure 5. Sammon mappings of various size SOM grids: (a) 3 columns × 2rows;(b)4× 3; (c) 5 × 3. Axes represent distance in the two-dimensional projection space and are effectively without units. Each SOM node is labelled with its coordinates in the SOM column, row grid with columns increasing left-to-right and rows increasing top-to-bottom. Values on graph edges are the distances between the projected SOM nodes rounded to an integer from its four-member group to join the singleton 1985. Four of the final (5 × 3 SOM) seven groups (1980, 1988; 1983, 1993; 1982, 1984; 1991, 1992) have been identified although two of these (1982, 1984; 1991, 1992) still have an extra year in the 4 × 3 group (1986 and 1990 respectively). Each corner group remains well separated from its neighbours. The largest size SOM grid (5 × 3, Figure 5(c)) shows the final year groups. Two corners retain their 4 × 3 grid groups (1980, 1988; 1983, 1993) and are somewhat more differentiated in the larger grid. Three years, 1979, 1986 and 1987, have rejoined years with which they were originally associated in the 3 × 2grid. Lastly, the 5 × 3 grid provides eight states unmapped by the input data. The result is that all mapped states have unmapped states as their neighbours. Thus, the SOM has both grouped similar input years and provided intermediate states between all the mapped groups.

4.1.2. Generalized patterns. Figure 6 presents the 15 generalized patterns for 700 hPa average annual temperature as extracted by the 5 × 3 SOM analysis and expressed as anomalies from the standardized grid- wide annual average (as described in Section 2.1). It is also important to remember that this is only one figure from a set of six (one each for the six variables analysed) and that the complete analysis provides generalized patterns for all six variables (e.g. Figure 7). The annual values map to seven distinct groups (Table III) in

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 595

Figure 6. The 5 × 3 SOM classification map for 700 hPa temperature anomalies. Each node is a generalized pattern extracted from the original data by the SOM analysis as described in the text. Values are grid-point anomalies from the grid-wide mean. Zero contour is in bold, and relative highs and lows are labelle as H and L respectively. Labels above selected nodes identify the input years that most closely match the pattern shown in the node. Nodes with no label represent intermediate states not seen in the input data. Figure 5(c) shows distances in SOM space between the nodes this SOM-analysis, as shown by the year labels over the maps in Figure 6. This effectively says that, with this size SOM grid, the data primarily cluster in pairs with one group of three. That is, within the 15 year period, there is still a fair amount of variability between the years. For later reference, these groups will be numbered in left-to-right, top-to-bottom order (also see Figure 4). To aid the interpretation of the SOM analysis results, the generalized patterns for the six variables analysed are presented in separate figures, of which three are shown here (Figure 7). Two of these groups are from the corners of the SOM grid (Figure 6) to emphasize differences. Group 1 (Figure 7(a), 1980 and 1988) shows warmer temperatures, higher geopotential height and increased moisture over a broad region centred along the and Marie Byrd Land. Positive height anomalies exceed 40 m at the centre of the pattern and are at least 10 m over all of West Antarctica and the Antarctic Peninsula. The temperature anomalies of up to 2 °C at 700 hPa are also seen at similar magnitude in AWS surface records from across West Antarctica for both these years (Reusch and Alley, 2004). The pattern of increased moisture

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 596 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY

Figure 7. Generalized patterns as anomalies from mean values for specific year groups: (a) 1980 and 1988, (b) 1989 and 1990, (c) 1983 and 1993. Contour limits and interval, in standard deviations, shown beneath each map with the zero contour in bold. Relative highs and lows are labelled as H and L respectively. Variables as described in text over the western Amundsen Sea and is at least partly explained by the warmer air, as seen in temperature and increased 700 hPa heights. Group 1 also shows increased stability (i.e. reduced variability) in the 700 hPa temperature and height fields (negative anomalies in the standard deviation fields) with more variability (positive anomalies) in the moisture field. The latter is at least partly due to the higher absolute values for specific humidity in the generally very dry Antarctic atmosphere. Although the relationship is not universal, higher variability is often associated with higher absolute specific humidity in this region. The generalized patterns for 1980 and 1988 thus describe a warmer, wetter and generally less variable mid-troposphere.

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 597

Figure 7. (Continued)

Diagonally opposite to Group 1 on the SOM grid (Figure 6) is Group 7 (Figure 7(b), 1989 and 1990). A distinguishing feature of this generalized pattern is the large negative anomaly (more than 20 m over the Amundsen and Bellingshausen ) and above-average variability in geopotential height. The former is related to an eastward extension and overall deepening of the Amundsen Sea low from a centre over the eastern Ross Sea, especially in 1989. The latter likely represents a shift of storm tracks into the southern Antarctic Peninsula/Bellingshausen Sea region due to the height anomaly. The area of higher moisture over the is similarly explained. Except for this region, moisture is close to average values, with moderate departures from average variability in the Ross Sea (positive) and Victoria Land (negative). Temperature variability is moderately above average over much of the region. From the upper right corner of the SOM grid, the main feature of Group 3 (Figure 7(c), 1983 and 1993) is the large positive anomaly in the standard deviations of all three variables. Geopotential height variability exceeds 15 m above average values over much of the Ross Sea. Strong temperature and moisture variability anomalies fit within the maxima of the height variability anomaly, with peak values over Victoria Land and the western Ross Sea. In the mean, temperatures are close to average values (±0.5 °C), geopotential heights are moderately reduced, and moisture is generally below normal, apart from the Victoria Land anomaly. Thus, the generalized patterns for 1983 and 1993 describe mean values close to the study period average, with enhanced variability over the Ross Sea and Victoria Land. The map of average geopotential height suggests that a source for the variability may lie in the central South Pacific outside the study area.

4.2. Climate upscaling 4.2.1. Data stratification. Before attempting to upscale to the SOM classifications, it is useful to examine surface data associated with each group. Figure 8 shows AWS temperatures and accumulation rate data for each SOM-classified year group. In the ideal case, each group of surface data would be distinct from all other groups. This would indicate an unambiguous (if not necessarily simple) relationship between the surface record and atmospheric conditions. A nearly perfect example is seen in the temperature data for 1980 and 1988 (Figure 8(a)). Both years have the same characteristics at all six AWS sites, i.e. unambiguously warm temperatures. That the other patterns are less clean (e.g. Figure 8(b)) does not make them unusable, but hinders their use as predictors. Given enough training data, an ANN should be able to determine a relationship between a set of noisy predictors and the SOM classifications. That is, we should be able to train an ANN to

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 598 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY

Figure 8. Surface data (normalized) for each SOM-classified year group, by site: (a) AWS temperatures; (b) ice-core accumulation rates. Each plot shows the surface or proxy data associated with each group of years identified in the SOM analysis. AWS data are shown in geographic order by longitude, east to west (see Figure 1). Ice-core data are ordered site A to site D. Labels on x axis denote the years in each SOM group and each year’s data are grouped by shade. For example, the data shown for 1980 (light grey) and 1988 (dark grey) are associated with the generalized 700 hPa patterns in Figure 7. The data are thus ‘stratified’ into groups based on the SOM classification. Values are normalized to allow for different ranges at the different stations/sites take possibly noisy (or ‘smudged’) fingerprints that the atmosphere has imprinted on surface data and relate them back to our set of classified states and thus reconstruct past conditions in the atmosphere.

4.2.2. AWS-based upscaling. Although the AWS records currently provide no long-term opportunity to reconstruct the atmosphere from the SOM patterns (the records start only in 1979), we still considered it

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 599 useful to explore upscaling with the AWS data as predictors. This served as an alternate way to test the ANN- based upscaling methodology. The AWS data are also inherently cleaner (less noisy), because the dating is inherently more accurate relative to that in the ice-core records. The issue of circularity needs consideration when using the reconstructed AWS records to predict ERA-15 data, since ERA-15 upper air variables were in fact used in the reconstruction process. Three facts support our belief that circularity can be ignored. First, the upscaling ANN is attempting to predict a pair of integer grid coordinates, not the actual atmospheric data. Second, variables from multiple upper air levels and a limited spatial extent were used in the reconstructions, not just the 700 hPa level. Third, the original AWS data do not appear to have been used during the ERA-15 reanalysis itself and, thus, do not influence the ERA-15 dataset. Because our emphasis is on reconstructions prior to 1979, only one scenario, with one year held out of the training set, was run using AWS data. A thorough evaluation of AWS-based upscaling would call for additional testing with different years held back for validation and different sets of predictors. Our one scenario used the temperature records for all six sites as predictors for two ensembles (maximum iteration limits of 10 000 and 20 000) of 20 ANN models each. Three hidden-layer nodes were used for all ANNs. 1988 was held back from the training set to provide a completely independent test of the ANN (beyond the normal splits during training). Table IV summarizes the results from this testing. For this configuration, additional training iterations (ensemble 2) appear to result in ANNs with reduced skill (fewer correct predictions during training and testing). However, even the more skilled ensemble still mispredicts for 1988 25% of the time. Table V provides an alternate evaluation of skill during training based on differences between predicted and expected x, y coordinate values (after mapping to the grid). Errors are broken down by just x or y being incorrect or both values being wrong. The ANNs appear to be better able to predict y than x, for both single and double errors. Overall, ∼68% (∼57%) of predictions for x (y) are within two (one) unit(s) of the correct

Table IV. Summary of AWS-based upscaling ensembles

Ensemblea Skillb # Wrongc Correctd Prediction Errore

µσn−1 µσn−1 µ Min Max

1(10) 0.95 0.02 4.6 2.0 15 (75%) 0.8 0.1 4.1 2(20) 0.95 0.02 5.1 1.7 11 (55%) 1.1 0.5 5.0 a Value in parentheses is the maximum iteration count used in training, in thousands. b Skill is the R2 value as reported by NevProp for the training ANNs. This is not the version of R2 from traditional statistics, but is a scaling of the mean square error to the range 0–1 (Goodman, 1996). It has also been adjusted based on the bootstrapping methodology, which attempts to assess how the ANN will perform on data outside the training set. c Number of incorrect predictions during training, out of 14 known targets (years 1979–93 less 1988). d Number of correct predictions during testing, out of 20 ensemble members. A correct prediction is x, y values that match the SOM x, y values for 1988, i.e. 0, 0, after being mapped to the nearest SOM x, y coordinates. e Quantization error related to mapping an ANN x, y prediction to the nearest SOM x, y coordinates. SOM nodes are spaced one unit apart from their vertical and horizontal neighbour(s).

Table V. Summary of mapping errors from AWS-based upscaling ensembles. Mean and standard deviation are for the difference between the prediction and the correct value for the x or y coordinate

Ensemble Total wronga Just x wrong Just y wrong Both x and y wrongc nb µσ nb µσ nb µσ

1 102 (20) 20 (20) −1.7 1.3 40 (39) 0.3 1.0 42 (41) 2.8, −0.1 1.3, 1.8 2 91 (18) 21 (23) −1.8 0.7 29 (32) 0.1 1.0 41 (45) 2.8, 1.2 −0.1, 1.9 a Percentage of total predictions shown in parentheses, out of 500 predictions. b Percentage of prediction errors shown in parentheses, out of the total wrong. c Mean and standard deviations are for both x and y.

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 600 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY value. That only one independent target is being predicted in these ensembles makes it difficult to come to any strong conclusions about the value of AWS data for upscaling. However, with correct prediction of the unseen year (1988) 75% of the time and mapping errors within one or two units (in y and x respectively) for 60–70% of the training data, the results are definitely promising. That a limited suite of AWS sites can be used effectively to estimate the large-scale circulation patterns of the atmosphere is a reasonable conclusion.

4.2.3. Ice-core-based upscaling. Characteristics of 10 ANN configurations used to predict SOM classifications from ice-core data are summarized in Table II. After training, each of these ANNs was used to predict the SOM classifications for 1954–78 using corresponding ice-core data. Six sets of predictors were used with two 20-member ensemble runs for four of the six predictor sets. Specifically, the predictors tested + 2− + + 2− are three singletons (accumulation, Na ,SO4 ), two pairs (accumulation, Na ;Na ,SO4 ) and one ‘kitchen- + − 2− sink’ (accumulation, Na ,NO3 ,SO4 ). In all cases, data from all four sites were used for each variable in the predictor set resulting in 4, 8 and 16 ANN inputs respectively for the three predictor categories. Table VI summarizes leading statistics for the 10 configurations tested. Based on NevProp’s adjusted R2 and the number of incorrect predictions in the training set, prediction skill of all the networks is high. Unfortunately, the prediction errors suggest a different story, i.e. that the networks may be overtrained despite all attempts to avoid this. The prediction error represents errors related to quantization of real-valued x, y predictions to the integer-valued SOM x, y grid. For example, an ANN prediction of (2.34, 1.79) would be mapped to (2, 2) with a Euclidean distance prediction error of 0.4. The prediction errors for the training data are small, but values for the testing data (1954–78) are substantially larger (typically an order of magnitude). Mean prediction errors for the testing data (last column), and examination of the raw prediction values (before mapping to the grid range), shows that many predictions actually lie outside the grid and are being mapped into corner or edge nodes from more than the average within-grid internode half-distance (i.e. 0.71 since nodes are on a one-unit grid). An alternative explanation to the ANNs being overtrained is that the prediction error may indicate that the training data are insufficient for representing the data space covered by the testing data. We will return to this topic in Section 5.

Table VI. Summary of statistics for ice-core-based upscaling ensembles

Group Prediction errorc Skilla # Wrongb Training Testing

µσn−1 µσn−1 µ() µ(µ) µ() µ(µ)

1a 0.98 0.01 1.1 1.4 2.53 0.16 28.1 1.10 1b 0.98 0.01 0.4 0.6 2.36 0.16 33.8 1.36 2d 0.98 – 3 – 2.70 0.20 28.9 1.20 3a 0.97 0.01 0.9 1.0 3.26 0.20 52.6 2.11 3b 0.97 0.01 0.8 1.2 3.35 0.21 60.7 2.42 4a 0.97 0.01 1.2 1.3 2.97 0.19 46.8 1.88 4b 0.98 0.01 1.2 1.6 2.09 0.15 46.3 1.85 5a 0.95 0.01 1.8 1.8 3.43 0.23 48.9 1.95 5b 0.95 0.02 2.0 1.7 3.16 0.22 55.9 2.23 6d 1.00 – 0 – 0.00 0.00 17.4 0.70 a As in Table IV. b Number of incorrect predictions during training, out of 15 known targets (years 1979–93). c As in Table IV, the values represent the quantization error associated with mapping a real-valued x, y prediction to the integer-valued SOM x, y grid. Two statistics are shown for the training and testing predictions. Each is the mean value over the ensemble (or the actual value for non-ensembles). µ() is the mean of the total quantization error and µ(µ) is the mean of the mean quantization error, i.e. the sum divided by the number of predictions (15). d Results are from a single ANN, not an ensemble; therefore, standard deviation does not apply and the mean values are the actual results.

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 601

Table VII. Predictions of SOM grid x, y coordinates from ice-core data for 1954–78, the period outside the training set. Predictions from an ensemble group are from a representative ensemble member, i.e. one of high but not necessarily highest skill. Most common prediction, if any, is highlighted in bold for each year

Year SOM x, y for group

1a 1b 2 3a 3b 4a 4b 5a 5b 6

1954 2, 2 2, 1 1, 2 4, 1 4, 2 2, 2 0, 2 4, 1 4, 0 2, 1 1955 3, 2 3, 2 1, 2 0, 2 2, 2 2, 2 2, 2 0, 2 0, 0 3, 2 1956 4, 2 4, 2 0, 1 0, 2 0, 1 2, 2 2, 0 0, 10, 1 1, 2 1957 3, 23, 23, 23, 2 0, 1 2, 2 3, 1 0, 2 0, 1 3, 2 1958 1, 0 1, 0 2, 0 2, 0 2, 2 0, 1 0, 0 0, 2 1, 2 1, 1 1959 2, 2 4, 2 1, 0 4, 1 4, 2 1, 1 3, 1 4, 14, 14, 1 1960 0, 2 4, 0 1, 2 1, 0 0, 2 0, 0 2, 0 0, 00, 0 2, 0 1961 1, 1 2, 2 0, 0 1, 1 4, 2 2, 2 0, 0 1, 2 2, 1 3, 1 1962 0, 2 0, 2 0, 0 2, 0 2, 2 0, 0 3, 2 1, 1 4, 2 3, 2 1963 0, 00, 00, 0 4, 2 4, 2 0, 00, 0 4, 2 2, 1 2, 1 1964 4, 0 4, 1 3, 1 0, 0 1, 2 4, 1 4, 2 4, 1 1, 0 4, 2 1965 2, 1 2, 2 0, 1 0, 0 4, 2 2, 2 0, 1 4, 24, 2 3, 0 1966 1, 01, 0 4, 0 4, 1 4, 2 1, 0 0, 1 0, 0 1, 0 2, 0 1967 2, 2 1, 2 0, 0 1, 2 0, 1 4, 1 3, 2 3, 2 0, 1 3, 1 1968 2, 1 2, 2 1, 2 1, 2 4, 2 4, 1 2, 0 4, 1 2, 1 1, 0 1969 2, 2 2, 2 2, 0 3, 1 2, 0 2, 2 4, 0 4, 1 2, 0 3, 0 1970 2, 2 1, 0 4, 0 2, 0 1, 2 2, 2 1, 2 4, 1 0, 1 1, 1 1971 1, 0 1, 0 4, 2 0, 2 0, 1 0, 0 0, 1 0, 1 0, 0 0, 2 1972 4, 2 4, 2 0, 0 1, 1 0, 2 0, 0 0, 2 0, 0 1, 2 2, 2 1973 1, 0 1, 0 2, 0 3, 1 2, 22, 2 0, 0 2, 2 0, 1 1, 2 1974 0, 0 1, 0 0, 0 3, 1 4, 2 1, 2 0, 0 4, 2 3, 0 4, 1 1975 1, 2 2, 2 4, 2 4, 1 2, 1 2, 1 3, 1 0, 2 0, 2 4, 2 1976 2, 2 1, 0 3, 1 2, 2 0, 1 4, 0 0, 0 0, 20, 20, 2 1977 2, 2 4, 1 2, 0 1, 2 4, 2 2, 2 4, 0 4, 0 4, 2 3, 2 1978 2, 2 4, 0 3, 2 0, 2 0, 1 2, 2 1, 0 1, 2 0, 0 1, 2

Table VII summarizes predicted SOM grid coordinates for the testing period (1954–78) based on the various predictors. For those groups that are ensembles (all but groups 2 and 6), the predictions listed are from a representative ensemble member. Determining which ensemble member has the highest skill is somewhat subjective because of the variety of metrics being used. Thus, although the ANNs listed in Table VII may not have the highest overall skill, they are still among the best in their ensembles. For years that have any agreement in the predictions, the most common prediction is highlighted (ties are not highlighted). At first look, Table VII suggests that the ANN-based approach is not very successful, since there is so little agreement on predictions between the various ANNs. As further explained below, this is only partly true.

5. DISCUSSION

The ANN-based upscaling results based on ice-core datasets as predictors (Table VII) suggest that the available training data are not sufficient to produce reliable predictions and, thus, preclude development of a deterministic, ice-core-based climate reconstruction. Instead, it appears that a more probabilistic approach is needed for determining the appropriate circulation patterns from the ice-core predictors. This approach awaits better (i.e. longer) atmospheric and ice-core datasets from a broader spatial region. Meanwhile, possible sources of error affecting the ANN’s skill, robustness of the main methods, and potential refinements to the methodology are discussed in the following sections.

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 602 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY

5.1. General issues and limitations Despite following established practice for ANN training and applying multiple safeguards against overtraining, ANN predictions on non-training data vary considerably from one model to the next. With ANN training methodology ruled out as the source of this variability, the other main possible explanations are inadequate predictor data, dominance of local forcing and the absence of a climate record in the ice-core data. Although ice-core datasets are known to be noisy to varying degrees (e.g. Benoist et al., 1982; White et al., 1997), a climate signal has been solidly established (e.g. Lorius, 1989; Delmas et al., 1992; Legrand and Mayewski, 1997; Shuman et al., 1998; Reusch et al., 1999); thus, we turn our attention to the data. One of the keys to applying ANNs successfully for prediction is to ensure that the training data cover the full range of the input space (Haykin, 1999). In other words, give an ANN input data from outside the range seen in training and the predictive skill will likely drop. Figure 9 summarizes the individual training and testing predictors for the accumulation rate data. Taken by individual site, only Site A has significant testing data in a range not covered by the training data. All ANNs using accumulation rate as a predictor use the data from all four sites; so, collectively (and subjectively), the input space appears to be reasonably well covered by the training data, at least when the predictors are examined individually in one dimension. A different conclusion may be drawn in the native four-dimensional space in which the data reside. Evaluating the representativeness of training data rapidly becomes very difficult as the size of the input vector increases, although Sammon maps can help. SOMs may also be useful for the assessment of upscaling training data coverage. A SOM trained on the complete set of data used as predictor input to the upscaling

Figure 9. Distribution of normalized accumulation rate data by training (dark grey) and testing (light grey) predictors. For ideal ANN training, the range and coverage of the training data should match that of the prediction data. Verification is made more difficult by the fact that all four sites are used together as predictors. The x axis is in standard deviations. The y axis is a frequency count

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 603

ANN can be used to characterize the input space. Mapping the upscaling training and testing data separately to the SOM should reveal the coverage of each data set relative to the SOM classifications space. If the upscaling training and testing data map to all the same SOM nodes, then there is strong evidence that the training and testing data both cover the same portions of the input space. Nodes that are only mapped by the upscaling testing data suggest (but do not prove) that the training data are not capturing the complete input space. Unfortunately, it is difficult to say that an SOM analysis of the upscaling predictors is definitive, since the result depends on the amount of generalization. That is, a small SOM will generalize more than a larger one and give a different picture of how well the upscaling training set covers the input space. Nonetheless, SOM analysis of the upscaling predictors can still provide useful clues. For example, preliminary analysis of the accumulation rate dataset suggests that there are, in fact, gaps in the training set that may account for at least some of the testing results based on this predictor. Even if data space coverage is complete, however, this does not exclude the possibility of the training period being too short, since noise in the data is another factor in how well an ANN can perform (Haykin, 1999). For example, even a very long record of a very noisy (random) dataset may not contain the information needed for reliable prediction. In our case, the training record is brief (15 years) and noise is definitely present in the ice-core data (Reusch et al., 1999). It is also possible that the geographic coverage of the predictor data is not broad enough to capture all of the potential variability of the atmosphere over West Antarctica. The ice-core sites span only a 200 km transect in the central part of West Antarctica. The variability of this region may be different enough from the larger region that it reduces the skill of upscaling ANNs based on just these ice-core sites. In this respect, the AWSs should be better upscaling predictors; unfortunately, the record is too short to know for sure. Based on the limited results from AWS-based predictions (Section 4.2.2), the ANN-based methodology is capable of skillful prediction for testing data within the range of the training data. Given that the AWS data are not noise free, limited noise should also not preclude reasonable ANN performance.

5.2. Noise and aliasing. Few, if any, climate records are free of noise. Certain aspects of the climate system as a whole may rely on the system being noisy to explain their existence and behaviour. For example, there is evidence supporting a relationship between noise and the highly regular spacing (∼1500 years) of the Dansgaard–Oeschger oscillations seen in Greenland ice-core proxy records of temperature and dust, as well as various North Atlantic marine records (Alley et al., 2001). How much of an interpretation problem results from the presence of noise varies widely by dataset and time scale. AWS records at an annual resolution should be quite robust and relatively noise free, in part because of the large number of observations that form the annual average. It is also possible that going to an annual average may also smooth out useful signals in the data, not just reduce the noise. Ice-core records are more subject to noise, in part because of the larger uncertainties in dating. AWS records are time-stamped; ice-core dating often depends on assumptions about poorly observed precipitation processes. These assumptions have enough observational history that we feel confident in using them to develop annual resolution records, but the uncertainties do not disappear. The two main uncertainty components in multiparameter, ice-core-chemistry-based dating relate to issues in peak identification and assumptions about timing of chemical species deposition (Reusch et al., 1999). Noise is a factor in peak identification, since no chemical species follows a pure annual cycle in the real world. Peaks may be lost or added through various noise-contributing processes (e.g. redeposition, removal). Multiparameter techniques reduce, but do not eliminate, this problem. Timing of deposition is broadly known for Antarctic ice cores (e.g. Legrand, 1987; Legrand and Mayewski, 1997), but, like the climate system itself, it is not invariant from year to year. The dating of the ice-core data used here is based on multiple parameters (annual cycles in major ion chemistry, gross β activity from bomb 2− fallout horizons and excess SO4 peaks from volcanic events), but it depends strongly on the assumption that + 2− Na and SO4 peak in the austral winter and summer respectively (Reusch et al., 1999). Variability in the actual timing of these peaks can produce an aliasing of the annual data, where values from one year move into (or out of) another. Shifting of the Na+ peak is less important at the annual scale, since the calendar year is being used and the austral winter season falls in the middle of the averaging period. Interannual variability 2− of the timing of the SO4 peak will affect dating and the accuracy of all the other ice-core records, since it

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 604 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY determines the start/end of the calendar year. This ultimately affects the upscaling process by adding noise to the predictors. Because the SOM-based classifications are generalized patterns, the same atmospheric state, as represented by the SOM, could readily produce multiple, unique ice-core (and AWS) records. Noise will expand the number of (nearly) unique surface patterns mapping to the same atmospheric state. In theory, learning this many-to-one relationship is well within the power of ANNs, but it is likely that more training records will be needed compared with the noise-free, unaliased case. Thus, although Table VI suggests enough training data are available, noise is a definite factor in the predictive skill of the upscaling ANNs, and a larger training set is needed to improve the results.

5.3. Robustness of method The two distinct methods involved, SOM-based classification and ANN-based upscaling, have different robustness characteristics and respond to the length of training data (15 years) in different ways.

5.3.1. SOM classifications. Despite the shortness of record, the SOM analyses are robust and reproducible. The same year groups are found during the 5 × 3 analysis regardless of network initialization ordering of the input data, or the number of training iterations. A number of years (e.g. 1980, 1988) are classified together in all three grid sizes studied. The SOM has clearly found a reliable relationship between such years. Although our interpretation of the generalized patterns has been limited, partly due to our focus on upscaling, these results have value and will help to improve our understanding of West Antarctic climate.

5.3.2. ANN-based upscaling. The ANN-based upscaling has a less robust response to the size of the training set, particularly for the ice-core data. A factor contributing to the different results for the two datasets is simply that the AWSs are likely sampling a substantially larger area of the West Antarctic ice sheet (Figure 1). Thus, along with factors such as noise, the geographic coverage of the predictor dataset needs to be considered in evaluating the upscaling results. As discussed previously, the individual ANNs have high predictive skill with the training data but much lower performance with the testing data. Figure 10 summarizes the predictions for four ensembles and displays the extent to which predictions fall inside versus outside the SOM grid. The inner, darker black rectangle indicates the domain of the SOM grid. The outer, lighter black rectangle indicates the area within the ‘normal’ mapping domain of the x, y space. ‘Normal’ is defined as falling within the SOM grid or no more than one-half the internode distance outside the grid. SOM nodes are in a rectangular grid and spaced one unit apart. The mapping domain for each node is thus 0.5 units in the vertical and horizontal directions and ∼0.7 units on the diagonal. Thus, predictions that fall within that distance of a node will be mapped to that node. Predictions outside the outer rectangle (Figure 10) are more than this distance from the corner and edge nodes. Since we are just mapping to the nearest node, SOM nodes at the corners and on the edges (e.g. 4, 2) may include predictions from well outside the SOM grid extent. This is a side-effect of using the shortest distance to map predictions to the grid coordinates: corners will be the closest coordinate to all predictions outside the grid in that region. An alternative approach would be simply not to map predictions that fall outside the outer rectangle. This would provide useful information about predictive skill, but it would not solve the problem by itself. That the same year is mapped to adjacent nodes in different models is not in and of itself an indication of poor predictive skill. It is also important to consider the distance between map nodes when assessing differences between model predictions. Two nodes may have fairly similar spatial maps (generalized patterns) if they are adjacent; or they could be quite distinct, and an off-by-one error in the prediction leads to quite different results. This highlights the importance of Sammon maps (e.g. Figure 5) when evaluating upscaling predictions. Although not necessarily justifiable, it is also worth considering how prediction results might be improved by relaxing the criteria for a match. A simple change would be to consider predictions to adjacent nodes also to be correct. This would be quite reasonable for nodes that form a fairly similar cluster separated from the rest of the grid. Relaxing the criteria in this way would also make more sense for larger SOM grids (e.g. 5 × 7). For the sizes used in this study, it is hard to justify making this change, since the new range of correct answers would potentially cover over half the grid nodes (9 of 15). A less black-and-white approach might

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 605 coordinate predictions (20 ge of 5% of the predictions lie x, y 3 SOM grid (shown as the heavy black rectangle). The × de white for clarity. Note: an avera bers of each ensemble shown made a total of 500 3 grid; points outside this area are farther from the grid than half the distance between the grid nodes. × lines. The near-zero range has been ma ons of these predictions with respect to the 5 outside the axis limits of the plots ons from ice-core data for 1954–78. The 20 mem itional half unit distance outside the 5 25 predictions each). Each figure contours the positi × ensemble members Figure 10. Contouring of upscaling predicti light black rectangle represents an add Under ideal circumstances, all predictions would lie within the heavy black

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 606 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY be useful for this size grid, i.e. allowing predictions to adjacent nodes to be ‘sort of’ correct rather than just wrong. It is more likely that the simplest approach is just to train with more data, particularly since any relaxation of the criteria is also going to help the random solution (described below). As an alternative way to assess the ANN predictions outside the ice-core training set, we applied Monte Carlo techniques to the suite of predictions from the eight ensembles to test whether the yearly predictions were doing better than chance. An approach such as this is necessary due to the shortage of independent test data. Synoptic reconstructions do exist for years within the 1954–78 period (e.g. Rastorguev and Alvarez, 1958; Phillpot, 1997, and references cited therein), but comparisons with our work have not been attempted because of differing time scales, pressure levels, data availability, etc. In our test, a random sample is defined as a set of 25 x, y pairs (one pair for each year) drawn without replacement from the 4000 x, y predictions available in the eight ensembles of 20 ANNs each (with each ANN providing 25 predictions). The x, y values are used in real-value form to avoid the biases introduced by mapping them to integer grid coordinates. Each of these random samples has a standard deviation σRS measuring the spread of the predicted values (σRS is, in fact, calculated separately for x and y). Random skill is then defined by calculating the average and standard deviation (µ(σRS) and σ(σRS) respectively) of 1000 such random samples. This process was repeated 20 times to get average values for µ(σRS) and σ(σRS), i.e. an average random skill. For a particular year’s upscaling predictions to have skill relative to chance, the standard deviation of the prediction set σP needs to be less than µ(σRS) − σ(σRS). Table VIII summarizes results from comparisons for the ice-core-based upscaling for each ensemble. Values are the number of years in the ensemble that had σP less than our definition of random chance, µ(σRS) − σ(σRS). ANN skill is most tightly defined by the predictions for both x and y, since that determines the grid location. Because there is also information in how well the ANN predicts the individual coordinates (the ANNs are predicting the two outputs independently), Table VIII includes these data as well. To complete this analysis, we recognized that an ensemble could have a certain number of more skilful years purely by chance. Counts above this threshold suggest that more than just chance is involved and that there is some amount of skill in the ensemble’s predictions. Table VIII allows for a conservative 10% of years to have higher skill by chance and highlights those counts that exceed this threshold. Under these criteria, six of the eight ensembles have skill greater than chance in predicting the x, y values for 1954–78 (and, thus, the atmospheric patterns for these years). All ensembles have skill with one or the other grid coordinate (x or y), with skill being noticeably higher in predicting x (six of eight ensembles) versus y (only two of eight ensembles). These results strongly suggest that the ANNs are more skilful than random chance, although this is not directly apparent from the individual year predictions (Table VII) or the distribution of all predictions (Figure 10).

5.3.3. Refinements. As mentioned previously, noise is present in all climate records. We have tried to make use of this in a positive way by adding small amounts of noise to the original data to create additional training records. Although the idea is sound, the value of this approach has not been shown indisputably. Predictive skill tended to be more a function of the number of predictors than the size of the training set, although the latter was definitely a factor. Additional testing with a wider variety of artificial noise would likely be useful.

Table VIII. Summary of ANN-based upscaling predictions of SOM x, y coordinates versus chance. See text for explanation of entries. Bold indicates counts higher than expected by chance at the 10% level (rounded to 3 out of 25)

Ensemble Just x Just yxor yxand y

1a 0 7718 1b 7 0 715 3a 8 0 88 3b 3 3 65 4a 75126 4b 4 3 710 5a 8 1 9 3 5b 11 0 11 3

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 607

With respect to noise reduction, additional preprocessing steps may prove useful for improving predictive skill. Input data could be quantized into a small set of values, e.g. high, neutral, low, to reduce the range of cases that the ANN needs to relate to the target data. We have avoided traditional dimension reduction techniques, such as PCA, in order to focus on the simplest version of the prediction problem, i.e. predicting from raw values. Both of these preprocessing methods are worth further investigation, even with a larger input set for training. Ice-core-based upscaling is currently done to annual atmospheric patterns because of the annual resolution of the ice cores. However, there are strong annual cycles in the ice-core chemistry records. Although the timing of these cycles may not be known well enough to create unambiguous subannual chemistry records, enough is known to believe that relationships between the chemistry and the atmosphere will be better for certain subannual periods. For example, because of the Na+ peak in winter, it is reasonable to expect at least as good, and probably better, correlation between annual Na+ and the winter-season atmosphere. Thus, it would be useful to do SOM analyses on subannual atmospheric data and repeat the ice-core-based upscaling. Another approach to improving the upscaling results is to use predictor data that more fully sample the atmosphere over West Antarctica. A suite of ice-core data that provides broad geographical coverage would be one example of an improved predictor set. The four cores used so far cover only a limited spatial area and, thus, may not be entirely well suited to predicting the larger domain. Ice cores are, of course, not the only route to palaeoatmospheric reconstructions with upscaling. Extensive records from manned stations are available, although mostly coastal, and it would be useful to take advantage of this resource. These data could be used as independent predictors to compare with the ice-core-based results, or in combination with the ice cores (where appropriate) to join the information provided by each source.

6. CONCLUSIONS

SOM-based classification of annually averaged reanalysis data for West Antarctica produces robust groupings that provide insight into atmospheric climatology. ANN-based upscaling from a limited set of ice-core data is skilful at identifying the main SOM states extracted in the classification analysis. Unfortunately, the short training period (15 years) and limited spatial extent of the selected ice-core data appear to prohibit, in this case, a simple, deterministic reconstruction from high-confidence, well-defined predictions. Instead, a more probabilistic approach is needed to make up for the shortcomings in these datasets. This is clearly less desirable than simple, deterministic, high-confidence reconstructions and, thus, we will continue to pursue this goal in the future. Nonetheless, recognizing the short period of analysis, this is not a particular shortcoming in respect of advancing the methodological basis for researching climate and palaeoclimates. We have chosen to publish a progress report now because of the great potential we see for the analytical path followed here. Pending enhancements to our already robust methodology and availability of improved datasets, we are limited to an evaluation of what can be done with the current data. Careful evaluation of the upscaling analysis path suggests that, with longer reanalysis datasets and greater spatial coverage from ice-core data, improved reconstructions of histories of climate states should be possible. To this end, we plan to acquire the ERA-40 dataset (despite its shortcomings it still provides an improved dataset for this region of the globe) and obtain access to a larger suite of recent ice-core datasets. Despite the unfavourably short-term and spatially restricted dataset used to date, which might be expected to preclude successful analyses, we find that the combined techniques do allow ice-core reconstruction of annual-average synoptic conditions with some skill. We believe that this skill in the face of the difficulties justifies much wider testing of the techniques.

ACKNOWLEDGEMENTS This research was supported by the Office of Polar Programs of the National Science Foundation through grants OPP 94-18622, OPP 95-26374, OPP 96-14927 and OPP 00-87380 to R. B. Alley. We are also grateful to the Antarctic Meteorological Research Center, University of Wisconsin, for their archive of Antarctic AWS data and to NCAR’s Visualization and Enabling Technologies Section, Scientific Computing Division, for their tireless support of NCL.

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 608 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY

APPENDIX A: DATA CODING AND INTERPRETATION OF ANN PREDICTIONS

As outlined in Section 3.3.2, predictor and target data must be encoded prior to training of the upscaling ANNs. This is straightforward for the predictor data, as they are simply the measurements at all sites as standardized anomalies from the 15 year study period mean at each site. The AWS and ice-core chemistry data were averaged to annual values prior to calculating the anomalies to match the annual time scale of this study. Encoding of the targets is potentially more complex and, thus, we tested three coding schemes for the target SOM classifications: two versions of 1-of-N mapping and as x, y grid coordinates. The latter simply uses the x and y grid coordinates of each ERA-15 year in the 5 × 3 SOM map. Thus, each prediction vector is being mapped to two integer values, and two output values are needed for the prediction ANN. 1-of-N coding translates the x, y grid coordinates into either an integer or a bit vector. When using integers, the SOM nodes are simply numbered sequentially 1 to N (where N = xmaxymax) from grid 0, 0 to the maximum x, y coordinate in row order. With this scheme, only one ANN output is required. Bit vector coding attempts to take advantage of the fact that the SOM classification is a mapping from a prediction vector to a group of years identified by the SOM as having similar characteristics. For example, data are classified into six separate groups by a particular SOM. In this case the upscaling ANN is mapping from the input prediction vector to one of six groups (one-to-six mapping) and six outputs are required. For N target groups, a binary vector of length N is created with the position for the target group set to 1, and N output nodes are needed. For example, if group 3 of seven groups is the target classification for a sample, then the corresponding target vector will be 0010000 and seven output nodes are required. Bit vector coding thus has the highest complexity in the output layer, since it requires the most output nodes. It also has additional drawbacks, such as not being able to map SOM nodes that have not had input mapped to them. Integer 1-of-N coding has the lowest complexity (only one output node), but it has problems with small errors (off by only one position) potentially moving the prediction to the opposite side of the SOM. The x, y grid coordinate coding scheme does not have the problems of either 1-of-N integer coding or bit vector coding, at the small expense of a slightly more complex output layer. Thus, after evaluating all three schemes, we have selected x, y grid coordinates as the target format for ANN training. Because coordinate values are predicted as real numbers, post-processing is required to map predicted values to the SOM grid for comparison with known targets and other analysis steps. The simplest approach is just to map to the closest SOM grid coordinates (based on shortest distance). This takes advantage of the power of ANNs to make predictions of targets not seen during training (i.e. SOM nodes not mapped during the original SOM training), but it is also open to some drawbacks. For example, predictions that actually lie well outside the coordinate range of the SOM grid will be collapsed into the corners and edges of the grid. If it is a 5 × 3 SOM, then all upscaling ANN predictions greater than five and greater than three will map to the lower right node of the SOM. This becomes a problem when the distance between the prediction and the nearest x, y grid coordinate grows larger than one or two units, the maximum internode spacing within the grid itself. (See Section 4.2.3 for the impact of this error.) Future work with the post-processing steps will look at alternative ways to map the ANN predictions back to the SOM grid.

REFERENCES Alley RB, Cuffey KM. 2001. Oxygen- and hydrogen-isotopic ratios of water in precipitation: beyond paleothermometry. In Stable Isotope Geochemistry, Valley JW, Cole D (eds). Reviews in Mineralogy & Geochemistry, vol. 43; Mineralogical Society of America/The Geological Society: 527–553. Alley RB, Anandakrishnan S, Jung P. 2001. Stochastic resonance in the North Atlantic. Paleoceanography, 16: 190–198. Armstrong RL, Brodzik MJ. 1995. An Earth-gridded SSM/I data set for cryospheric studies and global change monitoring. Advances in Space Research 10: 155–163. Barry RG, Carleton AM. 2001. Synoptic and Dynamic Climatology. Routledge. Benoist JP, Jouzel J, Lorius C, Merlivat L, Pourchet M. 1982. Isotope climatic record over the last 2,500 years from Dome C (Antarctica) ice cores. Annals of Glaciology 3: 17–22. Bromwich DH, Fogt RL. 2004. Strong trends in the skill of the ERA-40 and NCEP/NCAR reanalyses in the high and middle latitudes of the , 1958–2001. Journal of Climate: 17: 4603–4619. Bromwich DH, Robasky FM, Cullather RI, Vanwoert ML. 1995. The atmospheric hydrologic cycle over the Southern and Antarctica from operational numerical analyses. Monthly Weather Review 123: 3518–3538.

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS 609

Bromwich DH, Cullather RI, Van Woert ML. 1998. Antarctic precipitation and its contribution to the global sea-level budget. Annals of Glaciology 27: 220–226. Bromwich DH, Rogers AN, Kallberg˚ P, Cullather RI, White JWC, Kreutz KJ. 2000. ECMWF analyses and reanalyses depiction of ENSO signal in Antarctic precipitation. Journal of Climate 13: 1406–1420. Bromwich DH, Monaghan AJ, Guo Z. 2004. Modeling the ENSO modulation of Antarctic climate in the late 1990s with Polar MM5. Journal of Climate 17: 109–132. Cavazos T. 1999. Large-scale circulation anomalies conducive to extreme events and simulation of daily rainfall in northeastern Mexico and southeastern Texas. Journal of Climate 12: 1506–1523. Cavazos T. 2000. Using self-organizing maps to investigate extreme climate events: an application to wintertime precipitation in the . Journal of Climate 13: 1718–1732. Crane RG, Hewitson BC. 1998. Doubled CO2 precipitation changes for the Susquehanna basin: down-scaling from the GENESIS general circulation model. International Journal of Climatology 18: 65–76. Cullather RI, Bromwich DH, Grumbine RW. 1997. Validation of operational numerical analyses in Antarctic latitudes. Journal of Geophysical Research 102: 13 761–13 784. Cullather RI, Bromwich DH, Van Woert ML. 1998. Spatial and temporal variability of Antarctic precipitation from atmospheric methods. Journal of Climate 11: 334–367. Delmas RJ, Kirchner S, Palais JM, Petit J-R. 1992. 1000 years of explosive volcanism recorded at the South Pole. Tellus, series B: Chemical and Physical Meteorology 44: 335–350. Demuth H, Beale M. 2000. Neural Network Toolbox. Mathworks, Inc.. ECMWF. 2000. ERA-15. http://wms.ecmwf.int/research/era/Era-15.html [Last accessed 2001]. ECMWF. 2001. ERA-40 project plan. http://wms.ecmwf.int/research/era/Project plan.html [Last accessed 30 July 2001]. Efron B, Tibshirani R. 1993. An Introduction to the Bootstrap. Chapman & Hall. Gardner MW, Dorling SR. 1998. Artificial neural networks (the multilayer perceptron) — a review of applications in the atmospheric sciences. Atmospheric Environment 32: 2627–2636. Genthon C, Braun A. 1995. ECMWF analyses and predictions of the surface climate of Greenland and Antarctica. Journal of Climate 8: 2324–2332. Gibson JK, Kallberg˚ P, Uppala S, Hernandez A, Nomura A, Serrano E. 1999. 1. ERA-15 description. ECMWF re-analysis report series, European Centre for Medium-Range Weather Forecasts, Reading, UK. Goodman PH. 1996. NevProp software, version 4. http://www.scs.unr.edu/nevprop [Last accessed 2002]. Haykin SS. 1999. Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall. Hewitson BC, Crane RG (eds). 1994. Neural Nets: Applications in Geography. Kluwer Academic. Hewitson BC, Crane RG. 2002. Self-organizing maps: applications to synoptic climatology. Climate Research 22: 13–26. Jouzel J, Vimeux F, Caillon N, Delaygue G, Hoffman G, Masson-Delmotte V, Parrenin F. 2003. Magnitude of isotope/temperature scaling for interpretation of central Antarctic ice cores. Journal of Geophysical Research 108: ACL 6-1–6-10. DOI: 10.1029/2002JD002677. Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y, Leetmaa A, Reynolds B, Chelliah M, Ebisuzaki W, Higgins W, Janowiak J, Mo KC, Ropelewski C, Wang J, Jenne R, Joseph D. 1996. The NCEP–NCAR 40-year reanalysis project. Bulletin of the American Meteorological Society 77: 437–472. Kistler R, Kalnay E, Collins W, Saha S, White G, Woollen J, Chelliah M, Ebisuzaki W, Kanamitsu M, Kousky V, van den Dool H, Jenne R, Fiorino M. 2001. The NCEP–NCAR 50-year reanalysis: monthly means CD-ROM and documentation. Bulletin of the American Meteorological Society 82: 247–267. Kohonen T. 1990. The self organizing map. Proceedings of the IEEE 78: 1464–1480. Kohonen T. 1995. Self-Organizing Maps. Springer Series in Information Sciences, vol. 30. Springer-Verlag. Kohonen T, Hynninen J, Kangas J, Laaksonen J. 1996. SOM PAK: the self-organizing map program package. Technical Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo. Kreutz KJ, Mayewski PA, Twickler MS, Whitlow SI, White JWC, Shuman CA, Raymond CF, Conway H, McConnell JR. 1999. Seasonal variations of glaciochemical, isotopic, and stratigraphic properties in Siple Dome, Antarctica surface snow. Annals of Glaciology 29: 38–44. Langway Jr CC, Osada K, Clausen HB, Hammer CU, Shoji H, Mitani A. 1994. New chemical stratigraphy over the last millenium for Byrd Station, Antarctica. Tellus, Series B: Chemical and Physical Meteorology 46: 40–51. Lazzara MA. 2000. Antarctic automatic weather stations Web site home page. http://uwamrc.ssec.wisc.edu/aws/ [Last accessed 13 May 2000]. Legrand M. 1987. Chemistry of Antarctic snow and ice. Journal de Physique 48: 77–86. Legrand M, Mayewski PA. 1997. Glaciochemistry of polar ice cores: a review. Reviews of Geophysics 35: 219–243. Legrand MR, Lorius C, Barkov NI, Petrov VN. 1988. Vostok (Antarctica) ice core: atmospheric chemistry changes over the last climatic cycle (160,000 years). Atmospheric Environment 22: 317–331. Lorius C. 1989. Polar ice cores and climate. In Climate and Geo-Sciences, Berger A, Schneider S, Duplessy JCl (eds). Kluwer Academic Publishers: 77–103. Marshall GJ. 2002. Trends in Antarctic geopotential height and temperature: a comparison between radiosonde and NCEP–NCAR reanalysis data. Journal of Climate 15: 659–674. Mayewski PA, Spencer MJ, Lyons WB, Twickler MS, Dibb J. 1988. Ice core records and ozone depletion — potential for a proxy ozone record. Antarctic Journal of the United States 23: 64–68. Mayewski PA, Meeker LD, Twickler MS, Whitlow S, Yang QZ, Lyons WB, Prentice M. 1997. Major features and forcing of high- latitude atmospheric circulation using a 110,000-year-long glaciochemical series. Journal of Geophysical Research 102: 26 345–26 366. Mosley-Thompson E, Dai J, Thompson LG, Grootes PM, Arbogast JK, Paskievitch JF. 1991. Glaciological studies at Siple Station (Antarctica): potential ice core paleoclimatic record. Journal of Glaciology 37: 11–22.

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005) 610 D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY

North GR, Bell TL, Cahalan RF, Moeng FJ. 1982. Sampling errors in the estimation of empirical orthogonal functions. Monthly Weather Review 110: 699–706. Phillpot HR. 1997. Some observationally identified meteorological features of East Antarctica (Meteorological Study No. 42). Bureau of Meteorology. Rastorguev VI, Alvarez JA. 1958. Description of the Antarctic circulation observed from April to November 1957 at the IGY Antarctic Weather Central, Little America Station, IGY World Data Center A, National Academy of Sciences, Washington, DC. Reusch DB, Alley RB. 2002. Automatic weather stations and artificial neural networks: improving the instrumental record in West Antarctica. Monthly Weather Review 130: 3037–3053. Reusch DB, Alley RB. 2004. A 15-year West Antarctic climatology from six automatic-weather-station temperature and pressure records. Journal of Geophysical Research 109: 1–28. DOI: 10.1029/2003JD004178. Reusch DB, Mayewski PA, Whitlow SI, Pittalwala II, Twickler MS. 1999. Spatial variability of climate and past atmospheric circulation patterns from central West Antarctic glaciochemistry. Journal of Geophysical Research 104: 5985–6001. Sammon Jr JW. 1969. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers C-18: 401–409. Schneider DP, Steig EJ. 2002. Spatial and temporal variability of Antarctic ice sheet microwave brightness temperatures. Geophysical Research Letters 29: 25–1–25-4. DOI: 10.129/2002GL15490. Shuman CA, Alley RB, Fahnestock MA, Bindschadler RA, White JWC, Winterle J, McConnell JR. 1998. Temperature history and accumulation timing for the snow pack at GISP2, central Greenland. Journal of Glaciology 44: 21–30. Sinclair MR, Renwick JA, Kidson JW. 1997. Low-frequency variability of Southern Hemisphere sea level pressure and weather system activity. Monthly Weather Review 125: 2531–2543. Smith TM, Reynolds RW, Livezey RE, Stokes DC. 1996. Reconstruction of historical sea surface temperatures using empirical orthogonal functions. Journal of Climate 9: 1403–1420. US ITASE Steering Committee. 1996. Science and implementation plan for US ITASE: 200 years of past Antarctic climate and environmental change. US ITASE Workshop, Baltimore, MD, National Science Foundation. Von Storch H, Zwiers FW. 1999. Statistical Analysis in Climate Research. Cambridge University Press. Waddington ED. 1996. Where are we going? The ice core — paleoclimate inverse problem. In Chemical Exchange Between the Atmosphere and Polar Snow, Wolff EW, Bales RC (eds). NATO ASI Series I: Global Environmental Change 43. Springer-Verlag: 630–640. White JWC, Barlow LK, Fisher D, Grootes P, Jouzel J, Johnsen SJ, Stuiver M, Clausen H. 1997. The climate signal in the stable isotopes of snow from Summit, Greenland — results of comparisons with modern climate observations. Journal of Geophysical Research 102: 26425–26439–. White JWC, Steig EJ, Cole J, Cook ER, Johnsen SJ. 1999. Recent, annually resolved climate as recorded in stable isotope ratios in ice cores from Greenland and Antarctica. In 10th Global Change Studies. American Meteorological Society: 300–302. Wolff EW, Peel DA. 1985. The record of global pollution in polar snow and ice. Nature 313: 535–540. Zielinski GA, Fiacco RJ, Mayewski PA, Meeker LD, Whitlow S, Twickler MS, Germani MS, Endo K, Yasui M. 1994. Climatic impact of the AD 1783 Asama (Japan) eruption was minimal — evidence from the GISP2 ice core. Geophysical Research Letters 21: 2365–2368.

Copyright  2005 Royal Meteorological Society Int. J. Climatol. 25: 581–610 (2005)