Nonlinear Climatology and Paleoclimatology: Capturing Patterns of Variability and Change with Self-Organizing Maps

Physics and Chemistry of the Earth 35 (2010) 329–340 Contents lists available at ScienceDirect Physics and Chemistry of the Earth journal homepage: www.elsevier.com/locate/pce Nonlinear climatology and paleoclimatology: Capturing patterns of variability and change with Self-Organizing Maps David B. Reusch * EMS Earth and Environmental Systems Institute, The Pennsylvania State University, University Park, PA 16802, USA article info abstract Article history: Self-Organizing Maps (SOMs) provide a powerful, nonlinear technique to optimally summarize complex Received 26 February 2009 geophysical datasets using a user-selected number of ‘‘icons” or SOM states, allowing rapid identification Received in revised form 30 July 2009 of preferred patterns, predictability of transitions, rates of transitions, and hysteresis/asymmetry in Accepted 4 September 2009 cycles. For example, SOM-based patterns concisely capture the spatial and temporal variability in atmo- Available online 18 September 2009 spheric circulation datasets. SOMs and the SOM-based methodology are reviewed here at a practical level so as to encourage more-widespread usage of this powerful technique in the atmospheric sciences. Usage Keywords: is introduced with a simple Antarctic ice core-based dataset to show analysis of multiple variables at a Antarctica single point in space. Subsequent examples utilize a 24-year dataset (1979–2002) of daily Antarctic mete- Atmospheric circulation Self-Organizing Maps orological variables and demonstrate usage with 2-D, gridded data, for single and multiple variable sce- Nonlinear narios. These examples readily show how SOMs capture nonlinearity in time series data, concisely Polar climate summarize voluminous spatial data, and help us understand spatial and temporal change through, for Meteorology example, pattern frequency analysis and identification of preferred pattern transitions. Ó 2009 Elsevier Ltd. All rights reserved. 1. Introduction marizes key features without being overly drawn to specific details. Many techniques are useful for extracting patterns from a large Self-Organizing Maps (Kohonen, 2001) are an analysis tool from geophysical dataset, such as the examples of daily-mean anomalies the field of artificial neural networks that uses so-called unsuper- of the atmospheric circulation data considered here or records of vised training to find salient features (icons) of an input dataset monthly Antarctic sea-ice extent. It may prove useful and informa- without prior specification (or knowledge) of the ‘‘correct” output. tive, for example, to note the strength of the loading of the data The features (or patterns) found by the network are often of direct from a particular month on the first two principal components of interest but they are also frequently used to divide the input data the dataset. It may also be instructive to note that the data from into distinct classes (i.e., classification) that further help us to a given month are very similar to those of some earlier, well- understand the data. In short, SOMs support unsupervised classifi- known month (e.g., ‘‘this looks like the ice that trapped Endurance cation of large, multivariate geophysical datasets through creation in the Weddell Sea ice pack in January, 1915”). In this hypothetical of a spatially organized set of generalized patterns of variability example, ‘‘January, 1915” serves as an icon, to which other data which, collectively, represents the probability density function fields can be compared. Such icon-based classification schemes (PDF) of the input data (e.g., Hewitson and Crane, 2002; Reusch can work in parallel with pattern-extraction tools such as Principal et al., 2005a,b). That is, a SOM analysis produces a succinct, Components Analysis (PCA). However, a particular data field (such discrete, nonlinear (Kohonen, 2001) classification/summary of as ‘‘January, 1915”) may contain unique features that reduce its complex, continuum input data. utility as a more general icon. To overcome this difficulty (i.e., over The extracted patterns of a SOM analysis are returned in a grid specificity of the results), the technique of Self-Organizing Map or ‘‘map”, with similar patterns placed near each other and the (SOM) analysis (Kohonen, 2001) provides an objective way to opti- most-extreme patterns at the corners, i.e., there is a spatial organi- mally extract a user-specified number of icons, or SOM states, from zation and order in the pattern set. Often, the patterns at the ends an input dataset. Here ‘‘optimal” indicates that the patterns cap- of one diagonal are similar to the positive and negative phases of ture dataset variability in a generalized way that accurately sum- the first principal component of the input data, with the second principal component correspondingly at the ends of the other diagonal; however, this is not required. For example, in the ice-core * Tel.:+1 814 865 9319. data analysis example below (Fig. 2), high recent values and low E-mail address: [email protected] early-record values are placed at opposite ends of one diagonal 1474-7065/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.pce.2009.09.001 330 D.B. Reusch / Physics and Chemistry of the Earth 35 (2010) 329–340 of the map, while important asymmetries in the behavior of spe- 2. Methodology cific variables are found on the other diagonal. In neither case, however, are the end-diagonal patterns mirror images, as would The successful application of SOMs revolves around two key be found in a strictly linear analysis. concepts: the architecture and topology of the SOM grid, and the The SOM-based approach brings a number of methodological parameter selection and training process that yields the SOM pat- benefits as well. Foremost is that SOMs offer an alternative to more terns. All aspects of SOM usage, such as the frequency maps and traditional linear techniques, such as principal component analysis other analyses described in later sections, build on these ideas, (PCA), that is more robust (e.g., able to interpolate into areas of the thus, it is important to understand these foundations. Note that input space not present in the available training input), less com- the freely available SOM-PAK software (Kohonen, 2001) has been plex, and less subjective while also accommodating nonlinear rela- used here as in our previous work (e.g., Reusch et al., 2005a,b). tionships in the data (Kohonen, 2001; Reusch et al., 2005a,b). SOMs also provide a completely independent uniformitarian analysis 2.1. Architecture and topology pathway and thus provide independent results for comparison with more traditional techniques. SOM-based analysis thus com- Architecturally, a SOM is composed of a finite set of nodes with plements linear techniques without replacing them. SOMs also a grid topology such that each node has either four (rectangular) or provide a powerful visualization approach for studying structure six (hexagonal) neighbors (fewer for those nodes located on a grid in large, complex datasets. In the case of atmospheric circulation edge or corner). A particular SOM instance is usually referred to by data, for example, the patterns capture the full range of synoptic its grid dimensions. For example, a 4 Â 3 SOM has 12 nodes. The conditions while also treating the data as a continuum. Because size of the grid (number of patterns) directly influences the amount each input data record maps uniquely to one SOM pattern, SOM of generalization: smaller (larger) node arrays have fewer (more) analysis easily allows characterization of, for example, time-trends available patterns to characterize the n-dimensional data space, in frequency of occurrence, preferred transitions that may point to- so the final patterns developed during training will tend to do more ward predictability, and cyclicity of preferred patterns. These as- (less) generalization of the input. Grid size is thus a first-order pects and others will be detailed below. experimental parameter. Because the pattern set is relatively small Readers interested in going beyond the examples presented (and finite), complexity is reduced to working with the set of pat- here have a growing number of examples to choose from of SOM terns instead of the (usually much larger) original dataset. Individ- usage in meteorology, climatology and oceanography. Recent work ual SOM patterns are typically identified either by an (x, y) with a regional/synoptic focus covers the polar regions as well as coordinate pair or a sequence number within the two-dimensional the temperate latitudes of both hemispheres. In the polar regions, array (counting left-to-right, top-to-bottom). Coordinate pairs examples include characterizing extreme events at Barrow, Alaska identify patterns consistently across different grid sizes while se- in a synoptic context (Cassano et al., 2006a,b), the identification quence numbers have notational simplicity (and will be used here). and characterization of a low-level jet on the Ross Ice Shelf, Antarc- Each SOM node has an associated reference vector representing tica (Seefeldt and Cassano, 2008), and the atmospheric circulation the node’s generalized pattern. Reference vectors have the same of West Antarctica (Reusch et al., 2005a,b). In the North Atlantic, dimensionality as the original data samples once those samples synoptic forcing of precipitation in Greenland (Schuenemann have been converted to vector form (i.e., if the samples being ana- et al., 2009) and the spatial/temporal characteristics of the North lyzed are not one-dimensional,

Nonlinear Climatology and Paleoclimatology: Capturing Patterns of Variability and Change with Self-Organizing Maps

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support