FORECASTER's FORUM Discussion Of
Total Page:16
File Type:pdf, Size:1020Kb
AUGUST 2004 FORECASTER'S FORUM 769 FORECASTER'S FORUM Discussion of Veri®cation Concepts in Forecast Veri®cation: A Practitioner's Guide in Atmospheric Science BOB GLAHN Meteorological Development Laboratory, National Weather Service, Silver Spring, Maryland 16 December 2003 and 9 February 2004 The book Forecast Veri®cation: A Practitioner's Bureau in 1958. In the Alaskan Weather Center of the Guide in Atmospheric Science, edited by Jolliffe and U.S. Air Force, as the ®rst numerical weather predic- Stephenson (2003, hereafter JS03), ®lls a void in ver- tion (NWP) ``progs'' were rolling out, we were using i®cation of meteorological and climate forecasts. While a score similar to the S1 score (JS03, p. 129; Teweles a number of books on aspects of statistics related to and Wobus 1954). Roger Allen (for whom I worked meteorology and climatology (e.g., Wilks 1995) discuss for several years) and Jack Thompson hired me, and veri®cation, this complete book is devoted to the sub- my of®ce was just down the hall from Glenn Brier and ject. The book comprises a fairly tightly coupled set of Thompson. All three had recently published what has chapters written by generally well-known experts, in turned out to be landmark papers (Brier 1950; Thomp- some cases perhaps more so in Europe than North Amer- son 1952; Thompson and Brier 1955; Brier and Allen ica, in veri®cation and especially in the subjects of their 1951); all except Thompson and Brier are referenced particular chapters. In a book in which sections or chap- in JS03. Over the years, I have watched the veri®cation ters are written by different authors, one asks the fol- literature grow to what it is today. I certainly agree lowing questions: 1) how well do the individual chapters with JS03 that ``Allan Murphy had a major impact on read and present the material logically, accurately, and the theory and practice of forecast veri®cation'' (p. 3). comprehensively; and 2) how well do the chapters relate Murph was a proli®c writer, maintaining over long pe- to one another and address the full subject of the book? riods a paper a month. He collaborated with many oth- Regarding the ®rst question, JS03 gets high marks for ers of renown and touched on most subjects relating most chapters. On the second question, JS03 is better to forecast veri®cation. The one topic with which he than many, although the editors have not suppressed had not gotten entirely comfortable was forecasts of individuality enough in some instances for it to read like spatial ®elds (Allan Murphy, personal communica- a fully cohesive book. JS03 is a voluminously refer- tion), although he and Ed Epstein de®ned a skill score enced and well-indexed survey of what is known about, for model veri®cation (Murphy and Epstein 1989). Per- and a historical account of, veri®cation and the related haps the single most important paper he coauthored topic evaluation as it exists in the meteorological lit- was the landmark paper, ``A general framework for erature. The editors have put much emphasis on stan- forecast veri®cation'' (Murphy and Winkler 1987), dardizing mathematical notation throughout, and were mentioned by JS03 in chapter 1. quite successfulÐan achievement in itself. While the A possible runner-up in importance to the Murphy methods presented can be applicable to most any fore- and Winkler paper in the meteorological veri®cation lit- casting problem, the discussion and examples are tied erature was the introduction of the relative operating to weather and climate forecasting as acknowledged by characteristic (ROC) into meteorology. While Murph JS03 (preface), which hardly translates into the full embraced this concept, it was ®rst brought into the me- scope of ``atmospheric science.'' teorological literature by Ian Mason (1980, 1982a,b), I have been interested in and involved with veri®- who reported and built upon the work of John Swets cation even before my entry into the U.S. Weather (1973). John and Ian were two of the invitees to a work- shop on probabilistic weather forecasts in 1983 at Tim- Corresponding author address: Dr. Harry R. Glahn, Meteorolog- ical Department Laboratory, National Weather Service, W/OST2, berline Lodge on Mount Hood, Oregon, organized by 1325 East±West Highway, Silver Spring, MD 20910. Murphy. ROC has not played as major a role in the past E-mail: [email protected] as such scores as versions of the skill score and threat 770 WEATHER AND FORECASTING VOLUME 19 score (sometimes with different names), but it is begin- work would get many citations, the number and ning to come to the forefront with the recognition and diversity is so great that it is a dominant thread in use of probability information. Even the terminology JS03. base rate, hit rate, and false alarm rate have come into Attribution to other authors. JS03 provides many ci- prominence in meteorological forecast veri®cation tations to previous works, which can be very help- largely through the in¯uence of ROC. For instance, JS03 ful to those delving into details of veri®cation and in the de®nition for hit rate states that it is ``Also known evaluation of meteorological and climate forecasts. as probability of detection in older literature.'' I would counter that most readers and developers associated with In chapter 1, the editors reiterate Brier and Allen's atmospheric science are more familiar with ``probability (1951) reasons for veri®cation; use their terms ``eco- of detection'' than they are with ``hit rate.'' Maybe that nomic,'' ``administrative,'' and ``scienti®c,'' and note is because they, too, are ``older.'' that a common theme is that any veri®cation scheme I have identi®ed a number of recurring themes or needs to be informative (p. 4). They note that it is highly central ideas in JS03 mentioned below: desirable that the veri®cation system be objective; they examine various scores according to attributes reliabil- Finley's tornado forecasts. The now-famous 2 3 2 ity, resolution, discrimination, and sharpness, as sug- table of Finley's (1884) yes/no tornado forecasts is gested by Murphy and Winkler (1987), and for ``good- introduced on page 1 and is discussed several times. ness'' of which Murphy (1993) identi®ed three typesÐ The table even appears almost subliminally on the consistency, quality (e.g., accuracy or skill), and value cover. JS03 states ``. there is a surprisingly large (utility); and they note that in order to quantify the value number of ways in which the numbers in the four of a forecast, a baseline is needed, and that persistence, cells . can be combined to give measures of the climatology,1 and randomization are common baselines. quality of the forecasts.'' I might add that the well-established objective method Veri®cation presented from a developer's viewpoint. of Model Output Statistics (MOS) produces an impor- Much of the discussion seems to have as an ob- tant and more competitive baseline for many forecasts, jective developing or improving a forecast system especially in the National Weather Service in the United rather than judging the, possibly comparative, States; but, this is not mentioned in JS03. goodness of a set of forecasts. While both aspects While the idea of a baseline is important and seem- are important, JS03 does not clearly make the dis- ingly a simple concept, even climatic forecasts as a base- tinction, and I would have expected concentration line need more de®nition, because different ``de®ni- to be heavily on the latter rather than the former. tions'' can give quite different results. For instance, in Strong emphasis on the ROC and its associated ter- verifying temperature forecasts over a season, the mean minology hit rate, H, and false alarm rate, F.While temperature (climatic mean) over the season would be other ways to evaluate forecasts [e.g., computation a poor baseline. One should rather use monthly means of scores, such as mean absolute error (MAE)] are or some simple low-frequency curve ®t to the data over treated throughout the book, the ROC gets a very the same seasonal extent. Even so, the question of using strong play. Albeit an important concept, it has a the sample frequencies of categories versus longer-term major de®ciencyÐit does not consider calibration, relative frequencies usually is not a given. For instance, and poorly calibrated forecasts may be judged to Bob Livezey (in JS03, p. 78) states ``. the exclusive be as good as well-calibrated forecasts. This is stat- use of sample probabilities (observed frequencies) of ed in JS03 in some contexts, but is not emphasized, categories of the forecast/observation set being veri®ed and when it is mentioned, it is usually dismissed is recommended, rather than the use of historical data. with the suggestion to recalibrate, in keeping with The only exception to this is for the case where statistics the development theme. are stationary and very well estimated.'' However, as Probability forecasts. In agreement with the recent with many veri®cations, the purpose comes into play. American Meteorological Society (AMS) state- If one is comparing a set of subjective temperature fore- ment on probability forecasts (AMS 2002), JS03 casts with the baseline available to the forecaster when recommends the use of probability forecasts and the forecasts are being made, the baseline is the his- emphasizes their potential value to customers over torical record, not the mean of the time series yet to be nonprobabilistic forecasts. observed, regardless of the stationarity of the time se- Ensembles. The examples are, beside Finley's fore- ries. (Extreme nonstationarity would indicate that cli- casts, in connection with climate or ensembles.