Project Phoenix - Optimizing the Machine-Person Mix in High-Impact Weather Forecasting
Total Page:16
File Type:pdf, Size:1020Kb
6A.5 PROJECT PHOENIX - OPTIMIZING THE MACHINE-PERSON MIX IN HIGH-IMPACT WEATHER FORECASTING Patrick J. McCarthy, Dave Ball, Willi Purcell Prairie and Arctic Storm Prediction Centre Meteorological Service of Canada 1. INTRODUCTION An experiment, called Project Phoenix, was designed in 2000 to test whether forecasters In the numerical model-dominated weather still added value to the routine public services of the world, do meteorologists still forecast. The project in early 2001 and when add value in weather forecasting? unexpected results were achieved, the project was rerun later that year producing The forecast process within an operational similar results. weather centre is based upon the analysis, diagnosis and prognosis of skilled This paper examines the results of Project meteorologists. To aid their efforts, the Phoenix, and offers explanations for the forecasters use a wide range of tools including results obtained. weather satellites, radar, workstations, and numerical weather prediction (NWP) models. Tremendous gains in NWP performance over 2. PROJECT PHOENIX the past 2 decades has made this tool typically the most utilized in the forecaster’s toolbox. Project Phoenix has two principal components: 1) operational forecasting, and As early as 1969 (Klein, 1969), it has been 2) performance measurement and feedback. envisioned that computer-generated forecasts would increasingly dominate weather OPERATIONAL FORECASTING prediction. In 1970, NWP-generated text forecasts were being produced experimentally To test whether the PSPC meteorologists (Glahn, 1970) with the objective to reduce overuse NWP at the expense of their forecaster workload related to “routine” meteorological skills, or whether forecasters weather. could no-longer add value, it was decided that a second “weather center” was needed. The seductive nature of the ever-improving The project would compare the performance NWP output raised concerns that forecasters of the Phoenix team to that of the PSPC, plus were becoming too dependent upon this tool. an automated NWP-based text product. Leonard W. Snellman noted this development as early as 1977 (Snellman, 1977). He coined The Phoenix weather center mirrored the the term “meteorological cancer” to describe PSPC. The PSPC was responsible for the the increasing tendency of forecasters to forecasts, Watches and Warnings of the abdicate practicing meteorological science, three Canadian Provinces. This area of while becoming just a conduit for information responsibility represented one of the largest generated by computers. in the world, encompassing roughly 25% of the area of the 48 contiguous U.S. states. By the late 1990s, managers at the Prairie The PSPC forecast area was divided into 102 Storm Prediction Centre (PSPC) were forecast regions. Three or four forecasters concerned that the performance of their were normally on duty at the PSPC. The forecasters was now only on par with the now Project Phoenix teams operated their prevalent NWP forecast products. weather center “down the hall” from the PSPC with 3 forecasters and produced the same primary forecasts as the PSPC. The *Corresponding author address: Patrick McCarthy, Prairie and Arctic Storm prediction Centre, 123 Main Street – forecasters selected for the Phoenix team Suite 150, Winnipeg, Manitoba, Canada, R3C4W2; email: were chosen to best represent a mix of both [email protected] experience and inexperience. Selecting a team of “all-stars” was counter to the goals of the project and was intentionally avoided. The Phoenix, SCRIBE and PSPC forecasts were objectively assessed early the following All versions of Phoenix followed the same morning using the Phoenix verification general process, on a Monday-to-Friday basis. method. Feedback was immediately provided The forecast team worked from 8:00 a.m. until as raw numeric scores for each version, 5:00 p.m. and was responsible for issuing all along with a written commentary outlining of the regularly scheduled forecasts due near where each group had achieved superior 11:00 a.m. and 4:30 p.m., in accordance with performance, or had failed to match the the official filing times for each product. accuracy of the other two forecast systems. Both the Project Phoenix and PSPC PSPC forecasters usually have a text product forecasters were aware of the details of the from the previous shift in place when they verification scheme used. work through their routine. To maintain a similar approach between the two forecast VERIFICATION teams, while avoiding cross-contamination between the two, the Phoenix forecasters A detailed description of the Phoenix began with the automated NWP-based text verification system is beyond the scope of product produced earlier that morning near this paper, though one is available (Ball, 00:00 A.M. This was the “SCRIBE” forecast. 2007). The following is a brief summary of SCRIBE (Boulais, 1992, Boulais et al, 1992, the verification system. Verret et al, 1993, Verret et al, 1995) takes direct model output from the Canadian A prerequisite to any comparison of forecasts operational model plus output from a number is a verification system that will provide an of NWP-derived statistical packages to accurate and meaningful measure of error generate a matrix of weather concepts. Text (e.g. Stanski, 1982, Stanski et al, 1989), forecasts can be generated from these preferably in a concise fashion that the end concepts. Forecasters can modify these user, in this case the general public, can concepts to generate new forecasts. An readily understand. Many systems of analogue to this system would be the IFPS verification have traditionally been used by (Ruth, 2002) and the NDFD (Glahn et al, Environment Canada, but most concentrate 2003) in the U.S. National Weather Service. only upon one or two readily quantifiable parameters, such as a daytime high. These Based upon their analysis, diagnosis, and methods fail to provide the complete story prognosis efforts, the forecasters could alter and the utility of the information can be the existing forecasts or start anew. Other masked in statistical parlance. than the raw SCRIBE forecast, the Phoenix forecasters had no access to any model or Project Phoenix adopted a verification statistical information, while all official forecast scheme that attempted to measure the products were also barred. This lack of model severity of forecast error and its impact upon and human forecast data was the only the residents of the Prairie Provinces, significant difference between what was drawing heavily upon public responses to available to the Phoenix team and the PSPC recent Environment Canada opinion surveys forecasters. The Phoenix forecasters had to (e.g. Environics Research Group, 1999, meet the same forecast deadlines as their Angus Reid Group Inc., 2000). In effect, the PSPC counterparts. All non-transmitted system was designed to recognize that forecasts valid at the deadlines for both missing a five centimetre snowfall in a large forecast teams were deemed final and centre would result in a far greater negative assessed. Forecasts for days 1 to 5 could be reaction than would a forecast missing a few altered. late-afternoon sunny breaks in a rural region. At the end of each day during the forecasting Forecasts of precipitation, wind and sky portion of Phoenix, the meteorologists offered conditions were assessed, based on a five- a brief commentary on their shift, outlining any category, nonlinear error scale (Figure 1). challenges that were faced and potential Forecasting the correct category resulted in a concerns with their forecasts. zero-error score, while an error of four full categories resulted in an error charge of 100 The error scores were combined into one percent. Due to its emphasis on improving holistic error. Since the same public surveys forecasts in the shorter term, the Phoenix noted that the various weather elements verification system assessed the forecasts in ranked differently in importance, each greater detail than has typically been the case, weather element was weighted differently. including the timing of significant events. Based on public opinion, precipitation forecasts account for 45 percent of the Forecast Precipitation Precipitation Wind Sky forecast score and temperatures were Category (Rain) (Snow) Speed Condition ascribed a 25 percent proportion. Wind 1 Nil Nil 10 Sunny velocities contribute 15 percent of the Sunny with forecast value, with sky condition and cloudy obstructions to vision accounting for the 2 0.3 0.3 20 periods remaining 10 percent of the total score. Mix of sun 3 0.6 0.6 30 and cloud Cloudy The verification system is heavily with weighted toward routine weather events >= 0.2 cm sunny due to the limited duration of Project 4 Showery 0.2 cm - 2 cm 40 periods Phoenix and its primary objectives. >=0.2 cm 5 Extensive > 2 cm 50+ Cloudy Specifically, there was no attempt to differentiate between extreme events Figure 1. Verification Categories for precipitation, wind that would warrant a weather warning. A speed and sky condition. Phoenix verification system for extreme weather and warnings has since been Temperature was handled differently. The created. public surveys indicated that the Canadian public could tolerate temperature errors of four The system also used a weighting factor for degrees Celsius or less but that errors in each forecast region, based in part upon excess of 4 degrees were deemed population, but also considering factors such unacceptable. Temperature forecasts were as infrastructure,