<<

7. The August 14 Blackout Compared With Previous Major North American Outages

Incidence and Characteristics Figure 7.1. North American Power System Outages, of Power System Outages 1984-1997 Short, localized outages occur on power systems fairly frequently. System-wide disturbances that affect many customers across a broad geographic area are rare, but they occur more frequently than a normal distribution of probabilities would pre- dict. North American power system outages between 1984 and 1997 are shown in Figure 7.1 by the number of customers affected and the rate of occurrence. While some of these were widespread weather-related events, some were cascading events that, in retrospect, were preventable. Elec- tric power systems are fairly robust and are capa- ble of withstanding one or two contingency events, but they are fragile with respect to multi- ple contingency events unless the systems are Note: The circles represent individual outages in North readjusted between contingencies. With the America between 1984 and 1997, plotted against the fre- shrinking margin in the current transmission sys- quency of outages of equal or greater size over that period. tem, it is likely to be more vulnerable to cascading Source: Adapted from John Doyle, California Institute of outages than it was in the past, unless effective Technology, “Complexity and Robustness,” 1999. Data from NERC. countermeasures are taken.

As evidenced by the absence of major transmis- If nothing else changed, one could expect an sion projects undertaken in North America over increased frequency of large-scale events as com- the past 10 to 15 years, utilities have found ways to pared to historical experience. The last and most increase the utilization of their existing facilities extreme event shown in Figure 7.1 is the August to meet increasing demands without adding sig- 10, 1996, outage. August 14, 2003, surpassed that nificant high-voltage equipment. Without inter- event in terms of severity. In addition, two signifi- vention, this trend is likely to continue. Pushing cant outages in the month of September 2003 the system harder will undoubtedly increase reli- occurred abroad: one in England and one, initiated ability challenges. Special protection schemes in Switzerland, that cascaded over much of Italy. may be relied on more to deal with particular chal- lenges, but the system still will be less able to In the following sections, seven previous outages withstand unexpected contingencies. are reviewed and compared with the blackout of August 14, 2003: (1) Northeast blackout on A smaller transmission margin for reliability November 9, 1965; (2) New York City blackout on makes the preservation of system reliability a July 13, 1977; (3) West Coast blackout on Decem- harder job than it used to be. The system is being ber 22, 1982; (4) West Coast blackout on July 2-3, operated closer to the edge of reliability than it 1996; (5) West Coast blackout on August 10, 1996; was just a few years ago. Table 7.1 represents some (6) Ontario and U.S. North Central blackout on of the changed conditions that make the preserva- June 25, 1998; and (7) Northeast outages and non- tion of reliability more challenging. outage disturbances in the summer of 1999.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 103 Outage Descriptions u Two key 345-kV east-west (Rochester-Syracuse) lines opened due to instability, and several and Major Causal Factors lower voltage lines tripped open. November 9, 1965: Northeast Blackout u Five of 16 generators at the St. Lawrence (Massena) plant tripped automatically in This disturbance resulted in the loss of over accordance with predetermined operating 20,000 MW of load and affected 30 million people. procedures. Virtually all of New York, Connecticut, Massachu- u Following additional line tripouts, 10 generat setts, Rhode Island, small segments of northern - ing units at Beck were automatically shut down Pennsylvania and northeastern New Jersey, and by low governor oil pressure, and 5 pumping substantial areas of Ontario, Canada, were generators were tripped off by overspeed gover affected. Outages lasted for up to 13 hours. This - nor control. event resulted in the formation of the North Amer- ican Electric Reliability Council in 1968. u Several other lines then tripped out on under-frequency relay action. A backup operated to open one of five 230-kV lines taking power north from a gener- July 13, 1977: New York City Blackout ating plant in Ontario to the Toronto area. When the flows redistributed instantaneously on the This disturbance resulted in the loss of 6,000 MW remaining four lines, they tripped out succes- of load and affected 9 million people in New York sively in a total of 2.5 seconds. The resultant City. Outages lasted for up to 26 hours. A series of power swings resulted in a cascading outage that events triggering the separation of the Consoli- blacked out much of the Northeast. dated Edison system from neighboring systems and its subsequent collapse began when two The major causal factors were as follows: 345-kV lines on a common tower in Northern u Operation of a backup protective relay took a Westchester were struck by lightning and tripped 230-kV line out of service when the loading on out. Over the next hour, despite Consolidated Edi- the line exceeded the 375-MW relay setting. son dispatcher actions, the system electrically separated from surrounding systems and col- u Operating personnel were not aware of the lapsed. With the loss of imports, generation in operating set point of this relay. New York City was not sufficient to serve the load in the city. u Another 230-kV line opened by an overcurrent relay action, and several 115- and 230-kV lines Major causal factors were: opened by protective relay action.

Table 7.1. Changing Conditions That Affect System Reliability Previous Conditions Emerging Conditions Fewer, relatively large resources Smaller, more numerous resources Long-term, firm contracts Contracts shorter in duration More non-firm transactions, fewer long-term firm transactions Bulk power transactions relatively stable and predictable Bulk power transactions relatively variable and less predictable Assessment of system reliability made from stable base Assessment of system reliability made from variable base (narrower, more predictable range of potential operating (wider, less predictable range of potential operating states) states) Limited and knowledgable set of utility players More players making more transactions, some with less interconnected operation experience; increasing with retail access Unused transmission capacity and high security margins High transmission utilization and operation closer to security limits Limited competition, little incentive for reducing reliability Utilities less willing to make investments in transmission investments reliability that do not increase revenues Market rules and reliability rules developed together Market rules undergoing transition, reliability rules developed separately Limited wheeling More system throughput

104 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G u Two 345-kV lines connecting Buchanan South December 22, 1982: West Coast Blackout to Millwood West experienced a phase B to ground fault caused by a lightning strike. This disturbance resulted in the loss of 12,350 MW of load and affected over 5 million people in u operations at the Buchanan the West. The outage began when high winds South ring bus isolated the Indian Point No. 3 caused the failure of a 500-kV . generating unit from any load, and the unit trip- The tower fell into a parallel 500-kV line tower, ped for a rejection of 883 MW of load. and both lines were lost. The failure of these two u Loss of the ring bus isolated the 345-kV tie to lines mechanically cascaded and caused three Ladentown, which had been importing 427 additional towers to fail on each line. When the MW, making the cumulative resources lost line conductors fell they contacted two 230-kV 1,310 MW. lines crossing under the 500-kV rights-of-way, col- lapsing the 230-kV lines. u 18.5 minutes after the first incident, an addi- tional lightning strike caused the loss of two The loss of the 500-kV lines activated a remedial 345-kV lines, which connect Sprain Brook to action scheme to control the separation of the Buchanan North and Sprain Brook to Millwood interconnection into two pre-engineered islands West. These two 345-kV lines share common and trip generation in the Pacific Northwest in towers between Millwood West and Sprain order to minimize customer outages and speed Brook. One line (Sprain Brook to Millwood restoration. However, delayed operation of the West) automatically reclosed and was restored remedial action scheme components occurred for to service in about 2 seconds. The failure of the several reasons, and the interconnection sepa- other line to reclose isolated the last Consoli- rated into four islands. dated Edison interconnection to the Northwest. In addition to the mechanical failure of the trans- mission lines, analysis of this outage cited prob- u The resulting surge of power from the North- lems with coordination of protective schemes, west caused the loss of the Pleasant Valley to because the generator tripping and separation Millwood West line by relay action (a bent con- schemes operated slowly or did not operate as tact on one of the relays at Millwood West planned. A communication channel component caused the improper action). performed sporadically, resulting in delayed u 23 minutes later, the Leeds to Pleasant Valley transmission of the control signal. The backup 345-kV line sagged into a tree due to overload separation scheme also failed to operate, because and tripped out. the coordination of relay settings did not antici- pate the power flows experienced in this severe u Within a minute, the 345 kV to 138 kV trans- disturbance. former at Pleasant Valley overloaded and trip- ped off, leaving Consolidated Edison with only In addition, the volume and format in which data three remaining interconnections. were displayed to operators made it difficult to assess the extent of the disturbance and what cor- u Within 3 minutes, the Long Island Lighting Co. rective action should be taken. Time references to system operator, on concurrence of the pool dis- events in this disturbance were not tied to a com- patcher, manually opened the Jamaica to Valley mon standard, making real-time evaluation of the Stream tie. situation more difficult. u About 7 minutes later, the tap-changing mecha- July 2-3, 1996: West Coast Blackout nism failed on the Goethals phase-shifter, resulting in the loss of the Linden-to-Goethals This disturbance resulted in the loss of 11,850 tie to PJM, which was carrying 1,150 MW to MW of load and affected 2 million people in the Consolidated Edison. West. Customers were affected in Arizona, Cali- u The two remaining external 138-kV ties to Con- fornia, Colorado, Idaho, Montana, Nebraska, solidated Edison tripped on overload, isolating Nevada, New Mexico, Oregon, South Dakota, the Consolidated Edison system. Texas, Utah, Washington, and Wyoming in the United States; Alberta and British Columbia in u Insufficient generation in the isolated system Canada; and Baja California Norte in Mexico. Out- caused the Consolidated Edison island to ages lasted from a few minutes to several hours. collapse.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 105 The outage began when a 345-kV transmission the overload and loss of additional transmission line in Idaho sagged into a tree and tripped out. A lines. General voltage decline in the Northwest protective relay on a parallel transmission line and the loss of McNary generation due to incor- also detected the fault and incorrectly tripped a rectly applied relays caused power oscillations on second line. An almost simultaneous loss of these the California to Oregon AC intertie. The intertie’s lines greatly reduced the ability of the system to protective relays tripped these facilities out and transmit power from the nearby Jim Bridger plant. caused the Western Interconnection to separate Other relays tripped two of the four generating into four islands. Following the loss of the first two units at that plant. With the loss of those two lightly loaded lines, operators were unaware that units, frequency in the entire Western Intercon- the system was in an insecure state over the next nection began to decline, and voltage began to col- hour, because new operating studies had not been lapse in the Boise, Idaho, area, affecting the performed to identify needed system adjustments. California-Oregon AC Intertie transfer limit. June 25, 1998: Upper Midwest Blackout For 23 seconds the system remained in precarious balance, until the Mill Creek to Antelope 230-kV This disturbance resulted in the loss of 950 MW of line between Montana and Idaho tripped by zone load and affected 152,000 people in Minnesota, 3 relay, depressing voltage at Summer Lake Sub- Montana, North Dakota, South Dakota, and Wis- station and causing the intertie to slip out of syn- consin in the United States; and Ontario, Mani- chronism. Remedial action relays separated the toba, and Saskatchewan in Canada. Outages lasted system into five pre-engineered islands designed up to 19 hours. to minimize customer outages and restoration A lightning storm in Minnesota initiated a series of times. Similar conditions and initiating factors events, causing a system disturbance that affected were present on July 3; however, as voltage began the entire Mid-Continent Area Power Pool (MAPP) to collapse in the Boise area, the operator shed Region and the northwestern Ontario Hydro sys- load manually and contained the disturbance. tem of the Northeast Power Coordinating Council. August 10, 1996: West Coast Blackout A 345-kV line was struck by lightning and tripped out. Underlying lower voltage lines began to over- This disturbance resulted in the loss of over load and trip out, further weakening the system. 28,000 MW of load and affected 7.5 million people Soon afterward, lightning struck a second 345-kV in the West. Customers were affected in Arizona, line, taking it out of service as well. Following the California, Colorado, Idaho, Montana, Nebraska, outage of the second 345-kV line, the remaining Nevada, New Mexico, Oregon, South Dakota, lower voltage transmission lines in the area Texas, Utah, Washington, and Wyoming in the became significantly overloaded, and relays took United States; Alberta and British Columbia in them out of service. This cascading removal of Canada; and Baja California Norte in Mexico. Out- lines from service continued until the entire ages lasted from a few minutes to as long as nine northern MAPP Region was separated from the hours. Eastern Interconnection, forming three islands Triggered by several major transmission line out- and resulting in the eventual blackout of the ages, the loss of generation from McNary Dam, and northwestern Ontario Hydro system. resulting system oscillations, the Western Inter- connection separated into four electrical islands, Summer of 1999: Northeast U.S. with significant loss of load and generation. Prior Non-outage Disturbances to the disturbance, the transmission system from Load in the PJM system on July 6, 1999, was Canada south through the Northwest into Califor- 51,600 MW (approximately 5,000 MW above fore- nia was heavily loaded with north-to-south power cast). PJM used all emergency procedures (includ- transfers. These flows were due to high Southwest ing a 5% voltage reduction) except manually demand caused by hot weather, combined with tripping load, and imported 5,000 MW from exter- excellent hydroelectric conditions in Canada and nal systems to serve the record customer demand. the Northwest. Load on July 19, 1999, exceeded 50,500 MW. PJM Very high temperatures in the Northwest caused loaded all available eastern PJM generation and two lightly loaded transmission lines to sag into again implemented emergency operating proce- untrimmed trees and trip out. A third heavily dures from approximately 12 noon into the eve- loaded line also sagged into a tree. Its outage led to ning on both days.

106 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G During these record peak loads, steep voltage pumping, accompanies hot weather and places declines were experienced on the bulk transmis- higher burdens on transmission lines. Losing cir- sion system. In each case, a voltage collapse was cuits contributes to voltage decline. Inductive barely averted through the use of emergency pro- load is unforgiving when voltage declines, draw- cedures. Low voltage occurred because reactive ing additional reactive supply from the system demand exceeded reactive supply. High reactive and further contributing to voltage problems. demand was due to high electricity demand and high losses resulting from high transfers across the Recommendations from previous investigations system. Reactive supply was inadequate because include: generators were unavailable or unable to meet u Paying special attention to the condition of rated reactive capability due to ambient condi- rights-of-way following favorable growing sea- tions, and because some shunt capacitors were out sons. Very wet and warm spring and summer of service. growing conditions preceded the 1996 outages in the West. Common or Similar Factors u Careful review of any reduction in operations Among Major Outages and maintenance expenses that may contribute to decreased frequency of line patrols or trim- The factors that were common to some of the ming. Maintenance in this area should be major outages above and the August 14 blackout strongly directed toward preventive rather than include: (1) conductor contact with trees; (2) over- remedial maintenance. estimation of dynamic reactive output of system generators; (3) inability of system operators or Dynamic Reactive Output of Generators coordinators to visualize events on the entire sys- Reactive supply is an important ingredient in tem; (4) failure to ensure that system operation maintaining healthy power system voltages and was within safe limits; (5) lack of coordination on facilitating power transfers. Inadequate reactive system protection; (6) ineffective communication; supply was a factor in most of the events. Shunt (7) lack of “safety nets;” and (8) inadequate train- capacitors and generating resources are the most ing of operating personnel. The following sections significant suppliers of reactive power. Operators describe the nature of these factors and list recom- perform contingency analysis based on how mendations from previous investigations that are power system elements will perform under vari- relevant to each. ous power system conditions. They determine and Conductor Contact With Trees set transfer limits based on these analyses. Shunt capacitors are easy to model because they are This factor was an initiating trigger in several of static. Modeling the dynamic reactive output of the outages and a contributing factor in the sever- generators under stressed system conditions has ity of several more. Unlike lightning strikes, for proven to be more challenging. If the model is which system operators have fair storm-tracking incorrect, estimated transfer limits will also be tools, system operators generally do not have incorrect. direct knowledge that a line has contacted a tree In most of the events, the assumed contribution of and faulted. They will sometimes test the line by dynamic reactive output of system generators was trying to restore it to service, if that is deemed to be greater than the generators actually produced, a safe operation. Even if it does go back into ser- resulting in more significant voltage problems. vice, the line may fault and trip out again as load Some generators were limited in the amount of heats it up. This is most likely to happen when reactive power they produced by over-excitation vegetation has not been adequately managed, in limits, or necessarily derated because of high combination with hot and windless conditions. ambient temperatures. Other generators were con- In some of the disturbances, tree contact account- trolled to a fixed and did not contrib- ed for the loss of more than one circuit, contribut- ute reactive supply in depressed voltage ing multiple contingencies to the weakening of conditions. Under-voltage load shedding is em- the system. Lines usually sag into right-of-way ployed as an automatic remedial action in some obstructions when the need to retain transmission interconnections to prevent cascading, and could interconnection is high. High inductive load be used more widely. composition, such as air conditioning or irrigation

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 107 Recommendations from previous investigations presented to operators or coordinators as informa- concerning voltage support and reactive power tion they can use in making appropriate operating management include: decisions. u Communicate changes to generator reactive Recommendations from previous investigations capability limits in a timely and accurate man- concerning visibility and tools include: ner for both planning and operational modeling u Develop communications systems and displays purposes. that give operators immediate information on u Investigate the development of a generator changes in the status of major components in MVAr/voltage monitoring process to determine their own and neighboring systems. when generators may not be following reported u Supply communications systems with uninter- MVAr limits. ruptible power, so that information on system u Establish a common standard for generator conditions can be transmitted correctly to con- steady-state and post-contingency (15-minute) trol centers during system disturbances. MVAr capability definition; determine method- u In the control center, use a dynamic line loading ology, testing, and operational reporting and outage display board to provide operating requirements. personnel with rapid and comprehensive infor- u Determine the generator service level agree- mation about the facilities available and the ment that defines generator MVAr obligation to operating condition of each facility in service. help ensure reliable operations. u Give control centers the capability to display to u Periodically review and field test the reactive system operators computer-generated alterna- limits of generators to ensure that reported tive actions specific to the immediate situation, MVAr limits are attainable. together with expected results of each action. u u Provide operators with on-line indications of Establish on-line security analysis capability to available reactive capability from each generat- identify those next and multiple facility outages ing unit or groups of generators, other VAr that would be critical to system reliability from sources, and the reactive margin at all critical thermal, stability, and post-contingency voltage buses. This information should assist in the points of view. operating practice of maximizing the use of u Establish time-synchronized disturbance moni- shunt capacitors during heavy transfers and toring to help evaluate the performance of the thereby increase the availability of system interconnected system under stress, and design dynamic reactive reserve. appropriate controls to protect it. u For voltage instability problems, consider fast System Operation Within Safe Limits automatic capacitor insertion (both series and shunt), direct shunt reactor and load tripping, Operators in several of the events were unaware of and under-voltage load shedding. the vulnerability of the system to the next contin- gency. The reasons were varied: inaccurate model- u Develop and periodically review a reactive mar- ing for simulation, no visibility of the loss of key gin against which system performance should transmission elements, no operator monitoring of be evaluated and used to establish maximum stability measures (reactive reserve monitor, transfer levels. power transfer angle), and no reassessment of sys- System Visibility Procedures and tem conditions following the loss of an element Operator Tools and readjustment of safe limits. Recommendations from previous investigations Each control area operates as part of a single syn- include: chronous interconnection. However, the parties with various geographic or functional responsibil- u Following a contingency, the system must be ities for reliable operation of the grid do not have returned to a reliable state within the allowed visibility of the entire system. Events in neighbor- readjustment period. Operating guides must be ing systems may not be visible to an operator or reviewed to ensure that procedures exist to reliability coordinator, or power system data restore system reliability in the allowable time may be available in a control center but not be periods.

108 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G u Reduce scheduled transfers to a safe and pru- a specified period of time. The time delay dent level until studies have been conducted to should be long enough to allow the system oper- determine the maximum simultaneous transfer ator to attempt to reduce line loadings promptly capability limits. by other means. u Reevaluate processes for identifying unusual u Review phase-angle restrictions that can pre- operating conditions and potential disturbance vent reclosing of major interconnections during scenarios, and make sure they are studied system emergencies. Consideration should be before they are encountered in real-time operat- given to bypassing synchronism-check relays to ing conditions. permit direct closing of critical interconnec- tions when it is necessary to maintain stability Coordination of System Protection of the grid during an emergency. (Transmission and Generation Elements) u Review the need for controlled islanding. Oper- Protective relays are designed to detect short cir- ating guides should address the potential for cuits and act locally to isolate faulted power sys- significant generation/load imbalance within tem equipment from the system—both to protect the islands. the equipment from damage and to protect the sys- tem from faulty equipment. Relay systems are Effectiveness of Communications applied with redundancy in primary and backup Under normal conditions, parties with reliability modes. If one relay fails, another should detect the responsibility need to communicate important fault and trip appropriate circuit breakers. Some and prioritized information to each other in a backup relays have significant “reach,” such that timely way, to help preserve the integrity of the non-faulted line overloads or stable swings may be grid. This is especially important in emergencies. seen as faults and cause the tripping of a line when During emergencies, operators should be relieved it is not advantageous to do so. Proper coordina- of duties unrelated to preserving the grid. A com- tion of the many relay devices in an intercon- mon factor in several of the events described nected system is a significant challenge, requiring above was that information about outages occur- continual review and revision. Some relays can ring in one system was not provided to neighbor- prevent resynchronizing, making restoration more ing systems. difficult. System-wide controls protect the interconnected Need for Safety Nets operation rather than specific pieces of equip- A safety net is a protective scheme that activates ment. Examples include controlled islanding to automatically if a pre-specified, significant con- mitigate the severity of an inevitable disturbance tingency occurs. When activated, such schemes and under-voltage or under-frequency load shed- involve certain costs and inconvenience, but they ding. Failure to operate (or misoperation of) one or can prevent some disturbances from getting out of more relays as an event developed was a common control. These plans involve actions such as shed- factor in several of the disturbances. ding load, dropping generation, or islanding, and in all cases the intent is to have a controlled out- Recommendations developed after previous out- come that is less severe than the likely uncon- ages include: trolled outcome. If a safety net had not been taken u Perform system trip tests of relay schemes peri- out of service in the West in August 1996, it would odically. At installation the acceptance test have lessened the severity of the disturbance from should be performed on the complete relay 28,000 MW of load lost to less than 7,200 MW. (It scheme in addition to each individual compo- has since been returned to service.) Safety nets nent so that the adequacy of the scheme is should not be relied upon to establish transfer lim- verified. its, however. u Continually update relay protection to fit Previous recommendations concerning safety nets changing system development and to incorpo- include: rate improved relay control devices. u Establish and maintain coordinated programs u Install sensing devices on critical transmission of automatic load shedding in areas not so lines to shed load or generation automatically if equipped, in order to prevent total loss of power the short-term emergency rating is exceeded for in an area that has been separated from the

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 109 main network and is deficient in generation. frames in many cases. Other readjustments Load shedding should be regarded as an insur- must be used, and the system operator must ance program, however, and should not be used take responsibility to restore the system as a substitute for adequate system design. immediately. u Install load-shedding controls to allow fast sin- u Operators’ authority and responsibility to take gle-action activation of large-block load shed- immediate action if they sense the system is ding by an operator. starting to degrade should be emphasized and protected. Training of Operating Personnel u The current processes for assessing the poten- Operating procedures were necessary but not suf- tial for voltage instability and the need to ficient to deal with severe power system distur- enhance the existing operator training pro- bances in several of the events. Enhanced grams, operational tools, and annual technical procedures and training for operating personnel assessments should be reviewed to improve the were recommended. Dispatcher training facility ability to predict future voltage stability prob- scenarios with disturbance simulation were sug- lems prior to their occurrence, and to mitigate gested as well. Operators tended to reduce sched- the potential for adverse effects on a regional ules for transactions but were reluctant to call scale. for increased generation—or especially to shed load—in the face of a disturbance that threatened to bring the whole system down. Comparisons With the August 14 Blackout Previous recommendations concerning training include: The blackout on August 14, 2003, had several u Thorough programs and schedules for operator causes or contributory factors in common with the training and retraining should be vigorously earlier outages, including: administered. u Inadequate vegetation management u A full-scale simulator should be made available u Failure to ensure operation within secure limits to provide operating personnel with “hands-on” u experience in dealing with possible emergency Failure to identify emergency conditions and or other system conditions. communicate that status to neighboring systems u Procedures and training programs for system u Inadequate operator training operators should include anticipation, recogni- tion, and definition of emergency situations. u Inadequate regional-scale visibility over the power system u Written procedures and training materials should include criteria that system operators u Inadequate coordination of relays and other can use to recognize signs of system stress and protective devices or systems. mitigating measures to be taken before condi- New causal features of the August 14 blackout tions degrade into emergencies. include: inadequate interregional visibility over u Line loading relief procedures should not be the power system; dysfunction of a control area’s relied upon when the system is in an insecure SCADA/EMS system; and lack of adequate backup state, as these procedures cannot be imple- capability to that system. mented effectively within the required time

110 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 8. Performance of Plants Affected by the Blackout

Introduction contribute to its spread (i.e., to an extent beyond the normal tripping of the plants at expected On August 14, 2003, nine U.S. nuclear power conditions). plants experienced rapid shutdowns (reactor trips) as a consequence of the . Seven u The severity of the grid transient caused genera- nuclear power plants in Canada operating at high tors, turbines, or reactor systems at the nuclear power levels at the time of the event also experi- plants to reach protective feature limits and enced rapid shutdowns. Four other Canadian actuate automatic protective actions. nuclear plants automatically disconnected from u The nuclear plants responded to the grid condi- the grid due to the electrical transient but were tions in a manner consistent with the plant able to continue operating at a reduced power designs. level and were available to supply power to the grid as it was restored by the transmission system u The nuclear plants were maintained in a safe operators. Six nuclear plants in the United States condition until conditions were met to permit and one in Canada experienced significant electri- the nuclear plants to resume supplying electri- cal disturbances but were able to continue gener- cal power to the grid. ating electricity. Many non-nuclear generating plants in both countries also tripped during the u For nuclear plants in the United States: event. Numerous other nuclear plants observed ã disturbances on the but continued Fermi 2, Oyster Creek, and Perry tripped due to generate electrical power without interruption. to main generator trips, which resulted from voltage and frequency fluctuations on the The Nuclear Working Group (NWG) was one of grid. Nine Mile 1 tripped due to a main tur- three Working Groups created to support the bine trip due to frequency fluctuations on the U.S.-Canada Power System Outage Task Force. grid. The NWG was charged with identifying all rele- vant actions by nuclear generating facilities in ã FitzPatrick and Nine Mile 2 tripped due to connection with the outage. Nils Diaz, Chairman reactor trips, which resulted from turbine of the U.S. Nuclear Regulatory Commission (NRC) control system low pressure due to frequency and Linda Keen, President and CEO of the Cana- fluctuations on the grid. Ginna tripped due to dian Nuclear Safety Commission (CNSC) were a reactor trip which resulted from a large loss co-chairs of the Working Group, with other mem- of electrical load due to frequency fluctua- bers appointed from industry and various State tions on the grid. Indian Point 2 and Indian and federal agencies. Point 3 tripped due to a reactor trip on low flow, which resulted when low grid fre- In Phase I, the NWG focused on collecting and quency tripped reactor coolant pumps. analyzing data from each affected to determine what happened, and whether u For nuclear plants in Canada: any activities at the plants caused or contributed ã to the power outage or involved a significant At Bruce B and Pickering B, frequency and/or safety issue. Phase I culminated in the issuance of voltage fluctuations on the grid resulted in the Task Force’s Interim Report, which reported the automatic disconnection of generators that: from the grid. For those units that were suc- cessful in maintaining the unit generators u The affected nuclear power plants did not operational, reactor power was automatically trigger the power outage or inappropriately reduced.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 111 ã At Darlington, load swing on the grid led to plants that shut down or disconnected from the the automatic reduction in power of the four grid responded automatically to grid conditions. reactors. The generators were, in turn, auto- All the nuclear plants responded in a manner con- matically disconnected from the grid. sistent with the plant designs. Safety functions were effectively accomplished, and the nuclear ã Three reactors at Bruce B and one at Darling- plants that tripped were maintained in a safe shut- ton were returned to 60% power. These reac- down condition until their restart. tors were available to deliver power to the grid on the instructions of the transmission In Phase II, the NWG collected comments and ana- system operator. lyzed information related to potential recommen- ã Three units at Darlington were placed in a dations to help prevent future power outages. zero-power hot state, and four units at Representatives of the NWG, including represen- Pickering B and one unit at Bruce B were tatives of the NRC and the CNSC, attended public placed in a Guaranteed Shutdown State. meetings to solicit feedback and recommenda- tions held in Cleveland, Ohio; New York City, The licensees’ return to power operation followed New York; and Toronto, Ontario, on December 4, a deliberate process controlled by plant proce- 5, and 8, 2003, respectively. Representatives of the dures and regulations. Equipment and process NWG also participated in the NRC’s public meet- problems, whether existing prior to or caused by ing to solicit feedback and recommendations on the event, would normally be addressed prior to the Northeast blackout held in Rockville, Mary- restart. The NWG is satisfied that licensees took an land, on January 6, 2004. appropriately conservative approach to their restart activities, placing a priority on safety. Additional details on both the Phase I and Phase II efforts are available in the following sections. Due u For U.S. nuclear plants: Ginna, Indian Point 2, to the major design differences between nuclear Nine Mile 2, and Oyster Creek resumed electri- plants in Canada and the United States, the NWG cal generation on August 17. FitzPatrick and decided to have separate sections for each coun- Nine Mile 1 resumed electrical generation on try. This also responds to the request by the August 18. Fermi 2 resumed electrical genera- nuclear regulatory agencies in both countries to tion on August 20. Perry resumed electrical gen- have sections of the report that stand alone, so that eration on August 21. Indian Point 3 resumed they can also be used as regulatory documents. electrical generation on August 22. Indian Point 3 had equipment issues (failed splices in the control rod drive mechanism power system) Findings of the U.S. Nuclear that required repair prior to restart. Ginna Working Group submitted a special request for enforcement dis- cretion from the NRC to permit mode changes Summary and restart with an inoperable auxiliary feedwater pump. The NRC granted the request The U.S. NWG found no evidence that the shut- for enforcement discretion. down of the nine U.S. nuclear power plants trig- gered the outage, or inappropriately contributed to u For Canadian nuclear plants: The restart of the its spread (i.e., to an extent beyond the normal Canadian nuclear plants was carried out in tripping of the plants at expected conditions). All accordance with approved Operating Policies nine plants that experienced a reactor trip were and Principles. Three units at Bruce B and one responding to grid conditions. The severity of the at Darlington were resynchronized with the grid grid transient caused generators, turbines, or reac- within 6 hours of the event. The remaining tor systems at the plants to reach a protective fea- three units at Darlington were reconnected by ture limit and actuate a plant shutdown. All nine August 17 and 18. Units 5, 6, and 8 at Pickering plants tripped in response to those conditions in a B and Unit 6 at Bruce B returned to service manner consistent with the plant designs. The between August 22 and August 25. nine plants automatically shut down in a safe The NWG has found no evidence that the shut- fashion to protect the plants from the grid tran- down of the nuclear power plants triggered the sient. Safety functions were effectively accom- outage or inappropriately contributed to its spread plished with few problems, and the plants were (i.e., to an extent beyond the normal tripping of maintained in a safe shutdown condition until the plants at expected conditions). All the nuclear their restart.

112 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G The nuclear power plant outages that resulted in detail. Two additional requests for more spe- from the August 14, 2003, power outage were trig- cific information were made for certain plants. gered by automatic protection systems for the The collection of information from U.S. nuclear reactors or turbine-generators, not by any manual power plants was gathered through the NRC operator actions. The NWG has received no infor- regional offices, which had NRC resident inspec- mation that points to operators deliberately shut- tors at each plant obtain licensee information to ting down nuclear units to isolate themselves from answer the questions. General design information instabilities on the grid. In short, only automatic was gathered from plant-specific Updated Final separation of nuclear units occurred. Safety Analysis Reports and other documents. Regarding the 95 other licensed commercial Plant data were compared against plant designs by nuclear power plants in the United States: 4 were the NRC staff to determine whether the plant already shut down at the time of the power outage, responses were as expected; whether they one of which experienced a grid disturbance; 70 appeared to cause the power outage or contributed operating plants observed some level of grid dis- to the spread of the outage; and whether applica- turbance but accommodated the disturbances and ble safety requirements were met. In some cases remained on line, supplying power to the grid; and supplemental questions were developed, and 21 operating plants did not experience any grid answers were obtained from the licensees to clar- disturbance. ify the observed response of the plant. The NWG interfaced with the ESWG to validate some data Introduction and to obtain grid information, which contributed to the analysis. The NWG identified relevant The NRC, which regulates U.S. commercial actions by nuclear generating facilities in connec- nuclear power plants, has regulatory requirements tion with the power outage. for offsite power systems. These requirements address the number of offsite power sources and Typical Design, Operational, and the ability to withstand certain transients. Offsite power is the normal source of alternating current Protective Features of U.S. Nuclear (AC) power to the safety systems in the plants Power Plants when the plant main generator is not in operation. Nuclear power plants have a number of design, The requirements also are designed to protect operational, and protective features to ensure that safety systems from potentially damaging varia- the plants operate safely and reliably. This section tions (in voltage and frequency) in the supplied describes these features so as to provide a better power. For loss of offsite power events, the NRC understanding of how nuclear power plants inter- requires emergency generation (typically emer- act with the grid and, specifically, how nuclear gency diesel generators) to provide AC power to power plants respond to changing grid conditions. safety systems. In addition, the NRC provides While the features described in this section are oversight of the safety aspects of offsite power typical, there are differences in the design and issues through its inspection program, by moni- operation of individual plants which are not toring operating experience, and by performing discussed. technical studies. Design Features of U.S. Nuclear Power Plants Phase I: Fact Finding Nuclear power plants use heat from nuclear reac- Phase I of the NWG effort focused on collecting tions to generate steam and use a single steam- and analyzing data from each plant to determine driven turbine-generator (also known as the main what happened, and whether any activities at the generator) to produce electricity supplied to the plants caused or contributed to the power outage grid. or its spread or involved a significant safety issue. To ensure accuracy, comprehensive coordination Connection of the plant switchyard to the grid. was maintained among the working group mem- The plant switchyard normally forms the interface - bers and among the NWG, ESWG, and SWG. between the plant main generator and the electri cal grid. The plant switchyard has multiple trans- The staff developed a set of technical questions to mission lines connected to the grid system to meet obtain data from the owners or licensees of the offsite power supply requirements for having reli- nuclear power plants that would enable them to able offsite power for the nuclear station under review the response of the nuclear plant systems all operating and shutdown conditions. Each

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 113 transmission line connected to the switchyard has power (MVAr) from the generator to the dedicated circuit breakers, with fault sensors, to switchyard and out to the surrounding grid, help- isolate faulted conditions in the switchyard or the ing to control the grid voltage decrease. If the connected transmission lines, such as phase-to- switchyard voltage goes low enough, the phase or phase-to-ground short circuits. The fault increased generator field current could result in sensors are fed into a protection scheme for the generator field overheating. Over-excitation pro- plant switchyard that is engineered to localize tective circuitry is generally employed to prevent any faulted conditions with minimum system this from occurring. This protective circuitry may disturbance. trip the generator to prevent equipment damage. Connection of the main generator to the - Under-voltage protection is provided for the yard. The plant main generator produces electri- nuclear power plant safety buses, and may be pro- cal power and transmits that power to the offsite vided on nonsafety buses and at individual pieces transmission system. Most plants also supply of equipment. It is also used in some pressurized power to the plant auxiliary buses for normal water reactor designs on reactor coolant pumps operation of the nuclear generating unit through (RCPs) as an anticipatory loss of RCP flow signal. the unit auxiliary . During normal plant operation, the main generator typically gen- Protective Features of U.S. Nuclear Power erates electrical power at about 22 kV. The voltage Plants is increased to match the switchyard voltage by The main generator and main turbine have protec- the main , and the power flows to the tive features, similar to fossil generating stations, switchyard through two power cir- which protect against equipment damage. In gen- cuit breakers. eral, the reactor protective features are designed to Power supplies for the plant auxiliary buses. The protect the reactor fuel from damage and to protect safety-related and nonsafety auxiliary buses are the reactor coolant system from over-pressure or normally lined up to receive power from the main over-temperature transients. Some trip features generator auxiliary transformer, although some also produce a corresponding trip in other compo- plants leave some of their auxiliary buses powered nents; for example, a turbine trip typically results from a startup transformer (that is, from the offsite in a reactor trip above a low power setpoint. power distribution system). When plant power Generator protective features typically include generation is interrupted, the power supply auto- over-current, ground detection, differential relays matically transfers to the offsite power source (the (which monitor for conditions startup transformer). If that is not supplying within a zone of protection defined by the location acceptable voltage, the circuit breakers to the of the sensors, typically the main generator and all safety-related buses open, and the buses are transformers connected directly to the generator reenergized by the respective fast-starting emer- output), electrical faults on the transformers con- gency diesel generators. The nonsafety auxiliary nected to the generator, loss of the generator field, buses will remain deenergized until offsite power and a turbine trip. Turbine protective features typ- is restored. ically include over-speed (usually set at 1980 rpm Operational Features of U.S. Nuclear Power or 66 Hz), low bearing oil pressure, high bearing Plants vibration, degraded condenser vacuum, thrust Response of nuclear power plants to changes in bearing failure, or generator trip. Reactor protec- switchyard voltage. With the main generator volt- tive features typically include trips for over- age regulator in the automatic mode, the generator power, abnormal pressure in the reactor coolant will respond to an increase of switchyard voltage system, low reactor coolant system flow, low level by reducing the generator field excitation current. in the steam generators or the reactor vessel, or a This will result in a decrease of reactive power, trip of the turbine. normally measured as mega-volts-amperes-reac- tive (MVAr) from the generator to the switchyard Considerations on Returning a U.S. and out to the surrounding grid, helping to control Nuclear Power Plant to Power the grid voltage increase. With the main generator Production After Switchyard Voltage voltage regulator in the automatic mode, the gen- Is Restored erator will respond to a decrease of switchyard voltage by increasing the generator field excitation The following are examples of the types of activi- current. This will result in an increase of reactive ties that must be completed before returning a

114 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G nuclear power plant to power production follow- decreased significantly (more than 24 hours ing a loss of switchyard voltage. after the trip). u Switchyard voltage must be normal and stable It may require a day or more before a nuclear from an offsite supply. Nuclear power plants are power plant can restart following a normal trip. not designed for black-start capability (the abil- Plant trips are a significant transient on plant ity to start up without external power). equipment, and some maintenance may be neces- sary before the plant can restart. When combined u Plant buses must be energized from the with the infrequent event of loss of offsite power, switchyard and the emergency diesel genera- additional recovery actions will be required. tors restored to standby mode. Safety systems, such as emergency diesel genera- u Normal plant equipment, such as reactor cool- tors and safety-related decay heat removal sys- ant pumps and circulating water pumps, must tems, must be restored to normal lineups. These be restarted. additional actions would extend the time neces- sary to restart a nuclear plant from this type of u A reactor trip review report must be completed event. and approved by plant management, and the cause of the trip must be addressed. Summary of U.S. Nuclear Power Plant u All plant technical specifications must be satis- Response to and Safety During the fied. Technical specifications are issued to each August 14 Outage nuclear power plant as part of their license by the NRC. They dictate equipment which must The NWG’s review did not identify any activity or be operable and process parameters which must equipment issues at U.S. nuclear power plants be met to allow operation of the reactor. Exam- that caused the transient on August 14, 2003. Nine ples of actions that were required following the nuclear power plants tripped within about 60 sec- events of August 14 include refilling the diesel onds as a result of the grid disturbance. Addi- fuel oil storage tanks, refilling the condensate tionally, many nuclear power plants experienced storage tanks, establishing reactor coolant sys- a transient due to this grid disturbance. tem forced flow, and cooling the suppression pool to normal operating limits. Surveillance Nuclear Power Plants That Tripped tests must be completed as required by techni- The trips at nine nuclear power plants resulted cal specifications (for example, operability of from the plant responses to the grid disturbances. the low-range neutron detectors must be Following the initial grid disturbances, voltages in demonstrated). the plant switchyard fluctuated and reactive u Systems must be aligned to support the startup. power flows fluctuated. As the voltage regulators on the main generators attempted to compensate, u Pressures and temperatures for reactor startup equipment limits were exceeded and protective must be established in the reactor coolant sys- trips resulted. This happened at Fermi 2 and Oys- tem for pressurized water reactors. ter Creek. Fermi 2 tripped on a generator field pro- tection trip. Oyster Creek tripped due to a u A reactor criticality calculation must be per- generator trip on high ratio of voltage relative to formed to predict the control rod withdrawals the electrical frequency. needed to achieve criticality, where the fission chain reaction becomes self-sustaining due to Also, as the balance between electrical generation the increased neutron flux. Certain neutron- and electrical load on the grid was disturbed, the absorbing fission products increase in concen- electrical frequency began to fluctuate. In some tration following a reactor trip (followed later cases the electrical frequency dropped low by a decrease or decay). At pressurized water enough to actuate protective features. This hap- reactors, the boron concentration in the primary pened at Indian Point 2, Indian Point 3, and Perry. coolant must be adjusted to match the criticality Perry tripped due to a generator under-frequency calculation. Near the end of the fuel cycle, the trip signal. Indian Point 2 and Indian Point 3 trip- nuclear power plant may not have enough ped when the grid frequency dropped low enough boron adjustment or control rod worth available to trip reactor coolant pumps, which actuated a for restart until the neutron absorbers have reactor protective feature.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 115 In other cases, the electrical frequency fluctuated at least one safety bus at about 01:53 EDT on and went higher than normal. Turbine control sys- August 15. The following equipment problems tems responded in an attempt to control the fre- were noted: the Combustion Turbine Generator quency. Equipment limits were exceeded as a (the alternate AC power source) failed to start from result of the reaction of the turbine control sys- the control room; however, it was successfully tems to large frequency changes. This led to trips started locally. In addition, the Spent Fuel Pool at FitzPatrick, Nine Mile 1, Nine Mile 2, and Cooling System was interrupted for approxi- Ginna. FitzPatrick and Nine Mile 2 tripped on low mately 26 hours and reached a maximum temper- pressure in the turbine hydraulic control oil sys- ature of 130 degrees Fahrenheit (55 degrees tem. Nine Mile 1 tripped on turbine light load pro- Celsius). The main generator was reconnected to tection. Ginna tripped due to conditions in the the grid at about 01:41 EDT on August 20. reactor following rapid closure of the turbine con- trol valves in response to high frequency on the FitzPatrick. FitzPatrick is located about 8 miles grid. (13 km) northeast of Oswego, NY, in northern New York on Lake Ontario. It was generating about 850 The Perry, Fermi 2, Oyster Creek, and Nine Mile 1 MWe before the event. The reactor tripped due to reactors tripped immediately after the generator low pressure in the hydraulic system that controls tripped, although that is not apparent from the the turbine control valves. Low pressure in this times below, because the clocks were not synchro- system typically indicates a large load reject, for nized to the national time standard. The Indian which a reactor trip is expected. In this case the Point 2 and 3, FitzPatrick, Ginna, and Nine Mile 2 pressure in the system was low because the con- reactors tripped before the generators. When the trol system was rapidly manipulating the turbine reactor trips first, there is generally a short time control valves to control turbine speed, which was delay before the generator output breakers open. being affected by grid frequency fluctuations. The electrical generation decreases rapidly to zero after the reactor trip. Table 8.1 provides the times Immediately preceding the trip, both significant from the data collected for the reactor trip times, over-voltage and under-voltage grid conditions and the time the generator output breakers opened were experienced. Offsite power was subse- (generator trip), as reported by the ESWG. Addi- quently lost to the plant auxiliary buses. The tional details on the plants that tripped are given safety buses were deenergized and automatically below, and summarized in Table 8.2 on page 120. reenergized from the emergency diesel generators. Fermi 2. Fermi 2 is located 25 miles (40 km) north- The lowest emergency declaration, an Unusual east of Toledo, Ohio, in southern Michigan on Event, was declared at about 16:26 EDT due to the Lake Erie. It was generating about 1,130 mega- loss of offsite power. Decay heat removal systems watts-electric (MWe) before the event. The reactor maintained the cooling function for the reactor tripped due to a turbine trip. The turbine trip was fuel. Offsite power was restored to at least one likely the result of multiple generator field protec- safety bus at about 23:07 EDT on August 14. The tion trips (overexcitation and loss of field) as the main generator was reconnected to the grid at Fermi 2 generator responded to a series of rapidly about 06:10 EDT on August 18. changing transients prior to its loss. This is consis- tent with data that shows large swings of the Fermi Table 8.1. U.S. Nuclear Plant Trip Times a b 2 generator MVAr prior to its trip. Nuclear Plant Reactor Trip Generator Trip Perry ...... 16:10:25 EDT 16:10:42 EDT Offsite power was subsequently lost to the plant Fermi 2 ...... 16:10:53 EDT 16:10:53 EDT auxiliary buses. The safety buses were de- Oyster Creek . . . 16:10:58 EDT 16:10:57 EDT energized and automatically reenergized from the Nine Mile 1 . . . . 16:11 EDT 16:11:04 EDT emergency diesel generators. The operators trip- Indian Point 2 . . 16:11 EDT 16:11:09 EDT ped one emergency diesel generator that was par- alleled to the grid for testing, after which it Indian Point 3 . . 16:11 EDT 16:11:23 EDT automatically loaded. Decay heat removal systems FitzPatrick . . . . . 16:11:04 EDT 16:11:32 EDT maintained the cooling function for the reactor Ginna...... 16:11:36 EDT 16:12:17 EDT fuel. Nine Mile 2 . . . . 16:11:48 EDT 16:11:52 EDT aAs determined from licensee data (which may not be syn- The lowest emergency declaration, an Unusual chronized to the national time standard). Event, was declared at about 16:22 EDT due to the bAs reported by the Electrical System Working Group (syn- loss of offsite power. Offsite power was restored to chronized to the national time standard).

116 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G Ginna. Ginna is located 20 miles (32 km) north- reactor protection signal tripped the reactor, east of Rochester, NY, in northern New York on which resulted in turbine and generator trips. Lake Ontario. It was generating about 487 MWe before the event. The reactor tripped due to Over- The auxiliary bus experienced the under- Temperature-Delta-Temperature. This trip signal frequency due to fluctuating grid conditions. protects the reactor core from exceeding tempera- Offsite power was lost to all the plant auxiliary ture limits. The turbine control valves closed buses. The safety buses were reenergized from the down in response to the changing grid conditions. emergency diesel generators. Decay heat removal This caused a temperature and pressure transient systems maintained the cooling function for the in the reactor, resulting in an Over-Temperature- reactor fuel. Delta-Temperature trip. The lowest emergency declaration, an Unusual Offsite power was not lost to the plant auxiliary Event, was declared at about 16:25 EDT due to the buses. In the operators’ judgement, offsite power loss of offsite power for more than 15 minutes. was not stable, so they conservatively energized Offsite power was restored to at least one safety the safety buses from the emergency diesel genera- bus at about 20:02 EDT on August 14. The follow- tors. Decay heat removal systems maintained the ing equipment problems were noted: the service cooling function for the reactor fuel. Offsite power water to one of the emergency diesel generators was not lost, and stabilized about 50 minutes after developed a leak; a steam generator atmospheric the reactor trip. dump valve did not control steam generator pres- sure in automatic and had to be shifted to manual; The lowest emergency declaration, an Unusual a steam trap associated with the turbine-driven Event, was declared at about 16:46 EDT due to the AFW pump failed open, resulting in operators degraded offsite power. Offsite power was securing the turbine after 2.5 hours; loss of instru- restored to at least one safety bus at about 21:08 ment air required operators to take manual control EDT on August 14. The following equipment of charging and a letdown isolation occurred; and problems were noted: the digital feedwater control operators in the field could not use radios; and the system behaved in an unexpected manner follow- diesel generator for the Unit 2 Technical Support ing the trip, resulting in high steam generator lev- Center failed to function. Also, several uninter- els; there was a loss of RCP seal flow indication, ruptible power supplies in the Emergency Opera- which complicated restarting the pumps; and at tions Facility failed. This reduced the capability least one of the power-operated relief valves expe- for communications and data collection. Alternate rienced minor leakage following proper operation equipment was used to maintain vital communi- and closure during the transient. Also, one of the cations.1 The main generator was reconnected to motor-driven auxiliary feedwater pumps was the grid at about 12:58 EDT on August 17. damaged after running with low flow conditions due to an improper valve alignment. The redun- Indian Point 3. Indian Point 3 is located 24 miles dant pumps supplied the required water flow. (39 km) north of New York City on the Hudson River. It was generating about 1,010 MWe before The NRC issued a Notice of Enforcement Discre- the event. The reactor tripped due to loss of a reac- tion to allow Ginna to perform mode changes and tor coolant pump that tripped because the auxil- restart the reactor with one auxiliary feedwater iary bus frequency fluctuations actuated the (AFW) pump inoperable. Ginna has two AFW under-frequency relay, which protects against pumps, one turbine-driven AFW pump, and two inadequate coolant flow through the reactor core. standby AFW pumps, all powered from safety- This reactor protection signal tripped the reactor, related buses. The main generator was recon- which resulted in turbine and generator trips. nected to the grid at about 20:38 EDT on August 17. The auxiliary bus experienced the under- frequency due to fluctuating grid conditions. Indian Point 2. Indian Point 2 is located 24 miles Offsite power was lost to all the plant auxiliary (39 km) north of New York City on the Hudson buses. The safety buses were reenergized from the River. It was generating about 990 MWe before the emergency diesel generators. Decay heat removal event. The reactor tripped due to loss of a reactor systems maintained the cooling function for the coolant pump that tripped because the auxiliary reactor fuel. bus frequency fluctuations actuated the under- frequency relay, which protects against inade- The lowest emergency declaration, an Unusual quate coolant flow through the reactor core. This Event, was declared at about 16:23 EDT due to the

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 117 loss of offsite power for more than 15 minutes. several hours due to lack of procedure guidance to Offsite power was restored to at least one safety address particular plant conditions encountered bus at about 20:12 EDT on August 14. The follow- during the shutdown. The main generator was ing equipment problems were noted: a steam gen- reconnected to the grid at about 02:08 EDT on erator safety valve lifted below its desired setpoint August 18. and was gagged; loss of instrument air, including Nine Mile 2. Nine Mile 2 is located 6 miles (10 km) failure of the diesel backup compressor to start northeast of Oswego, NY, in northern New York and failure of the backup nitrogen system, on Lake Ontario. It was generating about 1,193 resulted in manual control of atmospheric dump MWe before the event. The reactor scrammed due valves and AFW pumps needing to be secured to to the actuation of pressure which prevent overfeeding the steam generators; a blown detected low pressure in the hydraulic system that fuse in a battery charger resulted in a longer bat- controls the turbine control valves. Low pressure tery discharge; a control rod drive mechanism in this system typically indicates a large load cable splice failed, and there were high resistance reject, for which a reactor trip is expected. In this readings on 345-kV breaker-1. These equipment case the pressure in the system was low because problems required correction prior to startup, the control system was rapidly manipulating the which delayed the startup. The diesel generator turbine control valves to control turbine speed, for the Unit 3 Technical Support Center failed to which was being affected by grid frequency function. Also, several uninterruptible power sup- fluctuations. plies in the Emergency Operations Facility failed. This reduced the capability for communications After the reactor tripped, several reactor level con- and data collection. Alternate equipment was trol valves did not reposition, and with the main used to maintain vital communications.2 The feedwater system continuing to operate, a high main generator was reconnected to the grid at water level in the reactor caused a turbine trip, about 05:03 EDT on August 22. which caused a generator trip. Offsite power was degraded but available to the plant auxiliary Nine Mile 1. Nine Mile 1 is located 6 miles (10 km) buses. The offsite power dropped below the nor- northeast of Oswego, NY, in northern New York mal voltage levels, which resulted in the safety on Lake Ontario. It was generating about 600 MWe buses being automatically energized from the before the event. The reactor tripped in response emergency diesel generators. Decay heat removal to a turbine trip. The turbine tripped on light load systems maintained the cooling function for the protection (which protects the turbine against a reactor fuel. loss of electrical load), when responding to fluctu- ating grid conditions. The turbine trip caused fast The lowest emergency declaration, an Unusual closure of the turbine valves, which, through Event, was declared at about 17:00 EDT due to the acceleration relays on the control valves, create a loss of offsite power to the safety buses for more signal to trip the reactor. After a time delay of 10 than 15 minutes. Offsite power was restored to at seconds, the generator tripped on reverse power. least one safety bus at about 01:33 EDT on August 15. The following additional equipment problem The safety buses were automatically deenergized was noted: a tap changer on one of the offsite due to low voltage and automatically reenergized power transformers failed, complicating the resto- from the emergency diesel generators. Decay heat ration of one division of offsite power. The main removal systems maintained the cooling function generator was reconnected to the grid at about for the reactor fuel. 19:34 EDT on August 17. The lowest emergency declaration, an Unusual Oyster Creek. Oyster Creek is located 9 miles (14 Event, was declared at about 16:33 EDT due to the km) south of Toms River, NJ, near the Atlantic degraded offsite power. Offsite power was Ocean. It was generating about 629 MWe before restored to at least one safety bus at about 23:39 the event. The reactor tripped due to a turbine trip. EDT on August 14. The following additional The turbine trip was the result of a generator trip equipment problems were noted: a feedwater due to actuation of a high Volts/Hz protective trip. block valve failed “as is” on the loss of voltage, The Volts/Hz trip is a generator/transformer pro- resulting in a high reactor vessel level; fuses blew tective feature. The plant safety and auxiliary in fire circuits, causing control room ventilation buses transferred from the main generator supply isolation and fire panel alarms; and operators were to the offsite power supply following the plant delayed in placing shutdown cooling in service for trip. Other than the plant transient, no equipment

118 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G or performance problems were determined to be to the grid at about 23:15 EDT on August 21. After directly related to the grid problems. the plant restarted, a surveillance test indicated a problem with one emergency diesel generator.3 Post-trip the operators did not get the mode switch to shutdown before main steam header pressure Nuclear Power Plants With a Significant reached its isolation setpoint. The resulting MSIV Transient closure complicated the operator’s response The electrical disturbance on August 14 had a sig- because the normal steam path to the main con- nificant impact on seven plants that continued to denser was lost. The operators used the isolation remain connected to the grid. For this review, sig- condensers for decay heat removal. The plant nificant impact means that these plants had signif- safety and auxiliary buses remained energized icant load adjustments that resulted in bypassing from offsite power for the duration of the event, steam from the turbine generator, opening of relief and the emergency diesel generators were not valves, or requiring the onsite emergency diesel started. Decay heat removal systems maintained generators to automatically start due to low the cooling function for the reactor fuel. The main voltage. generator was reconnected to the grid at about 05:02 EDT on August 17. Nuclear Power Plants With a Non-Significant Transient Perry. Perry is located 7 miles (11 km) northeast of Painesville, OH, in northern Ohio on Lake Erie. It Sixty-four nuclear power plants experienced was generating about 1,275 MWe before the event. non-significant transients caused by minor distur- The reactor tripped due to a turbine control valve bances on the electrical grid. These plants were fast closure trip signal. The turbine control valve able to respond to the disturbances through nor- fast closure trip signal was due to a generator mal control systems. Examples of these transients under-frequency trip signal that tripped the gener- included changes in load of a few megawatts or ator and the turbine and was triggered by grid fre- changes in frequency of a few-tenths Hz. quency fluctuations. Plant operators noted voltage Nuclear Power Plants With No Transient fluctuations and spikes on the main transformer, and the Generator Out-of-Step Supervisory relay Twenty-four nuclear power plants experienced no actuated approximately 30 minutes before the transient and saw essentially no disturbances on trip. This supervisory relay senses a ground fault the grid, or were shut down at the time of the on the grid. The purpose is to prevent a remote transient. fault on the grid from causing a generator out-of- step relay to activate, which would result in a gen- General Observations Based on the Facts erator trip. Approximately 30 seconds prior to the Found During Phase One trip operators noted a number of spikes on the gen- The NWG found no evidence that the shutdown of erator field volt meter, which subsequently went U.S. nuclear power plants triggered the outage or offscale high. The MVAr and MW meters likewise inappropriately contributed to its spread (i.e., to went offscale high. an extent beyond the normal tripping of the plants The safety buses were deenergized and automati- at expected conditions). This review did not iden- cally reenergized from the emergency diesel gen- tify any activity or equipment issues that appeared erators. Decay heat removal systems maintained to start the transient on August 14, 2003. All nine the cooling function for the reactor fuel. The fol- plants that experienced a reactor trip were lowing equipment problems were noted: a steam responding to grid conditions. The severity of the bypass valve opened; a reactor water clean-up sys- transient caused generators, turbines, or reactor tem pump tripped; the off-gas system isolated, and systems to reach a protective feature limit and a keep-fill pump was found to be air-bound, actuate a plant shutdown. requiring venting and filling before the residual All nine plants tripped in response to those condi- heat removal system loop A and the low pressure tions in a manner consistent with the plant core spray system could be restored to service. designs. All nine plants safely shut down. All The lowest emergency declaration, an Unusual safety functions were effectively accomplished, Event, was declared at about 16:20 EDT due to the with few problems, and the plants were main- loss of offsite power. Offsite power was restored to tained in a safe shutdown condition until their at least one safety bus at about 18:13 EDT on restart. Fermi 2, Nine Mile 1, Oyster Creek, and August 14. The main generator was reconnected Perry tripped on turbine and generator protective

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 119 features. FitzPatrick, Ginna, Indian Point 2 and 3, August 22. Indian Point 3 had equipment issues and Nine Mile 2 tripped on reactor protective (failed splices in the control rod drive mechanism features. power system) that required repair prior to restart. Ginna submitted a special request for enforcement Nine plants used their emergency diesel genera- discretion from the NRC to permit mode changes tors to power their safety-related buses during the and restart with an inoperable auxiliary feedwater power outage. Offsite power was restored to the pump. The NRC granted the request for enforce- safety buses after the grid was energized and the ment discretion. plant operators, in consultation with the transmis- sion system operators, decided the grid was stable. Although the Oyster Creek plant tripped, offsite Conclusions of the U.S. Nuclear power was never lost to their safety buses and the Working Group emergency diesel generators did not start and As discussed above, the investigation of the U.S. were not required. Another plant, Davis-Besse, nuclear power plant responses during the was already shut down but lost power to the safety blackout found no significant deficiencies. buses. The emergency diesel generators started Accordingly, there are no recommendations here and provided power to the safety buses as concerning U.S. nuclear power plants. Some areas designed. for consideration on a grid-wide basis were dis- For the eight remaining tripped plants and cussed and forwarded to the Electric System Davis-Besse (which was already shut down prior Working Group for their review. to the events of August 14), offsite power was On August 14, 2003, nine U.S. nuclear power restored to at least one safety bus after a period of plants tripped as a result of the loss of offsite time ranging from about 2 hours to about 14 hours, power. Nuclear power plants are designed to cope with an average time of about 7 hours. Although with the loss of offsite power (LOOP) through the Ginna did not lose offsite power, the operators use of emergency power supplies (primarily judged offsite power to be unstable and realigned on-site diesel generators). The safety function of the safety buses to the emergency diesel most concern during a LOOP is the removal of generators. heat from the reactor core. Although the control The licensees’ return to power operation follows a rods have been inserted to stop the fission process, deliberate process controlled by plant procedures the continuing decay of radioactive isotopes in the and NRC regulations. Ginna, Indian Point 2, Nine reactor core produces a significant amount of heat Mile 2, and Oyster Creek resumed electrical gener- for many weeks. If this decay heat is not removed, ation on August 17. FitzPatrick and Nine Mile 1 it will cause fuel damage and the release of highly resumed electrical generation on August 18. Fermi radioactive isotopes from the reactor core. The 2 resumed electrical generation on August 20. failure of the alternating current emergency power Perry resumed electrical generation on August 21. supplies in conjunction with a LOOP is known Indian Point 3 resumed electrical generation on as a station blackout. Failures of the emergency

Table 8.2. Summary of Events for U. S. Nuclear Power Plants Operating Status at Time of Event Response to Event Reactor and Emergency Nuclear Plant Unit Full Power Not Operating Turbine Trip Diesels used Davis-Besse (near Toledo, OH) ...... 1 √√ Fermi (near Toledo, OH)...... 2 √√√ James A. FitzPatrick (near Oswego, NY) . . 1 √√√ Ginna (near Rochester, NY) ...... 1 √√√ Indian Point (near New York City, NY) . . . . 2 √√√ 3 √√√ Nine Mile Point (near Oswego, NY) ...... 1 √√√ 2 √√√ Oyster Creek (near Toms River, NJ) . . . . . 1 √√ Perry (near Painesville, OH) ...... 1 √√√

120 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G power supplies would seriously hinder the ability of the event either automatically shut down as a of the plant operators to carry out the required result of the grid disturbance or automatically safety functions. Nuclear plants can cope with a reduced power while waiting for the grid to be station blackout for a limited time without suffer- reestablished. In addition, the Point Lepreau ing fuel damage. However, recovery of the grid or Nuclear Generating Station in New Brunswick the restoration of an emergency power supply is was forced to reduce electricity production for a needed for long-term decay heat removal. For this short period. reason, the NRC considers LOOP events to be potential precursors to more serious situations. The Canadian NWG (CNWG) was mandated to: The risk of reactor core damage increases as the review the sequence of events for each Canadian LOOP frequency or duration increases. nuclear plant; determine whether any events caused or contributed to the power system outage; Offsite power is considered the preferred power evaluate any potential safety issues arising as a source for responding to all off-normal events or result of the event; evaluate the effect on safety accidents. However, if the grid is operated in a and the reliability of the grid of design features, stressed configuration, the loss of the nuclear operating procedures, and regulatory require- plant generation may result in grid voltage drop- ments at Canadian nuclear power plants; and ping below the level needed for the plant safety assess the impact of associated regulator perfor- loads. In that case, each plant is designed such mance and regulatory decisions. that voltage relays will automatically disconnect the plant safety-related electrical buses from the In Ontario, 11 nuclear units were operating and grid and reenergize them from the emergency die- delivering power to the grid at the time of the grid sel generators (EDGs). Although the resultant disturbance: 4 at Bruce B, 4 at Darlington, and 3 at safety system responses have been analyzed and Pickering B. Of the 11 reactors, 7 shut down as a found acceptable, the loss of offsite power reduces result of the event (1 at Bruce B, 3 at Darlington, the plant’s safety margin. It also increases the risk and 3 at Pickering B). Four reactors (3 at Bruce B associated with failures of the EDGs. For these rea- and 1 at Darlington) disconnected safely from the sons, the NRC periodically assesses the impact of grid but were able to avoid shutting down and grid reliability on overall nuclear plant safety. were available to supply power to the Ontario grid as soon as reconnection was enabled by Ontario’s The NRC monitors grid reliability under its nor- Independent Market Operator (IMO). mal monitoring programs, such as the operating experience program, and has previously issued New Brunswick Power’s Point Lepreau Generating reports related to grid reliability. The NRC is con- Station responded to the loss of grid event by cut- tinuing with an internal review of the reliability of ting power to 460 MW, returning to fully stable the electrical grid and the effect on the risk profile conditions at 16:35 EDT, within 25 minutes of the for nuclear power plants. The NRC will consider event. Hydro Québec’s (HQ) grid was not affected the implications of the August 14, 2003, Northeast by the power system outage, and HQ’s Gentilly-2 blackout under the NRC’s regulations. The NRC nuclear station continued to operate normally. is conducting an internal review of its station Having reviewed the operating data for each plant blackout rule, and the results of the August 14th and the responses of the power stations and their event will be factored into that review. If there are staff to the event, the CNWG concludes the additional findings, the NRC will address them following: through the NRC’s normal process. u None of the reactor operators had any advanced warning of impending collapse of the grid. Findings of the Canadian Nuclear ã Trend data obtained indicate stable condi- Working Group tions until a few minutes before the event. Summary ã There were no prior warnings from Ontario’s IMO. On the afternoon of August 14, 2003, southern Ontario, along with the northeastern United u Canadian nuclear power plants did not trigger States, experienced a widespread electrical power the power system outage or contribute to its system outage. Eleven nuclear power plants in spread. Rather they responded, as anticipated, Ontario operating at high power levels at the time in order to protect equipment and systems from

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 121 the grid disturbances. Plant data confirm the rapid reductions in reactor power. While following. greater flexibility with respect to use of adjuster ã At Bruce B and Pickering B, frequency and/or rods would not have prevented the shutdown, voltage fluctuations on the grid resulted in some units, particularly those at Darlington, the automatic disconnection of generators might have been able to return to service less from the grid. For those units that were suc- than 1 hour after the initiating event. cessful in maintaining the unit generators u Off-site power was unavailable for varying peri- operational, reactor power was automatically ods of time, from approximately 3 hours at reduced. Bruce B to approximately 9 hours at Pickering A. Despite the high priority assigned by the IMO ã At Darlington, load swing on the grid led to to restoring power to the nuclear stations, the the automatic reduction in power of the four stations had some difficulty in obtaining timely reactors. The generators were, in turn, auto- information about the status of grid recovery matically disconnected from the grid. and the restoration of Class IV power. This ã Three reactors at Bruce B and one at Darling- information is important for Ontario Power ton were returned to 60% power. These reac- Generation’s and Bruce Power’s response tors were available to deliver power to the strategy. grid on the instructions of the IMO. u Required regulatory approvals from CNSC staff ã Three units at Darlington were placed in a were obtained quickly and did not delay the zero-power hot state, and four units at restart of the units; however, CNSC staff was Pickering B and one unit at Bruce B were unable to immediately activate the CNSC’s placed in a guaranteed shutdown state. Emergency Operation Centre because of loss of u There were no risks to health and safety of power to the CNSC’s head office building. workers or the public as a result of the shut- CNSC staff, therefore, established communica- down of the reactors. tions with licensees and the U.S. NRC from other locations. ã Turbine, generator, and reactor automatic safety systems worked as designed to Introduction respond to the loss of grid. The primary focus of the CNWG during Phase I ã Station operating staff and management fol- was to address nuclear power plant response rele- lowed approved Operating Policies & Princi- vant to the power outage of August 14, 2003. Data ples (OP&Ps) in responding to the loss of grid. were collected from each power plant and ana- At all times, operators and shift supervisors lyzed in order to determine: the cause of the power made appropriately conservative decisions in outage; whether any activities at these plants favor of protecting health and safety. caused or contributed to the power outage; and The CNWG commends the staff of Ontario Power whether there were any significant safety issues. Generation and Bruce Power for their response to In order to obtain reliable and comparable infor- the power system outage. At all times, staff acted mation and data from each nuclear power plant, a in accordance with established OP&Ps, and took questionnaire was developed to help pinpoint an appropriately conservative approach to how each nuclear power plant responded to the decisions. August 14 grid transients. Where appropriate, additional information was obtained from the During the course of its review, the CNWG also ESWG and SWG. identified the following secondary issues: u The operating data from each plant were com- Equipment problems and design limitations at pared against the plant design specifications to Pickering B resulted in a temporary reduction in determine whether the plants responded as the effectiveness of some of the multiple safety expected. Based on initial plant responses to the barriers, although the equipment failure was questionnaire, supplemental questions were within the unavailability targets found in the developed, as required, to further clarify outstand- OP&Ps approved by the CNSC as part of Ontario ing matters. Supplementary information on the Power Generation’s licence. design features of Ontario’s nuclear power plants u Existing OP&Ps place constraints on the use of was also provided by Ontario Power Generation adjuster rods to respond to events involving and Bruce Power. The CNWG also consulted a

122 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G number of subject area specialists, including maintain the reactor at 60% full power if the gen- CNSC staff, to validate the responses to the ques- erator becomes disconnected from the grid and to tionnaire and to ensure consistency in their maintain this “readiness” condition if necessary interpretation. for days. Once reconnected to the grid, the unit In addition to the stakeholder consultations dis- can be loaded to 60% full power within several cussed in the Introduction to this chapter, CNSC minutes and can achieve full power within 24 staff met with officials from Ontario’s Independ- hours. ent Market Operator on January 7, 2004. As with other nuclear reactors, CANDU reactors normally operate continuously at full power Typical Design, Operational, and except when shut down for maintenance and Protective Features of CANDU Nuclear inspections. As such, while they provide a stable Power Plants source of baseload power generation, they cannot provide significant additional power in response There are 22 CANDU nuclear power reactors in to sudden increases in demand. CANDU power Canada—20 located in Ontario at 5 multi-unit sta- plants are not designed for black-start operation; tions (Pickering A and Pickering B located in that is, they are not designed to start up in the Pickering, Darlington located in the Municipality absence of power from the grid. of Clarington, and Bruce A and Bruce B located near Kincardine). There are also single-unit Electrical Distribution Systems CANDU stations at Bécancour, Québec (Gentilly- 2), and Point Lepreau, New Brunswick. The electrical distribution systems at nuclear power plants are designed to satisfy the high In contrast to the pressurized water reactors used safety and reliability requirements for nuclear sys- in the United States, which use enriched uranium tems. This is achieved through flexible bus fuel and a light water coolant-moderator, all arrangements, high capacity standby power gener- housed in a single, large pressure vessel, a CANDU ation, and ample redundancy in equipment. reactor uses fuel fabricated from natural uranium, with heavy water as the coolant and moderator. Where continuous power is required, power is The fuel and pressurized heavy water coolant are supplied either from batteries (for continuous DC contained in 380 to 480 pressure tubes housed in a power, Class I) or via inverters (for continuous AC calandria containing the heavy water moderator power, Class II). AC supply for safety-related under low pressure. Heat generated by the fuel is equipment, which can withstand short interrup- removed by heavy water coolant that flows tion (on the order of 5 minutes), is provided by through the pressure tubes and is then circulated Class III power. Class III power is nominally sup- to the boilers to produce steam from demineral- plied through Class IV; when Class IV becomes ized water. unavailable, standby generators are started auto- matically, and the safety-related loads are picked While the use of natural uranium fuel offers up within 5 minutes of the loss of Class IV power. important benefits from the perspectives of safe- guards and operating economics, one drawback is The Class IV power is an AC supply to reactor that it restricts the ability of a CANDU reactor to equipment and systems that can withstand longer recover from a large power reduction. In particu- interruptions in power. Class IV power can be sup- lar, the lower reactivity of natural uranium fuel plied either from the generator through a trans- means that CANDU reactors are designed with a former or from the grid by another transformer. small number of control rods (called “adjuster Class IV power is not required for reactors to shut rods”) that are only capable of accommodating down safely. power reductions to 60%. The consequence of a In addition to the four classes of power described larger power reduction is that the reactor will “poi- above, there is an additional source of power son out” and cannot be made critical for up to 2 known as the Emergency Power System (EPS). days following a power reduction. By comparison, EPS is a separate power system consisting of its the use of enriched fuel enables a typical pressur- own on-site power generation and AC and DC dis- ized water reactor to operate with a large number tribution systems whose normal supply is from of control rods that can be withdrawn to accom- the Class III power system. The purpose of the EPS modate power reductions to zero power. system is to provide power to selected safety- A unique feature of some CANDU plants— related loads following common mode incidents, namely, Bruce B and Darlington—is a capability to such as seismic events.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 123 Protective Features of CANDU Nuclear Power of the Class III powered shutdown cooling system; Plants heat removal to the shutdown cooling heat exchangers is by means of the Class III powered CANDU reactors typically have two separate, components of the Service Water System. independent and diverse systems to shut down the reactor in the event of an accident or transients in the grid. Shutdown System 1 (SDS1) consists of CANDU Reactor Response to a large number of cadmium rods that drop into the Loss-of-Grid Event core to decrease the power level by absorbing neu- Response to Loss of Grid trons. Shutdown System 2 (SDS2) consists of high-pressure injection of gadolinium nitrate into In the event of disconnection from the grid, power the low-pressure moderator to decrease the power to shut down the reactor safely and maintain level by absorbing neutrons. Although Pickering A essential systems will be supplied from batteries does not have a fully independent SDS2, it does and standby generators. The specific response of a have a second shutdown mechanism, namely, the reactor to disconnection from the grid will depend fast drain of the moderator out of the calandria; on the reactor design and the condition of the unit removal of the moderator significantly reduces the at the time of the event. rate of nuclear fission, which reduces reactor 60% Reactor Power: All CANDU reactors are power. Also, additional trip circuits and shutoff designed to operate at 60% of full power following rods have recently been added to Pickering A Unit the loss of off-site power. They can operate at this 4 (Shutdown System Enhancement, or SDS-E). level as long as demineralized water is available Both SDS1 and SDS2 are capable of reducing reac- for the boilers. At Darlington and Bruce B, steam tor power from 100% to about 2% within a few can be diverted to the condensers and recirculated seconds of trip initiation. to the boilers. At Pickering A and Pickering B, Fuel Heat Removal Features of CANDU excess steam is vented to the atmosphere, thereby Nuclear Power Plants limiting the operating time to the available inven- tory of demineralized water. Following the loss of Class IV power and shut- down of the reactor through action of SDS1 and/or 0% Reactor Power, Hot: The successful transition SDS2, significant heat will continue to be gener- from 100% to 60% power depends on several sys- ated in the reactor fuel from the decay of fission tems responding properly, and continued opera- products. The CANDU design philosophy is to tion is not guaranteed. The reactor may shut down provide defense in depth in the heat removal automatically through the operation of the process systems. control systems or through the action of either of the shutdown systems. Immediately following the trip and prior to resto- ration of Class III power, heat will be removed Should a reactor shutdown occur following a load from the reactor core by natural circulation of rejection, both Class IV power supplies (from the coolant through the Heat System main generator and the grid) to that unit will become circuit following rundown of the main Heat Trans- unavailable. The main Heat Transport pumps port pumps (first by thermosyphoning and later by will trip, leading to a loss of forced circulation of intermittent buoyancy induced flow). Heat will be coolant through the core. Decay heat will be con- rejected from the secondary side of the steam gen- tinuously removed through natural circulation erators through the atmospheric steam discharge (thermosyphoning) to the boilers, and steam pro- valves. This mode of operation can be sustained duced in the boilers will be exhausted to the for many days with additional feedwater supplied atmosphere via atmospheric steam discharge to the steam generators via the Class III powered valves. The Heat Transport System will be main- auxiliary steam generator feed pump(s). tained at around 250 to 265 degrees Celsius during thermosyphoning. Standby generators will start In the event that the auxiliary feedwater system automatically and restore Class III power to key becomes unavailable, there are two alternate EPS safety-related systems. Forced circulation in the powered water supplies to steam generators, Heat Transport System will be restored once namely, the Steam Generator Emergency Coolant either Class III or Class IV power is available. System and the Emergency Service Water System. Finally, a separate and independent means of When shut down, the natural decay of fission cooling the fuel is by forced circulation by means products will lead to the temporary buildup of

124 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G neutron absorbing elements in the fuel. If the reac- Summary of Canadian Nuclear Power tor is not quickly restarted to reverse this natural Plant Response to and Safety During the process, it will “poison-out.” Once poisoned-out, August 14 Outage the reactor cannot return to operation until the fis- sion products have further decayed, a process On the afternoon of August 14, 2003, 15 Canadian which typically takes up to 2 days. nuclear units were operating: 13 in Ontario, 1 in Québec, and 1 in New Brunswick. Of the 13 Overpoisoned Guaranteed Shutdown State: In Ontario reactors that were critical at the time of the event that certain problems are identified the event, 11 were operating at or near full power when reviewing the state of the reactor after a sig- and 2 at low power (Pickering B Unit 7 and nificant transient, the operating staff will cool Pickering A Unit 4). All 13 of the Ontario reactors down and depressurize the reactor, then place it in disconnected from the grid as a result of the grid an overpoisoned guaranteed shutdown state (GSS) disturbance. Seven of the 11 reactors operating at through the dissolution of gadolinium nitrate into high power shut down, while the remaining 4 the moderator. Maintenance will then be initiated operated in a planned manner that enabled them to correct the problem. to remain available to reconnect to the grid at the request of Ontario’s IMO. Of the 2 Ontario reactors Return to Service Following Loss of Grid operating at low power, Pickering A Unit 4 tripped The return to service of a unit following any one of automatically, and Pickering B Unit 7 was tripped the above responses to a loss-of-grid event is dis- manually and shut down. In addition, a transient cussed below. It is important to note that the was experienced at New Brunswick Power’s Point descriptions provided relate to operations on a Lepreau Nuclear Generating Station, resulting in a single unit. At multi-unit stations, the return to reduction in power. Hydro Québec’s Gentilly-2 service of several units cannot always proceed in nuclear station continued to operate normally as parallel, due to constraints on labor availability the Hydro Québec grid was not affected by the grid and the need to focus on critical evolutions, such disturbance. as taking the reactor from a subcritical to a critical Nuclear Power Plants With Significant state. Transients 60% Reactor Power: In this state, the unit can be Pickering Nuclear Generating Station. The resynchronized consistent with system demand, Pickering Nuclear Generating Station (PNGS) is and power can be increased gradually to full located in Pickering, Ontario, on the shores of power over approximately 24 hours. Lake Ontario, 19 miles (30 km) east of Toronto. It houses 8 nuclear reactors, each capable of deliver- 0% Reactor Power, Hot: In this state, after approx- ing 515 MW to the grid. Three of the 4 units at imately 2 days for the poison-out, the turbine can Pickering A (Units 1 through 3) have been shut be run up and the unit synchronized. Thereafter, down since late 1997. Unit 4 was restarted earlier power can be increased to high power over the this year following a major refurbishment and was next day. This restart timeline does not include in the process of being commissioned at the time the time required for any repairs or maintenance of the event. At Pickering B, 3 units were operating that might have been necessary during the outage. at or near 100% prior to the event, and Unit 7 was being started up following a planned maintenance Overpoisoned Guaranteed Shutdown State: Plac- outage. ing the reactor in a GSS after it has been shut down requires approximately 2 days. Once the condi- Pickering A. As part of the commissioning process, tion that required entry to the GSS is rectified, the Unit 4 at Pickering A was operating at 12% power restart requires removal of the guarantee, removal in preparation for synchronization to the grid. The of the gadolinium nitrate through ion exchange reactor automatically tripped on SDS1 due to Heat process, heatup of the Heat Transport System, and Transport Low Coolant Flow, when the Heat finally synchronization to the grid. Approximately Transport main circulating pumps ran down fol- 4 days are required to complete these restart activ- lowing the Class IV power loss. The decision was ities. In total, 6 days from shutdown are required then made to return Unit 4 to the guaranteed shut- to return a unit to service from the GSS, and this down state. Unit 4 was synchronized to the grid on excludes any repairs that might have been August 20, 2003. Units 1, 2 and 3 were in lay-up required while in the GSS. mode.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 125 Pickering B. The Unit 5 Generator Excitation Sys- delayed because of low suction pressure supply- tem transferred to manual control due to large ing the Emergency High Pressure Service Water voltage oscillations on the grid at 16:10 EDT and pumps. Manual operator intervention was then tripped on Loss of Excitation about 1 second required to restore some pumps back to service. later (prior to grid frequency collapse). In response to the generator trip, Class IV buses transferred to Units were synchronized to the grid as follows: the system transformer and the reactor setback. Unit 8 on August 22, Unit 5 on August 23, Unit 6 The grid frequency collapse caused the System on August 25, and Unit 7 on August 29. Service Transformer to disconnect from the grid, Darlington Nuclear Generating Station. Four resulting in a total loss of Class IV power. The reactors are located at the Darlington Nuclear Gen- reactor consequently tripped on the SDS1 Low eration Station, which is on the shores of Lake Gross Flow parameter followed by an SDS2 trip Ontario in the Municipality of Clarington, 43 due to Low Core Differential Pressure. miles (70 km) east of Toronto. All four of the reac- The Unit 6 Generator Excitation System also tors are licensed to operate at 100% of full power, transferred to manual control at 16:10 EDT due to and each is capable of delivering approximately large voltage oscillations on the grid and the gen- 880 MW to the grid. erator remained connected to the grid in manual Unit 1 automatically stepped back to the 60% voltage control. Approximately 65 seconds into reactor power state upon load rejection at 16:12 the event, the grid under-frequency caused all the EDT. Approval by the shift supervisor to automati- Class IV buses to transfer to the Generator Service cally withdraw the adjuster rods could not be pro- Transformer. Ten seconds later, the generator sep- vided due to the brief period of time for the shift arated from the Grid. Five seconds later, the gener- supervisor to complete the verification of systems ator tripped on Loss of Excitation, which caused a as per procedure. The decreasing steam pressure total loss of Class IV power. The reactor conse- and turbine frequency then required the reactor to quently tripped on the SDS1 Low Gross Flow be manually tripped on SDS1, as per procedure for parameter, followed by an SDS2 trip due to Low loss of Class IV power. The trip occurred at 16:24 Core Differential Pressure. EDT, followed by a manual turbine trip due to Unit 7 was coming back from a planned mainte- under-frequency concerns. nance outage and was at 0.9% power at the time of Like Unit 1, Unit 2 automatically stepped back the event. The unit was manually tripped after upon load rejection at 16:12 EDT. As with Unit 1, loss of Class IV power, in accordance with proce- there was insufficient time for the shift supervisor dures and returned to guaranteed shutdown state. to complete the verification of systems, and faced Unit 8 reactor automatically set back on load rejec- with decreasing steam pressure and turbine fre- tion. The setback would normally have been ter- quency, the decision was made to shut down Unit minated at 20% power but continued to 2% power 2. Due to under-frequency on the main Primary because of the low boiler levels. The unit subse- Heat Transport pumps, the turbine was tripped quently tripped on the SDS1 Low Boiler Feedline manually which resulted in an SDS1 trip at 16:28 Pressure parameter due to a power mismatch EDT. between the reactor and the turbine. Unit 3 experienced a load rejection at 16:12 EDT, The following equipment problems were noted. At and during the stepback Unit 3 was able to sustain Pickering, the High Pressure Emergency Coolant operation with steam directed to the condensers. Injection System (HPECIS) pumps are designed to After system verifications were complete, approv- operate from a Class IV power supply. As a result al to place the adjuster rods on automatic was of the shutdown of all the operating units, the obtained in time to recover, at 59% reactor power. HPECIS at both Pickering A and Pickering B The unit was available to resynchronize to the became unavailable for 5.5 hours. (The design of grid. Pickering A and Pickering B HPECIS must be such Unit 4 experienced a load rejection at 16:12 EDT, that the fraction of time for which it is not avail- and required a manual SDS1 trip due to the loss of able can be demonstrated to be less than 10-3 Class II bus. This was followed by a manual tur- years—about 8 hours per year. This was the first bine trip. unavailability of the HPECIS for 2003.) In addi- tion, Emergency High Pressure Service Water The following equipment problems were noted: System restoration for all Pickering B units was Unit 4 Class II inverter trip on BUS A3 and

126 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G subsequent loss of critical loads prevented unit Units were synchronized to the grid as follows: recovery. The Unit 0 Emergency Power System Unit 8 at 19:14 EDT on August 14, 2003; Unit 5 at BUS B135 power was lost until the Class III power 21:04 EDT on August 14; and Unit 7 at 21:14 EDT was restored. (A planned battery bank B135 on August 14, 2003. Unit 6 was resynchronized at change out was in progress at the time of the 02:03 EDT on August 23, 2003, after maintenance blackout.) was conducted. Units were synchronized to the grid as follows: Point Lepreau Nuclear Generating Station. The Unit 3 at 22:00 EDT on August 14; Unit 2 on Point Lepreau nuclear station overlooks the Bay of August 17, 2003; Unit 1 on August 18, 2003; and Fundy on the Lepreau Peninsula, 25 miles (40 km) Unit 4 on August 18, 2003. southwest of Saint John, New Brunswick. Point Lepreau is a single-unit CANDU 6, designed for a Bruce Power. Eight reactors are located at Bruce gross output of 680 MW. It is owned and operated Power on the eastern shore of Lake Huron between by New Brunswick Power. Kincardine and Port Elgin, Ontario. Units 5 through 8 are capable of generating 840 MW each. Point Lepreau was operating at 91.5% of full Presently these reactors are operating at 90% of power (610 MWe) at the time of the event. When full power due to license conditions imposed by the event occurred, the unit responded to changes the CNSC. Units 1 through 4 have been shut down in grid frequency as per design. The net impact since December 31, 1997. At the time of the event, was a short-term drop in output by 140 MW, with work was being performed to return Units 3 and 4 reactor power remaining constant and excess ther- to service. mal energy being discharged via the unit steam discharge valves. During the 25 seconds of the Bruce A. Although these reactors were in guaran- event, the unit stabilizer operated numerous times teed shutdown state, they were manually tripped, to help dampen the turbine generator speed oscil- in accordance with operating procedures. SDS1 lations that were being introduced by the grid fre- was manually tripped on Units 3 and 4, as per pro- quency changes. Within 25 minutes of the event cedures for a loss of Class IV power event. SDS1 initiation, the turbine generator was reloaded to was re-poised on both units when the station 610 MW. Given the nature of the event that power supplies were stabilized. The emergency occurred, there were no unexpected observations transfer system functioned as per design, with the on the New Brunswick Power grid or at Point Class III standby generators picking up station Lepreau Generating Station throughout the ensu- electrical loads. The recently installed Qualified ing transient. Diesel Generators received a start signal and were available to pick up emergency loads if necessary. Nuclear Power Plants With No Transient Bruce B. Units 5, 6, 7, and 8 experienced initial Gentilly-2 Nuclear Station. Hydro Québec owns generation rejection and accompanying stepback and operates Gentilly-2 nuclear station, located on on all four reactor units. All generators separated the south shore of the St. Lawrence River opposite from the grid on under-frequency at 16:12 EDT. the city of Trois-Rivières, Québec. Gentilly-2 is Units 5, 7, and 8 maintained reactor power at 60% capable of delivering approximately 675 MW to of full power and were immediately available for Hydro Québec’s grid. The Hydro Québec grid was reconnection to the grid. not affected by the power system outage and Although initially surviving the loss of grid event, Gentilly-2 continued to operate normally. Unit 6 experienced an SDS1 trip on insufficient General Observations Based on the Facts Neutron Over Power (NOP) margin. This occurred while withdrawing Bank 3 of the adjusters in an Found During Phase I attempt to offset the xenon transient, resulting in a Following the review of the data provided by the loss of Class IV power. Canadian nuclear power plants, the CNWG con- The following equipment problems were noted: cludes the following: An adjuster rod on Unit 6 had been identified on u None of the reactor operators had any advanced August 13, 2003, as not working correctly. Unit 6 warning of impending collapse of the grid. experienced a High Pressure Recirculation Water line leak, and the Closed Loop Demineralized u Canadian nuclear power plants did not trigger Water loop lost inventory to the Emergency Water the power system outage or contribute to its Supply System. spread.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 127 u There were no risks to the health and safety of rapid reductions in reactor power. While workers or the public as a result of the concur- greater flexibility with respect to use of adjuster rent shutdown of several reactors. Automatic rods would not have prevented the shutdown, safety systems for the turbine generators and some units, particularly those at Darlington, reactors worked as designed. (See Table 8.3 for might have been able to return to service less a summary of shutdown events for Canadian than 1 hour after the initiating event. nuclear power plants.) u Off-site power was unavailable for varying peri- The CNWG also identified the following second- ods of time, from approximately 3 hours at ary issues: Bruce B to approximately 9 hours at Pickering A. Despite the high priority assigned by the IMO u Equipment problems and design limitations at to restoring power to the nuclear stations, the Pickering B resulted in a temporary reduction in stations had some difficulty obtaining timely the effectiveness of some of the multiple safety information about the status of grid recovery barriers, although the equipment failure was and the restoration of Class IV power. This within the unavailability targets found in the information is important for Ontario Power OP&Ps approved by the CNSC as part of Ontario Generation’s and Bruce Power’s response Power Generation’s license. strategy. u Existing OP&Ps place constraints on the use of u Required regulatory approvals from CNSC staff adjuster rods to respond to events involving were obtained quickly and did not delay the Table 8.3. Summary of Shutdown Events for Canadian Nuclear Power Plants Operating Status at Time of Event Response to Event Stepback to Reactor Trip 60% Power, Full Not Available To Turbine Generating Station Unit Power Startup Operating Supply Grid Trip SDS1 SDS2 Pickering NGS 1 √ (a) 2 √ 3 √ 4 √√(b) 5 √√√ 6 √√√ 7 √√ 8 √√ Darlington NGS 1 √√√ 2 √√√ 3 √√ 4 √√√ Bruce Nuclear Power 1 √ Development 2 √ 3 √√ 4 √√ 5 √√ 6 √√ 7 √√ 8 √√ aPickering A Unit 1 tripped as a result of electrical bus configuration immediately prior to the event which resulted in a temporary loss of Class II power. bPickering A Unit 4 also tripped on SDS-E. Notes: Unit 7 at Pickering B was operating at low power, warming up prior to reconnecting to the grid after a maintenance outage. Unit 4 at Pickering A was producing at low power, as part of the reactor’s commissioning after extensive refurbishment since being shut down in 1997.

128 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G restart of the units; however, CNSC staff was another concerning the use of adjuster rods during unable to immediately activate the CNSC’s future events involving the loss of off-site power. Emergency Operation Centre because of loss of These are presented in Chapter 10 along with the power to the CNSC’s head office building. Task Force’s recommendations on other subjects. CNSC staff, therefore, established communica- tions with licensees and the U.S. NRC from Despite some comments to the contrary, the other locations. CNWG’s investigation found that the time to restart the reactors was reasonable and in line Regulatory Activities Subsequent to the with design specifications for the reactors. There- Blackout fore, the CNWG made no recommendations for action on this matter. Comments were also made The actuation of emergency shutdown systems at regarding the adequacy of generation capacity in Bruce, Darlington and Pickering, and the impair- Ontario and the appropriate mix of technologies ment of the High Pressure Emergency Coolant for . This is a matter beyond Injection System (HPECIS) at Pickering are events the CNWG’s mandate, and it made no recommen- for which licensees need to file reports with the dations on this issue. Canadian Nuclear Safety Commission (CNSC), in accordance with Regulatory Standard S 99, “Reporting Requirements for Operating Nuclear Perspective of Power Plants.” Reports have been submitted by Ontario Power Generation (OPG) and Bruce Nuclear Regulatory Agencies Power, and are being followed up by staff from the on Potential Changes to the Grid CNSC as part of the CNSC’s normal regulatory process. This includes CNSC’s review and The NRC and the CNSC, under their respective approval, where appropriate, of any actions taken regulatory authorities, are entrusted with provid- or proposed to be taken to correct any problems in ing reasonable assurance of adequate protection of design, equipment or operating procedures identi- public health and safety. As the design and opera- fied by OPG and Bruce Power. tion of the electricity grid is taken into account when evaluating the safety analysis of nuclear As a result of further information about the event power plants, changes to the electricity grid must gathered by CNSC staff during followup inspec- be evaluated for the impact on plant safety. As the tions, the temporary impairment of the HPECIS at Task Force final recommendations result in Pickering has been rated by CNSC staff as Level 2 actions to affect changes, the NRC and the CNSC on the International Nuclear Event Scale, indicat- will assist by evaluating potential effects on the ing that there was a significant failure in safety safety of nuclear power plant operation. provisions, but with sufficient backup systems, or “defense-in-depth,” in place to cope with potential The NRC and the CNSC acknowledge that future malfunctions. Since August 2003, OPG has imple- improvements in grid reliability will involve coor- mented procedural and operational changes to dination among many groups. The NRC and the improve the performance of the safety systems at CNSC intend to maintain the good working rela- Pickering. tionships that have been developed during the Task Force investigation to ensure that we con- Conclusions of the Canadian Nuclear tinue to share experience and insights and work Working Group together to maintain an effective and reliable elec- tric supply system. As discussed above, Canadian nuclear power plants did not trigger the power system outage or Endnotes contribute to its spread. The CNWG therefore 1 Further details are available in the NRC Special Inspection made no recommendations with respect to the Report dated December 22, 2003, ADAMS Accession No. design or operation of Canadian nuclear plants to ML033570386. improve the reliability of the Ontario electricity 2 Further details are available in the NRC Special Inspection grid. Report dated December 22, 2003, ADAMS Accession No. ML033570386. The CNWG made two recommendations, one con- 3 Further details are available in the NRC Special Inspection cerning backup electrical generation equipment Report dated October 10, 2003, ADAMS Acccession No. to the CNSC’s Emergency Operations Centre and ML032880107.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 129

9. Physical and Cyber Security Aspects of the Blackout

Summary and Primary Findings conducted after the release of the Interim Report allowed the SWG to validate its preliminary find- After the Task Force Interim Report was issued in ings and the SWG to make recommendations on November 2003, the Security Working Group those findings: (SWG) continued in its efforts to investigate u Interviews and analyses conducted by the SWG whether a malicious cyber event directly caused indicate that within some of the companies or significantly contributed to the power outage of interviewed there are potential opportunities August 14, 2003. These efforts included addi- for cyber system compromise of Energy Man- tional analyses of interviews conducted prior to agement Systems (EMS) and their supporting the release of the Interim Report and additional information technology (IT) . consultations with representatives from the elec- Indications of procedural and technical IT man- tric power sector. The information gathered from agement vulnerabilities were observed in some these efforts validated the SWG’s Interim Report facilities, such as unnecessary software services preliminary findings and the SWG found no rea- not denied by default, loosely controlled system son to amend, alter, or negate any of the informa- access and perimeter control, poor patch and tion submitted to the Task Force for the Interim configuration management, and poor system Report. security documentation. This situation caused Specifically, further analysis by the SWG found the SWG to support the promulgation, imple- no evidence that malicious actors caused or con- mentation, and enforce- ment of cyber and physi- Recommendation tributed to the power outage, nor is there evidence 32, page 163 that worms or viruses circulating on the at cal security standards for the time of the power outage had an effect on the sector. power generation and delivery systems of the u A failure in a software program not linked to companies directly involved in the power outage. malicious activity may have significantly con- The SWG acknowledges reports of al-Qaeda tributed to the power outage. Since the issuance claims of responsibility for the power outage of of the Interim Report, the SWG consulted with August 14, 2003. However, these claims are not the software program’s vendor and confirmed consistent with the SWG’s findings. SWG analysis that since the August 14, 2003, power outage, also brought to light certain concerns respecting the vendor provided industry with the neces- the possible failure of alarm software; links to con- sary information and mitigation steps to trol and data acquisition software; and the lack of address this software failure. In Canada, a sur- a system or process for some grid operators to ade- vey was posted on the Canadian Electricity quately view the status of electric systems outside Association (CEA) secure members-only web of their immediate control. site to determine if the software was in use. The After the release of the Interim Report in Novem- Recommendation responses indicated that 33, page 164 ber 2003, the SWG determined that the existing it is not used by Canadian data, and the findings derived from analysis of companies in the industry. those data, provided sufficient certainty to exclude the probability that a malicious cyber u Internal and external links from Supervisory event directly caused or significantly contributed Control and Data Acquisi- to the power outage events. As such, further data tion (SCADA) networks to Recommendation collection efforts to conduct broader analysis were other systems introduced 34, page 165 deemed unnecessary. While no additional data vulnerabilities. were collected, further analysis and interviews

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 131 u In some cases, Control of its work, the SWG defined a “malicious cyber Area (CA) and Reliability event” as the manipulation of data, software or Recommendation Coordinator (RC) visibil- 35, page 165 hardware for the purpose of deliberately disrupt- ity into the operations of ing the systems that control and support the gener- surrounding areas was ation and delivery of electric power. lacking. The SWG worked closely with the United States The SWG’s analysis is reflected in a total of 15 rec- and Canadian law enforcement, intelligence and ommendations, two of which were combined with homeland security communities to examine the similar concerns by the ESWG (Recommendations possible role of malicious actors in the power out- 19 and 22); for the remaining 13, see Recommen- age. A primary activity in this endeavor was the dations 32-44 (pages 163-169). collection and review of available intelligence related to the power outage of August 14, 2003. Overall, the SWG’s final report was the result of The SWG also collaborated with the energy indus- interviews conducted with representatives of try to examine the cyber systems that control Cinergy, FirstEnergy, American Electric Power power generation and delivery operations, the (AEP), PJM Interconnect, the Midwest Independ- physical security of cyber assets, cyber policies ent System Operator (MISO), the East Central Area and procedures and the functionality of support- Reliability Coordinating Agreement (ECAR), and ing —such as communication sys- GE Power Systems Division. These entities were tems and backup power generation, which chosen due to their proximity to the causes of the facilitate the smooth running operation of cyber power outage based on the analysis of the Electric assets—to determine if the operation of these sys- System Working Group (ESWG). The findings tems was affected by malicious activity. The SWG contained in this report relate only to those enti- coordinated its efforts with those of other Working ties surveyed. The final report also incorporates Groups and there was a significant interdepen- information gathered from third party sources dence on each groups work products and findings. as well as federal security and intelligence The SWG’s focus was on the cyber operations of communities. those companies in the United States involved in In summary, SWG analysis provided no evidence the early stages of the power outage timeline, as that a malicious cyber attack was a direct or indi- identified by the ESWG. rect cause of the August 14, 2003, power outage. Outside of the SWG’s scope was the examination This conclusion is supported by the SWG’s event of the non-cyber physical infrastructure aspects of timeline, detailed later in this chapter, which the power outage of August 14, 2003. The Interim explains in detail the series of non-malicious Report detailed the SWG’s availability to investi- human and cyber failures that ultimately resulted gate breaches of physical security unrelated to the in the power outage. In the course of its analysis cyber dimensions of the infrastructure on behalf the SWG, however, did identify a number of areas of the Task Force but no incidents came to the of concern respecting cyber security aspects of the SWG’s attention during its work. Also outside of electricity sector. the scope of the SWG’s work was analysis of the impacts the power outage had on other critical SWG Mandate and Scope infrastructure sectors. Both Public Safety and Emergency Preparedness Canada and the U.S. It is widely recognized that the increased reliance Department of Homeland Security (DHS) exam- on IT by critical infrastructure sectors, including ined these issues, but not within the context of the the energy sector, has increased the vulnerability SWG. of these systems to disruption via cyber means. The ability to exploit these vulnerabilities has Cyber Security in the been demonstrated in North America. The SWG Electricity Sector was comprised of United States and Canadian fed- eral, state, provincial and local experts in both The generation and delivery of electricity has physical and cyber security and its objective was been, and continues to be, a target of malicious to determine the role, if any, that a malicious cyber groups and individuals intent on disrupting this event played in causing, or contributing to, the system. Even attacks that do not directly target the power outage of August 14, 2003. For the purposes electricity sector can have disruptive effects on

132 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G electricity system operations. Many malicious systems. The Canadian Electricity Association’s code attacks, by their very nature, are unbiased Critical Infrastructure Working Group is examin- and tend to interfere with operations supported by ing similar measures. vulnerable applications. One such incident occurred in January 2003, when the “Slammer” Internet worm took down monitoring computers Information Collection at FirstEnergy Corporation’s idled Davis-Besse and Analysis nuclear plant. A subsequent report by the North American Electric Reliability Council (NERC) After analyzing information already obtained concluded that although the infection caused no from stakeholder interviews, telephone tran- outages, it blocked commands that operated other scripts, law enforcement and intelligence informa- power utilities.1 tion, and other ESWG working documents, the SWG determined that it was not necessary to ana- This example, among others, highlights the lyze other sources of data on the cyber operations increased vulnerability to disruption via cyber of those such as log data from routers, intrusion means faced by North America’s critical infra- detection systems, firewalls, EMS, change man- structure sectors, including the energy sector. Of agement logs, and physical security materials. specific concern to the United States and Cana- dian governments are the SCADA networks, The SWG was divided into six sub-teams to which contain computers and applications that address the discrete components of this investiga- perform a wide variety of functions across many tion: Cyber Analysis, Intelligence Analysis, Physi- industries. In electric power, SCADA includes cal Analysis, Policies and Procedures, Supporting telemetry for status and control, as well as EMS, Infrastructure, and Root Cause Liaison. The SWG protective relaying and automatic generation con- organized itself in this manner to create a holistic trol. SCADA systems were developed to maximize approach to address each of the main areas of con- functionality and interoperability, with little cern with regards to power grid vulnerabilities. attention given to cyber security. These systems, Rather than analyze each area of concern sepa- many of which were intended to be isolated, now rately, the SWG sub-team structure provided a find themselves for a variety of business and oper- more comprehensive framework in which to ational reasons, either directly or indirectly con- investigate whether malicious activity was a cause nected to the global Internet. For example, in some of the power outage of August 14, 2003. Each instances, there may be a need for employees to sub-team was staffed with Subject Matter Experts monitor SCADA systems remotely. However, (SMEs) from government, industry, and academia connecting SCADA systems to a remotely accessi- to provide the analytical breadth and depth neces- ble can present security risks. sary to complete each sub-team’s objective. A These risks include the compromise of sensitive detailed overview of the sub-team structure and operating information and the threat of un- activities for each sub-team is provided below. authorized access to SCADA systems’ control mechanisms. 1. Cyber Analysis Security has always been a priority for the electric- The Cyber Analysis sub-team was led by the ity sector in North America; however, it is a CERT® Coordination Center (CERT/CC) at Carne- greater priority now than ever before. CAs and RCs gie Mellon University and the Royal Canadian recognize that the threat environment is changing Mounted Police (RCMP). This team was focused and that the risks are greater than in the past, and on analyzing and reviewing electronic media of they have taken steps towards improving their computer networks in which online communica- security postures. NERC’s Critical Infrastructure tions take place. The sub-team examined these Protection Advisory Group has been examining networks to determine if they were maliciously ways to improve both the physical and cyber secu- used to cause, or contribute to the August 14, rity dimensions of the North American power 2003, outage. Specifically, the SWG reviewed grid. This group is comprised of Canadian and materials created on behalf of DHS’s National U.S. industry experts in the areas of cyber secu- Communication System (NCS). These materials rity, physical security and operational security. covered the analysis and conclusions of their The creation of a national SCADA program is now Internet Protocol (IP) modeling correlation study also under discussion in the U.S. to improve the of Blaster (a malicious Internet worm first noticed physical and cyber security of these control on August 11, 2003) and the power outage. This

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 133 NCS analysis supports the SWG’s finding that general nature relating to the sector which was viruses and worms prevalent across the Internet provided to the North American energy industry at the time of the outage did not have any signifi- by U.S. and Canadian Government agencies in late cant impact on power generation and delivery July 2003. This information indicated that systems. The team also conducted interviews with al-Qaeda might attempt to carry out a physical vendors to identify known system flaws and attack involving explosions at oil production facil- vulnerabilities. ities, power plants or nuclear plants on the east coast of the U.S. during the summer of 2003. The This sub-team took a number of steps, including type of physical attack described in the intelli- reviewing NERC reliability standards to gain a gence that prompted this threat warning is not better understanding of the overall security pos- consistent with the events causing the power out- ture of the . Additionally, age as there was no indication of a kinetic event the sub-team participated in meetings in Balti- before, during, or immediately after the power more on August 22 and 23, 2003. The meetings outage of August 14, 2003. provided an opportunity for the cyber experts and the power industry experts to understand the Despite all of the above indications that no terror- details necessary to conduct an investigation. ist activity caused the power outage, al-Qaeda publicly claimed responsibility for its occurrence: Members of the sub-team also participated in the NERC/Department of Energy (DOE) Fact Finding u August 18, 2003: Al-Hayat, an Egyptian media meeting held in Newark, New Jersey on Septem- outlet, published excerpts from a communiqué ber 8, 2003. Each company involved in the outage attributed to al-Qaeda. Al Hayat claimed to have provided answers to a set of questions related to obtained the communiqué from the website of the outage. The meeting helped to provide a better the International Islamic Media Center. The understanding of what each company experi- content of the communiqué asserts that the “bri- enced before, during and after the outage. Addi- gades of Abu Fahes Al Masri had hit two main tionally, sub-team members participated in power plants supplying the East of the U.S., as interviews with grid operators from FirstEnergy well as major industrial cities in the U.S. and on October 8 and 9, 2003, and from Cinergy on Canada, . . . its ally in the war against Islam October 10, 2003. (New York and Toronto) and their neighbors.” Furthermore, the operation “was carried out on 2. Intelligence Analysis the orders of Osama bin Laden to hit the pillars of the U.S. economy,” as “a realization of bin The Intelligence Analysis sub-team was led by Laden’s promise to offer the Iraqi people a pres- DHS and the RCMP, which worked closely with ent.” The communiqué does not specify the way Federal, State and local law enforcement, intelli- the alleged sabotage was carried out, but does gence and homeland security organizations to elaborate on the alleged damage the sabotage assess whether the power outage was the result of caused to the U.S. economy in the areas of a malicious attack. finance, transportation, energy and telecommu- SWG analysis provided no evidence that mali- nications. cious actors—be they individuals or organiza- Additional claims and commentary regarding the tions—were responsible for, or contributed to, the power outage appeared in various Middle Eastern power outage of August 14, 2003. Additionally, media outlets: the sub-team found no indication of deliberate physical damage to power generating stations and u August 26, 2003: A conservative Iranian daily delivery lines on the day of the outage and there newspaper published a commentary regarding were no reports indicating the power outage was the potential of computer technology as a tool caused by a computer network attack. for terrorists against infrastructures dependent on computer networks, most notably water, Both U.S. and Canadian government authorities electric, public transportation, trade organiza- provide threat intelligence information to their tions and “supranational” companies in the respective energy sectors when appropriate. No United States. intelligence reports prior to, during or after the power outage indicated any specific terrorist plans u September 4, 2003: An Islamist participant in a or operations against the energy infrastructure. Jihadist chat room forum claimed that sleeper There was, however, threat information of a cells associated with al-Qaeda used the power

134 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G outage as a cover to infiltrate the U.S. from the sub-team sought to determine if the physical Canada. integrity of these cyber facilities was breached, whether externally or by an insider, prior to or However, these claims as known are not consis- during the outage, and if so, whether such a tent with the SWG’s findings. They are also not breach caused or contributed to the power outage. consistent with congressional testimony of the Federal Bureau of Investigation (FBI). Larry A. Although the sub-team analyzed information pro- Mefford, Executive Assistant Director in charge of vided to both the ESWG and Nuclear Working the FBI’s Counterterrorism and Counterintelli- Groups, the Physical Analysis sub-team also gence programs, testified in U.S. Congress on Sep- reviewed information resulting from face-to-face tember 4, 2003, that: meetings with energy sector personnel and “To date, we have not discovered any evidence site-visits to energy sector facilities to determine indicating that the outage was a result of activity the physical integrity of the cyber infrastructure. by international or domestic terrorists or other The sub-team compiled a list of questions cover- 2 criminal activity.” ing location, accessibility, cameras, alarms, locks, Mr. Mefford also testified that: fire protection and water systems as they apply to computer server rooms. Based on discussions of “The FBI has received no specific, credible these questions during its interviews, the threats to electronic power grids in the United sub-team found no evidence that the physical States in the recent past and the claim of the Abu integrity of the cyber infrastructure was breached. Hafs al-Masri Brigade to have caused the black- Additionally, the sub-team examined access and out appears to be no more than wishful thinking. control measures used to allow entry into com- We have no information confirming the actual mand and control facilities and the integrity of 3 existence of this group.” remote facilities. Current assessments suggest that there are terror- The sub-team also concentrated on mechanisms ists and other malicious actors who have the capa- used by the companies to report unusual incidents bility to conduct a malicious cyber attack with within server rooms, command and control rooms potential to disrupt the energy infrastructure. and remote facilities. The sub-team also addressed Although such an attack cannot be ruled out the possibility of an insider attack on the cyber entirely, an examination of available information infrastructure. and intelligence does not support any claims of a deliberate attack against the energy infrastructure 4. Policies and Procedures on, or leading up to, August 14, 2003. The few instances of physical damage that occurred on The Policies and Procedures sub-team was led by power delivery lines were the result of natural DHS and Public Safety and Emergency Prepared- events and not of sabotage. No intelligence reports ness Canada. Personnel from these organizations prior to, during or after the power outage indicated have strong backgrounds in the fields of electric any specific terrorist plans or operations against delivery operations, automated control systems the energy infrastructure. No incident reports including SCADA and EMS, and information detail suspicious activity near the power genera- security. tion plants or delivery lines in question. This sub-team was focused on examining the 3. Physical Analysis overall policies and procedures that may or may not have been in place during the events leading The Physical Analysis sub-team was led by the up to and during the power outage of August 14, United States Secret Service and the RCMP. These 2003. Policies that the team examined revolved organizations have a particular expertise in physi- centrally around the cyber systems of the compa- cal security assessments in the energy sector. The nies identified in the early stages of the power out- sub-team focused on issues related to how the age. Of specific interest to the team were policies cyber-related facilities of the energy sector compa- and procedures regarding the upgrade and mainte- nies were secured, including the physical integrity nance (to include system patching) of the com- of data centers and control rooms along with secu- mand and control (C2) systems, including SCADA rity procedures and policies used to limit access to and EMS. The Policies and Procedures sub-team sensitive areas. Focusing on the facilities identi- was also interested in the procedures for contin- fied as having a causal relationship to the outage, gency operations and restoration of systems in the

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 135 event of a computer system failure, or a cyber between the power outage and malicious activity, event such as an active hack or the discovery of whether physical or cyber initiated. malicious code. 5. Supporting Infrastructure Cyber Timeline The Supporting Infrastructure sub-team was led The following sequence of events was derived by a DHS expert with experience assessing sup- from discussions with representatives of porting infrastructure elements such as water FirstEnergy and the Midwest Independent System cooling for computer systems, back-up power sys- Operator (MISO). All times are approximate. tems, heating, ventilation and air conditioning The first significant cyber-related event of August (HVAC), and supporting telecommunications net- 14, 2003, occurred at 12:40 EDT at the MISO. At works. Public Safety and Emergency Preparedness this time, a MISO EMS engineer purposely dis- Canada was the Canadian co-lead for this effort. abled the automatic periodic trigger on the State This team analyzed the integrity of the supporting Estimator (SE) application, an application that infrastructure and its role, if any, in the power out- allows MISO to determine the real-time state of age on August 14, 2003. It sought to determine the power system for its region. The disablement whether the supporting infrastructure was per- of the automatic periodic trigger, a program fea- forming at a satisfactory level leading up to and ture that causes the SE to run automatically every during the power outage of August 14, 2003. In five minutes, is a necessary operating procedure addition, the team verified with vendors if there when resolving a mismatched solution produced were maintenance issues that may have impacted by the SE. The EMS engineer determined that the operations prior to and during the outage. mismatch in the SE solution was due to the SE The sub-team specifically focused on the follow- model depicting Cinergy’s Bloomington-Denois ing key issues in visits to each of the designated Creek 230-kV line as being in service, when it had electrical entities: actually been out of service since 12:12 EDT. 1. Carrier/provider/vendor for the supporting At 13:00 EDT, after making the appropriate infrastructure services and/or systems at select changes to the SE model and manually triggering company facilities; the SE, the MISO EMS engineer achieved two valid solutions. 2. Loss of service before and/or after the power outage; At 13:30 EDT, the MISO EMS engineer went to lunch. However, he forgot to re-engage the auto- 3. Conduct of maintenance activities before and/or matic periodic trigger. after the power outage; At 14:14 EDT, FirstEnergy’s “Alarm and Event Pro- 4. Conduct of installation activities before and/or cessing Routine,” (AEPR) a key software program after the power outage; that gives grid operators visual and audible indica- tions of events occurring on their portion of the 5. Conduct of testing activities before and/or after grid, began to malfunction. FirstEnergy grid opera- the power outage; tors were unaware that the software was not func- 6. Conduct of exercises before and/or after the tioning properly. This software did not become power outage; and functional again until much later that evening. 7. Existence of a monitoring process (log, checklist At 14:40 EDT, an Ops Engineer discovered the SE etc.) to document the status of supporting infra- was not solving and went to notify an EMS engi- structure services. neer that the SE was not solving. At 14:41 EDT, FirstEnergy’s server running the 6. Root Cause Analysis AEPR software failed to the backup server. Control The SWG Root Cause Liaison Sub-Team (SWG/ room staff remained unaware that the AEPR soft- RC) followed the work of the ESWG to identify ware was not functioning properly. potential root causes of the power outage. As these At 14:44 EDT, a MISO EMS engineer, after being root cause elements were identified, the sub-team alerted by the Ops Engineer, re-activated the auto- assessed with the ESWG any potential linkages matic periodic trigger and, for speed, manually to physical and/or cyber malfeasance. The final triggered the program. However, the SE program analysis of the SWG/RC team found no causal link again showed a mismatch.

136 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G At 14:54 EDT, FirstEnergy’s backup server failed. thought the Stuart-Atlanta line was out of service. AEPR continued to malfunction. The Area Control Grid operators referred to their “Outage Sched- Error Calculations (ACE) and Strip Charting rou- uler” and informed the EMS Engineer that their tines malfunctioned and the dispatcher user inter- data showed the Stuart-Atlanta line was “up” and face slowed significantly. that the EMS engineer should depict the line as in service in the SE model. At 15:17 EDT, the EMS At 15:00 EDT, FirstEnergy used its emergency engineer ran the SE with the Stuart-Atlanta line backup system to control the system and make “live,” but the model again mismatched. ACE calculations. ACE calculations and control systems continued to run on the emergency At 15:29 EDT, the MISO EMS Engineer asked backup system until roughly 15:08 EDT, when the MISO grid operators to call PJM Interconnect, LLC primary server was restored. to determine the status of the Stuart-Atlanta line. MISO was informed that the Stuart-Atlanta line At 15:05 EDT, FirstEnergy’s Harding-Chamberlin tripped at 14:02 EDT. The EMS Engineer adjusted 345-kV line tripped and locked out. FirstEnergy the model, which by this time had been updated grid operators did not receive notification from the with the 15:05 EDT Harding-Chamberlin 345-kV AEPR software which continued to malfunction, line trip, and came up with a valid solution. unbeknownst to the FirstEnergy grid operators. At 15:32 EDT, FirstEnergy’s Hanna-Juniper At 15:08 EDT, using data obtained at roughly 345-kV line tripped and locked out. The AEPR 15:04 EDT (it takes roughly five minutes for the SE continued to malfunction. to provide a result), the MISO EMS engineer con- At 15:41 EDT, the lights flickered at the cluded that the SE mismatched due to a line out- FirstEnergy’s control facility. This occurred age. His experience allowed him to isolate the because they had lost grid power and switched outage to the Stuart-Atlanta 345-kV line (which over to their emergency power supply. tripped about an hour earlier at 14:02 EDT). He took the Stuart-Atlanta line out of service in the SE At 15:42 EDT, a FirstEnergy dispatcher realized model and got a valid solution. that the AEPR was not working and made techni- cal support staff aware of the problem. Also at 15:08 EDT, the FirstEnergy primary server was restored. ACE calculations and control sys- Endnotes tems were now running on the primary server. 1 http://www.nrc.gov/reading-rm/doc-collections/news/ AEPR continued to malfunction, unbeknownst to 2003/ 03-108.html. the FirstEnergy grid operators. 2 http://www.fbi.gov/congress/congress03/mefford090403. htm. At 15:09 EDT, the MISO EMS engineer went to 3 http://www.fbi.gov/congress/congress03/mefford090403. the control room to tell the grid operators that he htm.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 137

10. Recommendations to Prevent or Minimize the Scope of Future Blackouts

Introduction bulk power systems. Market mechanisms should be used where possible, but in circum- As reported in previous chapters, the blackout on stances where conflicts between reliability and August 14, 2003, was preventable. It had several commercial objectives cannot be reconciled, direct causes and contributing factors, including: they must be resolved in favor of high reliabil- ity.2 u Failure to maintain adequate reactive power support 2. Regulators and consumers should recognize that reliability is not free, and that maintaining u Failure to ensure operation within secure limits it requires ongoing investments and operational expenditures by many parties. Regulated com- u Inadequate vegetation management panies will not make such outlays without u Inadequate operator training assurances from regulators that the costs will be recoverable through approved electric rates, u Failure to identify emergency conditions and and unregulated companies will not make such communicate that status to neighboring outlays unless they believe their actions will be systems profitable.3 u Inadequate regional-scale visibility over the 3. Recommendations have no value unless they bulk power system. are implemented. Accordingly, the Task Force Further, as discussed in Chapter 7, after each emphasizes strongly that North American gov- major blackout in North America since 1965, an ernments and industry should commit them- expert team of investigators has probed the causes selves to working together to put into effect the of the blackout, written detailed technical reports, suite of improvements mapped out below. Suc- and issued lists of recommendations to prevent or cess in this area will require particular attention minimize the scope of future blackouts. Yet sev- to the mechanisms proposed for performance eral of the causes of the August 14 blackout are monitoring, accountability of senior manage- strikingly similar to those of the earlier blackouts. ment, and enforcement of compliance with Clearly, efforts to implement earlier recommenda- standards. tions have not been adequate.1 Accordingly, the recommendations presented below emphasize 4. The bulk power systems are among the most comprehensiveness, monitoring, training, and critical elements of our economic and social enforcement of reliability standards when neces- infrastructure. Although the August 14 black- sary to ensure compliance. out was not caused by malicious acts, a number of security-related actions are needed to It is useful to think of the recommendations pre- enhance reliability. sented below in terms of four broad themes: Over the past decade or more, electricity demand 1. Government bodies in the U.S. and Canada, reg- has increased and the North American intercon- ulators, the North American electricity indus- nections have become more densely woven and try, and related organizations should commit heavily loaded, over more hours of the day and themselves to making adherence to high reli- year. In many geographic areas, the number of sin- ability standards paramount in the planning, gle or multiple contingencies that could create design, and operation of North America’s vast serious problems has increased. Operating the

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 139 grids at higher loadings means greater stress on Implementation of the recommendations pre- equipment and a smaller range of options and a sented below will address all remaining issues, shorter period of time for dealing with unexpected through the ongoing work of government bodies problems. The system operator’s job has become and agencies in the U.S. and Canada, the electric- more challenging, leading to the need for more ity industry, and the non-governmental institu- sophisticated grid management tools and more tions responsible for the maintenance of electric demanding operator training programs and certifi- reliability in North America. cation requirements.

The recommendations below focus on changes of Recommendations many kinds that are needed to ensure reliability, for both the summer of 2004 and for the years to Forty-six numbered recommendations are pre- follow. Making these changes will require higher sented below, grouped into four substantive areas. and broader awareness of the importance of reli- Some recommendations concern subjects that ability, and some of them may require substantial were addressed in some detail by commenters on new investments. However, the cost of not making the Interim Report or participants in the Task these changes, i.e., the cost of chronic large-scale Force’s two technical conferences. In such cases, blackouts, would be far higher than the cost of the commenters are listed in the Endnotes section addressing the problem. Estimates of the cost of of this chapter. Citation in the endnotes does not the August 14 blackout range between $4 and $10 necessarily mean that the commenter supports the billion (U.S.).4 position expressed in the recommendation. A “table of contents” overview of the recommenda- The need for additional attention to reliability is tions is provided in the text box on pages 141-142. not necessarily at odds with increasing competi- tion and the improved economic efficiency it Group I. Institutional Issues brings to bulk power markets. Reliability and eco- Related to Reliability nomic efficiency can be compatible, but this out- come requires more than reliance on the laws of 1. Make reliability standards mandatory physics and the principles of economics. It and enforceable, with penalties for non- requires sustained, focused efforts by regulators, compliance.5 policy makers, and industry leaders to strengthen and maintain the institutions and rules needed to Appropriate branches of government in the United protect both of these important goals. Regulators States and Canada should take action as required must ensure that competition does not erode to make reliability standards mandatory and incentives to comply with reliability require- enforceable, and to provide appropriate penalties for noncompliance. ments, and that reliability requirements do not serve as a smokescreen for noncompetitive practices. A. Action by the U.S. Congress The U.S. Congress should enact reliability legisla- The metric for gauging achievement of this goal— tion no less stringent than the provisions now making the changes needed to maintain a high included in the pending comprehensive energy level of reliability for the next decade or longer— bills, H.R. 6 and S. 2095. Specifically, these provi- will be the degree of compliance obtained with the sions would require that: recommendations presented below. The single u Reliability standards are to be mandatory and most important step in the United States is for the enforceable, with penalties for noncompliance. U.S. Congress to enact the reliability provisions in pending energy bills (H.R. 6 and S. 2095). If that u Reliability standards should be developed by an can be done, many of the actions recommended independent, international electric reliability below could be accomplished readily in the organization (ERO) with fair stakeholder repre- course of implementing the legislation. sentation in the selection of its directors and balanced decision-making in any ERO commit- Some commenters asserted that the Interim tee or subordinate organizational structure. Report did not analyze all factors they believe may (See text box on NERC and an ERO below.) have contributed to the August 14 blackout.

140 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G Overview of Task Force Recommendations: Titles Only Group I. Institutional Issues Related to Reliability 1. Make reliability standards mandatory and enforceable, with penalties for noncompliance. 2. Develop a regulator-approved funding mechanism for NERC and the regional reliability councils, to ensure their independence from the parties they oversee. 3. Strengthen the institutional framework for reliability management in North America. 4. Clarify that prudent expenditures and investments for bulk system reliability (including invest- ments in new technologies) will be recoverable through transmission rates. 5. Track implementation of recommended actions to improve reliability. 6. FERC should not approve the operation of new RTOs or ISOs until they have met minimum functional requirements. 7. Require any entity operating as part of the bulk power system to be a member of a regional reli- ability council if it operates within the council’s footprint. 8. Shield operators who initiate load shedding pursuant to approved guidelines from liability or retaliation. 9. Integrate a “reliability impact” consideration into the regulatory decision-making process. 10. Establish an independent source of reliability performance information. 11. Establish requirements for collection and reporting of data needed for post-blackout analyses. 12. Commission an independent study of the relationships among industry restructuring, competi- tion, and reliability. 13. DOE should expand its research programs on reliability-related tools and technologies. 14. Establish a standing framework for the conduct of future blackout and disturbance investigations. Group II. Support and Strengthen NERC’s Actions of February 10, 2004 15. Correct the direct causes of the August 14, 2003 blackout. 16. Establish enforceable standards for maintenance of electrical clearances in right-of-way areas. 17. Strengthen the NERC Compliance Enforcement Program. 18. Support and strengthen NERC’s Reliability Readiness Audit Program. 19. Improve near-term and long-term training and certification requirements for operators, reliability coordinators, and operator support staff. 20. Establish clear definitions for normal, alert and emergency operational system conditions. Clarify roles, responsibilities, and authorities of reliability coordinators and control areas under each condition. 21. Make more effective and wider use of system protection measures. 22. Evaluate and adopt better real-time tools for operators and reliability coordinators. 23. Strengthen reactive power and voltage control practices in all NERC regions. 24. Improve quality of system modeling data and data exchange practices. 25. NERC should reevaluate its existing reliability standards development process and accelerate the adoption of enforceable standards. 26. Tighten communications protocols, especially for communications during alerts and emergen- cies. Upgrade communication system hardware where appropriate. 27. Develop enforceable standards for transmission line ratings. 28. Require use of time-synchronized data recorders. 29. Evaluate and disseminate lessons learned during system restoration. 30. Clarify criteria for identification of operationally critical facilities, and improve dissemination of updated information on unplanned outages. 31. Clarify that the transmission loading relief (TLR) process should not be used in situations involv- ing an actual violation of an Operating Security Limit. Streamline the TLR process. (continued on page 142)

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 141 Overview of Task Force Recommendations: Titles Only (Continued) Group III. Physical and Cyber Security of North American Bulk Power Systems 32. Implement NERC IT standards. 33. Develop and deploy IT management procedures. 34. Develop corporate-level IT security governance and strategies. 35. Implement controls to manage system health, network monitoring, and incident management. 36. Initiate U.S.-Canada risk management study. 37. Improve IT forensic and diagnostic capabilities. 38. Assess IT risk and vulnerability at scheduled intervals. 39. Develop capability to detect wireless and remote wireline intrusion and surveillance. 40. Control access to operationally sensitive equipment. 41. NERC should provide guidance on employee background checks. 42. Confirm NERC ES-ISAC as the central point for sharing security information and analysis. 43. Establish clear authority for physical and cyber security. 44. Develop procedures to prevent or mitigate inappropriate disclosure of information. Group IV. Canadian Nuclear Power Sector 45. The Task Force recommends that the Canadian Nuclear Safety Commission request Ontario Power Generation and Bruce Power to review operating procedures and operator training associ- ated with the use of adjuster rods. 46. The Task Force recommends that the Canadian Nuclear Safety Commission purchase and install backup generation equipment.

u Reliability standards should allow, where applied more broadly. For example, in the appropriate, flexibility to accommodate Western U.S. and Canada, many members of the regional differences, including more stringent Western Electricity Coordinating Council (WECC) reliability requirements in some areas, but include clauses in contracts for the purchase of regional deviations should not be allowed to wholesale power that require the parties to com- lead to lower reliability expectations or ply with reliability standards. In the areas of the performance. U.S. and Canada covered by the Northeast Power u Coordinating Council (NPCC), parties found not to An ERO-proposed standard or modification to a be in compliance with NERC and NPCC reliability standard should take effect within the United requirements are subject to escalating degrees of States upon approval by the Federal Energy scrutiny by their peers and the public. Both of Regulatory Commission (FERC). these approaches have had positive effects. FERC u FERC should remand to the ERO for further should examine other approaches as well, and consideration a proposed reliability standard or work with state regulatory authorities to ensure a modification to a reliability standard that it disapproves of in whole or in part, with expla- nation for its concerns and rationale. B. Action by FERC NERC and the ERO In the absence of such reliability legislation, FERC If the proposed U.S. reliability legislation should review its statutory authorities under passes, the North American Electric Reliability existing law, and to the maximum extent permit- Council (NERC) may undertake various organi- ted by those authorities, act to enhance reliability zational changes and seek recognition as the by making compliance with reliability standards electric reliability organization (ERO) called for enforceable in the United States. In doing so, in H.R. 6 and S. 2095. For simplicity of presen- FERC should consult with state regulators, NERC, tation, the many forward-looking references and the regional reliability councils to determine below to “NERC” are intended to apply to the whether certain enforcement practices now in use ERO if the legislation is passed, and to NERC if in some parts of the U.S. and Canada might be the legislation is not passed.

142 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G that any other appropriate actions to make reli- based on a surcharge in transmission rates. The ability standards enforceable are taken. purpose would be to ensure that NERC and the councils are appropriately funded to meet their Action by FERC under its existing authorities changing responsibilities without dependence on would not lessen the need for enactment of reli- the parties that they oversee. Note: Implementation ability legislation by the Congress. Many U.S. par- of this recommendation should be coordinated ties that should be required by law to comply with with the review called for in Recommendation 3 reliability requirements are not subject to the concerning the future role of the regional councils. Commission’s full authorities under the Federal Power Act. NERC’s current $13 million/year budget is funded C. Action by Appropriate Authorities in Canada as part of the dues that transmission owners, gen- erators, and other market participants pay to the The interconnected nature of the transmission ten regional reliability councils, which then fund grid requires that reliability standards be identical NERC. This arrangement makes NERC subject to or compatible on both sides of the Canadian/U.S. the influence of the reliability councils, which are border. Several provincial governments in Canada in turn subject to the influence of their control have already demonstrated support for mandatory areas and other members. It also compromises the and enforceable reliability standards and have independence of both NERC and the councils in either passed legislation or have taken steps to put relation to the entities whose actions they oversee, in place the necessary framework for implement - and makes it difficult for them to act forcefully ing such standards in Canada. The federal and and objectively to maintain the reliability of the provincial governments should work together and North American bulk power system. Funding with appropriate U.S. authorities to complete a NERC and the councils through a transmission framework to ensure that identical or compatible rate surcharge administered and disbursed under standards apply in both countries, and that means regulatory supervision would enable the organiza- are in place to enforce them in all interconnected tions to be more independent of the industry, with jurisdictions. little impact on electric bills. The dues that com- D. Joint Actions by U.S. and Canadian panies pay to the regional councils are passed Governments through to electricity customers today, so the net International coordination mechanisms should be impacts on customer bills from shifting to a rate developed between the governments in Canada surcharge would be minimal. and the United States to provide for government Implementation of the recommendations pre- oversight of NERC or the ERO, and approval and sented in this report will involve a substantial enforcement of reliability standards. increase in NERC’s functions and responsibilities, E. Memoranda of Understanding between U.S. and require an increase in NERC’s annual budget. or Canadian Government Agencies and The additional costs, however, would be small in NERC comparison to the cost of a single major blackout. Government agencies in both countries should decide (individually) whether to develop a memo- 3. Strengthen the institutional framework randum of understanding (MOU) with NERC that for reliability management in North would define the agency’s working relationship America.7 with NERC, government oversight of NERC activi- ties if appropriate, and the reliability responsibili- FERC, DOE and appropriate authorities in Canada ties of the signatories. should work with the states, NERC, and the indus- try, to evaluate and develop appropriate modifica- tions to the existing institutional framework for 2. Develop a regulator-approved mecha- reliability management. In particular, the affected nism for funding NERC and the regional government agencies should: reliability councils, to ensure their inde- A. Commission an independent review by quali- 6 pendence from the parties they oversee. fied experts in organizational design and man- U.S. and Canadian regulatory authorities should agement to address issues concerning how best work with NERC, the regional councils, and the to structure an international reliability organi- industry to develop and implement a new funding zation for the long term. mechanism for NERC and the regional councils

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 143 B. Based in part on the results of that review, sizes and resources. Some have grown from a reli- develop metrics for gauging the adequacy of ability council into an ISO or RTO (ERCOT and NERC’s performance, and specify the functions SPP), some span less than a single state (FRCC and of the NERC Board of Trustees and the proce- ERCOT) while others cover many states and prov- dure for selecting the members of the Board. inces and cross national boundaries (NPCC and C. Examine and clarify the future role of the WECC). Several cross reliability coordinator regional reliability councils, with particular boundaries. It is time to evaluate the appropriate attention to their mandate, scope, structure, size and scope of a regional council, the specific responsibilities, and resource requirements. tasks that it should perform, and the appropriate level of resources, expertise, and independence D. Examine NERC’s proposed Functional Model that a regional reliability council needs to perform and set minimum requirements under which those tasks effectively. This evaluation should NERC would certify applicants’ qualifications to perform critical functions. also address whether the councils as currently constituted are appropriate to meet future reliabil- E. Request NERC and the regional councils to sus- ity needs. pend designation of any new control areas (or sub-control areas) until the minimum require- D. NERC’s Functional Model ments in section D (above) have been estab- The transition to competition in wholesale power lished, unless an applicant shows that such markets has been accompanied by increasing designation would significantly enhance reli- diversity in the kinds of entities that need to be in ability. compliance with reliability standards. Rather than F. Determine ways to enhance reliability opera- resist or attempt to influence this evolution, tions in the United States through simplified NERC’s response—through the Functional organizational boundaries and resolution of Model—has been to seek a means of enabling reli- seams issues. ability to be maintained under virtually any insti- tutional framework. The Functional Model A and B. Reshaping NERC identifies sixteen basic functions associated with operating the bulk electric systems and maintain- The far-reaching organizational changes in the ing reliability, and the capabilities that an organi- North American electricity industry over the past zation must have in order to perform a given decade have already induced major changes in the function. (See Functional Model text box below.) nature of NERC as an organization. However, the process of change at NERC is far from complete. NERC acknowledges that maintaining reliability Important additional changes are needed such as in some frameworks may be more difficult or more the shift to enforceable standards, development of expensive than in others, but it stresses that as an effective monitoring capability, and funding long as some responsible party addresses each that is not dependent on the industry. These function and the rules are followed, reliability will changes will strengthen NERC as an organization. be preserved. By implication, the pros and cons of In turn, to properly serve overarching public pol- alternative institutional frameworks in a given icy concerns, this strengthening of NERC’s capa- region—which may affect aspects of electric bilities will have to be balanced with increased industry operations other than reliability—are government oversight, more specific metrics for matters for government agencies to address, not gauging NERC’s performance as an organization, NERC. and greater transparency concerning the functions of its senior management team (including its One of the major purposes of the Functional Board of Trustees) and the procedures by which Model is to create a vehicle through which NERC those individuals are selected. The affected gov- will be able to identify an entity responsible for ernment agencies should jointly commission an performing each function in every part of the three independent review of these and related issues to North American interconnections. NERC consid- aid them in making their respective decisions. ers four of the sixteen functions to be especially critical for reliability. For these functions, NERC intends, upon application by an entity, to review C. The Role of the Regional Reliability Councils the entity’s capabilities, and if appropriate, certify North America’s regional reliability councils have that the entity has the qualifications to perform evolved into a disparate group of organizations that function within the specified geographic area. with varying responsibilities, expertise, roles, For the other twelve functions, NERC proposes to

144 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G “register” entities as responsible for a given func- critical functions for their respective areas. In tion in a given area, upon application. other areas, capabilities may remain less aggre- gated, and the institutional structure may remain All sixteen functions are presently being per- more complex. formed to varying degrees by one entity or another today in all areas of North America. Frequently an Working with NERC and the industry, FERC and entity performs a combination of functions, but authorities in Canada should review the Func- there is great variety from one region to another in tional Model to ensure that operating hierarchies how the functions are bundled and carried out. and entities will facilitate, rather than hinder, Whether all of the parties who are presently per- efficient reliability operations. At a minimum, forming the four critical functions would meet the review should identify ways to eliminate inap- NERC’s requirements for certification is not propriate commercial incentives to retain control known, but the proposed process provides a area status that do not support reliability objec- means of identifying any weaknesses that need to tives; address operational problems associated be rectified. with institutional fragmentation; and set mini- mum requirements with respect to the capabilities At present, after protracted debate, the Functional requiring NERC certification, concerning subjects Model appears to have gained widespread but cau- such as: tious support from the diverse factions across the industry, while the regulators have not taken a 1. Fully operational backup control rooms. position. In some parts of North America, such as 2. System-wide (or wider) electronic map boards the Northeast, large regional organizations will or functional equivalents, with data feeds that probably be certified to perform all four of the are independent of the area’s main energy man- agement system (EMS). Sixteen Functions in NERC’s Functional Model 3. Real-time tools that are to be available to the u Operating Reliability operator, with backups. (See Recommendation u 22 below for more detail concerning minimum Planning Reliability requirements and guidelines for real-time oper- u Balancing (generation and demand) ating tools.) u Interchange 4. SCADA and EMS requirements, including u Transmission service backup capabilities. u Transmission ownership 5. Training programs for all personnel who have u Transmission operations access to a control room or supervisory respon- u Transmission planning sibilities for control room operations. (See Rec- u Resource planning ommendation 19 for more detail on the Task u Distribution Force’s views regarding training and certifica- tion requirements.) u Generator ownership u Generator operations 6. Certification requirements for control room managers and staff. u Load serving u Purchasing and selling E. Designation of New Control Areas u Standards development Significant changes in the minimum functional u requirements for control areas (or balancing Compliance monitoring authorities, in the context of the Functional NERC regards the four functions shown above Model) may result from the review called for in bold as especially critical to reliability. above. Accordingly, the Task Force recommends Accordingly, it proposes to certify applicants that regulatory authorities should request NERC that can demonstrate that they have the capabil- and the regional councils not to certify any new ities required to perform those functions. The control areas (or sub-control areas) until the Operating Reliability authority would corre- appropriate regulatory bodies have approved the spond to today’s reliability coordinator, and the minimum functional requirements for such bod- Balancing authority to today’s control area ies, unless an applicant shows that such designa- operator. tion would significantly enhance reliability.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 145 F. Boundary and Seam Issues and Minimum FERC and appropriate authorities in Canada Functional Requirements should clarify that prudent expenditures and investments by regulated companies to maintain or Some observers believe that some U.S. regions improve bulk system reliability will be recoverable have too many control areas performing one or through transmission rates. more of the four critical reliability functions. In the U.S., FERC and DOE should work with state In many cases, these entities exist to retain com- regulators to identify and resolve issues related to mercial advantages associated with some of these the recovery of reliability costs and investments functions. The resulting institutional fragmenta- through retail rates. Appropriate authorities in tion and decentralization of control leads to a Canada should determine whether similar efforts higher number of operating contacts and seams, are warranted. complex coordination requirements, misalign- ment of control areas with other electrical bound- Companies will not make the expenditures and aries and/or operating hierarchies, inconsistent investments required to maintain or improve the practices and tools, and increased compliance reliability of the bulk power system without credi- monitoring requirements. These consequences ble assurances that they will be able to recover hamper the efficiency and reliability of grid their costs. operations. 5. Track implementation of recommended As shown above (text box on page 14), MISO, as actions to improve reliability.9 reliability coordinator for its region, is responsible In the requirements issued on February 10, 2004, for dealing with 37 control areas, whereas PJM NERC announced that it and the regional councils now spans 9 control areas, ISO-New England has would establish a program for documenting com- 2, and the New York ISO, Ontario’s IMO, Texas’ pletion of recommendations resulting from the ERCOT, and Québec’s Trans-Energie are them- August 14 blackout and other historical outages, as selves the control area operators for their respec- well as NERC and regional reports on violations of tive large areas. Moreover, it is not clear that small reliability standards, results of compliance audits, control areas are financially able to provide the and lessons learned from system disturbances. The facilities and services needed to perform control regions are to report on a quarterly basis to NERC. area functions at the level needed to maintain reli- In addition, NERC intends to initiate by January 1, ability. This concern applies also to the four types 2005 a reliability performance monitoring function of entities that NERC proposes to certify under the that will evaluate and report on trends in bulk Functional Model (i.e., Reliability Authority, electric system reliability performance. Planning Authority, Balancing Authority, and The Task Force supports these actions strongly. Interchange Authority). However, many of the Task Force’s recommenda- tions pertain to government bodies as well as For the long term, the regulatory agencies should NERC. Accordingly: continue to seek ways to ensure that the regional A. Relevant agencies in the U.S. and Canada operational frameworks that emerge through the should cooperate to establish mechanisms for implementation of the Functional Model promote tracking and reporting to the public on imple- reliable operations. Any operational framework mentation actions in their respective areas of will represent some combination of tradeoffs, but responsibility. reliability is a critically important public policy B. NERC should draw on the above-mentioned objective and should be a primary design quarterly reports from its regional councils to criterion. prepare annual reports to FERC, appropriate authorities in Canada, and the public on the status of the industry’s compliance with recom- 4. Clarify that prudent expenditures and mendations and important trends in electric investments for bulk system reliability system reliability performance. (including investments in new technolo- gies) will be recoverable through trans- The August 14 blackout shared a number of con- mission rates.8 tributing factors with prior large-scale blackouts,

146 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G confirming that the lessons and recommendations suits; and 2) affirm publicly that actions to shed from earlier blackouts had not been adequately load pursuant to such guidelines are not indicative implemented, at least in some geographic areas. of operator failure. Accordingly, parallel and coordinated efforts are Timely and sufficient action to shed load on needed by the relevant government agencies and August 14 would have prevented the spread of the NERC to track the implementation of recommen- blackout beyond northern Ohio. NERC has dations by governments and the electricity indus- directed all the regional councils in all areas of try. WECC and NPCC have already established North America to review the applicability of plans programs that could serve as models for tracking for under-voltage load shedding, and to support implementation of recommendations. the development of such capabilities where they would be beneficial. However, organizations and 6. FERC should not approve the operation individual operators may hesitate to initiate such of a new RTO or ISO until the applicant actions in appropriate circumstances without assurances that they will not be subject to liability has met the minimum functional suits or other forms of retaliation, provided their requirements for reliability action is pursuant to previously approved coordinators. guidelines. The events of August 14 confirmed that MISO did not yet have all of the functional capabilities 9. Integrate a “reliability impact” consid- required to fulfill its responsibilities as reliability eration into the regulatory decision- coordinator for the large area within its footprint. making process.12 FERC should not authorize a new RTO or ISO to become operational until the RTO or ISO has veri- The Task Force recommends that FERC, appropri- fied that all critical reliability capabilities will be ate authorities in Canada, and state regulators inte- functional upon commencement of RTO or ISO grate a formal reliability impact consideration into operations. their regulatory decision-making to ensure that their actions or initiatives either improve or at minimum do no harm to reliability. 7. Require any entity operating as part of Regulatory actions can have unintended conse- the bulk power system to be a member quences. For example, in reviewing proposed util- of a regional reliability council if it op- ity company mergers, FERC’s primary focus has erates within the council’s footprint.10 been on financial and rate issues, as opposed to The Task Force recommends that FERC and appro- the reliability implications of such mergers. To priate authorities in Canada be empowered minimize unintended harm to reliability, and aid through legislation, if necessary, to require all enti- the improvement of reliability where appropriate, ties that operate as part of the bulk electric system the Task Force recommends that regulators incor- to certify that they are members of the regional porate a formal reliability impact consideration reliability council for all NERC regions in which into their decision processes. At the same time, they operate. regulators should be watchful for use of alleged This requirement is needed to ensure that all rele- reliability impacts as a smokescreen for anti- vant parties are subject to NERC standards, poli- competitive or discriminatory behavior. cies, etc., in all NERC regions in which they operate. Action by the Congress or legislative bod- 10. Establish an independent source of ies in Canada may be necessary to provide appro- reliability performance information.13 priate authority. The U.S. Department of Energy’s Energy Informa- tion Administration (EIA), in coordination with 8. Shield operators who initiate load shed- other interested agencies and data sources (FERC, ding pursuant to approved guidelines appropriate Canadian government agencies, NERC, from liability or retaliation.11 RTOs, ISOs, the regional councils, transmission operators, and generators) should establish com- Legislative bodies and regulators should: 1) estab- mon definitions and information collection stan- lish that operators (whether organizations or indi- dards. If the necessary resources can be identified, viduals) who initiate load shedding pursuant to EIA should expand its current activities to include operational guidelines are not subject to liability information on reliability performance.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 147 Energy policy makers and a wide range of eco- transmission system is now being used to transmit nomic decision makers need objective, factual power over distances and at volumes that were not information about basic trends in reliability per- envisioned when the system was designed, and formance. EIA and the other organizations cited that this functional shift has created major risks above should identify information gaps in federal that have not been adequately addressed. Indeed, data collections covering reliability performance some commenters believe that restructuring was a and physical characteristics. Plans to fill those major cause of the August 14 blackout. gaps should be developed, and the associated resource requirements determined. Once those The Task Force believes that the Interim Report resources have been acquired, EIA should publish accurately identified the primary causes of the information on trends, patterns, costs, etc. related blackout. It also believes that had existing reliabil- to reliability performance. ity requirements been followed, either the distur- bance in northern Ohio that evolved on August 14 into a blackout would not have occurred, or it 11. Establish requirements for collection would have been contained within the FE control and reporting of data needed for area. post-blackout analyses. Nevertheless, as discussed at the beginning of this FERC and appropriate authorities in Canada chapter, the relationship between competition in should require generators, transmission owners, power markets and reliability is both important and other relevant entities to collect and report and complex, and careful management and sound data that may be needed for analysis of blackouts rules are required to achieve the public policy and other grid-related disturbances. goals of reasonable electricity prices and high reli- The investigation team found that some of the data ability. At the present stage in the evolution of needed to analyze the August 14 blackout fully these markets, it is worthwhile for DOE and Natu- was not collected at the time of the events, and ral Resources Canada (in consultation with FERC thus could not be reported. Some of the data that and the Canadian Council of Energy Ministers) to was reported was based on incompatible defini- commission an independent expert study to pro- tions and formats. As a result, there are aspects of vide advice on how to achieve and sustain an the blackout, particularly concerning the evolu- appropriate balance in this important area. tion of the cascade, that may never be fully Among other things, this study should take into explained. FERC, EIA and appropriate authorities account factors such as: in Canada should consult with NERC, key mem- bers of the investigation team, and the industry to u Historical and projected load growth identify information gaps, adopt common defini- u Location of new generation in relation to old tions, and establish filing requirements. generation and loads u Zoning and NIMBY15 constraints on siting of 12. Commission an independent study of generation and transmission the relationships among industry restructuring, competition, and reli- u Lack of new transmission investment and its ability.14 causes u DOE and Natural Resources Canada should com- Regional comparisons of impact of wholesale mission an independent study of the relationships electric competition on reliability performance among industry restructuring, competition in and on investments in reliability and power markets, and grid reliability, and how those transmission relationships should be managed to best serve the u public interest. The financial community’s preferences and their effects on capital investment patterns Some participants at the public meetings held in Cleveland, New York and Toronto to review the u Federal vs. state jurisdictional concerns Task Force’s Interim Report expressed the view u Impacts of state caps on retail electric rates that the restructuring of electricity markets for competition in many jurisdictions has, itself, u Impacts of limited transmission infrastructure increased the likelihood of major supply interrup- on energy costs, transmission congestion, and tions. Some of these commenters assert that the reliability

148 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G u Trends in generator fuel and wholesale electric- effects on system stability during major ity prices disturbances. u Trends in power flows, line losses, voltage lev- u Development of practical human factors guide- els, etc. lines for power system control centers. u Study of obstacles to the economic deployment 13. DOE should expand its research pro- of capability and distributed grams on reliability-related tools and generation. 16 technologies. u Investigation of alternative approaches to moni- DOE should expand its research agenda, and con- toring right-of-way vegetation management. sult frequently with Congress, FERC, NERC, state u regulators, Canadian authorities, universities, and Study of air traffic control, the airline industry, the industry in planning and executing this agenda. and other relevant industries for practices and ideas that could reduce the vulnerability of the More investment in research is needed to improve electricity industry and its reliability managers grid reliability, with particular attention to to human error. improving the capabilities and tools for system monitoring and management. Research on reli- Cooperative and complementary research and ability issues and reliability-related technologies funding between nations and between govern- has a large public-interest component, and gov- ment and industry efforts should be encouraged. ernment support is crucial. DOE already leads many research projects in this area, through part- 14. Establish a standing framework for the nerships with industry and research under way at conduct of future blackout and distur- the national laboratories and universities. DOE’s bance investigations.17 leadership and frequent consultation with many parties are essential to ensure the allocation of The U.S., Canadian, and Mexican governments, in scarce research funds to urgent projects, bring the consultation with NERC, should establish a stand- best talent to bear on such projects, and enhance ing framework for the investigation of future black- the dissemination and timely application of outs, disturbances, or other significant grid-related research results. incidents. Important areas for reliability research include but Fortunately, major blackouts are not frequent, are not limited to: which makes it important to study such events carefully to learn as much as possible from the u Development of practical real-time applications experience. In the weeks immediately after for wide-area system monitoring using phasor August 14, important lessons were learned per- measurements and other synchronized measur- taining not only to preventing and minimizing ing devices, including post-disturbance future blackouts, but also to the efficient and fruit- applications. ful investigation of future grid-related events. u Development and use of enhanced techniques Appropriate U.S., Canadian, and Mexican govern- for modeling and simulation of contingencies, ment agencies, in consultation with NERC and blackouts, and other grid-related disturbances. other organizations, should prepare an agreement u Investigation of protection and control alterna- that, among other considerations: tives to slow or stop the spread of a cascading u Establishes criteria for determining when an power outage, including demand response ini- investigation should be initiated. tiatives to slow or halt voltage collapse. u Establishes the composition of a task force to u Re-evaluation of generator and customer equip- provide overall guidance for the inquiry. The ment protection requirements based on voltage task force should be international if the trigger- and frequency phenomena experienced during ing event had international consequences. the August 14, 2003, cascade. u Provides for coordination with state and provin- u Investigation of protection and control of gener- cial governments, NERC and other appropriate ating units, including the possibility of multiple entities. steps of over-frequency protection and possible

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 149 u Designates agencies responsible for issuing NERC: directives concerning preservation of records, Specified measures required in that area to provision of data within specified periods to a help ensure the reliability of the FE system and data warehouse facility, conduct of onsite inter- avoid undue risks to neighboring systems. views with control room personnel, etc. However, the blackout investigation did not ex- amine conditions in FE service areas in other u Provides guidance on confidentiality of data. states. u Identifies types of expertise likely to be needed Task Force: on the investigation team. Recommends that NERC require FE to review its entire service territory, in all states, to de- Group II. Support and Strengthen termine whether similar vulnerabilities exist NERC’s Actions of February 10, 2004 and require prompt attention. This review should be completed by June 30, 2004, and the On February 10, 2004, after taking the findings of results reported to FERC, NERC, and utility the Task Force’s investigation into the August 14, regulatory authorities in the affected states. 2003, blackout into account, the NERC Board of Trustees approved a series of actions and strategic and technical initiatives intended to protect the 2. Interim Voltage Criteria reliability of the North American bulk electric sys- NERC: tem. (See Appendix D for the full text of the Required that FE, consistent with or as part of a Board’s statement of February 10.) Overall, the study ordered by FERC on December 24, Task Force supports NERC’s actions and initia- 2003,19 determine the minimum acceptable lo- tives strongly. On some subjects, the Task Force cation-specific voltages at all 345 kV and 138 advocates additional measures, as shown in the kV buses and all generating stations within the next 17 recommendations. FE control area (including merchant plants). Further, FE is to determine the minimum dy- namic reactive reserves that must be main- 15. Correct the direct causes of the tained in local areas to ensure that these mini- August 14, 2003 blackout.18 mum voltages are met following contingencies NERC played an important role in the Task Force’s studied in accordance with ECAR Document 20 blackout investigation, and as a result of the find- 1. Criteria and minimum voltage require- ings of the investigation, NERC issued directives on ments must comply with NERC planning crite- February 10, 2004 to FirstEnergy, MISO, and PJM ria, including Table 1A, Category C3, and Oper- to complete a series of remedial actions by June 30, ating Policy 2.21 2004 to correct deficiencies identified as factors Task Force: contributing to the blackout of August 14, 2003. Recommends that NERC appoint a team, (For specifics on the actions required by NERC, see joined by representatives from FERC and the Appendix D.) Ohio Public Utility Commission, to review The Task Force supports and endorses NERC’s and approve all such criteria. near-term requirements strongly. It recommends the addition of requirements pertaining to ECAR, 3. FE Actions Based on FERC-Ordered Study and several other additional elements, as described below. NERC: Required that when the FERC-ordered study is A. Corrective Actions to Be Completed by completed, FE is to adopt the planning and op- FirstEnergy by June 30, 2004 erating criteria determined as a result of that The full text of the remedial actions NERC has study and update the operating criteria and required that FirstEnergy (FE) complete by June 30 procedures for its system operators. If the study is provided in Appendix D. The Task Force recom- indicates a need for system reinforcement, FE mends the addition of certain elements to these is to develop a plan for developing such re- requirements, as described below. sources as soon as practical and develop opera- tional procedures or other mitigating programs 1. Examination of Other FE Service Areas to maintain safe operating conditions until The Task Force’s investigation found severe reac- such time that the necessary system reinforce- tive power and operations criteria deficiencies in ments can be made. the Cleveland-Akron area.

150 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G Task Force: capability may be provided by automatic or Recommends that a team appointed by NERC manual load shedding, voltage reduction, di- and joined by representatives from FERC and rect-controlled commercial or residential load the Ohio Public Utility Commission should re- management, or any other method or combina- view and approve this plan. tion of methods capable of achieving the 1500 MW of reduction in ten minutes without ad- 4. Reactive Resources versely affecting other interconnected systems. NERC: The amount of required load reduction capabil- Required that FE inspect all reactive resources, ity may be modified to an amount shown by the including generators, and ensure that all are FERC-ordered study to be sufficient for re- fully operational. FE is also required to verify sponse to severe contingencies and if approved that all installed capacitors have no blown by ECAR and NERC. fuses and that at least 98% of installed capaci- Task Force: tors (69 kV and higher) are available for service Recommends that NERC require MISO’s ap- during the summer of 2004. proval of any change in the amount of re- Task Force: quired load reduction capability. It also rec- Recommends that NERC also require FE to ommends that NERC require FE’s load reduc- confirm that all non-utility generators in its tion plan to be shared with the Ohio Public area have entered into contracts for the sale of Utilities Commission and that FE should com- generation committing them to producing in- municate with all communities in the affected creased or maximum reactive power when areas about the plan and its potential conse- called upon by FE or MISO to do so. Such con- quences. tracts should ensure that the generator would 7. Emergency Response Plan be compensated for revenue losses associated NERC: with a reduction in real power sales in order Required that FE develop an emergency re- to increase production of reactive power. sponse plan, including arrangements for de- ploying the load reduction capabilities noted 5. Operational Preparedness and Action Plan above. The plan is to include criteria for deter- NERC: mining the existence of an emergency and Required that FE prepare and submit to ECAR identify various possible states of emergency. an Operational Preparedness and Action Plan The plan is to include detailed operating proce- to ensure system security and full compliance dures and communication protocols with all with NERC and planning and operating crite- the relevant entities including MISO, FE opera- ria, including ECAR Document 1. tors, and market participants within the FE Task Force: area that have an ability to vary generation out- Recommends that NERC require copies of this put or shed load upon orders from FE opera- plan to be provided to FERC, DOE, the Ohio tors. The plan should include procedures for Public Utility Commission, and the public load restoration after the declaration that the utility commissions in other states in which FE system is no longer in an emergency operat- FE operates. The Task Force also recommends ing state. that NERC require FE to invite its system oper- Task Force: ations partners—control areas adjacent to FE, Recommends that NERC require FE to offer its plus MISO, ECAR, and PJM—to participate in system operations partners—i.e., control ar- the development of the plan and agree to its eas adjacent to FE, plus MISO, ECAR, and implementation in all aspects that could affect PJM—an opportunity to contribute to the de- their respective systems and operations. velopment of the plan and agree to its key pro- visions. 6. Emergency Response Resources NERC: 8. Operator Communications Required that FE develop a capability to reduce NERC: load in the Cleveland-Akron area by 1500 MW Required that FE develop communications pro- within ten minutes of a directive to do so by cedures for FE operating personnel to use MISO or the FE system operator. Such a within FE, with MISO and neighboring

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 151 systems, and others. The procedure and the op- 11. Emergency Preparedness Training for erating environment within the FE system con- Operators trol center should allow control room staff to NERC: focus on reliable system operations and avoid Required that all reliability coordinators, con- distractions such as calls from customers and trol areas, and transmission operators provide others who are not responsible for operation of at least five days of training and drills using re- a portion of the transmission system. alistic simulation of system emergencies for Task Force: each staff person with responsibility for the Recommends that NERC require these proce- real-time operation or reliability monitoring of dures to be shared with and agreed to by con- the bulk electric system. This system emer- trol areas adjacent to FE, plus MISO, ECAR, gency training is in addition to other training and PJM, and any other affected system opera- requirements. The term “realistic simulation” tions partners, and that these procedures be includes a variety of tools and methods that tested in a joint drill. present operating personnel with situations to improve and test diagnostic and decision- 9. Reliability Monitoring and System Manage- making skills in an environment that resembles ment Tools expected conditions during a particular type of system emergency. NERC: Required that FE ensure that its state estimator Task Force: and real-time contingency analysis functions Recommends that to provide effective training are used to execute reliably full contingency before June 30, 2004, NERC should require FE analyses automatically every ten minutes or on to consider seeking the assistance of another demand, and used to notify operators of poten- control area or reliability coordinator known tial first contingency violations. to have a quality training program (such as IMO or ISO-New England) to provide the Task Force: needed training with appropriate FE-specific Recommends that NERC also require FE to en- modifications. sure that its information technology support function does not change the effectiveness of B. Corrective Actions to be Completed by MISO reliability monitoring or management tools in by June 30, 2004 any way without the awareness and consent of its system operations staff. 1. Reliability Tools NERC: 10. GE XA21 System Updates and Transition to Required that MISO fully implement and test New Energy Management System its topology processor to provide its operating personnel a real-time view of the system status NERC: for all transmission lines operating and all gen- Required that until FE replaces its GE XA21 En- erating units within its system, and all critical ergy Management System, FE should imple- transmission lines and generating units in ment all current known fixes for the GE XA21 neighboring systems. Alarms should be pro- system necessary to ensure reliable and stable vided for operators for all critical transmission operation of critical reliability functions, and line outages and voltage violations. MISO is to particularly to correct the alarm processor fail- establish a means of exchanging outage infor- ure that occurred on August 14, 2003. mation with its members and adjacent systems Task Force: such that the MISO state estimator has accurate Recommends that NERC require FE to design and timely information to perform as designed. and test the transition to its planned new en- MISO is to fully implement and test its state es- ergy management system to ensure that the timation and real-time contingency analysis system functions effectively, that the transi- tools to ensure they can operate reliably no less tion is made smoothly, that the system’s oper- than every ten minutes. MISO is to provide ators are adequately trained, and that all op- backup capability for all functions critical to erating partners are aware of the transition. reliability.

152 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G Task Force: ECAR should consult with an expert team ap- Recommends that NERC require MISO to en- pointed by NERC—joined by representatives sure that its information technology support from FERC, DOE, interested state commis- staff does not change the effectiveness of reli- sions, and MISO—to develop better modeling ability monitoring or management tools in procedures and scenarios, and obtain review any way without the awareness and consent of future assessments by the expert team. of its system operations staff. 2. Verification of Data and Assumptions 2. Operating Agreements Task Force: NERC: Recommends that NERC require ECAR to re- Required that MISO reevaluate its operating examine and validate all data and model as- agreements with member entities to verify its sumptions against current physical asset ca- authority to address operating issues, includ- pabilities and match modeled assets (such as ing voltage and reactive management, voltage line characteristics and ratings, and generator scheduling, the deployment and redispatch of reactive power output capabilities) to current real and reactive reserves for emergency re- operating study assessments. sponse, and the authority to direct actions dur- ing system emergencies, including shedding 3. Ensure Consistency of Members’ Data load. Task Force: Task Force: Recommends that NERC require ECAR to con- Recommends that NERC require that any duct a data validation and exchange exercise problems or concerns related to these operat- to be sure that its members are using accurate, ing issues be raised promptly with FERC and consistent, and current physical asset charac- MISO’s members for resolution. teristics and capabilities for both long-term and seasonal assessments and operating stud- C. Corrective Actions to be Completed by PJM ies. by June 30, 2004 NERC: E. Task Force Recommendation for Corrective Required that PJM reevaluate and improve its Actions to be Completed by Other Parties by communications protocols and procedures be- June 30, 2004 tween PJM and its neighboring control areas Task Force: and reliability coordinators. Recommends that NERC require each North Task Force: American reliability coordinator, reliability Recommends that NERC require definitions council, control area, and transmission com- and usages of key terms be standardized, and pany not directly addressed above to review non-essential communications be minimized the actions required above and determine during disturbances, alerts, or emergencies. whether it has adequate system facilities, op- NERC should also require PJM, MISO, and erational procedures, tools, and training to their member companies to conduct one or ensure reliable operations for the summer of more joint drills using the new communica- 2004. If any entity finds that improvements tions procedures. are needed, it should immediately undertake the needed improvements, and coordinate them with its neighbors and partners as neces- D. Task Force Recommendations for Corrective sary. Actions to be Completed by ECAR by August 14, 2004 The Task Force also recommends that FERC and government agencies in Canada require 1. Modeling and Assessments all entities under their jurisdiction who are Task Force: users of GE/Harris XA21 Energy Management Recommends that NERC require ECAR to re- Systems to consult the vendor and ensure that evaluate its modeling procedures, assump- appropriate actions have been taken to avert tions, scenarios and data for seasonal assess- any recurrence of the malfunction that oc- ments and extreme conditions evaluations. curred on FE’s system on August 14.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 153 16. Establish enforceable standards for A. Enforceable Standards maintenance of electrical clearances in NERC should work with FERC, government agen- right-of-way areas.22 cies in Canada, state regulatory agencies, the Insti- tute of Electrical and Electronic Engineers (IEEE), On February 10, the NERC Board directed the NERC Compliance Program and the regional coun- utility arborists, and other experts from the U.S. cils to initiate a joint program for reporting all and Canada to develop clear, unambiguous stan- bulk electric system transmission line trips result- dards pertaining to maintenance of safe clearances ing from vegetation contact. Based on the results of of transmission lines from obstructions in the these filings, NERC is to consider the development lines’ right-of-way areas, and procedures to verify of minimum line clearance standards to ensure compliance with the standards. States, provinces, reliability. and local governments should remain free to set The Task Force believes that more aggressive more specific or higher standards as they deem action is warranted. NERC should work with necessary for their respective areas. FERC, appropriate authorities in Canada, state reg- B. Right-of-Way Management Plan ulatory agencies, the Institute of Electrical and Electronic Engineers (IEEE), utility arborists, and NERC should require each bulk electric transmis- other experts from the US and Canada to develop sion operator to publish annually a proposed clear, unambiguous standards pertaining to main- right-of-way management plan on its public tenance of safe clearances of transmission lines website, and a report on its right-of-way manage- from obstructions in the lines’ right-of-way areas, ment activities for the previous year. The manage- and to develop a mechanism to verify compliance ment plan should include the planned frequency with the standards and impose penalties for non- of actions such as right-of-way trimming, herbi- compliance. cide treatment, and inspections, and the report should give the dates when the rights-of-way in a Ineffective vegetation management was a major given district were last inspected and corrective cause of the August 14, 2003, blackout and it was actions taken. also a causal factor in other large-scale North American outages such as those that occurred in C. Requirement to Report Outages Due to the summer of 1996 in the western United States. Ground Faults in Right-of-Way Areas Maintaining transmission line rights-of-way, Beginning with an effective date of March 31, including maintaining safe clearances of ener- 2004, NERC should require each transmission gized lines from vegetation, man-made structures, owner/operator to submit quarterly reports of all bird nests, etc., requires substantial expenditures ground-fault line trips, including their causes, on in many areas of North America. However, such lines of 115 kV and higher in its footprint to the maintenance is a critical investment for ensuring a regional councils. Failure to report such trips reliable electric system. For a review of current should lead to an appropriate penalty. Each issues pertaining to utility vegetation manage- regional council should assemble a detailed ment programs, see Utility Vegetation Manage- annual report on ground fault line trips and their ment Final Report, March 2004.23 causes in its area to FERC, NERC, DOE, appropri- ate authorities in Canada, and state regulators no NERC does not presently have standards for later than March 31 for the preceding year, with right-of-way maintenance. However, it has stan- the first annual report to be filed in March 2005 for dards requiring that line ratings be set to maintain calendar year 2004. safe clearances from all obstructions. Line rating standards should be reviewed to ensure that they D. Transmission-Related Vegetation Manage- are sufficiently clear and explicit. In the United ment Expenses, if Prudently Incurred, States, National Electrical Safety Code (NESC) Should be Recoverable through Electric rules specify safety clearances required for over- Rates head conductors from grounded objects and other The level of activity in vegetation management types of obstructions, but those rules are subject to programs in many utilities and states has fluctu- broad interpretation. Several states have adopted ated widely from year to year, due in part to incon- their own electrical safety codes and similar codes sistent funding and varying management support. apply in Canada and its provinces. A mechanism Utility managers and regulators should recognize is needed to verify compliance with these require- the importance of effective vegetation manage- ments and to penalize noncompliance. ment to transmission system reliability, and that

154 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G changes in vegetation management may be needed regulatory authorities in the United States and in response to weather, insect infestations, and Canada. other factors. Transmission vegetation manage- Task Force: ment programs should be consistently funded and Recommends that NERC inform the federal proactively managed to maintain and improve and state or provincial authorities of both system reliability. countries of the final results of all enforce- ment proceedings, and make the results of 17. Strengthen the NERC Compliance such proceedings public. Enforcement Program. C. Violations in August 14, 2003 Blackout On February 10, 2004, the NERC Board of Trustees approved directives to the regional reliability NERC: councils that will significantly strengthen NERC’s The Compliance and Standards investigation existing Compliance Enforcement Program. The team will issue a final report in March or April Task Force supports these directives strongly, and of 2004 of violations of NERC and regional recommends certain additional actions, as standards that occurred on August 14. (Seven described below.24 violations are noted in this report (pages 19- 20), but additional violations may be identified A. Reporting of Violations by NERC.) Within three months of the issuance NERC: of the report, NERC is to develop recommenda- Requires each regional council to report to the tions to improve the compliance process. NERC Compliance Enforcement Program Task Force: within one month of occurrence all “significant Recommends that NERC make its recommen- violations” of NERC operating policies and dations available to appropriate U.S. federal planning standards and regional standards, and state authorities, to appropriate authori- whether verified or still under investigation by ties in Canada, and to the public. the regional council. (A “significant violation” is one that could directly reduce the integrity of D. Compliance Audits the interconnected power systems or otherwise cause unfavorable risk to the interconnected NERC: power systems.) In addition, each regional Established plans for two types of audits, com- council is to report quarterly to NERC, in a for- pliance audits and readiness audits. Compli- mat prescribed by NERC, all violations of ance audits would determine whether the sub- NERC and regional reliability standards. ject entity is in documented compliance with NERC standards, policies, etc. Readiness au- Task Force: dits focus on whether the entity is functionally Recommends that NERC require the regional capable of meeting the terms of its reliability re- councils’ quarterly reports and reports on sig- sponsibilities. Under the terms approved by nificant violations be filed as public docu- NERC’s Board, the readiness audits to be com- ments with FERC and appropriate authorities pleted by June 30, 2004, will be conducted us- in Canada, at the same time that they are sent ing existing NERC rules, policies, standards, to NERC. and NERC compliance templates. Require- ments for control areas will be based on the ex- B. Enforcement Action by NERC Board isting NERC Control Area Certification Proce- NERC: dure, and updated as new criteria are ap- After being presented with the results of the in- proved. vestigation of a significant violation, the Board Task Force: is to require an offending organization to cor- Supports the NERC effort to verify that all rect the violation within a specified time. If the entities are compliant with reliability stan- Board determines that the organization is dards. Effective compliance and auditing will non-responsive and continues to cause a risk to require that the NERC standards be im- the reliability of the interconnected power sys- proved rapidly to make them clear, unambig- tems, the Board will seek to remedy the viola- uous, measurable, and consistent with the tion by requesting assistance from appropriate Functional Model.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 155 E. Audit Standards and Composition of Audit A. Readiness Audits Teams NERC: NERC: In its directives of February 10, 2004, NERC in- Under the terms approved by the Board, the re- dicated that it and the regional councils would gional councils are to have primary responsi- jointly establish a program to audit the reliabil- bility for conducting the compliance audits, ity readiness of all reliability coordinators and under the oversight and direct participation of control areas within three years and continuing staff from the NERC Compliance Enforcement thereafter on a three-year cycle. Twenty audits Program. FERC and other relevant regulatory of high-priority areas will be completed by June agencies will be invited to participate in the au- 30, 2004, with particular attention to deficien- dits, subject to the same confidentiality condi- cies identified in the investigation of the Au- tions as the other team members. gust 14 blackout. Task Force: Task Force: Recommends that each team should have Recommends that the remainder of the first some members who are electric reliability ex- round of audits be completed within two perts from outside the region in which the au- years, as compared to NERC’s plan for three dit is occurring. Also, some team members years. should be from outside the electricity indus- try, i.e., individuals with experience in sys- B. Public Release of Readiness Audit Reports tems engineering and management, such as Task Force: persons from the nuclear power industry, the Recommends that NERC require all readiness U.S. Navy, the aerospace industry, air traffic audit reports to be publicly posted, excluding control, or other relevant industries or gov- portions pertaining to physical and cyber se- ernment agencies. To improve the objectivity curity. Reports should also be sent directly to and consistency of investigation and perfor- DOE, FERC, and relevant authorities in Can- mance, NERC-organized teams should con- ada and state commissions. Such reports duct these compliance audits, using NERC cri- should draw clear distinctions between seri- teria (with regional variations if more strin- ous and minor violations of reliability stan- gent), as opposed to the regional councils us- dards or related requirements. ing regionally developed criteria.

F. Public Release of Compliance Audit Reports 19. Improve near-term and long-term Task Force: training and certification requirements Recommends that NERC require all compli- for operators, reliability coordinators, ance audit reports to be publicly posted, ex- and operator support staff.26 cluding portions pertaining to physical and In its requirements of February 10, 2004, NERC cyber security according to predetermined directed that all reliability coordinators, control criteria. Such reports should draw clear dis- areas, and transmission operators are to provide at tinctions between serious and minor viola- least five days per year of training and drills in tions of reliability standards or related re- system emergencies, using realistic simulations, for quirements. each staff person with responsibility for the real-time operation or reliability monitoring of the bulk electric system. This system emergency train- 18. Support and strengthen NERC’s Reli- ing is in addition to other training requirements. ability Readiness Audit Program.25 Five days of system emergency training and drills are to be completed by June 30, 2004. On February 10, 2004, the NERC Board of Trustees approved the establishment of a NERC program for The Task Force supports these near-term require- periodic reviews of the reliability readiness of all ments strongly. For the long term, the Task Force reliability coordinators and control areas. The recommends that: Task Force strongly supports this action, and rec- A. NERC should require training for the planning ommends certain additional measures, as staff at control areas and reliability coordina- described below. tors concerning power system characteristics

156 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G and load, VAr, and voltage limits, to enable NERC’s near-term requirements for emergency them to develop rules for operating staff to fol- preparedness training are described above. For the low. long term, training for system emergencies should B. NERC should require control areas and reliabil- be fully integrated into the broader training pro- ity coordinators to train grid operators, IT sup- grams required for all system planners, system port personnel, and their supervisors to operators, their supervisors, and other control recognize and respond to abnormal automation room support staff. system activity. C. NERC should commission an advisory report by Advisory Report by Independent Panel on an independent panel to address a wide range Industry Training Programs and Certification of issues concerning reliability training pro- Requirements grams and certification requirements. Under the oversight of FERC and appropriate The Task Force investigation team found that Canadian authorities, the Task Force recommends some reliability coordinators and control area that NERC commission an independent advisory operators had not received adequate training in panel of experts to design and propose minimum recognizing and responding to system emergen- training programs and certification procedures for cies. Most notable was the lack of realistic simula- the industry’s control room managers and staff. tions and drills to train and verify the capabilities This panel should be comprised of experts from of operating personnel. Such simulations are electric industry organizations with outstanding essential if operators and other staff are to be able training programs, universities, and other indus- to respond adequately to emergencies. This train- tries that operate large safety or reliability- ing deficiency contributed to the lack of situa- oriented systems and training programs. (The tional awareness and failure to declare an Institute of Nuclear Power Operations (INPO), for emergency on August 14 while operator interven- example, provides training and other safety- tion was still possible (before events began to related services to operators of U.S. nuclear power occur at a speed beyond human control). plants and plants in other countries.) The panel’s report should provide guidance on issues such as: Control rooms must remain functional under a wide range of possible conditions. Any person 1. Content of programs for new trainees with access to a control room should be trained so 2. Content of programs for existing operators and that he or she understands the basic functions of other categories of employees the control room, and his or her role in relation to those of others in the room under any conditions. 3. Content of continuing education programs and Information technology (IT) staff, in particular, fraction of employee time to be committed to should have a detailed understanding of the infor- ongoing training mation needs of the system operators under alter- native conditions. 4. Going beyond paper-based, fact-oriented “knowledge” requirements for operators—i.e., The Task Force’s cyber investigation team noted confirming that an individual has the ability to in its site visits an increasing reliance by control cope with unforeseen situations and areas and utilities on automated systems to mea- emergencies sure, report on, and change a wide variety of phys- ical processes associated with utility operations.27 5. In-house training vs. training by independent If anything, this trend is likely to intensify in the parties future. These systems enable the achievement of 6. Periodic accreditation of training programs major operational efficiencies, but their failure could cause or contribute to blackouts, as evi- 7. Who should certify trained staff? denced by the alarm failures at FirstEnergy and 8. Criteria to establish grades or levels of operator the state estimator deactivation at MISO. qualifications from entry level to supervisor or Grid operators should be trained to recognize and manager, based on education, training, and respond more efficiently to security and automa- experience. tion problems, reinforced through the use of peri- odic exercises. Likewise, IT support personnel The panel’s report should be delivered by March should be better trained to understand and 31, 2005. FERC and Canadian authorities, in con- respond to the requirements of grid operators dur- sultation with NERC and others, should evaluate ing security and IT incidents. the report and consider its findings in setting

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 157 minimum training and certification requirements needed, between “normal” and “emergency.” In for control areas and reliability coordinators. some areas (e.g., NPCC) an “alert” state has already been established. 20. Establish clear definitions for normal, alert and emergency operational sys- 21. Make more effective and wider use of tem conditions. Clarify roles, responsi- system protection measures.29 bilities, and authorities of reliability In its requirements of February 10, 2004, NERC: coordinators and control areas under each condition.28 A. Directed all transmission owners to evaluate the settings of zone 3 relays on all transmission NERC should develop by June 30, 2004 definitions lines of 230 kV and higher. for normal, alert, and emergency system condi- B. Directed all regional councils to evaluate the tions, and clarify reliability coordinator and con- feasibility and benefits of installing trol area functions, responsibilities, required under-voltage load shedding capability in load capabilities, and required authorities under each centers. operational system condition. C. Called for an evaluation within one year of its System operators need common definitions for planning standard on system protection and normal, alert, and emergency conditions to enable control to take into account the lessons from the them to act appropriately and predictably as sys- August 14 blackout. tem conditions change. On August 14, the princi- pal entities involved in the blackout did not have a The Task Force supports these actions strongly, and recommends certain additional measures, as shared understanding of whether the grid was in described below. an emergency condition, nor did they have a com- mon understanding of the functions, responsibili- A. Evaluation of Zone 3 Relays ties, capabilities, and authorities of reliability NERC: coordinators and control areas under emergency Industry is to review zone 3 relays on lines of or near-emergency conditions. 230 kV and higher. NERC: Task Force: On February 10, 2004, NERC’s Board of Recommends that NERC broaden the review Trustees directed NERC’s Operating Commit- to include operationally significant 115 kV tee to “clarify reliability coordinator and con- and 138 kV lines, e.g., lines that are part of trol area functions, responsibilities, capabili- monitored flowgates or interfaces. Transmis- ties, and authorities” by June 30, 2004. sion owners should also look for zone 2 relays set to operate like zone 3s. Task Force: Recommends that NERC go further and de- B. Evaluation of Applicability of Under-Voltage velop clear definitions of three operating sys- Load Shedding tem conditions, along with clear statements of the roles and responsibilities of all partici- NERC: pants, to ensure effective and timely actions in Required each regional reliability council to critical situations. evaluate the feasibility and benefits of un- der-voltage load shedding (UVLS) capability in Designating three alternative system conditions load centers that could become unstable as a re- (normal, alert, and emergency) would help grid sult of insufficient reactive power following managers to avert and deal with emergencies credible multiple-contingency events. The re- through preventive action. Many difficult situa- gions should complete the initial studies and tions are avoidable through strict adherence to report the results to NERC within one year. The sound procedures during normal operations. regions should promote the installation of un- However, unanticipated difficulties short of an der-voltage load shedding capabilities within emergency still arise, and they must be addressed critical areas where beneficial, as determined swiftly and skillfully to prevent them from becom- by the studies to be effective in preventing or ing emergencies. Doing so requires a high level of containing an uncontrolled cascade of the situational awareness that is difficult to sustain power system. indefinitely, so an intermediate “alert” state is

158 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G Task Force: year the real-time operating tools necessary for Recommends that NERC require the results of reliability operation and reliability coordination, the regional studies to be provided to federal including backup capabilities. The committee’s and state or provincial regulators at the same report is to address both minimum acceptable time that they are reported to NERC. In addi- capabilities for critical reliability functions and a guide to best practices. tion, NERC should require every entity with a new or existing UVLS program to have a The Task Force supports these requirements well-documented set of guidelines for opera- strongly. It recommends that NERC require the tors that specify the conditions and triggers for committee to: UVLS use. A. Give particular attention in its report to the development of guidance to control areas and C. Evaluation of NERC’s Planning Standard III reliability coordinators on the use of automated NERC: wide-area situation visualization display sys- Plans to evaluate Planning Standard III, System tems and the integrity of data used in those sys- tems. Protection and Control, and propose, by March 1, 2005, specific revisions to the criteria to ad- B. Prepare its report in consultation with FERC, dress adequately the issue of slowing or limit- appropriate authorities in Canada, DOE, and ing the propagation of a cascading failure, in the regional councils. The report should also light of the experience gained on August 14. inform actions by FERC and Canadian government agencies to establish minimum Task Force: functional requirements for control area opera- Recommends that NERC, as part of the review tors and reliability coordinators. of Planning Standard III, determine the goals and principles needed to establish an inte- The Task Force also recommends that FERC, DHS, and appropriate authorities in Canada should grated approach to relay protection for gener- require annual independent testing and certifica- ators and transmission lines and the use of un- tion of industry EMS and SCADA systems to ensure der-frequency and under-voltage load shed- that they meet the minimum requirements envi- ding (UFLS and UVLS) programs. An inte- sioned in Recommendation 3. grated approach is needed to ensure that at the local and regional level these interactive com- A principal cause of the August 14 blackout was a ponents provide an appropriate balance of lack of situational awareness, which was in turn risks and benefits in terms of protecting spe- the result of inadequate reliability tools and cific assets and facilitating overall grid sur- backup capabilities. In addition, the failure of FE’s vival. This review should take into account control computers and alarm system contributed the evidence from August 14 of some unin- directly to the lack of situational awareness. Like- tended consequences of installing Zone 3 re- wise, MISO’s incomplete tool set and the failure to lays and using manufacturer-recommended supply its state estimator with correct system data settings for relays protecting generators. It on August 14 contributed to the lack of situational should also include an assessment of the ap- awareness. The need for improved visualization propriate role and scope of UFLS and UVLS, capabilities over a wide geographic area has been a and the appropriate use of time delays in re- recurrent theme in blackout investigations. Some lays. wide-area tools to aid situational awareness (e.g., Recommends that in this effort NERC should real-time phasor measurement systems) have been work with industry and government research tested in some regions but are not yet in general organizations to assess the applicability of ex- use. Improvements in this area will require signifi- isting and new technology to make the inter- cant new investments involving existing or emerg- connections less susceptible to cascading out- ing technologies. ages. The investigation of the August 14 blackout revealed that there has been no consistent means 22. Evaluate and adopt better real-time across the Eastern Interconnection to provide an tools for operators and reliability coor- understanding of the status of the power grid out- dinators.30 side of a control area. Improved visibility of the status of the grid beyond an operator’s own area of NERC’s requirements of February 10, 2004, direct control would aid the operator in making adjust- its Operating Committee to evaluate within one ments in its operations to mitigate potential

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 159 problems. The expanded view advocated above adequacy analysis comparable to that outlined in would also enable facilities to be more proactive in FERC’s December 24, 2003, Order33 to FirstEnergy operations and contingency planning. concerning the Cleveland-Akron area. The Task Annual testing and certification by independent, Force recommends that NERC develop a priori- qualified parties is needed because EMS and tized list for which areas and loads need this type SCADA systems are the nerve centers of bulk elec- of analysis and a schedule that ensures that the tric networks. Ensuring that these systems are analysis will be completed for all such load cen- functioning properly is critical to sound and reli- ters by December 31, 2005. able operation of the networks. 24. Improve quality of system modeling 34 23. Strengthen reactive power and voltage data and data exchange practices. control practices in all NERC regions.31 NERC’s requirements of February 10, 2004 direct that within one year the regional councils are to NERC’s requirements of February 10, 2004 call for establish and begin implementing criteria and pro- a reevaluation within one year of existing reactive cedures for validating data used in power flow power and voltage control standards and how they are being implemented in the ten NERC regions. models and dynamic simulations by benchmarking However, by June 30, 2004, ECAR is required to model data with actual system performance. Vali- review its reactive power and voltage criteria and dated modeling data shall be exchanged on an procedures, verify that its criteria and procedures inter-regional basis as needed for reliable system are being fully implemented in regional and mem- planning and operation. ber studies and operations, and report the results The Task Force supports these requirements to the NERC Board. strongly. The Task Force also recommends that The Task Force supports these requirements FERC and appropriate authorities in Canada strongly. It recommends that NERC require the require all generators, regardless of ownership, to regional analyses to include recommendations for collect and submit generator data to NERC, using a appropriate improvements in operations or facili- regulator-approved template. ties, and to be subject to rigorous peer review by experts from within and outside the affected areas. The after-the-fact models developed to simulate August 14 conditions and events found that the The Task Force also recommends that FERC and dynamic modeling assumptions for generator and appropriate authorities in Canada require all tar- load power factors in regional planning and oper- iffs or contracts for the sale of generation to ating models were frequently inaccurate. In par- include provisions specifying that the generators can be called upon to provide or increase reactive ticular, the assumptions of load power factor were power output if needed for reliability purposes, overly optimistic—loads were absorbing much and that the generators will be paid for any lost more reactive power than the pre-August 14 mod- revenues associated with a reduction of real power els indicated. Another suspected problem con- sales attributable to a required increase in the pro- cerns modeling of shunt capacitors under duction of reactive power. depressed voltage conditions. Reactive power problems were a significant factor NERC should work with the regional reliability in the August 14 outage, and they were also impor- councils to establish regional power system mod- tant elements in several of the earlier outages els that enable the sharing of consistent and vali- detailed in Chapter 7.32 Accordingly, the Task dated data among entities in the region. Power Force agrees that a comprehensive review is flow and transient stability simulations should be needed of North American practices with respect periodically benchmarked with actual system to managing reactive power requirements and events to validate model data. Viable load (includ- maintaining an appropriate balance among alter- ing load power factor) and generator testing pro- native types of reactive resources. grams are necessary to improve agreement Regional Analyses, Peer Reviews, and Follow- between power flows and dynamic simulations Up Actions and the actual system performance. The Task Force recommends that each regional During the data collection phase of the blackout reliability council, working with reliability coor- investigation, when control areas were asked for dinators and the control areas serving major load information pertaining to merchant generation centers, should conduct a rigorous reliability and within their area, the requested data was

160 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G frequently not available because the control area Washington, must be designed and operated in had not recorded the status or output of the gener- accordance with a higher level of reliability than ator at a given point in time. Some control area would be appropriate for sparsely populated operators also asserted that some of the data that parts of the country. It is essential that regional did exist was commercially sensitive or confiden- differences in terms of load and population den- tial. To correct such problems, the Task Force rec- sity be recognized in the application of planning ommends that FERC and authorities in Canada and operating criteria. Any move to adopt a require all generators, regardless of ownership, to national, “one size fits all” formula for all parts collect and submit generator data, according to a of the United States would be disastrous to regulator-approved template. reliability.... A strong transmission system designed and oper- 25. NERC should reevaluate its existing ated in accordance with weakened criteria reliability standards development pro- would be disastrous. Instead, a concerted effort cess and accelerate the adoption of should be undertaken to determine if existing enforceable standards.35 reliability criteria should be strengthened. Such an effort would recognize the geo-electrical mag- The Task Force recommends that, with support nitude of today’s interconnected networks, and from FERC and appropriate authorities in Canada, the increased complexities deregulation and NERC should: restructuring have introduced in planning and A. Re-examine its existing body of standards, operating North American power systems. Most guidelines, etc., to identify those that are most important, reliability should be considered a important and ensure that all concerns that higher priority than commercial use. Only merit standards are addressed in the plan for through strong standards and careful engineer- standards development. ing can unacceptable power failures like the B. Re-examine the plan to ensure that those that August 14 blackout be avoided in the future.37 are the most important or the most out-of-date are addressed early in the process. C. Build on existing provisions and focus on what 26. Tighten communications protocols, needs improvement, and incorporate compli- especially for communications during ance and readiness considerations into the alerts and emergencies. Upgrade com- drafting process. munication system hardware where 38 D. Re-examine the Standards Authorization appropriate. Request process to determine whether, for each NERC should work with reliability coordinators standard, a review and modification of an exist- and control area operators to improve the effective- ing standard would be more efficient than ness of internal and external communications dur- development of wholly new text for the stan- ing alerts, emergencies, or other critical situations, dard. and ensure that all key parties, including state and NERC has already begun a long-term, systematic local officials, receive timely and accurate infor- process to reevaluate its standards. It is of the mation. NERC should task the regional councils to greatest importance, however, that this process work together to develop communications proto- cols by December 31, 2004, and to assess and not dilute the content of the existing standards, report on the adequacy of emergency communica- nor conflict with the right of regions or other areas tions systems within their regions against the pro- to impose more stringent standards. The state of tocols by that date. New York, for example, operates under mandatory and more stringent reliability rules and standards On August 14, 2003, reliability coordinator and than those required by NERC and NPCC.36 control area communications regarding condi- tions in northeastern Ohio were in some cases Similarly, several commenters on the Interim ineffective, unprofessional, and confusing. Inef- Report wrote jointly that: fective communications contributed to a lack of NERC standards are the minimum—national situational awareness and precluded effective standards should always be minimum rather actions to prevent the cascade. Consistent applica- than absolute or “one size fits all” criteria. [Sys- tion of effective communications protocols, par- tems for] densely populated areas, like the ticularly during alerts and emergencies, is metropolitan areas of New York, Chicago, or essential to reliability. Standing hotline networks,

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 161 or a functional equivalent, should be established investigate future system disturbances, outages, for use in alerts and emergencies (as opposed to or blackouts. one-on-one phone calls) to ensure that all key par- B. NERC, reliability coordinators, control areas, ties are able to give and receive timely and accu- and transmission owners should determine rate information. where high speed power system disturbance recorders are needed on the system, and ensure that they are installed by December 31, 2004. 27. Develop enforceable standards for transmission line ratings.39 C. NERC should establish data recording proto- cols. NERC should develop clear, unambiguous require- ments for the calculation of transmission line D. FERC and appropriate authorities in Canada should ensure that the investments called for in ratings (including dynamic ratings), and require this recommendation will be recoverable that all lines of 115 kV or higher be rerated accord- through transmission rates. ing to these requirements by June 30, 2005. A valuable lesson from the August 14 blackout is As seen on August 14, inadequate vegetation man- the importance of having time-synchronized sys- agement can lead to the loss of transmission lines tem data recorders. The Task Force’s investigators that are not overloaded, at least not according to labored over thousands of data items to determine their rated limits. The investigation of the black- the sequence of events, much like putting together out, however, also found that even after allowing small pieces of a very large puzzle. That process for regional or geographic differences, there is still would have been significantly faster and easier if significant variation in how the ratings of existing there had been wider use of synchronized data lines have been calculated. This variation—in recording devices. terms of assumed ambient temperatures, wind NERC Planning Standard I.F, Disturbance Moni- speeds, conductor strength, and the purposes and toring, requires the use of recording devices for duration of normal, seasonal, and emergency rat- disturbance analysis. On August 14, time record- ings—makes the ratings themselves unclear, ers were frequently used but not synchronized to a inconsistent, and unreliable across a region or time standard. Today, at a relatively modest cost, between regions. This situation creates unneces- all digital fault recorders, digital event recorders, sary and unacceptable uncertainties about the safe and power system disturbance recorders can and carrying capacity of individual lines on the trans- should be time-stamped at the point of observa- mission networks. Further, the appropriate use of tion using a Global Positioning System (GPS) dynamic line ratings needs to be included in this synchronizing signal. (The GPS signals are syn- review because adjusting a line’s rating according chronized with the atomic clock maintained in to changes in ambient conditions may enable the Boulder, Colorado by the U.S. National Institute of line to carry a larger load while still meeting safety Standards and Technology.) Recording and time- requirements. synchronization equipment should be monitored and calibrated to assure accuracy and reliability. 28. Require use of time-synchronized data It is also important that data from automation sys- recorders.40 tems be retained at least for some minimum period, so that if necessary it can be archived to In its requirements of February 10, 2004, NERC directed the regional councils to define within one enable adequate analysis of events of particular year regional criteria for the application of syn- interest. chronized recording devices in key power plants and substations. 29. Evaluate and disseminate lessons The Task Force supports the intent of this require- learned during system restoration.41 ment strongly, but it recommends a broader In the requirements it issued on February 10, 2004, approach: NERC directed its Planning Committee to work A. FERC and appropriate authorities in Canada with the Operating Committee, NPCC, ECAR, and should require the use of data recorders syn- PJM to evaluate the and system restora- chronized by signals from the Global Posi- tion performance following the outage of August tioning System (GPS) on all categories of 14, and to report within one year the results of that facilities whose data may be needed to evaluation, with recommendations for

162 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G improvement. Within six months of the Planning estimators, real-time contingency analyzers, and Committee’s report, all regional councils are to other system monitoring tools. have reevaluated their plans and procedures to ensure an effective black start and restoration Further, NERC’s present operating policies do not capability within their region. specify adequately criteria for identifying those critical facilities within reliability coordinator and The Task Force supports these requirements strongly. In addition, the Task Force recommends control area footprints whose operating status that NERC should require the Planning Commit- could affect the reliability of neighboring systems. tee’s review to include consultation with appropri- This leads to uncertainty about which facilities ate stakeholder organizations in all areas that were should be monitored by both the reliability coordi- blacked out on August 14. nator for the region in which the facility is located and by one or more neighboring reliability The efforts to restore the power system and cus- coordinators. tomer service following the outage were generally effective, considering the massive amount of load lost and the large number of generators and trans- 31. Clarify that the transmission loading mission lines that tripped. Fortunately, the resto- relief (TLR) process should not be used ration was aided by the ability to energize in situations involving an actual viola- transmission from neighboring systems, thereby tion of an Operating Security Limit. speeding the recovery. Streamline the TLR process.43 Despite the apparent success of the restoration NERC should clarify that the TLR procedure is effort, it is important to evaluate the results in often too slow for use in situations in which an more detail to compare them with previous black- affected system is already in violation of an Oper- out/restoration studies and determine opportuni- ating Security Limit. NERC should also evaluate ties for improvement. Black start and restoration experience to date with the TLR procedure and plans are often developed through study of simu- propose by September 1, 2004, ways to make it less lated conditions. Robust testing of live systems is cumbersome. difficult because of the risk of disturbing the sys- The reviews of control area and reliability coordi- tem or interrupting customers. The August 14 nator transcripts from August 14 confirm that the blackout provides a valuable opportunity to TLR process is cumbersome, perhaps unnecessar- review actual events and experiences to learn how ily so, and not fast and predictable enough for use to better prepare for system black start and restora- situations in which an Operating Security Limit is tion in the future. That opportunity should not be close to or actually being violated. NERC should lost. develop an alternative to TLRs that can be used 30. Clarify criteria for identification of quickly to address alert and emergency operationally critical facilities, and conditions. improve dissemination of updated Group III. Physical and Cyber Security information on unplanned outages.42 of North American Bulk Power Systems NERC should work with the control areas and reli- ability coordinators to clarify the criteria for iden- 32. Implement NERC IT standards. tifying critical facilities whose operational status The Task Force recommends that NERC standards can affect the reliability of neighboring areas, and related to physical and cyber security should be to improve mechanisms for sharing information understood as being included within the body of about unplanned outages of such facilities in near standards to be made mandatory and enforceable real-time. in Recommendation No. 1. Further: The lack of accurate, near real-time information A. NERC should ensure that the industry has about unplanned outages degraded the perfor- implemented its Urgent Action Standard 1200; mance of state estimator and reliability assess- finalize, implement, and ensure membership ment functions on August 14. NERC and the compliance with its Reliability Standard 1300 industry must improve the mechanisms for shar- for Cyber Security and take actions to better ing outage information in the operating time hori- communicate and enforce these standards. zon (e.g., 15 minutes or less), to ensure the B. CAs and RCs should implement existing and accurate and timely sharing of outage data needed emerging NERC standards, develop and imple- by real-time operating tools such as state ment best practices and policies for IT and

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 163 security management, and authenticate and that cover self-certification, self-assess- authorize controls that address EMS automa- ment, and/or third-party audit. tion system ownership and boundaries. ã Work with federal, state, and provincial/terri- Interviews and analyses conducted by the SWG torial jurisdictional departments and agen- indicate that within some of the companies inter- cies to regularly update private and public viewed there are potential opportunities for cyber sector standards, policies, and other system compromise of EMS and their supporting guidance. IT infrastructure. Indications of procedural and u CAs and RCs: technical IT management vulnerabilities were ã Implement existing and emerging NERC observed in some facilities, such as unnecessary standards. software services not denied by default, loosely controlled system access and perimeter control, ã Develop and implement best practices and poor patch and configuration management, and policies for IT and security management poor system security documentation. drawing from existing NERC and government authorities’ best practices.45 These should An analysis of the more prevalent policies and include, but not necessarily be limited to: standards within the electricity sector revealed 1. Policies requiring that automation system that there is existing and expanding guidance on products be delivered and installed with standards within the sector to perform IT and unnecessary services deactivated in order information security management.44 NERC issued to improve “out-of-the-box security.” a temporary standard (Urgent Action Standard 1200, Cyber Security) on August 13, 2003, and is 2. The creation of centralized system admin- developing the formal Reliability Standard 1300 istration authority within each CA and RC for Cyber Security. Both start the industry down to manage access and permissions for auto- the correct path, but there is a need to communi- mation access (including vendor manage- cate and enforce these standards by providing the ment backdoors, links to other automation industry with recommended implementation systems, and administrative connections). guidance. Implementation guidance regarding ã Authenticate and authorize controls that these sector-wide standards is especially impor- address EMS automation system ownership tant given that implementation procedures may and boundaries, and ensure access is granted differ among CAs and RCs. only to users who have corresponding job responsibilities. In order to address the finding described above, the Task Force recommends: 33. Develop and deploy IT management u NERC: procedures. ã Ensure that the industry has implemented its Urgent Action Standard 1200 and determine CAs’ and RCs’ IT and EMS support personnel if the guidance contained therein needs to be should develop procedures for the development, testing, configuration, and implementation of tech- strengthened or amended in the ongoing nology related to EMS automation systems and also development of its Reliability Standard 1300 define and communicate information security and for Cyber Security. performance requirements to vendors on a continu- ã Finalize, implement, and ensure member- ing basis. Vendors should ensure that system ship compliance of its Reliability Standard upgrades, service packs, and bug fixes are made 1300 for Cyber Security and take actions to available to grid operators in a timely manner. better communicate and enforce these stan- Interviews and analyses conducted by the SWG dards. These actions should include, but not indicate that, in some instances, there were necessarily be limited to: ill-defined and/or undefined procedures for EMS 1. The provision of policy, process, and automation systems software and hardware devel- implementation guidance to CAs and RCs; opment, testing, deployment, and backup. In addi- and tion, there were specific instances of failures to perform system upgrade, version control, mainte- 2. The establishment of mechanisms for com- nance, rollback, and patch management tasks. pliance, audit, and enforcement. This may include recommendations, guidance, or At one CA, these procedural vulnerabilities were agreements between NERC, CAs and RCs compounded by inadequate, out-of-date, or non-

164 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G existing maintenance contracts with EMS vendors ineffective monitoring and control over EMS- and contractors. This could lead to situations supporting IT infrastructure and overall IT net- where grid operators could alter EMS components work health. In these cases, both grid operators without vendor notification or authorization as and IT support personnel did not have situational well as scenarios in which grid operators are not awareness of the health of the IT systems that pro- aware of or choose not to implement vendor- vide grid information both globally and locally. recommended patches and upgrades. This resulted in an inability to detect, assess, respond to, and recover from IT system-related cyber failures (failed hardware/software, mali- 34. Develop corporate-level IT security cious code, faulty configurations, etc.). governance and strategies. In order to address the finding described above, CAs and RCs and other grid-related organizations the Task Force recommends: should have a planned and documented security strategy, governance model, and architecture for u IT and EMS support personnel implement tech- EMS automation systems. nical controls to detect, respond to, and recover from system and network problems. Interviews and analysis conducted by the SWG indicate that in some organizations there is evi- u Grid operators, dispatchers, and IT and EMS dence of an inadequate security policy, gover- support personnel be provided with the tools nance model, strategy, or architecture for EMS and training to ensure that: automation systems. This is especially apparent ã The health of IT systems is monitored and with legacy EMS automation systems that were maintained. originally designed to be stand-alone systems but that are now interconnected with internal (corpo- ã These systems have the capability to be rate) and external (vendors, Open Access Same repaired and restored quickly, with a mini- Time Information Systems (OASIS), RCs, Internet, mum loss of time and access to global and etc.) networks. It should be noted that in some of internal grid information. the organizations interviewed this was not the ã Contingency and disaster recovery proce- case and in fact they appeared to excel in the areas dures exist and can serve to temporarily sub- of security policy, governance, strategy, and stitute for systems and communications architecture. failures during times when EMS automation In order to address the finding described above, system health is unknown or unreliable. the Task Force recommends that CAs, RCs, and ã Adequate verbal communication protocols other grid-related organizations have a planned and procedures exist between operators and and documented security strategy, governance IT and EMS support personnel so that opera- model, and architecture for EMS automation sys- tors are aware of any IT-related problems that tems covering items such as network design, sys- may be affecting their situational awareness tem design, security devices, access and of the power grid. authentication controls, and integrity manage- ment as well as backup, recovery, and contin- gency mechanisms. 36. Initiate a U.S.-Canada risk manage- ment study. 35. Implement controls to manage system health, network monitoring, and inci- In cooperation with the electricity sector, federal governments should strengthen and expand the dent management. scope of the existing risk management initiatives IT and EMS support personnel should implement by undertaking a bilateral (Canada-U.S.) study of technical controls to detect, respond to, and the vulnerabilities of shared electricity infrastruc- recover from system and network problems. Grid ture and cross border interdependencies. Common operators, dispatchers, and IT and EMS support threat and vulnerability assessment methodologies personnel should be provided the tools and train- should be also developed, based on the work ing to ensure that the health of IT systems is moni- undertaken in the pilot phase of the current joint tored and maintained. Canada-U.S. vulnerability assessment initiative, and their use promoted by CAs and RCs. To coin- Interviews and analysis conducted by the SWG cide with these initiatives, the electricity sector, in indicate that in some organizations there was association with federal governments, should

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 165 develop policies and best practices for effective (Canada-U.S.) study of the vulnerabilities of risk management and risk mitigation. shared electricity infrastructure and cross bor- Effective risk management is a key element in der interdependencies. The study should assuring the reliability of our critical infrastruc- encompass cyber, physical, and personnel tures. It is widely recognized that the increased security processes and include mitigation and reliance on IT by critical infrastructure sectors, best practices, identifying areas that would ben- including the energy sector, has increased the efit from further standardization. vulnerability of these systems to disruption via u Common threat and vulnerability assessment cyber means. The breadth of the August 14, 2003, methodologies should be developed, based on power outage illustrates the vulnerabilities and the work undertaken in the pilot phase of the interdependencies inherent in our electricity current joint Canada-U.S. vulnerability assess- infrastructure. ment initiative, and their use promoted by CAs Canada and the United States, recognizing the and RCs. importance of assessing the vulnerabilities of u The electricity sector, in association with fed- shared energy systems, included a provision to eral governments, should develop policies and address this issue in the Smart Border Declara- best practices for effective risk management and tion,46 signed on December 12, 2001. Both coun- risk mitigation. tries committed, pursuant to Action Item 21 of the Declaration, to “conduct bi-national threat assess- ments on trans-border infrastructure and identify 37. Improve IT forensic and diagnostic necessary protection measures, and initiate capabilities. assessments for transportation networks and other CAs and RCs should seek to improve internal critical infrastructure.” These joint assessments forensic and diagnostic capabilities, ensure that IT will serve to identify critical vulnerabilities, support personnel who support EMS automation strengths and weaknesses while promoting the systems are familiar with the systems’ design and sharing and transfer of knowledge and technology implementation, and make certain that IT support to the energy sector for self-assessment purposes. personnel who support EMS automation systems have are trained in using appropriate tools for A team of Canadian and American technical diagnostic and forensic analysis and remediation. experts, using methodology developed by the Argonne National Laboratory in Chicago, Illinois, Interviews and analyses conducted by the SWG began conducting the pilot phase of this work in indicate that, in some cases, IT support personnel January 2004. The work involves a series of joint who are responsible for EMS automation systems Canada-U.S. assessments of selected shared criti- are unable to perform forensic and diagnostic rou- cal energy infrastructure along the Canada-U.S. tines on those systems. This appears to stem from border, including the electrical transmission lines a lack of tools, documentation and technical skills. and dams at Niagara Falls - Ontario and New York. It should be noted that some of the organizations The pilot phase will be completed by March 31, interviewed excelled in this area but that overall 2004. performance was lacking. The findings of the ESWG and SWG suggest that In order to address the finding described above, among the companies directly involved in the the Task Force recommends: power outage, vulnerabilities and interdependen- u CAs and RCs seek to improve internal forensic cies of the electric system were not well under- and diagnostic capabilities as well as strengthen stood and thus effective risk management was coordination with external EMS vendors and inadequate. In some cases, risk assessments did contractors who can assist in servicing EMS not exist or were inadequate to support risk man- automation systems; agement and risk mitigation plans. u CAs and RCs ensure that IT support personnel In order to address these findings, the Task Force who support EMS automation systems are recommends: familiar with the systems’ design and imple- u In cooperation with the electricity sector, fed- mentation; and eral governments should strengthen and u CAs and RCs ensure that IT support personnel expand the scope of the existing initiatives who support EMS automation systems have described above by undertaking a bilateral access to and are trained in using appropriate

166 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G tools for diagnostic and forensic analysis and 40. Control access to operationally sensi- remediation. tive equipment. RCs and CAs should implement stringent policies 38. Assess IT risk and vulnerability at and procedures to control access to sensitive equip- scheduled intervals. ment and/or work areas. IT and EMS support personnel should perform reg- Interviews conducted by the SWG indicate that ular risk and vulnerability assessment activities at some CAs and RCs operationally sensitive for automation systems (including EMS applica- computer equipment was accessible to non- tions and underlying operating systems) to identify essential personnel. Although most of these non- weaknesses, high-risk areas, and mitigating actions essential personnel were escorted through sensi- such as improvements in policy, procedure, and tive areas, it was determined that this procedure technology. was not always enforced as a matter of everyday Interviews and analysis conducted by the SWG operations. indicate that in some instances risk and vulnera- In order to address the finding described above, bility management were not being performed on the Task Force recommends: EMS automation systems and their IT supporting u infrastructure. To some CAs, EMS automation sys- That RCs and CAs develop policies and proce- tems were considered “black box”47 technologies; dures to control access to sensitive equipment and this categorization removed them from the list and/or work areas to ensure that: of systems identified for risk and vulnerability ã Access is strictly limited to employees or con- assessment. tractors who utilize said equipment as part of their job responsibilities. ã Access for other staff who need access to sen- 39. Develop capability to detect wireless sitive areas and/or equipment but are not and remote wireline intrusion and directly involved in their operation (such as surveillance. cleaning staff and other administrative per- Both the private and public sector should promote sonnel) is strictly controlled (via escort) and the development of the capability of all CAs and monitored. RCs to reasonably detect intrusion and surveil- lance of wireless and remote wireline access points 41. NERC should provide guidance on and transmissions. CAs and RCs should also con- duct periodic reviews to ensure that their user base employee background checks. is in compliance with existing wireless and remote NERC should provide guidance on the implementa- wireline access rules and policies. tion of its recommended standards on background Interviews conducted by the SWG indicate that checks, and CAs and RCs should review their poli- most of the organizations interviewed had some cies regarding background checks to ensure they are adequate. type of wireless and remote wireline intrusion and surveillance detection protocol as a standard secu- Interviews conducted with sector participants rity policy; however, there is a need to improve revealed instances in which certain company con- and strengthen current capabilities regarding tract personnel did not have to undergo back- wireless and remote wireline intrusion and sur- ground check(s) as stringent as those performed veillance detection. The successful detection and on regular employees of a CA or RC. NERC Urgent monitoring of wireless and remote wireline access Action Standard Section 1207 Paragraph 2.3 spec- points and transmissions are critical to securing ifies steps to remediate sector weaknesses in this grid operations from a cyber security perspective. area but there is a need to communicate and enforce this standard by providing the industry There is also evidence that although many of the with recommended implementation guidance, organizations interviewed had strict policies which may differ among CAs and RCs. against allowing wireless network access, periodic reviews to ensure compliance with these policies In order to address the finding described above, were not undertaken. the Task Force recommends:

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 167 u NERC provide guidance on the implementation Interviews and analysis conducted by the SWG of its recommended standards on background further indicate an absence of coherent and effec- checks, especially as they relate to the screening tive mechanisms for the private sector to share of contracted and sub-contracted personnel. information related to critical infrastructure with government. There was also a lack of confidence u CAs and RCs review their policies regarding on the part of private sector infrastructure owners background checks to ensure they are adequate and grid operators that information shared with before allowing sub-contractor personnel to governments could be protected from disclosure access their facilities. under Canada’s Access to Information Act (ATIA) and the U.S. Freedom of Information Act (FOIA). 42. Confirm NERC ES-ISAC as the central On the U.S. side of the border, however, the immi- point for sharing security information nent implementation of the Critical Infrastructure and analysis. Information (CII) Act of 2002 should mitigate almost all industry concerns about FOIA disclo- The NERC ES-ISAC should be confirmed as the sure. In Canada, Public Safety and Emergency Pre- central electricity sector point of contact for secu- paredness Canada relies on a range of mechanisms rity incident reporting and analysis. Policies and to protect the sensitive information related to criti- protocols for cyber and physical incident reporting cal infrastructure that it receives from its private should be further developed including a mecha- nism for monitoring compliance. There also should sector stakeholders, including the exemptions for be uniform standards for the reporting and sharing third party information that currently exist in the of physical and cyber security incident information ATIA and other instruments. At the same time, across both the private and public sectors. Public Safety and Emergency Preparedness Can- ada is reviewing options for stronger protection of There are currently both private and public sector CI information, including potential changes in information sharing and analysis initiatives in legislation. place to address the reporting of physical and cyber security incidents within the electricity sec- In order to address the finding described above, tor. In the private sector, NERC operates an Elec- the Task Force recommends: tricity Sector Information Sharing and Analysis u Confirmation of the NERC ES-ISAC as the cen- Center (ES-ISAC) specifically to address this tral electricity sector point of contact for secu- issue. On behalf of the U.S. Government, the rity incident reporting and analysis. Department of Homeland Security (DHS) operates the Information Analysis and Infrastructure Pro- u Further development of NERC policies and pro- tection (IAIP) Directorate to collect, process, and tocols for cyber and physical incident reporting act upon information on possible cyber and physi- including a mechanism for monitoring cal security threats and vulnerabilities. In Canada, compliance. Public Safety and Emergency Preparedness Can- u The establishment of uniform standards for the ada has a 24/7 operations center for the reporting reporting of physical and cyber security inci- of incidents involving or impacting critical infra- dents to internal corporate security, private sec- structure. As well, both in Canada and the U.S., tor sector-specific information sharing and incidents of a criminal nature can be reported to analysis bodies (including ISACs), law enforce- law enforcement authorities of jurisdiction. ment, and government agencies. Despite these private and public physical and u The further development of new mechanisms cyber security information sharing and analysis and the promulgation of existing48 Canadian initiatives, an analysis of policies and procedures and U.S. mechanisms to facilitate the sharing of within the electricity sector reveals that reporting electricity sector threat and vulnerability infor- of security incidents to internal corporate secu- mation across governments as well as between rity, law enforcement, or government agencies the private sector and governments. was uneven across the sector. The fact that these existing channels for incident reporting—whether u Federal, state, and provincial/territorial govern- security- or electricity systems-related—are cur- ments work to further develop and promulgate rently underutilized is an operating deficiency measures and procedures that protect critical, which could hamper the industry’s ability to but sensitive, critical infrastructure-related address future problems in the electricity sector. information from disclosure.

168 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 43. Establish clear authority for physical these intelligence collection activities be reported and cyber security. to government authorities. The task force recommends that corporations In order to address the findings described above, establish clear authority and ownership for the Task Force recommends: physical and cyber security. This authority u The private and public sectors jointly develop should have the ability to influence and implement security procedures and aware- corporate decision-making and the authority to make physical and cyber security-related ness training in order to mitigate disclosure of decisions. information not suitable for the public domain and/or removal of previously available informa- Interviews and analysis conducted by the SWG tion in the public domain (web sites, message indicate that some power entities did not imple- boards, industry publications, etc.). ment best practices when organizing their security u The private and public sector jointly develop staff. It was noted at several entities that the Infor- and implement security procedures and aware mation System (IS) security staff reported to IT - ness training in order to mitigate or prevent dis support personnel such as the Chief Information - closure of information by the practices of Officer (CIO). elicitation. Best practices across the IT industry, including u The private and public sector jointly develop most large automated businesses, indicate that the and implement security procedures and aware- best way to balance security requirements prop- ness training in order to mitigate, prevent, and erly with the IT and operational requirements of a detect incidents of surveillance. company is to place security at a comparable level u within the organizational structure. By allowing Where no mechanism currently exists, the pri- the security staff a certain level of autonomy, man- vate and public sector jointly establish a secure agement can properly balance the associated risks reporting chain and protocol for use of the infor- and operational requirements of the facility. mation for suspected and known attempts and incidents of elicitation and surveillance.

44. Develop procedures to prevent or miti- Group IV. Canadian gate inappropriate disclosure of infor- Nuclear Power Sector mation. The private and public sectors should jointly The U.S. nuclear power plants affected by the develop and implement security procedures and August 14 blackout performed as designed. After awareness training in order to mitigate or prevent reviewing the design criteria and the response of disclosure of information by the practices of open the plants, the U.S. members of the Nuclear source collection, elicitation, or surveillance. Working Group had no recommendations relative to the U.S. nuclear power plants. SWG interviews and intelligence analysis provide no evidence of the use of open source collection, As discussed in Chapter 8, Canadian nuclear elicitation or surveillance against CAs or RCs lead- power plants did not trigger the power system out- ing up to the August 14, 2003, power outage. How- age or contribute to its spread. Rather, they dis- ever, such activities may be used by malicious connected from the grid as designed. The individuals, groups, or nation states engaged in Canadian members of the Nuclear Working Group intelligence collection in order to gain insights or have, therefore, no specific recommendations proprietary information on with respect to the design or operation of Cana- functions and capabilities. Open source collection dian nuclear plants that would improve the reli- is difficult to detect and thus is best countered ability of the Ontario electricity grid. The through careful consideration by industry stake- Canadian Nuclear Working Group, however, holders of the extent and nature of pub- made two recommendations to improve the licly-available information. Methods of elicitation response to future events involving the loss of and surveillance, by comparison, are more detect- off-site power, one concerning backup electrical able activities and may be addressed through generation equipment to the CNSC’s Emergency increased awareness and security training. In Operations Centre and another concerning the use addition to prevention and detection, it is equally of adjuster rods during future events involving the important that suspected or actual incidents of loss of off-site power. The Task Force accepted

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 169 these recommendations, which are presented The August 2003 power outage demonstrated that below. the CNSC’s Emergency Operations Center, IT, and communications equipment are vulnerable if 45. The Task Force recommends that the there is a loss of electricity to the Ottawa area. Canadian Nuclear Safety Commission Endnotes request Ontario Power Generation and 1 In fairness, it must be noted that reliability organizations in Bruce Power to review operating pro- some areas have worked diligently to implement recommen- cedures and operator training associ- dations from earlier blackouts. According to the Initial Report ated with the use of adjuster rods. by the New York State Department of Public Service on the August 14, 2003 Blackout, New York entities implemented all OPG and Bruce Power should review their operat- 100 of the recommendations issued after the New York City ing procedures to see whether alternative proce- blackout of 1977. dures could be put in place to carry out or reduce 2 The need for a systematic recommitment to reliability by the number of system checks required before plac- all affected organizations was supported in various ways by ing the adjuster rods into automatic mode. This many commenters on the Interim Report, including Anthony review should include an assessment of any regula- J. Alexander, FirstEnergy; David Barrie, Hydro One Networks, tory constraints placed on the use of the adjuster Inc.; Joseph P. Carson, P.E.; Harrison Clark; F. J. Delea, J.A. Casazza, G.C. Loehr, and R. M. Malizewski, Power Engineers rods, to ensure that risks are being appropriately Seeking Truth; Ajay Garg and Michael Penstone, Hydro One managed. Networks, Inc.; and Raymond K. Kershaw, International Transmission Company. Current operating procedures require independ- 3 ent checks of a reactor’s systems by the reactor See supporting comments expressed by Anthony J. Alex- ander, FirstEnergy; Deepak Divan, SoftSwitching Technol- operator and the control room supervisor before ogies; Pierre Guimond, Canadian Nuclear Association; Hans the reactor can be put in automatic mode to allow Konow, Canadian Electricity Association; Michael Penstone, the reactors to operate at 60% power levels. Alter- Hydro One Networks, Inc.; and James K. Robinson, PPL. native procedures to allow reactors to run at 60% 4 See “The Economic Impacts of the August 2003 Blackout,” of power while waiting for the grid to be Electric Consumers Resource Council (ELCON), February 2, re-established may reduce other risks to the health 2004. 5 and safety of Ontarians that arise from the loss of a The need for action to make standards enforceable was supported by many commenters, including David Barrie, key source of electricity. CNSC oversight and Hydro One Networks, Inc.; Carl Burrell, IMO Ontario; David approval of any changes to operating procedures Cook, North American Electric Reliability Council; Deepak would ensure that health and safety, security, or Divan, SoftSwitching Technologies; Charles J. Durkin, North- the environment are not compromised. The CNSC east Power Coordinating Council; David Goffin, Canadian Chemical Producers’ Association; Raymond K. Kershaw, would assess the outcome of the proposed review International Transmission Company; Hans Konow, Cana- to ensure that health and safety, security, and the dian Electricity Association; Barry Lawson, National Rural environment would not be compromised as a Electric Cooperative Association; William J. Museler, New result of any proposed action. York Independent System Operator; Eric B. Stephens, Ohio Consumers’ Counsel; Gordon Van Welie, ISO New England, Inc.; and C. Dortch Wright, on behalf of James McGreevey, 46. The Task Force recommends that the Governor of New Jersey. Canadian Nuclear Safety Commission 6 This recommendation was suggested by some members of purchase and install backup genera- the Electric System Working Group. 7 tion equipment. The need to evaluate and where appropriate strengthen the institutional framework for reliability management was sup- In order to ensure that the CNSC’s Emergency ported in various respects by many commenters, including Operations Center (EOC) is available and fully Anthony J. Alexander, FirstEnergy Corporation; David Barrie, functional during an emergency situation requiring Hydro One Networks, Inc.; Chris Booth, Experienced Consul- CNSC response, whether the emergency is tants LLC; Carl Burrell, IMO Ontario;Linda Campbell, Florida nuclear-related or otherwise, and that staff needed Reliability Coordinating Council; Linda Church Ciocci, National Hydropower Association; David Cook, NERC; F.J. to respond to the emergency can be accommodated Delea, J.A. Casazza, G.C. Loehr, and R.M. Malizewski, Power safely, the CNSC should have backup electrical Engineers Seeking Truth; Charles J. Durkin, Northeast Power generation equipment of sufficient capacity to pro- Coordinating Council; Ajay Garg and Michael Penstone, vide power to the EOC, telecommunications and Hydro One Networks, Inc.; Michael W. Golay, Massachusetts Information Technology (IT) systems and accom- Institute of Technology; Leonard S. Hyman, Private Sector modations for the CNSC staff needed to respond to Advisors, Inc; Marija Ilic, Carnegie Mellon University; Jack an emergency. Kerr, Dominion Virginia Power; Raymond K. Kershaw,

170 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G International Transmission Company; Paul Kleindorfer, Uni- 18 This subject was addressed in detail in comments by David versity of Pennsylvania; Michael Kormos, PJM Interconnec- Cook, North American Electric Reliability Council; and in tion; Bill Mittelstadt, Bonneville Power Administration; part by comments by Anthony J. Alexander, FirstEnergy Cor- William J. Museler, New York Independent System Operator; poration; Ajay Garg, Hydro One Networks, Inc.; George James K. Robinson, PPL; Eric B. Stephens, Ohio Consumers’ Katsuras, IMO Ontario; and Vickie Van Zandt, Bonneville Counsel; John Synesiou, IMS Corporation; Gordon Van Power Administration. Welie, ISO New England; Vickie Van Zandt, Bonneville 19 U.S. Federal Energy Regulatory Commission, 105 FERC ¶ Power Administration; and C. Dortch Wright, on behalf of 61,372, December 24, 2003. James McGreevey, Governor of New Jersey. 20 8 See ECAR website, Several commenters noted the importance of clarifying http://www.ecar.org/documents/document%201_6-98.pdf. that prudently incurred reliability expenses and investments 21 will be recoverable through regulator-approved rates. These See NERC website, http://www.nerc.com/standards/. commenters include Anthony J. Alexander, FirstEnergy Cor- 22 The need to ensure better maintenance of required electri- poration; Deepak Divan, SoftSwitching Technologies; Ste- cal clearances in transmission right of way areas was empha- phen Fairfax, MTechnology, Inc.; Michael W. Golay, sized by several commenters, including Richard E. Abbott, Massachusetts Institute of Technology; Pierre Guimond, arborist; Anthony J. Alexander, FirstEnergy Corporation; Canadian Nuclear Association; Raymond K. Kershaw, Inter- David Barrie, Hydro One Networks, Inc.; David Cook, North national Transmission Company; Paul R. Kleindorfer, Uni- American Electric Reliability Council; Ajay Garg and Michael versity of Pennsylvania: Hans Konow, Canadian Electricity Penstone, Hydro One Networks, Inc.; Tadashi Mano, Tokyo Association; Barry Lawson, National Rural Electric Coopera- Electric Power Company; Eric B. Stephens, Ohio Consumers’ tive Association; and Michael Penstone, Hydro One Net- Counsel; Vickie Van Zandt, Bonneville Power Administra- works, Inc. tion; and Donald Wightman, Utility Workers Union of 9 The concept of an ongoing NERC process to track the America. implementation of existing and subsequent recommenda- 23 Utility Vegetation Management Final Report, CN Utility tions was initated by NERC and broadened by members of the Consulting, LLC, March 2004, commissioned by the U.S. Fed- Electric System Working Group. See comments by David eral Energy Regulatory Commission to support the investiga- Cook, North American Electric Reliability Council. tion of the August 14, 2003 blackout. 10 This recommendation was suggested by NERC and sup- 24 The need to strengthen and verify compliance with NERC ported by members of the Electric System Working Group. standards was noted by several commenters. See comments 11 See comments by Jack Kerr, Dominion Virginia Power, and by David Barrie, Hydro One Networks, Inc.; Carl Burrell, IMO Margie Phillips, Pennsylvania Services Integration Ontario; David Cook, North American Electric Reliability Consortium. Council; and Eric B. Stephens, Ohio Consumers’ Counsel. 25 12 The concept of a “reliability impact consideration” was The need to verify application of NERC standards via suggested by NERC and supported by the Electric System readiness audits—before adverse incidents occur—was noted Working Group. by several commenters. See comments by David Barrie, 13 Hydro One Networks, Inc.; David Cook, North American Elec- The suggestion that EIA should become a source of reliabil- tric Reliability Council; Barry Lawson, National Rural Electric ity data and information came from a member of the Electric Cooperative Association; Bill Mittelstadt, Bonneville Power System Working Group. Administration; and Eric B. Stephens, Ohio Consumers’ 14 Several commenters raised the question of whether there Counsel. was a linkage between the emergence of competition (or 26 The need to improve the training and certification require- increased wholesale electricity trade) in electricity markets ments for control room management and staff drew many and the August 14 blackout. See comments by Anthony J. comments. See comments by David Cook, North American Alexander, FirstEnergy Corporation; F.J. Delea, J.A. Casazza, Electric Reliability Council; F.J. Delea, J.A. Casazza, G.C. G.C. Loehr, and R.M. Malizewski, Power Engineers Seeking Loehr, and R.M. Malizewski, Power Engineers Seeking Truth; Truth; Ajay Garg and Michael Penstone, Hydro One Net- Victoria Doumtchenko, MPR Associates; Pat Duran, IMO works, Inc.; Brian O’Keefe, Canadian Union of Public Ontario; Ajay Garg and Michael Penstone, Hydro One Net- Employees; Les Pereira; and John Wilson. works, Inc.; George Katsuras, IMO Ontario; Jack Kerr, Domin- 15 NIMBY: “Not In My Back Yard.” ion Virginia Power; Tim Kucey, National Energy Board, 16 Several commenters either suggested that government Canada; Stephen Lee, Electric Power Research Institute; Steve agencies should expand their research in reliability-related Leovy, personal comment; Ed Schwerdt, Northeast Power topics, or emphasized the need for such R&D more generally. Coordinating Council; Tapani O. Seppa, The Valley Group, See comments by Deepak Divan, SoftSwitching Technol- Inc.; Eric B. Stephens, Ohio Consumers’ Counsel; Vickie Van ogies; Marija Ilic, Carnegie Mellon University; Hans Konow, Zandt, Bonneville Power Company; Don Watkins, Bonneville Canadian Electricity Association; Stephen Lee, Electric Power Administration; and Donald Wightman, Utility Power Research Institute; James K. Robinson, PPL; John Workers Union of America. Synesiou, IMS Corporation; and C. Dortch Wright on behalf of 27 This reliance, and the risk of an undue dependence, is Governor James McGreevey of New Jersey. often unrecognized in the industry. 17 The concept of a standing framework for grid-related 28 Many parties called for clearer statement of the roles, investigations was initiated by members of the Electric Sys- responsibilities, and authorities of control areas and reliabil- tem Working Group, after noting that the U.S. National Aero- ity coordinators, particularly in emergency situations. See nautics and Space Administration (NASA) had created a comments by Anthony J. Alexander, FirstEnergy Corporation; similar arrangement after the Challenger explosion in 1986. Chris Booth, Experienced Consultants LLC; Michael This framework was put to use immediately after the loss of Calimano, New York ISO; Linda Campbell, Florida Reliability the shuttle Columbia in 2003. Coordinating Council; David Cook, North American Electric

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 171 Reliability Council; F.J. Delea, J.A. Casazza, G.C. Loehr, and Delea, J.A. Casazza, G.C. Loehr, and R.M. Malizewski, Power R.M. Malizewski, Power Engineers Seeking Truth; Mark Engineers Seeking Truth; Stephen Fairfax, MTechnology, Fidrych, Western Area Power Authority; Ajay Garg and Inc.; Ajay Garg and Michael Penstone, Hydro One Networks, Michael Penstone, Hydro One Networks, Inc.; Carl Hauser, Inc.; Shinichi Imai and Toshihiko Furuya, Tokyo Electric Washington State University; Stephen Kellat; Jack Kerr, Power Company; Marija Ilic, Carnegie Mellon University; Dominion Virginia Power; Raymond K. Kershaw, Interna- Frank Macedo, Hydro One Networks, Inc.; and Tom tional Transmission Company; Michael Kormos, PJM Inter- Wiedman, Consolidated Edison. Several commenters connection; William J. Museler, New York Independent addressed issues related to the production of reactive power System Operator; Tapani O. Seppa, The Valley Group, Inc.; by producers of power for sale in wholesale markets. See com- John Synesiou, IMS Corporation; Gordon Van Welie, ISO ments by Anthony J. Alexander, FirstEnergy Corporation; New England, Inc.; Vickie Van Zandt, Bonneville Power K.K. Das, PowerGrid Corporation of India, Limited; F.J. Delea, Administration; Kim Warren, IMO Ontario; and Tom J.A. Casazza, G.C. Loehr, and R.M. Malizewski, Power Engi- Wiedman, Consolidated Edison. Members of the Electric Sys- neers Seeking Truth; Stephen Fairfax, MTechnology, Inc.; tem Working Group initiated the concept of defining an and Carson Taylor, Bonneville Power Administration. “alert” status, between “normal” and “emergency,” and asso- 32 See pages 107-108. ciated roles, responsibilities, and authorities. 33 U.S. Federal Energy Regulatory Commission, 105 FERC ¶ 29 The need to make better use of system protection measures 61,372, December 24, 2003. received substantial comment, including comments by James 34 The need to improve the quality of system modeling data L. Blasiak, International Transmission Company; David Cook, and data exchange practices received extensive comment. See North American Electric Reliability Council; Charles J. comments from Michael Calimano, New York ISO; David Durkin, Northeast Power Coordinating Council; F.J. Delea, Cook, North American Electric Reliability Council; Robert J.A. Casazza, G.C. Loehr, and R.M. Malizewski, Power Engi- Cummings, North American Electric Reliability Council; F.J. neers Seeking Truth; Ajay Garg and Michael Penstone, Hydro Delea, J.A. Casazza, G.C. Loehr, and R.M. Malizewski, Power One Networks, Inc.; Gurgen and Spartak Hakobyan, personal Engineers Seeking Truth; Mark Fidrych, Western Area Power study; Marija Ilic, Carnegie Mellon University; Shinichi Imai, Administration; Jack Kerr, Dominion Virginia Power; Ray- Tokyo Electric Power Company; Jack Kerr, Dominion Virginia mond K. Kershaw, International Transmission Company; Power; Stephen Lee, Electric Power Research Institute; Ed Frank Macedo, Hydro One Networks, Inc.; Vickie Van Zandt, Schwerdt, Northeast Power Coordinating Council; Robert Bonneville Power Administration; Don Watkins, Bonneville Stewart, PG&E; Philip Tatro, National Grid Company; Carson Power Administration; and David Zwergel, Midwest ISO. Taylor, Bonneville Power Administration; Vickie Van Zandt, 35 Bonneville Power Company; Don Watkins, Bonneville Power Several commenters addressed the subject of NERC’s stan- Administration; and Tom Wiedman, Consolidated Edison. dards in various respects, including Anthony J. Alexander, FirstEnergy Corporation; Carl Burrell, IMO Ontario; David 30 The subject of developing and adopting better real-time Cook, North American Electric Reliability Council; F.J. Delea, tools for control room operators and reliability coordinators J.A. Casazza, G.C. Loehr, and R.M. Malizewski, Power Engi- drew many comments, including those by Anthony J. Alexan- neers Seeking Truth; Charles J. Durkin, Northeast Power der, FirstEnergy Corporation; Eric Allen, New York ISO; Chris Coordinating Council; Ajay Garg and Michael Penstone, Booth, Experienced Consultants, LLC; Mike Calimano, New Hydro One Networks, Inc.; Jack Kerr, Dominion Virginia York ISO; Claudio Canizares, University of Waterloo Power; James K. Robinson, PPL; Mayer Sasson, New York (Ontario); David Cook, North American Electric Reliability State Reliability Council; and Kim Warren, IMO Ontario. Council; Deepak Divan, SoftSwitching Technologies Victoria 36 Doumtchenko, MPR Associates; Pat Duran, IMO Ontario; Bill See Initial Report by the New York State Department of Pub- Eggertson, Canadian Association for Renewable Energies; lic Service on the August 14, 2003 Blackout (2004), and com- Ajay Garg and Michael Penstone, Hydro One Networks, Inc.; ments by Mayer Sasson, New York State Reliability Council. Jack Kerr, Dominion Virginia Power; Raymond K. Kershaw, 37 F.J. Delea, J.A. Casazza, G.C. Loehr, and R.M. Malizewski, International Transmission Company; Michael Kormos, PJM “The Need for Strong Planning and Operating Criteria to Interconnection; Tim Kucey, National Energy Board, Canada; Assure a Reliable Bulk Power Supply System,” January 29, Steve Lapp, Lapp Renewables; Stephen Lee, Electric Power 2004. Research Institute; Steve Leovy; Tom Levy; Peter Love, Cana- 38 The need to tighten communications protocols and dian Energy Efficiency Alliance; Frank Macedo, Hydro One improve communications systems was cited by several Networks, Inc.; Bill Mittelstadt, Bonneville Power Adminis- commenters. See comments by Anthony J. Alexander, tration; Fiona Oliver, Canadian Energy Efficiency Alliance; FirstEnergy Corporation; David Barrie, Hydro One Networks, Peter Ormund, Mohawk College; Don Ross, Prince Edward Inc.; Carl Burrell, IMO Ontario; Michael Calimano, New York Island Wind Co-op Limited; James K. Robinson, PPL; Robert ISO; David Cook, North American Electric Reliability Coun- Stewart, PG&E; John Synesiou, IMS Corporation; Gordon Van cil; Mark Fidrych, Western Area Power Administration; Ajay Welie, ISO New England, Inc.; Vickie Van Zandt, Bonneville Garg and Michael Penstone, Hydro One Networks, Inc.; Jack Power Administration; Don Watkins, Bonneville Power Kerr, Dominion Virginia Power; William Museler, New York Administration; Chris Winter, Conservation Council of ISO; John Synesiou, IMS Corporation; Vickie Van Zandt, Ontario; David Zwergel, Midwest ISO. The concept of requir- Bonneville Power Administration; Don Watkins, Bonneville ing annual testing and certification of operators’ EMS and Power Administration; Tom Wiedman, Consolidated Edison. SCADA systems was initiated by a member of the Electric 39 System Working Group. Also, see comments by John See comments by Tapani O. Seppa, The Valley Group, Inc. Synesiou, IMS Corporation. 40 Several commenters noted the need for more systematic 31 The need to strengthen reactive power and voltage control use of time-synchronized data recorders. In particular, see practices was the subject of several comments. See comments David Cook, North American Electric Reliability Council; by Claudio Canizares, University of Waterloo (Ontario); Ajay Garg and Michael Penstone, Hydro One Networks, Inc.; David Cook, North American Electric Reliability Council; F.J. and Robert Stewart, PG&E.

172 G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 41 The importance of learning from the system restoration ea.doe.gov/pdfs/21stepsbooklet.pdf (“21 Steps to Improve experience associated with the August 14 blackout was Cyber Security of SCADA Networks”). stressed by Linda Church Ciocci, National Hydropower Asso- 45 For example: “21 Steps to Improve Cyber Security of ciation; David Cook, North American Electric Reliability SCADA Networks,” http://www.ea.doe.gov/pdfs/ Council; Frank Delea; Bill Eggertson, Canadian Association 21stepsbooklet.pdf. for Renewable Energies; Stephen Lee, Electric Power 46 Research Institute; and Kim Warren, IMO Ontario. Canadian reference: http://www.dfait-maeci.gc.ca/ 42 anti-terrorism/actionplan-en.asp; U.S. reference: http://www. The need to clarify the criteria for identifying critical facili- whitehouse.gov/news/releases/2001/12/20011212-6.html. ties and improving dissemination of updated information 47 about unplanned outages was cited by Anthony J. Alexander, A “black box” technology is any device, sometimes highly FirstEnergy Corporation; and Raymond K. Kershaw, Interna- important, whose workings are not understood by or accessi- tional Transmission Company. ble to its user. 48 43 The need to streamline the TLR process and limit the use of DOE Form 417 is an example of an existing, but it to non-urgent situations was discussed by several underutilized, private/public sector information sharing commenters, including Anthony J. Alexander, FirstEnergy mechanism. Corporation; Carl Burrell, IMO Ontario; Jack Kerr, Dominion Virginia Power; Raymond K. Kershaw, International Trans- mission Company; and Ed Schwerdt, Northeast Power Coor- dinating Council. 44 NERC Standards at www.nerc.com (Urgent Action Stan- dard 1200, Cyber Security, Reliability Standard 1300, Cyber Security) and Joint DOE/PCIB standards guidance at www.

G U.S.-Canada Power System Outage Task Force G August 14th Blackout: Causes and Recommendations G 173