Watchdog Timers Keep Computer Failures from Having Catastrophic Results
Total Page:16
File Type:pdf, Size:1020Kb
Watchdog Timers Keep Computer Failures from Having Catastrophic Results Digital computers do only what they're told - or what they think they're told. It's not unheard of for a computer to suddenly latch up or get lost in a program with potentially catastrophic results if it's controlling a critical process. To minimize the damage of a computer failure, good designs include a watchdog timer - a device that monitors host activity and triggers appropriate action if it notices that the computer isn't functioning as expected. These devices come as low-cost ICs or modules and are not only easy to integrate into a system, but it's highly recommended you do so in any control application. Implementation strategies You can interface your system When implementing a watchdog Watchdog timers easily integrate to a watchdog timer in several reset, especially in control with Industrial I/O Control Systems configurations. An easy yet applications, examine the overall effective way to detect a computer impact of the sudden initialization derivative) algorithms and special fault is by programming one or of the I/O (hot start) and loss of data arithmetic functions. Here, a more digital output bits to toggle in RAM. Consider, for example, the watchdog alarm should alert a On/Off continuously, and if that bit simple case where a microcontroller skilled operator to manually control stops at one state longer than a acts as the brain of a toaster oven the situation until the computer designated time period, then the and controls temperature and again comes up to speed. This watchdog timer monitoring the toasting time. If the computer action is the most appropriate signal assumes a computer fault. latches up and is reset by a because it's unlikely you'd want to An easy way to hook up a watchdog watchdog while toasting a melted- shut down the power plant because in a PC-based system is to add a cheese sandwich, the fact that it was of a computer fault. However, if the digital I/0 board and configure one in the process of toasting and the computer is reset with a hot start of its output lines to toggle the elapsed toasting time would likely such as noted above, it might watchdog. be information previously contained initialize variables in the control When any watchdog detects a in RAM and might be difficult to algorithms and lose track of phase, failure, there are three major courses ascertain. Thus, such a design flow and other vital parameters. of action it can take: reset, alarm or should also contain circuitry to Shutdown in response to a shutdown. monitor the temperature to see if the watchdog signal is common for Reset or an attempt to reset is a oven is already hot when the system critical industrial-control common approach in less-than- resets, or use a nonvolatile memory applications where a computer fault critical applications such as running to store timing data. could have unpredictable results, a business application or simple Alarming is generally used and it's generally used in datalogging where a reset can along with other watchdog functions conjunction with the Alarm establish system recovery. In such and is common in applications function. Shutdown implies that a cases where a computer fault is where a human operator must human operator must investigate the recoverable, the fault and the action determine why a computer fault cause of the computer fault and of the watchdog timer are often occurred and manually clear the manually clear it. Note in this mode transparent to the user. On a PC, for alarm. A good example is a that the watchdog should deactivate instance, the watchdog could controller for a hydroelectric power devices being controlled, such as activate the reset line (the same as plant. Regulation of turbine flow motors or heaters, but not hitting <ctrl> <alt> <del>. and other parameters likely requires necessarily the computer. PID (proportional integral Depending on the type of equipment being controlled, it might be wise to have the watchdog initiate a predetermined shutdown sequence to prevent damage to a process or equipment. Which level of response you want—reset, alarm or shutdown—also plays a role in how you implement the watchdog timer. Consider, for example, watchdog ICs that are sometimes included on function cards for industrial PCs. These devices might monitor power- supply voltages and have a toggle input. When they detect a computer failure, they simply strobe a Reset line and don't have any means of activating an alarm or shutting down a process. Further, most watchdog IC designs place the device local to Figure 1 - Take care when decoding the address that activates the watchdog timer the system CPU, and because they so that it does not receive erroneously TRUE signals. don't monitor the control output, they leave much of the output watchdog—rather than reading just timer be a control output from one hardware unsecured. one address—is toggled by a or more data bits of a decoded Watchdogs with relay outputs number of addresses, specifically address (see Figure 1). are better suited to control hex $70, $74, $78, $7C, $F0, $F4, applications because they can be $F8 and $FC. The chances of a In control applications with more implemented local to the control runaway program hitting the one than one I/O rack, you can increase outputs. They can provide fault absolute address that toggles the system integrity by configuring a indication and the correct response watchdog timer and fools it into watchdog on an output of each rack to most failure modes, including thinking everything is functioning and cascading their outputs. This loss of power or control signal. It's properly is relatively low, but approach assures computer control also easy to cascade multiple chances of a false signal to the timer at each control branch, and thus it watchdogs with relay outputs. are greater as you increase the indicates loss of control at any rack However, these devices are number of addresses to which it (see Figure 2). relatively large and may be costly responds. Thus, it's recommended Another consideration when for systems that need only low that the output to the watchdog implementing watchdogs in control levels of system integrity. When implementing a watchdog Control Computer timer, take care when determining the system output it monitors. You Put I/O Lines should decode that signal to at least Protective the level of the control outputs, if Devices In not beyond. Many control systems Series I/O Rack #1 I/O Rack #2 require much less I/O than the computer is capable of, so you WDT #1 WDT #2 might be tempted not to perform I/O I/O Other address decoding to the highest Modules Modules Devices possible level. For instance, assume that a watchdog timer monitors an output bit mapped at binary address X111XX00 (where X indicates a don't care). Because for each one of Figure 2 - When working with multiple I/O racks, it is a good idea to cascade the those undefined bits you can come outputs from a watchdog timer located on each rack. up with an absolute address, the applications is to use an energized unknown address values—in which toggles the timer so that it believes dry contact as the output. One the computer might get stuck in a the system is operating properly. advantage of this approach is that subroutine or roam aimlessly Upon servicing the interrupt, loss of power is detected as a fault through the address space without though, program control returns condition, and Fail-Open is common getting back to the main program. from where it originally left—even in fault reporting because of It's not advisable to place code that if it's the wrong place. Under such constant continuity. toggles the watchdog in an circumstances you could toggle the Finally, note that some interrupt-service routine, for watchdog even though the main designers combine watchdogs with example; even if program control program loop never gets serviced. other means of securing a computer. gets lost in the address space, an To see an example of how to set You can, for instance, make interrupt takes higher priority and up a watchdog properly, consider frequent checksums of memory, sends program control to the the program in Figure 3, which uses look for bus errors or feed back appropriate servicing routine, which the ROM BASIC resident on an control outputs to verify their presence. The bottom line, though, REM MAIN PROGRAM LOOP is to never design a control 10 GOSUB 1000 : REM INITIALIZE I/O computer without a watchdog timer. 20 IF STRT=L THEN GOSUB 2000 : REM CHECK TEMP SWITCH/CONTROL HEATER 30 IF STRT=O THEN XBY(4011H=(XBY(4011H) AND. OBFH) : REM HEAT OFF Software aspects 40 WDT=ABS(WDT-1) : REM COMPLIMENT LAST WATCHDOG STATE 50 IF WDT=1 THEN XBY(4011H)=(XBY(4011H) .OR. 01H) : REM SET WDT=1 When setting up a program with 60 IF WDT=0 THEN XBY(4011H)=(SBY(4011H) .AND.FEH) : REM RESET WDT=0 watchdog-timer capability, keep 70 GOTO 20 several things in mind. Foremost, remember that perhaps the most useless type of watchdog is one REM INITIALIZATION SUBROUTINE implemented completely in 1000 ONEXT1 3000 : REM INITIALIZE START SWITCH INTERRUPT software. It should be apparent 1010 XBY(4011H)=OOH : REM INITIALIZE OUTPUTS even to a casual observer that such 1020 WDT=O : STRT=O : TEMP=O : REM INITIALIZE VARIABLES an approach implies that the 1030 RETURN microprocessor is alive enough to run the code section that indicates a REM HEAT CONTROL computer fault.