Synopsis V2.2 Single Event Effects Testing of the Intel Pentium III (P3) and AMD K7 Microprocessors

Jim Howard1, Ken LaBel2, Marty Carts3, Ron Stattel3 and Charlie Rogers3 1. Jackson and Tull Chartered Engineers, Washington DC 20018 2. NASA GSFC, Greenbelt, MD 20771 3. Raytheon ITSS, Greenbelt, MD 20771

Test Dates: December 18-20, 2000. Report Date: February 16, 2001.

Introduction As part of the Remote Exploration and Experimentation Project, work was funded for the “Radiation Evaluation of the INTEL Pentium III and Merced Processors and Their Associated Bridge Chips.” As a next step in the completion of this work, Intel Pentium III and AMD K7 processors were re-tested at the proton facility at the Indiana University Cyclotron Facility (IUCF). There were three main objectives for this second proton test. The first was to do a more thorough investigation of single-event-induced functional interrupts (SEFI) and the single event upsets (SEU). The initial proton test did not do a sufficient job of distinguishing between SEFIs and SEUs. The second objective was to investigate the event rate for the various components of the processors, under a variety of software and hardware configurations. Finally, the third objective was continue to look at the state-of-the-art processors (higher clock speeds of both P3 and K7) and investigate the impact on rates with Device Under Test (DUT) clock speed. The remainder of this report details the test process, methodology and results from this second proton testing.

Devices Tested Pentium III devices with rated clock speeds of 750, 850 and 933 MHz were used and K7 microprocessors running at 600, 650 and 900 and 1000 MHz were tested. Pentium III devices were manufactured by Intel and the K7 devices were manufactured by (AMD). All devices were characterized prior to exposure. A listing of all devices used in this testing are given in Table I below.

Test Facility Facility: Indiana University Cyclotron Facility Proton Energy: 189.9 MeV incident on DUT structure Flux: 2.0 x 106 to 1.5 x 108 protons/cm2/s.

Test Methods Temperature: The test was conducted at room temperature. Intel P3 junction temperature was monitored using an on-die diode. Test Hardware: The system hardware consists of two subsystems, the test controller and the DUT computer, in addition to the system cabling. These subsystems are described below. Following that is a description of the DUT processors (The P3 and the K7). Figure 1 illustrates the overall test configuration. TABLE I Device Under Test (DUT) Table Device Vendor Speed DUT Number Package Markings 750/256/100/1.65V S1 Pentium III Intel 750 MHz 750_1 90260050-0092 MALAY imc '99 SL456 850/256/100/1.65V S1 Pentium III Intel 750 MHz 850_1 10280400-0071 Philippines imc '99 SL47M 850/256/100/1.65V S1 Pentium III Intel 850 MHz 850_2 10280400-0293 Philippines imc '99 SL47M 933/256/133/1.7V S1 Pentium III Intel 933 MHz 933_1 00280415-0224 COSTA RICA imc '99 SL47Q AMD-K7600MTR51B C K7 AMD 600 MHz 600_1 219949147583 AMD-K7650MTR51B A K7 AMD 650 MHz 650_1 230015009833 AMD-K7650MTR51B A K7 AMD 650 MHz 650_2 210017540094 AMD-K7900MNR53B A K7 AMD 900 MHz 900_1K 210036542751 AMD-K7100MNR53B A K7 AMD 1 GHz 1000_1 710026014044 AMD-K7100MNR53B A K7 AMD 1 GHz 1000_2 710026019051

Test Controller (PXI) Subsystem The test controller hardware is based on the PXI specification. A description of the test controller, or PXI, subsystem and the cabling between the subsystems follows. The PXI subsystem, as shown in Figure 2, consists of the components located within the PXI chassis, the intrasystem cabling and the user interface. Within the PXI chassis resides an embedded controller (running Win98, Labview (LV) environment, and a custom LV application), a signal switch matrix, and two digital multimeters (DMMs) in the voltage measurement mode. The switch matrix provides two functions: The multiplexing of analog signals to one of the DMMs, and contact closures (pulling signal levels to ground). The other DMM is dedicated to monitoring one specific analog value and measures that without regular periodic switching so that it may measure more frequently with less delay. The cabling from the PXI subsystem to the user facility includes the keyboard/monitor/mouse extension (a CAT5 cable based extender from Cybex, Inc.), a network connection for data file access, and power, in the case that the PXI system requires a hard power cycle. The keyboard/monitor/mouse is located in the user facility. The system cabling includes cabling between the PXI and the DUT subsystems, cabling between each subsystem and the user facility, and intra-subsystem cabling. The intra-PXI cabling includes only the cables from the switch matrix to the adjacent DMM modules.

Protons P-III DUT Computer Card

Extender Keyboard Monitor PCI Video, keyboard PS-2 port (mouse) Mouse Motherboard Memory signals Floppy Card Drive Motherboard Beam ATX Controls Control & Power Dosimetry Supply Power Supply Computer Controls

Video, keyboard DUT Analog PS-2 port (mouse) Keyboard Measurements signals Monitor

GPIB Mouse RS-232 DMM - 1 DMM - 2 Test Controller Switch Matrix PXI bus/chassis Test Controller User Area Irradiation Area Figure 1. Block diagram illustrating the overall test configuration.

Most of the cabling between the PXI and the DUT subsystems leaves the PXI subsystem from the switch matrix (described further below). One exception is a serial (RS-232) cable bringing telemetry from the DUT computer to the PXI controller. The controller resides in the test chamber, though removed from the immediate area of the DUT computer by about 15’ and a corner of concrete in order to reduce exposure of the PXI subsystem to secondary neutrons. DUT Computer Subsystem The DUT computer subsystem consists of the components immediately connected to the motherboard without cabling, the components located nearby (e.g. disk drives), the intrasystem cabling, and the user interface. Directly attached to the commercial motherboard are the modified DUT processor (described later), a RAM module (DIMM), the video card, a DUT processor extender card and a PCI-bus memory board, added on the hypothesis that even the simple RS- 232 telemetry was subject to SEE. This board and the serial port are to store identical copies of the telemetry stream; the contents of the memory board will survive a soft reset of the DUT computer for later readout through the RS-232 port. This has proven very useful. The DUT computer motherboard resides in the test chamber, positioned directly in front of the proton beam but so that only the DUT processor is irradiated. To DUT computer

Network Serial port telemetry and control Connection to Motherboard control signals User Facility Analog samples from DUT extender board To Test Controller interface in User Facility, via extender PS-2 PS-2 VGA 10b-T RS-232

Hard Test DMM DMM Disk Controller #1 #2

Drive Switch Matrix

cPCI-bus interface

PXI chassis/backplane

Figure 2. Block diagram of the PXI subsystem.

Located nearby (~6 feet) but in a hopefully greatly reduced secondary neutron environment are a standard PC ATX power supply (PS), a floppy and/or hard disk drive, and a Cybex keyboard/monitor extension. The DUT processor extender card is inserted between the DUT processor and the motherboard, extending the 242 DUT processor card contacts an additional 1.5” above the motherboard. This is done for two reasons. First, a clear line of site for the proton beam is established (connectors and the system RAM are avoided). Second, the extension provides the best opportunity for monitoring DUT currents. The extender board is modified to insert a low resistance in series with the power traces as shown in the schematic in Figure 3. The schematic of the voltage and current sampling is duplicated three times on the extender card. Also shown is the P3 temperature sensing diode schematic. Cabling is added to monitor the developed voltages. A photograph of the modified extender card can be seen in Figure 4. Intra-DUT subsystem cabling includes the disk drive cables, an extension cable for the ATX PS including a tap for test controller on/off control and the keyboard/monitor/ mouse extension (a CAT5 cable based extender from Cybex, Inc.). The motherboard is modified to allow connection to two controlling signals, both momentary contact closures. From the PXI switch matrix—motherboard power on/off + V supply - 100 uA (MotherPonoff) is intended to be controlled by a desktop computer case front panel switch, and motherboardOne of three supply soft monitors reset (MotherSR) which is sometimes implemented on Pentium III IUCF 12/00 Proton Test Extender Board Details desktop computers. P3_IU0012_Extender_Schem.ppt

DUT supply On-die temperature sensing diode

DUT module

Isupply Extender board + To PXI Ω VD To PXI R Vcc_core: R = 1 m - Ω Vcc_L2: R = 2 m VD = ƒ(temp) Vtt: R = 10 mΩ Vsupply

Motherboard V +-Supply 100 µA

Figure 3. Schematic diagram for the DUT extender card.

Figure 4. Photograph showing the Pentium III extender card.

ATX PS on/off state is normally controlled by a constant signal from the motherboard (The ATX SP supplies a standby +5V to power such motherboard functions). This signal (PS_ON#) is, approximately, a latched toggle of the front panel signal, MotherPonoff. As modified, MotherPonoff still controls power-down functions within the motherboard, but its latch/toggled version is disconnected from the ATX power supply’s PS_ON# input so that that can be controlled directly from the PXI. The ATX PS AC power is extended back to the user facility. The DUT software periodically reports to the PXI through the motherboard’s serial port via a null modem cable. The PCI memory board is a PCI plug-in card that makes memory available on the PCI-bus. During testing, the DUT software writes the same data to this PCI memory as it writes to the serial port. Unlike the main memory, the contents of this memory persist through soft resets. One function of the DUT software is to read the contents of the PCI memory board and write it to the serial port. A comparison of the serial port telemetry and the PCI board telemetry allows some insight to SEE failure locus. InterSystem Cabling Three previously described digital control signals leave the PXI to the DUT (subsystems) via shielded cable. Four twisted shielded pairs carry current/voltage samples from the DUT processor extender card to the PXI subsystem. The length of this cabling is about 15’. Intel Pentium III The DUT processor is a modification of a standard Pentium III (P3) for the 242- contact slot connector (SC242, also called “”) module. (The P3 is available both in this form and in a 370-pin zero insertion force socket (PGA370) form. The SC242 form is chosen for packaging/beam-access considerations). An unmodified SC242 module, shown in Figure 5, has several four discrete parts sandwiched together. In order of arrangement from “back” to “front”, they are a plastic backside cover, the printed circuit board (PCB), the heatsink, and a fan/front-side cover. The processor die is mounted top-side down onto an organic land grid array (OLGA), a small (1” x 1”) PCB (with die mounted directly to it and substrate exposed), which is in turn mounted to the PCB. The PCB also carries power distribution traces, high frequency and bulk power supply bypass capacitors, microprocessor identification (ID) and voltage identification jumpers. Level 2 cache memory (L2) is incorporated on- die for all P3 processors involved in this test (in some other versions of SC242 module processors L2 resides within plastic packaged ICs mounted to the PCB).

Figure 5. Photographs of Pentium III SC242 processor showing fan side and front side.

Each DUT processor module is disassembled and reassembled with the heatsink/fan assembly translated to a position above the DUT to improve beam access. A thermal transfer plate (“heat pipe”) provides thermal conductivity from the DUT to the heatsink/fan assembly. Thermal pad material instead of thermal grease was originally used to reduce thermal resistance between the thermal plate and the DUT, and the heatsink, in order to avoid having to deal with activated lithium or silicon greases. It was found, however, that dry contact and with 1 mil Copper sheet provides better heat conductivity than the thermal pad material. Since electrical resistance is not an issue, the DUTs were operated with the Copper sheet. This structure is shown in Figure 6.

Figure 6. Photograph showing the modified Pentium III processor card.

Signals that are controlled by the PXI subsystem, as described above, or by the user from the user facility are:

Name Destination Description PS_ON# ATX Power supply Hold low (0 V) for PS on; Open = High = Off MotherPono Motherboard power switch Pulse low (0 V) to toggle power ff connector on and off MotherSR Motherboard reset switch Pulse low (0 V) to initiate reset connector Command/ COM1 RS-232 carrying commands to the Telemetry DUT subsystem (this also carries Telemetry to the PXI subsystem). Keyboard PS-2 keyboard port

Signals that are monitored by the PXI or directly by the users in the user facility are:

Name Source Description I_Vcc_core, Extender Voltage samples of the DUT core current and Vcc_core board voltage. Twisted shielded pair (TSP): pair is both sides of the 0.001 ohm current sampling resistance; shield is ground. I_Vcc_L2, Extender Voltage samples of the DUT Level 2 cache current Vcc_L2 board and voltage. TSP to both sides of the 0.010 ohm current sampling resistance; shield is ground. I_Vtt, Vtt Extender Voltage samples of the DUT termination voltage board source’s current and voltage. TSP to both sides of the 0.010 ohm current sampling resistance; shield is ground. V_temp Extender Voltage sample of the on-die temperature sensing board diode. TSP. Telemetry/ COM1 RS-232 carrying telemetry from the DUT subsystem Command (this also carries Command information from the PXI subsystem). Telemetry VGA card Video carrying telemetry, and OS/BIOS boot output and OS to the user facility. output

AMD (K7) The K7 DUT processor is a modification of a standard AMD K7. It, as is the Pentium III, is available in a SC242 form but the signal assignments and much of the functionality are entirely different, to the extent that the K7 requires the use of a different motherboard, different , and different DUT extender board. The construction of the K7 DUT processor module is similar to that of the Pentium III. The physical form of these AMD SC242 processors is called “Slot A”. In all instances for the SC242 for the K7 processor, the L2 cache is located off-die on a separate chip. A DUT extender board, which had power group bussing for the P3 is modified to un-bus the P3 power groups and, re-bus them appropriately for the K7 with current sampling capability. Each DUT processor module was disassembled and reassembled with a thermal plate to allow uniform beam access to the die and the description of thermal issues in the P3 section apply to the K7 as well. This structure is shown in Figure 7. Signals that are controlled by the PXI subsystem are the same as for the Pentium III (PS_ON#, MotherPow, and MotherReset). The AMD K7 uses one less supply than the Intel P3 and does not have an on-die temperature sensing diode, so only two of the four TSP cables are used with the K7 DUTs. Figure 7. Photograph showing the AMD K7 processor card, including heatsink modifications.

K7 Signals monitored by the PXI or directly by the users in the user facility are:

Name Source Description I_Vcc_core, Extende Voltage samples of the DUT core current and Vcc_core r board voltage. Twisted shielded pair (TSP) to both sides of the 0.001 ohm current sampling resistance; shield is ground. I_Vcc_SRAM, Extende Voltage samples of the DUT SRAM current and Vcc_SRAM r board voltage. TSP to both sides of the 0.010 ohm current sampling resistance; shield is ground. Telemetry/C COM1 RS-232 carrying telemetry from the DUT subsystem ommand (this also carries Command information from the PXI subsystem). Telemetry VGA Video carrying telemetry, and OS/BIOS boot output and OS card to the user facility. output

Test Software

The Pharlap embedded operating system is used by the DUT computer to execute the test code. The test code is written in Microsoft Visual C++ 6.0 professional edition with Pharlap add-ins and Pharlap 386 assembly. Instructions beyond 386 are added with macros. The Pharlap add-ins enable remote debugging of the code through the serial ports. The software executes with two or more threads. The main thread is executed upon booting the system from the floppy disk. For the task switching test (D), eight other threads are launched to test the switching between threads while the main thread goes into sleep mode checking for test completion every 0.1 seconds. When the test completes the eight iterations, the threads are terminated. The main thread displays a menu and waits for another test to be selected. For all of the other tests, a second thread is launched to run the test software and the main thread goes into sleep as in test D. When the test completes the second thread is terminated. The main thread displays a menu of all available tests to run as outlined below. The test software sends a keep alive to the PXI and the screen every second. If errors occur, the test software accumulates errors for one second and then dumps error codes to the PXI system and the screen.

Boot process: • The Pharlap operating system and monitor are loaded from the floppy/hard disk. • The test software (p3test.exe) is loaded from the floppy/hard disk. • The test software is executed from random access memory. • The test software displays all available tests to the PXI and the test computer's monitor. • The program waits for input from the PXI or the test computer's keyboard. • The selected test is executed until stopped by an escape from the PXI or the test computer's keyboard.

Description of the eight tests currently available in p3test.exe:

A: This test repeats the following steps until stopped. Errors are reported to the PXI and the screen once per second when operating at 660MHz. The instruction timing is used to measure each second. • All of the registers are initialized to memory at checkData+rCode+20. Where rCode is 0:ebx 4:ecx 8:edx 0xC:ebp 0x10:edi. Currently the memory is set to 0AAAAAAAAh. • The register r is compared with the memory at checkData+rCode. • If a miscompare results an error is reported to the error buffer. • An attempt is made to set the register to the original value and the result of this attempt is stored as the new expected value. • steps b-d are repeated for each register until one second finishes. • All errors are reported to the dump RAM, the PXI, and the screen. • If the escape code is received from the keyboard or the PXI then exit. • go back to a. An error consists of the following four dwords: - rCode to show which register failed: 0:ebx 4:ecx 8:edx 0xC:ebp 0x10:edi - The contents of the register. - The expected contents of the register. - The new contents of the register after attempting to reset the register to its original contents.

B: This test repeats the following tests until stopped. Errors are reported to the PXI and the screen once per second when operating at 660MHz. The instruction timing is used to measure each second. 1. The arguments and expected results are loaded into RAM at checkData. 2. The function f is executed and the result compared to the expected result. 3. If a miscompare results, an error is reported to the error buffer. 4. Steps b and c are repeated for each function until one second finishes. 5. All errors are reported to the dump RAM, the PXI, and the screen. 6. If the escape code is received from the keyboard or the PXI then exit. 7. go back to a. An error consists of three quadwords: - The first is the function identifier: 0:fadd 24:fsub 48:fmul 72:fdiv 96:fsqrt - The second is the result of the operation. - The third is the expected result.

C: This test loads 100000 locations of memory with an incrementing pattern and then compares the block to the incrementing pattern. It repeats this function until an escape is sent from the keyboard or the PXI. Errors are reported immediately to the PXI and the screen. A keep-alive is sent to the screen and the PXI every second.

D: This test launches seven tasks. Each task loops through incrementing a memory location from 0 to 11 counts. When all of the tasks complete, the memory locations are checked to see if they equal 11. It repeats this function until an escape is sent from the keyboard or the PXI. Errors are reported immediately to the PXI and the screen. A keep- alive is sent to the screen and the PXI every second.

E: This test has 16K of instructions run in order. The instruction sequence is repeatedly incrementing the eax register from 0 to 4; checking after each increment to see that it has done so. If an error occurs the test is aborted and an error message is reported to the PXI and the screen.

F: This test is similar to test B except that the function tested is as follows: result=cos(cos(cos(sin(sin(sin(sqrt(sqrt(sqrt(sqrt(sqrt(sqrt(a*b)))))))))))) where a=0.123456789 and b=0.987654321

G: This test repeats the following tests until stopped. Errors are reported to the dump RAM, the PXI, and the screen once per second when operating at 650MHZ. The instruction timing is used to measure each second. The following functions are tested: 0:pxor 1:por 2:pmul 3pmulh 4:padds 5:addps 6:divps 7:mulps. For the AMD only functions 0 through 3 are tested. The test flow is as follows: a> The arguments and expected results are loaded into RAM at checkData. b> The function f is executed and the result compared to the expected result. c> If a miscompare results, an error is reported to the error buffer. d> Steps b and c are repeated for each function until one second finishes. e> All errors are reported to the dump RAM, the PXI, and the screen. f> If the escape code is received from the keyboard or the PXI then exit. g> go back to a.

An error consists of three quadwords or five quadwords depending on the function: - The first is the function identifier: 0:pxor 1:por 2:pmul 3pmulh 4:padds 5:addps 6:divps 7:mulps For functions 0 through 4: - The second is the result of the operation. - The third is the expected result. For functions 5 through 7: - The second and third are the result of the operation - The fourth and fifth are the expected result

L: This test loops through tests A through G executing each for two seconds.

Notes: (a) The cache can be turned off or on by pressing '@' from the test menu. The cache state switches between three settings: - all caches off - level 1 cache on and level 2 cache off (available for P3 only) - both level 1 and level 2 caches on (b) For Test C, the cache must be turned on. (c) The code to actively handle exceptions. If an exception occurs during the testing, the exception and its associated information are sent to telemetry and the process is restarted. Table II, below, shows possible errors reported for each test and possible causes. By correlating the expected susceptibility of the components within each test to the reported errors, we can determine a statistical probability for each of the possible causes, shown on the right side of the table.

TABLE II Pentium Test Error Interpretation Table

Test Error Reported Possible Causes A Miscompare report showing that Specified register bits flipped the data is actually different Alias/Scratch register bits flipped Arithmetic unit bits flipped Data cache bits flipped Miscompare report showing that Instruction cache bits flipped the data is actually the same Arithmetic unit bits flipped Instruction pool (ROB)/Centralized Scheduler bits flipped B Miscompare report showing that Data cache bits flipped the data is actually different by Floating point register bits flipped only one bit Miscompare report showing that Data cache bits flipped the data is different by several Floating point register bits flipped bits FPU bits flipped Instruction cache bits flipped Instruction pool (ROB)/Centralized Scheduler bits flipped Miscompare report showing that Instruction cache bits flipped the data is actually the same FPU bits flipped Instruction pool (ROB)/Centralized Scheduler bits flipped C Memory Error Data cache bits flipped Register bits flipped Alias/Scratch register bits flipped Instruction cache bits flipped Arithmetic unit bits flipped Instruction pool (ROB)/Centralized Scheduler bits flipped Recurring Memory Error Register bits flipped Alias/Scratch register bits flipped Instruction cache bits flipped Instruction pool (ROB)/Centralized Scheduler bits flipped D Task Switch Error Instruction cache bits flipped Instruction pool (ROB)/Centralized Scheduler bits flipped Data cache bits flipped Register bits flipped Alias/Scratch register bits flipped E Cache Error Instruction cache bits flipped Instruction pool (ROB)/Centralized Scheduler bits flipped Register EAX bits flipped Alias/Scratch register bits flipped F Miscompare report showing that Data cache bits flipped the data is actually different by Floating point register bits flipped only one bit Miscompare report showing that Data cache bits flipped the data is different by several Floating point register bits flipped bits FPU bits flipped Instruction cache bits flipped Instruction pool (ROB)/Centralized Scheduler bits flipped Miscompare report showing that Instruction cache bits flipped the data is actually the same FPU bits flipped Instruction pool (ROB)/Centralized Scheduler bits flipped G Miscompare report showing that Data cache bits flipped the data is actually different by Floating point register bits flipped only one bit Miscompare report showing that Data cache bits flipped the data is different by several Floating point register bits flipped bits FPU bits flipped Instruction cache bits flipped Instruction pool (ROB)/Centralized Scheduler bits flipped Miscompare report showing that Instruction cache bits flipped the data is actually the same FPU bits flipped Instruction pool (ROB)/Centralized Scheduler bits flipped All General Protection Fault Instruction cache bits flipped Instruction pool (ROB)/Centralized Scheduler bits flipped Instruction pointer bits flipped Arithmetic unit bits flipped Data cache bits flipped (Page Table corruption) Register reset to FFFFFFFF Register bits flipped Alias/Scratch register bits flipped General protection fault where Instruction pointer reset to FFFFFFFF CS:EIP = 18:FFFFFFFF Illegal instruction Instruction cache bits flipped Instruction pool (ROB)/Centralized Scheduler bits flipped Instruction pointer bits flipped Divide-by-zero exception Debug Exception Non-maskable interrupt One-byte interrupt (INT3) Interrupt on overflow BOUND interrupt Device not available exception Double fault Coprocessor segment overrun Invalid TSS Segment not present exception Stack fault Page Fault Coprocessor error

Test Methodology Single Event Effects Test Process The SEE test process must include methods to test for all aspects of single event effects (latchup, functional interrupts, upsets, etc.). As a number of these effects are sensitive to the software being run and may be sensitive to numerous other conditions, detailed control of the DUT is required. To this end, an extensive operating system would serve no purpose other than a software overhead that is uncontrollable. Therefore, the boot process is done into a minimal operating system (Pharlap) and a test executive is run that allows testing of the DUT at a very low level. A flow diagram of the main testing process is shown in Figure 8. The main part of this flow process, after the DUT is placed in a known operating state, is waiting for something to happen and then dealing with it. The flow describing this process is shown in Figure 9. Three general categories of events are expected: functional interrupts, system resets (radiation induced), and non-fatal errors (some error is produced but it does not immediately induce a functional interrupt or system reset. The remainder of the flow for all of these conditions shows the steps required to gain information about what exactly happened and to recover the DUT to a known state. The “Reboot Testing” called in this routine is simply trying a soft reboot three times. If at that point the system has not recovered, a hard boot is then done. If a soft reboot is successful, then a “Memory Grab” routine is executed that recovers the telemetry stored in the Memory Card. PXI Logoff End of Run Previous run is completed

PXI Logon Start of Run Logging is started and a new test run begins.

Hard Boot The test processor is hard-booted (so that a known startup Pharlap/TEST EXEC “boot” condition exists), the minimal OS and test software is started.

Test Exec Choice Made Choice is made as to which test routine is to be run.

P XI Detects Telemetry Test hardware detects that P3 is up and running test software All Status OK Report and Log it correctly. This is logged and operator is informed that test is ready to go.

Beam On Proton Beam is turned on.

Something Happens Software loop executes while we wait for something to Routine happen. This is then handled in the “Something Happened Routine”.

Beam Off When the run is completed, the beam is turned off.

Dosimetry Collected Beam dosimetry is collected and logged with the test results.

Figure 8. Flow diagram and description for main testing loop. Something Happens Routine

Non-fatal SEFI System Reset Error

Hangs Up Telemetry Telemetry Telemetry Reports Stops Stops Error

PXI Reports PXI Reports PXI Reports and Logs Lock Up Lock Up Error

Beam Beam Software Off Off Resets to Previous Condition

Dosimetry Dosimetry Collected Collected

Something Happens Routine Reboot Reboot Testing Detected Routine

Memory Grab PXI Logoff PXI Logon Routine End of Run Start of Run

Figure 9. Flow process after an event has been detected. Analysis Methodology

The data sets that are generated from Single Event Testing, due to the large test matrix, contain enormous amounts of data. It became necessary to generate software to deal with these large data sets to analyze the data under all the test conditions. A database form was chosen as the best form for this analysis. The database initially reads all of the test conditions, then allows to user to scan through the telemetry files. The user marks locations within the telemetry files with error annotations that are then stored in the database with those associated test conditions (See Table I Previously). After each telemetry file has been annotated, the database can be queried via Structured Query Language (SQL) commands to extract only those conditions wishing to be analyzed. Depending on the detail of the SQL commands, either the event rate (or cross section) can be calculated directly or the selected data from the database can be exported in tabular format for other software to continue the analysis. Figures 10 and 11 show a sample screen shot for the telemetry file analysis and the SQL data extraction process, respectively.

Figure 10. Screen shot of the telemetry analysis software. Figure 11. Screen shot of the SQL data extraction software. Results

Single Event Latchup Four different P3 processors (one 750 MHz, two 850 MHz and one 933 MHz) and six K7 processors (one 600 MHz, two 650 MHz, one 900 MHz and two 1000 MHz) run at eleven different clock speeds (K7 processors were only run at rated speed as the motherboards were not able to be clocked down). During these tests the processors were running one of the tests in the test executive (tests were varied) and exposed to proton fluences (per run) that varied from 1.5 x 105 to 2.0 x 1010 protons/cm2. The P3 parts were tested in 195 different conditions (processor speed, cache on/off, and software executing) and the K7 parts were tested in 130 different conditions. In all of the testing, no evidence of latchup was observed (presence of latchup would be indicated by a sharp increase in I_Vcc_core). There were two cases for each the P3 and the K7 parts that required a hard boot (power cycle) to resume normal operations. It should also be mentioned that some K7 processors had a high current transient on the core power supply. While these transients were very high (tens of amperes in some cases), no destructive events were ever observed. In fact, the processors for most transients continued to work through a series of transients before a reset event would occur. The other aspect of this to note is that these transients were not observed in all K7 parts, not events consistently across a processor speed family. An attempt was made to determine if this was a foundry effect, but that information was not available on the part or from AMD.

Single Event Functional Interrupts (SEFI) Figures 12 and 13 show the Pentium III and AMD K7, respectively, SEFI cross sections as a function of the processor speed with various cache states. It is quite obvious from this figure that the cache represents the most sensitive region of the device and its operation causes the SEFI rate to increase by approximately a factor of 3 to 10 in magnitude. There is approximately a factor of three difference between the P3 and K7 SEFI cross sections, with the K7 being higher. For the P3 different rated processor speeds are shown with different symbols. The K7 parts were not clocked down so the data points shown are for the rated processor speed. Therefore, Figures 6 and 7 also show no processor speed differences. -8 -8 10 10 ) ) 2 2 Cache On -9 10

-9 10

-10 L1 Cache Only Cache Off 10 L1 & L2 Cache On Cache Off 750 MHz P3 SEFI Cross Section (cm 850 MHz P3 SEFI Cross Section (cm 933 MHz P3 -11 -10 10 10 400 500 600 700 800 900 1000 500 600 700 800 900 1000 1100 Processor Speed (MHz) Processor Speed (MHz) Figure 12. SEFI cross section of the Figure 13. SEFI cross section of the AMD Pentium III processor as a function of the K7 processor as a function of the operating operating clock speed. clock speed.

Single Event Upsets – Non-SEFI As pointed out in the Test Methodology section, another category of events is exceptions. These are events, if not handled in software, would lead to a SEFI. This data is shown in Figure 14. The cross section is similar to the SEFI cross section and again no speed dependence is observed.

-8 10 ) 2

-9 10

-10 10 Pentium III Cache On Pentium III L1 Cache On AMD K7 Cache On Pentium III Cache Off

Exception Cross Section (cm AMD K7 Cache Off -11 10 400 500 600 700 800 900 1000 Operating Speed (MHz) Figure 12. Exception cross section of the Pentium III and AMD K7 processors as a function of the operating clock speed. The final category of event is upsets. The various tests, A through G, were designed to look for upsets in the registers, caches, floating point and MMX units. The cross section data obtained from these tests for the P3 and K7 processors is shown in Table III. The main item to note is that very few events were actually observed. Collection of this data was problematic, as the SEFI rate was sufficiently high as to impact the lengths of the runs. This data had to be collected with the Cache Off or the SEFI would have been too high to collect any significant data.

TABLE III Proton SEU (Cache Off) Results DUT Test Number of Fluence Cross Section (cm2) Upsets (cm2/sec) P3 A 1 6.53 x 1010 1.53 x 10-11 P3 B 0 5.3 x 1010 < 1.89 x 10-11 P3 C 0 4.7 x 1010 < 2.13 x 10-11 P3 D 1 7.7 x 1010 1.3 x 10-11 P3 F 1 5.17 x 1010 1.93 x 10-11 P3 G 1 3.98 x 1010 2.51 x 10-11 K7 A 0 5.11 x 1010 < 1.96 x 10-11 K7 B 0 1.4 x 1010 < 7.14 x 10-11 K7 C 0 3.19 x 1010 < 3.13 x 10-11 K7 D 1 3.47x 1010 2.88 x 10-11 K7 F 3 3.04 x 1010 3.29 x 10-11 K7 G 0 2.85 x 1010 < 3.51 x 10-11