Avionics, The faced many vehicle control challenges during ascent, as did the Orbiter during on-orbit and descent operations. Navigation, and Such challenges required innovations such as fly-by-wire, Instrumentation redundancy for robust systems, open-loop main engine control, and navigational aides. These tools and concepts led to groundbreaking technologies that are being used today in other space programs Introduction and will be used in future space programs. Other government agencies Gail Chapline as well as commercial and academic institutions also use these Reconfigurable Redundancy Paul Sollock analysis tools. NASA faced a major challenge in the development of Shuttle Single Event Upset Environment instruments for the Space Shuttle Main Engines—engines that operated Patrick O’Neill at speeds, pressures, vibrations, and temperatures that were Development of Space Shuttle unprecedented at the time. NASA developed unique instruments and Main Engine Instrumentation software supporting shuttle navigation and flight inspections. In addition, Arthur Hill the general purpose computer used on the shuttle had static random Unprecedented Rocket Engine Fault-Sensing System access memory, which was susceptible to memory bit errors or bit flips Tony Fiorucci from cosmic rays. These bit flips presented a formidable challenge as Calibration of Navigational Aides they had the potential to be disastrous to vehicle control. Using Global Positioning John Kiriazes

242 Engineering Innovations Reconfigurable Space Shuttle Columbia successfully time-domain-multiplexed data buses, concluded its first mission on fly-by-wire flight control, and digital Redundancy— April 14, 1981 , with the world’s first for , which provided The Novel Concept two-fault-tolerant Integrated Avionics a level of functionality and reliability System—a system that represented a at least a decade ahead of the avionics Behind the curious dichotomy of past and future in either military or commercial World’s First technologies. On the one hand, many aircraft. Beyond the technological Two-Fault-Tolerant of the components, having “nuts and bolts” of the on-board been selected before 1975, were system, two fundamental yet innovative Integrated already nearing technical obsolescence. precepts enabled and shaped the actual Avionics System On the other hand, it used what were implementation of the avionics system. then-emerging technologies; e.g., These precepts included the following: n The entire suite of avionics functions, generally referred to as “subsystems”—data processing (hardware and software), navigation, flight control, displays and controls, communications and tracking, and electrical power distribution and control—would be programmatically and technically managed as an integrated set of subsystems. Given that new and unique types of complex hardware and software had to be developed and certified, it is difficult to overstate the role that approach played in keeping those activities on course and on schedule toward a common goal. n A digital data processing subsystem comprised of redundant central processor units plus companion input/output units, resident software, digital data buses, and numerous remote terminal units would function as the core subsystem to interconnect all avionics subsystems. It also provided the means for the crew and ground to access all vehicle systems (i.e., avionics and non-avionics systems). There were exceptions to this, such as the gear, which was lowered by the crew via direct hardwired switches.

STS-1 launch (1981) from Kennedy Space Center, Florida. First crewed launch using two-fault-tolerant Integrated Avionics System.

Engineering Innovations 243 Avionics System Patterned and only a modicum of off-the-shelf status and continue the mission after After Apollo; Features equipment was directly applicable. one computer failure. Thus, four and Capabilities Unlike Any computers were required to meet Other in the Industry Why Fail Operational/Fail Safe? the fail-operational/fail-safe requirement. That level of redundancy Previous crewed spacecraft were The preceding tenets were very much applied only to the computers. Triple designed to be fail safe, meaning that influenced by NASA’s experience redundancy was deemed sufficient for after the first failure of a critical with the successful Apollo primary other components to satisfy the component, the crew would abort navigation, guidance, and control fail-operational/fail-safe requirement. the mission by manually disabling the system. The Apollo-type guidance primary system and switching over computer, with additional specialized to a backup system that had only Central Processor Units input/output hardware, an inertial the minimum capability to return the reference unit, a digital , Were Available Off the Shelf— vehicle safely home. Since the shuttle’s fly-by-wire thruster control, and an Remaining Hardware basic mission was to take humans alphanumeric keyboard/display unit and Software Would Need and payloads safely to and from orbit, represented a nonredundant subset of the fail-operational requirement was to be Developed critical functions for shuttle avionics intended to ensure a high probability The next steps included: selecting to perform. The proposed shuttle of mission success by avoiding costly, computer hardware that was for avionics represented a challenge for early termination of missions. military use yet commercially two principal reasons: an extensive available; choosing the actual redundancy scheme and a reliance Early conceptual studies of a configuration, or architecture, of on new technologies. shuttle-type vehicle indicated that vehicle atmospheric flight control the computer(s), data bus network, Shuttle avionics required the required full-time computerized and bus terminal units; and then development of an overarching and stability augmentation. Studies also developing the unique hardware and extensive redundancy management indicated that in some atmospheric software to implement the world’s scheme for the entire integrated flight regimes, the time required for first two-fault-tolerant avionics. avionics system, which met the shuttle a manual switchover could result in In 1973, only two off-the-shelf requirement that the avionics system loss of vehicle. Thus, fail operational computers available for be “fail operational/fail safe”—i.e., actually meant that the avionics had to offered the computational capability for two-fault tolerant with reaction times be capable of “graceful degradation” the shuttle. Both computers were basic capable of maintaining safe such that the first failure of a critical processor units—termed “central computerized flight control in a component did not compromise the processor units”—with only minimal vehicle traveling at more than 10 avionic system’s capability to maintain input/output functionality. NASA times the speed of high-performance vehicle stability in any flight regime. selected a vendor to provide the central military aircraft. The graceful degradation requirement processor units plus new companion Shuttle avionics would also rely on (derived from the fail-operational/ input/output processors that would be new technologies—i.e., time-domain fail-safe requirement) immediately developed to specifications provided by data buses, digital fly-by-wire provided an answer to how many architecture designers. At the time, no flight control, digital autopilots for redundant computers would be proven best practices existed for aircraft, and a sophisticated software necessary. Since the computers were interconnecting multiple computers, operating system that had very the only certain way to ensure timely data buses, and bus terminal units limited application in the aerospace graceful degradation—i.e., automatic beyond the basic active/standby manual industry of that time, even for detection and isolation of an errant switchover schemes. noncritical applications, much less computer—some type of computerized The architectural concept figured for “man-rated” usage. Simply put, majority-vote technique involving a heavily in the design requirements for no textbooks were available to guide minimum of three computers would the input/output processor and two the design, development, and flight be required to retain operational other new types of hardware “boxes” as certification of those technologies

244 Engineering Innovations Interconnections Were Key to Avionics Systems Success

Shuttle Systems Redundancy

Shuttle Systems Elements Diagram illustrates the eight “flight-critical” buses of the 24 buses on the Orbiter.

Four General Purpose Computers Eight Flight-critical Connections to Multiplexer/Demultiplexers Central Input/ Data Buses Processing Output (24 per computer) Flight Forward 1 Unit Processor Primary Secondary General Purpose Flight-critical Computer 1 Bus 1 Multiplex Interface Adapters (24 per computer) Flight Aft 1 Primary Secondary

Flight-critical Bus 5 Two Multiplex Interface Adapters Flight Forward 2 General Purpose Primary Port Assortment of Primary Secondary Computer 2 Connections to various modules selected to two Data Buses Flight-critical Secondary Port interface with Bus 2 devices in the region supported Control by the Flight Aft 2 multiplexer/ Electronics demultiplexer. Primary Secondary

Connections Flight-critical Bus 6 Multiplexer/Demultiplexer to Various Vehicle General Purpose Subsystems Flight Forward 3 Computer 3 Primary Secondary

Flight-critical Bus 3

Architecture designers for the shuttle Flight Aft 3 avionics system had three goals: provide Primary Secondary interconnections between the four Flight-critical Bus 7 General Purpose Computer 4 computers to support a synchronization Flight Forward 4 scheme; provide each computer access Primary Secondary to every data bus; and ensure that the Flight-critical Bus 4 multiplexer/demultiplexers were Flight Aft 4 = Primary Controlling Computer sufficiently robust to preclude a single Primary Secondary = Listen Only Unless Crew-initiated Recon guration Enables Control Capability internal failure from preventing computer Flight-critical Bus 8 access to the systems connected to that multiplexer/demultiplexer. redundancy in the form of two independent reconfiguration capability. The total To meet those goals, engineers designed ports for connections to two data buses. complement of such hardware on the the input/output processor to interface The digital data processing subsystem vehicle consisted of 24 data buses, with all 24 data buses necessary to cover possessed eight flight-critical data buses 19 multiplexer/demultiplexers, and an the shuttle. Likewise, each multiplexer/ and the eight flight-critical multiplexer/ almost equal number of other types of demultiplexer would have internal demultiplexers. They were essential to the specialized bus terminal units.

Engineering Innovations 245 well as the operating system software, in the field, their reliability was so poor The Costs and Risks of all four of which had to be uniquely that they could not be certified for the Reconfigurable Redundancy developed for the shuttle digital data shuttle “man-rated” application; and The benefits of interconnection processing subsystem. Each of those following the Approach and Landing flexibility came with costs, the most four development activities would Tests (1 977), NASA found that the obvious being increased verification eventually result in products that software for orbital missions exceeded testing needed to certify each established new limits for the so-called the original memory capacity. The configuration performed as designed. “state of the art” in both hardware and central processor units were all Those activities resulted in a set of software for aerospace applications. upgraded with a newer memory design formally certified system that doubled the amount of memory. In addition to the input/output reconfigurations that could be invoked That memory flew on Space processor, the other two new devices at specified times during a mission. Transportation System (STS)- 1 in 1981 . were the data bus transmitter/receiver Other less-obvious costs stemmed from units—referred to as the multiplex Although the computers were the only the need to eliminate single-point interface adapter—and the bus devices that had to be quad redundant, failures. Interconnections offered the terminal units, which was termed NASA gave some early thought to potential for failures that began in one the “multiplexer/demultiplexer.” simply creating four identical strings redundant element and propagated NASA designated the software as with very limited interconnections. throughout the entire redundant the Flight Computer Operating System. The space agency quickly realized, system—termed a “single-point The input/output processors (one however, that the weight and volume failure”—with catastrophic paired with each central processor associated with so much additional consequences. Knowing such, system unit) was necessary to interface the hardware would be unacceptable. designers placed considerable emphasis units to the data bus network. The Each computer needed the capability on identification and elimination of numerous multiplexer/demultiplexers to access every data bus so the failure modes with the potential to would serve as the remote terminal system could reconfigure and regain become single-point failures. Before units along the data buses to capability after certain failures. NASA describing how NASA dealt with effectively interface all the various accomplished such reconfiguration by potential catastrophic failures, it is vehicle subsystems to the data bus software reassignment of data buses to necessary to first describe how the network. Each central processor different general purpose computers. redundant digital data processing unit/input/output processor pair was subsystem was designed to function. The ability to reconfigure the system called a general purpose computer. and regain lost capability was a novel Establishing Synchronicity The multiplexer/demultiplexer was an approach to redundancy management. extraordinarily complex device that Examination of a typical mission profile The fundamental premise for the provided electronic interfaces for the illustrates why NASA placed a premium redundant digital data processing myriad types of sensors and effectors on providing reconfiguration capability. subsystem operation was that all four associated with every system on the Ascent and re-entry into Earth’s general purpose computers were vehicle. The multiplex interface atmosphere represented the mission executing identical software in a adaptors were placed internal to the phases that required automatic failure time-synchronized fashion such that input/output processors and the detection and isolation capabilities, all received the exact same data, multiplexer/demultiplexers to provide while the majority of on-orbit operations executed the same computations, got actual electrical connectivity to the data did not require full redundancy when the same results, and then sent the exact buses. Multiplex interface adaptors there was time to thoroughly assess the same time-synchronized commands were supplied to each manufacturer of implications of any failures that and/or data to other subsystems. all other specialized devices that occurred prior to re-entry. When a Maintenance of synchronicity interfaced with the serial data buses. computer and a critical sensor on between general purpose computers The protocol for communication on another string failed, the failed computer was one of the truly unique features those buses was also uniquely defined. string could be reassigned via software of the newly developed Flight control to a healthy computer, thereby The central processor units later Computer Operating System. All four providing a fully functional operational became a unique design for two general purpose computers ran in a configuration for re-entry. reasons: within the first several months synchronized fashion that was keyed

246 Engineering Innovations Shuttle Single Event Upset Environment

Five general purpose computers—the heart of the Orbiter’s guidance, navigation, and flight control system—were upgraded in 1991. The iron core memory was replaced with modern static random access memory transistors, providing more memory and better performance. However, the static random access memory computer chips were susceptible to single event upsets: memory bit flips caused by high-energy nuclear particles. These single event upsets could be catastrophic to the Orbiter because general purpose computers were critical to flights since one bit flip could disable the computer.

An error detection and correction code was implemented to “fix” flipped bits in a computer word by correcting any single erroneous bit. Whenever the system experienced a memory bit flip fix, the information was downlinked to flight controllers on the ground in Houston, Texas. The event time and the Orbiter’s ground track resulted in the pattern of bit flips around the Earth.

The bit flips correlated with the known space radiation environment. This phenomena had significant consequences for error detection and correction codes, which could only correct one error in a word and would be foiled by a multi-bit error. In response, system architects selected bits for each word from different chips, making it almost impossible for a single particle to upset more than one bit per word.

In all, the upgraded Orbiter general purpose computers performed flawlessly in spite of their susceptibility to ionizing radiation.

Earth’s Magnetic Equator

Single event upsets are indicated by yellow squares. Multi-bit single event upsets are indicated by red triangles. In these single events, anywhere from two to eight bits were typically upset by a single charged particle.

to the timing of the intervals when That sequence (input/process/output) The four general purpose computers general purpose computers were to repeated 25 times per second. The exchanged synchronization status query the bus terminal units for data, aerodynamic characteristics of the approximately 350 times per second. then process that data to select the best shuttle dictated the 25-hertz (Hz) rate. The typical failure resulted in the data from redundant sensors, create In other words, the digital autopilot computer halting anything resembling commands, displays, etc., and finally had to generate stability augmentation normal operation. output those command and status data commands at that frequency for the to designated bus terminal units. vehicle to retain stable flight control.

Engineering Innovations 247 majority-voting scheme. A nuance associated with the practical meaning of “simultaneous” warranted significant attention from the designers. It was quite possible for internal circuitry in complex electronics units to fail in a manner that wasn’t immediately apparent because the circuitry wasn’t used in all operations. This failure could remain dormant for seconds, minutes, or even longer before normal activities created conditions requiring use of the failed devices; however, should another unrelated failure occur that created the need for use of the previously failed circuitry, the practical effect was equivalent to two simultaneous failures. To decrease the probability of such pseudo-simultaneous failures, the general purpose computers and multiplexer/demultiplexers were designed to constantly execute cyclic A fish-eye view of the multifunction electronic display subsystem—or “glass ”—in the background self-test operations and fixed-base Space Shuttle mission simulator at Johnson Space Center, Texas. cease operations if internal problems Early Detection of Failure time in a quiescent flight control profile were detected. such that those sensors were operating NASA designed the four general very near their null points. Prior to Ferreting Out Potential purpose computer redundant set to re-entry, the vehicle executed some Single-point Failures gracefully degrade from either four designed maneuvers to purposefully to three or from three to two Engineering teams conducted design exercise those devices in a manner to members. Engineers tailored specific audits using a technique known as ensure the absence of permanent null redundancy management algorithms failure modes effects analysis to identify failures. The respective design teams for dealing with failures in other types of failures with the potential to for the various subsystems were always redundant subsystems based on propagate beyond the bounds of the challenged to strike a balance between knowledge of each subsyste m’s fault-containment region in which they early detection of failures vs. nuisance predominant failure modes and the originated. These studies led to the false alarms, which could cause the overall effect on vehicle performance. conclusion that the digital data unnecessary loss of good devices. processing subsystem was susceptible NASA paid considerable attention to to two types of hardware failures with means of detecting subtle latent failure Decreasing Probability of the potential to create a catastrophic modes that might create the potential Pseudo-simultaneous Failures condition, termed a “nonuniversal for a simultaneous scenario. Engineers There was one caveat regarding the input/output error.” As the name scrutinized sensors such as gyros and capability to be two-fault tolerant— implies, under such conditions a accelerometers in particular for null the system was incapable of coping majority of general purpose computers failures. During orbital operation, the with simultaneous failures since may not have received the same data vehicle typically spent the majority of such failures obviously defeat the and the redundant set may have

248 Engineering Innovations diverged into a two-on-two configuration or simply collapsed into four disparate members. Loss of Two General Purpose Computers Engineers designed and tested the Tested Resilience topology, components, and data encoding of the data bus network to ensure that robust signal levels and data integrity existed throughout the network. Extensive laboratory testing confirmed, however, that the two types of failures would likely create conditions resulting in eventual loss of all four computers. The first type of failure and the easiest to mitigate was some type of physical failure causing either an open or a short circuit in a data bus. Such a condition would create an impedance mismatch along the bus and produce Space Shuttle Columbia (STS-9) makes a successful landing at Dryden Flight Research classic transmission line effects; Center on Edwards Air Force Base runway, California, after reaching a fail-safe condition e.g., signal reflections and standing while on orbit. waves with the end result being Shuttle avionics never encountered any type (hardware or software) of single-point unpredictable signal levels at the receivers of any given general purpose failure in nearly 3 decades of operation, and on only one occasion did it reach computer. The probability of such a the fail-safe condition. That situation occurred on STS-9 (1983) and demonstrated the failure was deemed to be extremely resiliency afforded by reconfiguration. remote given the robust mechanical and electrical design as well as detailed While on-orbit, two general purpose computers failed within several minutes of each testing of the hardware, before and after other in what was later determined to be a highly improbable, coincidental occurrence installation on the Orbiter. of a latent generic hardware fault. By definition, the avionics was in a fail-safe The second type of problem was not condition and preparations were begun in preparation for re-entry into Earth’s so easily discounted. That problem atmosphere. Upon cycling power, one of the general purpose computers remained could occur if one of the bus failed while the other resumed normal operation. Still, with that machine being suspect, terminal units failed, thus generating NASA made the decision to continue preparation for the earliest possible return. unrequested output transmissions. As part of the preparation, sensors such as the critical inertial measurement unit, Such transmissions, while originating which were originally assigned to the failed computer, were reassigned to a healthy one. from only one node in the network, would nevertheless propagate to each Thus, re-entry occurred with a three-computer configuration and a full set of inertial general purpose computer and disrupt measurement units, which represented a much more robust and safe configuration. the normal data bus signal levels and timing as seen by each general The loss of two general purpose computers over such a short period was later attributed purpose computer. It should be to spacelight effects on microscopic debris inside certain electronic components. Since mentioned that no amount of analysis all general purpose computers in the inventory contained such components, NASA or testing could eliminate the delayed subsequent flights until sufficient numbers of those computers could be purged possibility of a latent, generic software of the suspect components. error that could conceivably cause all

Engineering Innovations 249 four computers to fail. Thus, Development of hundred times the force of gravity over the program deemed that a backup almost 8 hours of an engine’s total computer, with software designed Space Shuttle planned operational exposure. For these and developed by an independent Main Engine reasons, the endurance requirements of organization, was warranted as a the instrumentation constituent materials safeguard against that possibility. Instrumentation were unprecedented.

This backup computer was an identical The Space Shuttle Main Engine Engine considerations such as general purpose computer designed to operated at speeds and temperatures weight, concern for leakage that “listen” to the flight data being unprecedented in the history of might be caused by mounting bosses, collected by the primary system and spaceflight. How would NASA and overall system fault tolerance make independent calculations that measure the engine’s performance? prompted the need for greater were available for crew monitoring. redundancy for each transducer. Only the on-board crew had the NASA faced a major challenge in the Existing supplier designs, where switches, which transferred control of development of instrumentation for available, were single-output all data buses to that computer, thereby the main engine, which required a new devices that provided no redundancy. preventing any “rogue” primary generation capable of measuring— A possible solution was to package computers from “interfering” with the and surviving—its extreme operating two or more sensors within a single backup computer. pressures and temperatures. NASA transducer. But this approach required not only met this challenge, the space special adaptation to achieve the Its presence notwithstanding, the agency led the development of such desired small footprint and weight. backup computer was never considered instrumentation while overcoming a factor in the fail-operational/fail-safe numerous technical hurdles. NASA considered the option of analyses of the primary avionics strategically placing instrumentation system, and—at the time of this devices and closely coupling them to the publication—had never been used in Initial Obstacles desired stimuli source. This approach that capacity during a mission. prompted an appreciation of the inherent The original main engine simplicity and reliability afforded by instrumentation concept called for low-level output devices. The compact flange-mounted transducers Summary avoidance of active electronics tended with internal redundancy, high to minimize electrical, electronic, and The shuttle avionics system, which stability, and a long, maintenance- electromechanical part vulnerability to was conceived during the dawn free life. Challenges presented hostile environments. Direct mounting of the digital revolution, consistently themselves immediately, however. of transducers also minimized the provided an exceptional level of Few instrumentation suppliers were amount of intermediate hardware dependability and flexibility without interested in the limited market capable of producing a catastrophic any modifications to either the basic projected for the shuttle. Moreover, system failure response. Direct architecture or the original innovative early engine testing disclosed that mounting, however, came at a price. In design concepts. While engineers standard designs were generally some situations, it was not possible to replaced specific electronic boxes incapable of surviving the harsh design transducers capable of surviving due to electronic component environments. Although the “hot side” the severe environments, making it obsolescence or to provide improved temperatures were within the realm of necessary to off-mount the device. functionality, they took great care jet engines, no sort of instrumentation Pressure measurements associated with to ensure that such replacements existed that could handle both high the combustion process suffered from did not compromise the proven temperatures and cryogenic icing or blockage issues when hardware reliability and resiliency provided by environments down to minus -253°C temperatures dropped below freezing. the original design. (-423°F). Vibration environments with Purging schemes to provide positive high-frequency spectrums extending flow in pressure tubing were necessary beyond commercially testable ranges to alleviate this condition. of 2,000 hertz (Hz) experienced several

250 Engineering Innovations Several original system mandates were resistance temperature devices being Weakness Detection later shown to be ill advised, such as an baselined for all thermal measurements. and Solutions early attempt to achieve some measure Aggressive engine performance and of standardization through the use of In some instances, the engine weight considerations also compromised bayonet-type electrical connectors. environment revealed weaknesses the optimal sensor mountings. For Early engine-level and laboratory testing not normally experienced in industrial example, it was not practical to include revealed the need for threaded or aerospace applications. Some the prescribed straight section of tubing connectors since the instrumentation hardware successfully passed upstream from measuring devices, components could not be adequately component-level testing only to particularly for flow. This resulted in shock-isolated to prevent failures experience problems at subsystem or the improper loading of measuring induced by excessive relative connector engine-level testing. Applied vibration devices, primarily within the propellant motion. Similarly, electromagnetic spectrums mimicked test equipment oxygen ducting. The catastrophic interference assessments and observed limitations where frequency ranges failure risks finally prompted the deficiencies resulted in a reconsideration typically did not extend beyond removal or relocation of all intrusive of the need for cable overbraiding to 2,000 Hz. The actual engine measuring devices downstream of the minimize measurement disruption. recognized no limits and continued to high-pressure oxygen turbopump. expose the hardware to energy above Problems also extended to the sensing Finally, the deficiencies of vibration even 20,000 Hz. Therefore, a critical elements themselves. The lessons of redline systems were overcome as sensor resonance condition might material incompatibilities or deficiencies processing hardware and algorithms only be excited during engine-level were evident in the area of resistance matured to the point where a real-time testing. Similarly, segmenting of temperature devices and thermocouples. synchronous vibration redline system component testing into separate The need for the stability of temperature could be adopted, providing a vibration, thermal, and fluid testing measurements led to platinum-element significant increase in engine reliability. deprived the instrumentation of experiencing the more-severe effect of combined exposures. Wire Failures The shuttle’s reusability revealed failure modes not normally Prompted encountered, such as those ascribed to the differences between flight System Redesign and ground test environments. It was subsequently found that the High temperature measurements microgravity exposure of each flight continued to suffer brittle allowed conductive particles within fine-element wire failures until the instruments to migrate in a manner . d e v condition was linked to operation r not experienced with units confined e s e r

above the material recrystallization s to terrestrial applications. Main engine t h g i r

temperature of 525°C (977°F) l pressure transducers experienced l A

.

e electrical shorts only during actual where excessive grain growth n y d t

e engine performance. During the would result. The STS-51F (1985) k c o R

countdown of Space Transportation in-flight engine shutdown caused y e n t i System (STS)-53 (1 992), a h

by the failure of multiple resistance W

&

high-pressure oxidizer turbopump t t a

temperature devices mandated a r P

secondary seal measurement output redesign to a thermocouple-based © High temperatures in some engine operating pressure transducer data spike almost system that eliminated the wire environments caused fine wires used in temperature triggered an on-pad abort. Engineers embrittlement problem. devices to become brittle, thereby leading to failures. used pressure transducers screened

Engineering Innovations 251 by particle impact noise detection Expectations Exceeded Unprecedented and microfocus x-ray examination on an interim basis until a hardware As the original main engine design Rocket Engine redesign could be qualified. life of 10 years was surpassed, part Fault-Sensing System obsolescence and aging became a concern. Later designs used more Effects of Cryogenic current parts such as industry-standard The Space Shuttle Main Engine Exposure on Instrumentation electrical connectors. Some suppliers (SSME) was a complex system that chose to invest in technology driven used liquid hydrogen and liquid oxygen Cryogenic environments revealed a by the shuttle, which helped to ease as its fuel and oxidizer, respectively. host of related material deficiencies. the program’s need for long-term The engine operated at extreme levels Encapsulating materials—necessary part availability. of temperature, pressure, and turbine to provide structural support for fine speed. At these levels, slight material wires within speed sensors—lacked The continuing main engine ground defects could lead to high vibration in resiliency at extreme low temperatures. test program offered the ability to the turbomachinery. Because of the The adverse effects of inadvertent use ongoing hot-fire testing to ensure potential consequences of such exposure to liquefied gases within the that all flight hardware was conditions, NASA developed vibration shuttle’s aft compartment produced sufficiently enveloped by older monitoring as a means of monitoring functional failures due to excessively ground test units. Tracking algorithms engine health. cold conditions. In April 1991 , STS-37 and extensive databases permitted was scrubbed when the high-pressure such comparisons. The main engine used both low- and oxidizer turbopump secondary seal high-pressure turbopumps for fuel and Industry standards called for periodic pressure measurement became erratic oxidizer propellants. Low-pressure recalibration of measuring devices. due to the damaging effects of turbopumps served as propellant boost NASA excluded this from the Space cryogenic exposure of a circuit board. pumps for the high-pressure Shuttle Main Engine Program at turbopumps, which in turn delivered Problems with cryogenics also its inception to reduce maintenance fuel and oxidizer at high pressures to extended to the externals of the for hardware not projected for use the engine main combustion chamber. instrumentation. Cryopumping— beyond 10 years. In practice, the the condensation-driven pumping hardware life was extended to the The high-pressure pumps rotated at mechanism of inert gases such as point that some engine components speeds reaching 36,000 rpm on the fuel nitrogen—severely compromised the approached 40 years of use before side and 24,000 rpm on the oxidizer ability of electrical connectors to the final shuttle flight. Aging studies side. At these speeds, minor faults were maintain continuity. The normally validated the stable nature of exacerbated and could rapidly propagate inert conditions maintained within the instruments never intended to fly so to catastrophic engine failure. engine system masked a problem with long without recalibration. During the main engine’s 30-year residual contamination of glassed ground test program, more than 40 resistive temperature devices used for Summary major engine test failures occurred. cryogenic propellant measurements. High-pressure turbopumps were the Corrosive flux left over from the While initial engine testing disclosed source of a large percentage of these manufacturing process remained that instrumentation was a weak failures. Posttest analysis revealed that dormant for years until activated during link, NASA implemented innovative the vibration spectral data contained extended exposures to the humid and successful solutions that resulted potential failure indicators in the form conditions at the launch site. STS-50 in a suite of proven instruments of discrete rotordynamic spectral (1 992) narrowly avoided a launch delay capable of direct application on future signatures. These signatures were prime when a resistive temperature device rocket engines. indicators of turbomachinery health and had to be replaced just days before the could potentially be used to mitigate scheduled launch date.

252 Engineering Innovations catastrophic engine failures if assessed at high speeds and in real time. NASA recognized the need for a high-speed digital engine health management system. In 1996, engineers at Marshall Space Flight Center (MSFC) developed the Real Time Vibration Monitoring System and integrated the system into the main engine ground test program. The system used data from engine-mounted accelerometers to monitor pertinent spectral signatures. Spectral data were produced and assessed every 50 milliseconds to determine whether specific vibration amplitude thresholds were being violated. NASA also needed to develop software capable of discerning a failed sensor from an actual hardware failure. MSFC engineers developed the sensor validation algorithm—a software algorithm that used a series of rules and . d e v threshold gates based on actual r e s e r

vibration spectral signature content to s t h g i r

evaluate the quality of sensor data l l A

. every 50 milliseconds. e n y d t e k Outfitted with the sensor validation c o R

y algorithm and additional software, the e n t i h

Real Time Vibration Monitoring W NASA’s Advanced Health Monitoring System software was integrated with the Space &

t t

System could detect and diagnose Shuttle Main Engine controller (shown by itself and mounted on the engine) in 2007. a r P

pertinent indicators of imminent main © engine turbomachinery failure and initiate a shutdown command within system, supported the main engine This led to the concept of the SSME 100 milliseconds. ground test program throughout the Advanced Health Management shuttle era. System as a means of extending The Real Time Vibration Monitoring this protection to the main engine System operated successfully on more To prove that a vibration-based, during ascent. than 550 main engine ground tests high-speed engine health management with no false assessments and a 100% system could be used for flight The robust software algorithms and success rate on determining and operations, NASA included a redline logic developed and tested for disqualifying failed sensors from its subscale version of the Real Time the Real Time Vibration Monitoring vibration redlines. This, the first Vibration Monitoring System on System were directly applied to the high-speed vibration redline system Technology Flight Experiment 2, Advanced Health Management System developed for a liquid engine rocket which flew on STS-96 (1 999). and incorporated into a redesigned

Engineering Innovations 253 version of the engine controller. Calibration of transoceanic abort landing sites–– The Advanced Health Management intended for emergencies when the System’s embedded algorithms Navigational Aides shuttle lost a main engine during ascent continuously monitored the Using Global and could not return to KSC––were high-pressure turbopump vibrations located in Zaragoza and Moron in generated by rotation of the pump Positioning Computers Spain and in Istres in France. Former shafts and assessed rotordynamic transoceanic abort landing sites performance every 50 milliseconds. The crew members awakened at included: Dakar, Senegal; Ben Guerir, The system was programmed to initiate 5:00 a.m. After 10 days in orbit, they Morocco; Banjul, The Gambia; a shutdown command in fewer than were ready to return to Earth. By Honolulu, Hawaii; and Anderson Air 120 milliseconds if vibration patterns 7:45 a.m., the payload bay doors were Force Base, Guam. NASA certified indicated an instability that could lead closed and they were struggling into each site. to catastrophic failure. their flight suits to prepare for descent. The system also used the sensor- The commander called for a weather Error Sources validation algorithm to monitor sensor report and advice on runway selection. Because the ground portion of the quality and could disqualify a failed The shuttle could be directed to any one Microwave Scanning Beam Landing sensor from its redline suite or of three landing strips depending on and Tactical Systems deactivate the redline altogether. weather at the primary landing site. contained moving mechanical Throughout the shuttle era, no other Regardless of the runway chosen, the components and depended on liquid engine rocket system in the descent was controlled by systems microwave propagation, inaccuracies world employed a vibration-based capable of automatically landing the could develop over time that might health management system that used Orbiter. The Orbiter commander took prove detrimental to a shuttle landing. discrete spectral components to verify cues from these landing systems, For example, antennas could drift out of safe operation. controlled the descent, and dropped the to safely land the Orbiter. mechanical adjustment. Ground settling During their approach to the landing and external environmental factors Summary site, the Orbiter crew depended on a could also affect the system’s accuracy. complex array of technologies, Multipath and refraction errors could The Advanced Health Management including a Tactical Air Navigation result from reflections off nearby System, developed and certified by System and the Microwave Scanning structures, terrain changes, and Pratt & Whitney Rocketdyne (Canoga Beam Landing System, to provide day-to-day atmospheric variations. Park, California) under contract to precision navigation. These systems NASA, flew on numerous shuttle Flight inspection data gathered by the were located at each designated landing missions and continued to be active on NASA calibration team could be used site and had to be precisely calibrated all engines throughout the remainder to determine the source of these errors. to ensure a safe and smooth landing. of the shuttle flights. Flight inspection involved flying an aircraft through the landing system Touchdown Sites coverage area and receiving time-tagged data from the systems Shuttle runways were strategically under test. Those data were compared located around the globe to serve to an accurate aircraft positioning several purposes. After a routine reference to determine error. Restoring mission, the landing sites included integrity was easily achieved through Kennedy Space Center (KSC) in system adjustment. Florida, Dryden Flight Research Center in California, and White Sands Test Facility in New Mexico. The

254 Engineering Innovations Global Positioning Satellite Summary Position Reference NASA developed unique for Flight Inspection instrumentation and software supporting Technologies were upgraded several the shuttle navigation aids flight times since first using the Global inspection mission. The agency Positioning Satellite (GPS)-enabled developed aircraft pallets to operate, flight inspection system. The flight control, process, display, and archive inspection system used an aircraft data from several avionics receivers. GPS receiver as a position reference. They acquired and synchronized Differences between the system under measurements from shuttle-unique test and the position reference were avionics and aircraft platform recorded, processed, and displayed in avionics with precision time-tagged real time on board the aircraft. An GPS position. NASA developed data aircraft position reference used for processing platforms and software flight inspection had to be several times algorithms to graphically display more accurate than the system under and trend landing system performance test. Stand-alone commercial GPS in real time. In addition, a graphical systems did not have enough accuracy pilot’s display provided the aircraft for this purpose. Several techniques pilot with runway situational could be used to improve GPS awareness and visual direction cues. positioning. Differential GPS used a The pilot’s display software, integrated ground GPS receiver installed over a with the GPS reference system, known surveyed benchmark. Common resulted in a significant reduction in mode error corrections to the GPS mission flight time. position were calculated and broadcast over a data link to the aircraft. Synergy With the Federal After the received corrections were Administration applied, the on-board GPS position In early 2000, NASA and the Federal accuracy was within 3 m (10 ft). Aviation Administration (FAA) entered A real-time accuracy within 10 cm (4 into a partnership for flight inspection. in.) was achieved by using a The FAA had existing aircraft assets to carrier-phase technique and tracking perform its mission to flight-inspect cycles of the L-band GPS carrier signal. US civilian and military navigation NASA built several versions of the aids. The FAA integrated NASA’s flight inspection system customized carrier-phase GPS reference along with to different aircraft platforms. Different shuttle-unique avionics and software NASA aircraft were used based on algorithms into its existing control aircraft availability. These aircraft and display computers on several include NASA’s T-39 jet (Learjet), a flight-inspection aircraft. NASA P-3 turboprop, several C -1 30 The NASA/FAA partnership produced aircraft, and even NASA’s KC -1 35. increased efficiency, increased Each aircraft was modified with shuttle capability, and reduced cost to the landing system receivers and antennas. government for flight inspection of the Several pallets of equipment were shuttle landing aids. configured and tested to reduce the installation time on aircraft to one shift.

Engineering Innovations 255