Learning from Other People's Mistakes
Total Page:16
File Type:pdf, Size:1020Kb
Learning from Other People’s Mistakes Most satellite mishaps stem from engineering mistakes. To prevent the same errors from being repeated, Aerospace has compiled lessons that the space community should heed. Paul Cheng and Patrick Smith he computer onboard the Clementine spacecraft froze immediately after a thruster was commanded to fire. A “watchdog” algorithm designed to stop “It’s always the simple stuff the thrusters from excessive firing could not execute, and CTlementine’s fuel ran out. !e mission was lost. Based on that kills you…. With all the this incident, engineers working on the Near Earth Asteroid Rendezvous (NEAR) program learned a key lesson: the testing systems, everything watchdog function should be hard-wired in case of a com- puter shutdown. As it happened, NEAR suffered a similar computer crash during which its thrusters fired thousands looked good.” of times, but each firing was instantly cut off by the still- —James Cantrell, main engineer operative watchdog timer. NEAR survived. As this example illustrates, insights from past anoma- for the joint U.S.-Russian Skipper lies are of considerable value to design engineers and other mission, which failed because program stakeholders. Information from failures (and near its solar panels were connected failures) can influence important design decisions and pre- vent the same mistakes from being made over and over. backward (Associated Press, 1996) Contrary to popular belief, satellites seldom fail because of poor workmanship or defective parts. Instead, most failures are caused by simple engineering errors, such as an over- looked requirement, a unit mix-up, or even a typo in a docu- ment. Such blunders are often too subtle to be caught during other factors that may warrant correction. !ese notes are all routine verification and review because flight items can only collected in a comprehensive database, which now archives be checked against known requirements. But by understand- more than 20,000 launch vehicle and satellite anomalies that ing the types of errors that can occur, mission planners can occurred in the factory, at the launch site, during flight, and adjust their review processes to more effectively screen them in operation. out. For example, all off-nominal telemetry readings from The Engineering Database a launch are logged into the database. A technical expert is assigned to explain each anomaly and why it occurred. As part of its overall mission assurance role, Aerospace in- Every investigation is tracked by the program office as part vestigates anomalies on national security space missions and of Aerospace’s flight-readiness verification responsibilities. uses this information to help program managers improve Fault-management lessons, reliability impacts, and suggested their systems and processes. !roughout the development, corrective actions are documented. launch, and deployment of defense space systems, engineers !is rigorous postflight analysis was largely prompted by at Aerospace analyze all anomaly reports to identify engi- the failure in 1999 of a Defense Support Program launch neering errors, workmanship escapes, model deficiencies, (DSP-19). !e transfer vehicle for this mission, the Inertial test inadequacies, defective parts, hostile environments, and Upper Stage (IUS), used thermal tape on the connector plug housing. !e thermal tape was supposed to be wrapped with enough clearance to let the harness disengage. !e assembly instruc- tions stated that the tape shall be applied “within 0.5 inches of the mounting bracket flange” (instead of, for example, “no closer than 0.5 inches and no farther than 1.0 inch”). Unaware that the parts had to unfas- ten, and thinking that they should wrap the tapes as tightly as possible, the technicians applied the tapes so close to the flange that the separator jammed, preventing stage sep- aration. Afterwards, engineers reviewing the telemetry from previous IUS flights realized that the connector had jammed every time. In fact, seven of the previous flights dodged failure only because the taped connectors were jerked apart when they hit the allow- able stops. Unfortunately, the warning signs in the telemetry data were not heeded. e c r o F r Gleaning Lessons from Failures i A S To disseminate information like this in a U format that is readily absorbed, Aerospace The Genesis capsule, embedded in the sands of Utah after its parachute system failed to deploy in September began publishing a series of “Space Systems 2004. The Mishap Investigation Board said the likely cause was a design error involving the orientation of Engineering Lessons Learned” bulletins. To gravity-switch devices. date, more than 120 of these papers have been produced. Each tells a failure story, dissipation; one look at the hardware would verified, but not the direction. Once in orbit, spotlighting three questions: How did the have made it obvious. Another asks, “Have the battery drained, scuttling the mission. mistake occur? What prevented its detec- designs been analytically established before More recently, the Genesis spacecraft tion? Why did it bring down the entire testing?” !is is prompted by an incident in crashed. Genesis brought back solar-wind system? which the spacecraft tumbled after deploy- samples. During reentry, aerodynamic !e reports also spell out the specific ment. !e magnetic torque rods were set deceleration was supposed to cause a para- practices that could have prevented the during test instead of being determined by chute to unfold. Genesis’s avionics design mishap. For example, one lesson traced an analysis. Because the test engineer did not was derived from an earlier mission, Star- instrument’s power-supply malfunction in realize the Earth’s magnetic North pole is in dust, but was too complex to fit into Star- space. !e problem was not caught on the Antarctica, he set the phasing wrong. dust’s one-box package. Genesis designers ground because the test set supplied backup Five Common Mistakes cut and pasted Stardust’s schematic into power, and would have been avoided if the a new two-box design. !e Stardust unit test equipment provided metering to show to Look Out For had been spin tested, but the two boxes in the unit was unexpectedly drawing cur- !e most common mistakes are gathered Genesis were difficult to spin together. So, rent from it. In another case, a payload was in an Aerospace report, “Five Common believing that the design was flight proven, damaged during thermal vacuum testing Mistakes Reviewers Should Look Out For.” Genesis engineers simply verified the new because the test cable, which was not space- !is document, available to U.S. govern- assembly “by inspection.” Unfortunately, qualified, induced multipaction breakdown. ment agencies and contractors, includes the nobody knew that a pair of deceleration Obviously, tools used in simulated space “100 Questions” and all the lesson bulletins sensors were direction-sensitive—even the environments must be space-rated. published through June 2007. Stardust drawings did not flag any polar- To help reviewers increase the odds of !is report recommends that reviewers ity constraint. !e sensors were by chance finding mistakes, Aerospace also parsed first ponder, “Could the sign be wrong?” turned sideways in the new layout, and as a through the lessons-learned bulletins to Sign errors involving orientation and phas- result, the parachute could not open. generate a reviewer’s checklist titled, “100 ing (polarity) of equipment have caused nu- !e second major questions is, “How Questions for Technical Review.” Reviewers merous failures. For example, the TERRI- will last-minute configuration changes be will more than earn their pay if just one of ERS spacecraft was lost because one of its verified?” Satellites are frequently modified these items gets the response: “You know, torque coils had to be installed with a phase after factory testing. Placeholder blankets we hadn’t thought about that—we’d better opposite of that of the other two coils—but need to be swapped out, flight connectors check it!” the change was not incorporated into flight mated, and databases updated. Some items, For example, one review question asks, software. Likewise, a design mistake caused such as brackets to secure hardware during “Has the analyst inspected the actual the solar panels on the Skipper spacecraft ascent, have to be installed at the launch hardware?” !is question derives from an to be connected backward. During system site. Nonflight items such as test plugs and incident where a satellite failed because integration, the magnitude of current flow dust covers must be removed. Late changes, the solar-array thermal model did not ac- between the solar array and battery was especially those made in the heat of a count for the harnesses that prevent heat countdown, have caused several failures, in Crosslink Fall 2007 r 21 U.S. Government Satellite and Launch Vehicle Failures from 1990 to 2006 These charts show U.S. Government satellite (below) and launch vehicle (facing page) failures from 1990 to 2006. In contrast to common beliefs, mishaps are seldom due “Space is unforgiving; thousands of good to poor workmanship, defective parts, or environmental factors. Notice that engineering mistakes—such as a typo decisions can be undone by a single engineer- (Contour), an overlooked requirement (MPL), and a unit mix-up (MCO)—caused the majority of failures. These ing flaw or workmanship error, and these embarrassing snafus are often so subtle that they are not flaws and errors can result in catastrophe.” caught during verification and review. — Defense Science Board, 2003 Launch Engineering Technology Date Program Cause Mistake Surprise 04/90 Hubble Flawed manufacturing equipment misshaped the mirror. Quality assurance relied on x the same equipment and could not perceive the mistake, even though independent checks indicated a defect. 07/92 TSS-1 Tether deployment mechanism was jammed by a bolt added at the launch site. x 09/92 Mars Corroded braze jammed a regulator, causing a breach of the propulsion line.