<<

NASA Task Order NNN06AA01C

Interstellar Probe Reliability Engineering Discussion

Glen H. Fountain, Clayton A. Smith, Sally Whitley, Steve Jaskulek

The Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA.

Acknowledgement: Almost 200 professional scientists and engineers world-wide actively working in support for Interstellar Exploration

17 November 2020 1 Reliability Engineering Topics NASA Task Order NNN06AA01C

§ Challenge of 50 years

§ Interplanetary mission duration experience

§ Overall framework to assess mission reliability § Robustness of Science Requirements and Instrument Suite § Reliability of bus

§ Physics of failure methods

17 November 2020 2 Longevity Study Goal and Relevance to Other Long Duration Missions NASA Task Order NNN06AA01C

§ The Space Mission Life Time Study is part of the Interstellar Probe (ISP) initiative to study the environment well beyond the § An aspirational goal is to operate for 50 years or longer § The study has two principal goals: § Identify the processes for both the flight system, the supporting ground infrastructure, and mission staffing to assure a successful outcome when mission success requires 50 plus years of successful operation § Provide information about current and past missions with supporting analysis that will provide stakeholders with the confidence necessary to support such a mission § Interstellar Probe can provide a path and the basis for community support for other proposed missions § Many missions under study for the upcoming National Academy Decadal Survey may require mission durations exceeding 20 years § Electronic parts industry is dominated by short life-span consumer products § Historical record will need to be augmented with analysis and testing to provide confidence for decades long missions § Real trade space exists for space-based infrastructure of longevity vs tech refresh § Failure free / robust designs can support human missions of long duration where sparing up-mass and volume represent significant constraints

17 November 2020 3 Reliability Study Goals NASA Task Order NNN06AA01C

§ Longevity and reliability analyses are part of the Interstellar Probe (ISP) initiative to study the environment well beyond the Heliosphere with an aspirational goal is to operate for 50 years or longer

§ Reliability assessment questions § What does the historical record tell us about long duration missions? § What are the major technical challenges in building a long lasting spacecraft? § What is the analytical framework for providing sufficient confidence to decision-makers and the science community?

§ Assess the reliability of a baseline design § Identify risk drivers and mitigations § Quantify the uncertainty of such a system

17 November 2020 4 Building to Last NASA Task Order NNN06AA01C

§ Can we make systems that last long times without maintaining them?

Oxford Bell (The Clarendon Dry Pile) Setup in 1840 is still ringing. The frequency of its oscillation is about 2Hz; so far the bells have been rung on the order of 10 billion times.

Voyager 1 & 2 spacecraft – Launched in 1977 and still operational.

Source: University of Oxford Department of Physics webpage § These systems were not designed to last this long § Survivors bias – caution on taking these examples as proof

17 November 2020 5 Literature Review NASA Task Order NNN06AA01C

§ Identified 50+ papers relating to spacecraft lifetimes § Examples of missions discussed:

§ Matsumoto, S.K., “Voyager Interstellar Mission a Very Old Spacecraft on a Very Long Mission” (2016). § Top level overview of operating configurations, FSW modifications, transitions of Ground Sys., etc.

§ Brown, N., N. Cohen, M. Cavanaugh and G. Richardson, “Spacecraft Lifetime Study (2018). § Analysis of 283 spacecraft launched between 1980 – 2010, satellite life has increased and exceeds design life, study does not properly take into account spacecraft that are designed for lifetimes greater than 8 years

§ Weaver et al., “In-Flight Performance and Calibration of the Long Range Reconnaissance Imager (LORRI) for the Mission” (2019). § Documents the LORRI performance from shortly after launch (2006) through early 2019, data demonstrates no change in instrument performance (> 1%) over that period.

17 November 2020 6 Literature Review NASA Task Order NNN06AA01C

§ Unlike the often-assumed constant failure rate models, spacecraft failure rates decrease over time § MIL-HDBK-217F is a defacto standard for determining failure rates and reliability for many systems, pervasive throughout DoD and government (lots more latter)

§ Better modeled as a Weibull distribution Sarsfield, L. P. (1998). The Cosmos on a Shoestring: Small Spacecraft for Space and Science. Santa Monica, CA, RAND Corporation.

§ Spacecraft last longer than required design life § Plot shows Actual Life (vertical axis) versus Design Life (horizontal axis) for all satellites § Points above the 45° upward sloping light dotted line are satellites that have exceeded their design life § Red circles denote satellites that have either died due to technical failures of components, depletion of station keeping fuel, or loss of service/mission demand

Fox, G., R. Salazar, H. Habib-Agahi and G. F. Dubos (2013). A satellite mortality study to support space systems lifetime prediction. Aerospace Conference, 2013 IEEE, IEEE.

17 November 2020 7 Time Dependent Failure Models NASA Task Order NNN06AA01C

§ Weibull fits generated for mission types and systems § Saleh, Joseph Homer, and Jean-François Castet, ”Spacecraft reliability and multi-state failures: a statistical approach.” John Wiley & Sons, 2011.

§ Same behavior seen in Interplanetary spacecraft data

17 November 2020 8 Historical Record NASA Task Order NNN06AA01C

§ Interplanetary missions § 179 classified as interplanetary* § 71 missions after removal of launch failures, technology demonstrators, short-lived landers/impactors, etc. § Represents nearly 725 years of on-orbit experience

§ Large majority are still active or ended without failure

§ Mission failures not dominated by any one cause

§ Full analysis to be published at RAMS conference, January 2021

* SpacTrak database. https://www.seradata.com/products/spacetrak/ 17 November 2020 9 Examination of Mission Duration NASA Task Order NNN06AA01C

§ History suggests that spacecraft Design Life vs Actual Life - Interplanetary Spacecraft operational lifetimes frequently 45 Voyagers 1&2 reflect intentional mission design decisions rather than poor reliability 40

or limits in engineering capability 35 Pioneer 6 § Spacecraft tend to last much longer that Explorer 50

design life 30 § Majority of failures occurred after design

life 25 Retired Failed 20 Cassini Active Actual Life [years] Design Life = Actual Life 15 New Horizons 10

5

0 0 2 4 6 8 10 12 14 16 Design Life [years]

17 November 2020 10 Examination of Mission Duration NASA Task Order NNN06AA01C

§ Mission Duration § Since many missions still operational or operational when retired, cannot take average of mission times § Survival analysis is reliability engineering technique to evaluate special type of random variable of positive values with censored observations, of which failure time or survival time events are the most common § A particular challenge in analyzing survival data is information censoring, i.e., the observation of survival times is often incomplete § Right censoring or truncation where the observation is terminated at a fixed time 90% Confidence limits § Spacecraft retired before a failure was observed

§ Spacecraft still active, no failure observed Percentage of spacecraft survived

§ Survival Analysis shows mission duration to be Estimated Weibull distributed Weibull distribution § Shape factor < 1 indicating an decreasing failure rate over time § Consistent with finding in literature § Mean duration ~ 53 years

17 November 2020 11 Examination of Mission Duration NASA Task Order NNN06AA01C

§ With a majority of data set being right censored, Bayesian analysis provides quantification of the uncertainty in the results

Joint probability distribution of Weibull parameters 5th = 37 years 95th = 150 years

§ Caution with these results as they do not account for real physical limitations of hardware

17 November 2020 12 Reliability Modeling NASA Task Order NNN06AA01C

§ To augment Reliability engineering products, special attention is given to failure mechanisms § Develop an understanding for how devices and materials can fail in the presence of various radiation and thermal environments and characterize the physics of degradation processes out to 50 years § Provide inputs to testing campaign to assure the project and sponsor of viability at 50 years § Design tests to discover behavior of systems, subsystem, components, and materials at End of Life § Include tests to characterize lifetime uncertainties for components and materials § Employ various acceleration methods to test at 50 years § Identify dependencies when various redundancy and/or hibernation schemes are explored § Evaluate system resilience with a view including potential failures, health monitoring, and fault management behavior

17 November 2020 13 Reliability Modeling NASA Task Order NNN06AA01C

§ Over-arching model ties spacecraft, instruments, and science

objectives together in order to evaluate the combination of failures Does the Yes Loss of that represent a loss of mission Spacecraft Bus Fail Mission

No § Event sequence diagram shows logical flow for potential end-states Are Threshold Yes (diamonds) Mission Objectives Lost

No § Each question shown is modeled with a Fault Tree § Provides qualitative assessment of failure combinations Are Baseline Yes Threshold Mission Objectives Mission § Provides structure to eventually quantify reliability Lost

Boxes represent Fault tree No

analysis that address the Baseline stated question Mission

17 November 2020 14 Mapping Science Requirements to Instruments NASA Task Order NNN06AA01C

§ Map items is the Science Traceability Matrix to instruments and construct a logic model to evaluate failure combination impact to Threshold mission

§ Fault-Tree Analysis is the logic construct of the requirements § Informs the design team of potential payload redundancies

§ Success criteria is defined as meeting Threshold Mission § Definition of Threshold Mission: the minimum set of science requirements to be met for the mission to still be worth launching

§ Threshold Mission Success Criteria as follows: The Interstellar Probe mission shall § Answer at least two Science Questions under each Science Objective, and § Under each chosen Science Question, meet at least 1 (one) Priority 1 Measurement Objective

17 November 2020 15 Fault Trees NASA Task Order NNN06AA01C

Undesired event occurs Undesired § Logical model of the relationship of the undesired if Instrument A or Event event to more basic events Instrument B fail OR Gate § Top event of the fault tree is the undesired event § Bottom of the fault tree are the causal basic events § The logical relationships of the events are shown by logical symbols or gates Loss of Loss of Instrument A Instrument B § Graphical representation of Boolean expression of failure in terms of basic events Basic Event § Analysis is in failure space § Gates § OR – either fails § AND – both fails Undesired event occurs § k of m – k fails Undesired if Instrument A and Event § Qualitative Results Instrument B fail AND Gate § A list of failure events such that if they occur then so does the top event (cut set) § Reduction using Boolean logic provides a list of minimal, necessary and sufficient Loss of Loss of cut sets that is a smallest combination of events causing the top event Instrument A Instrument B Decomposition of Science Matrix NASA Task Order NNN06AA01C

§ Example decomposition* Mission Phases

Inner Heliosphere Phase 1 Phase 3 Measurement Outer Heliosphere Phase 2 (70-250 AU) Science Science (1-70 AU) (250 AU and beyond) Goals Objectives Objectives Questions (Methodology) UV UV IDA VIR IRM IDA VIR IRM PLS EPS UV PLS EPS CRS NMS LYA NAC CRS NMS LYA NAC MAG PWS IDA VIR MAG PWS LYA IRM PLS EPS CRS NAC ENA-Hi ENA-M MAG PWS NMS ENA-Hi ENA-M ENA-Lo ENA-Lo ENA-M ENA-Hi ENA-Lo

Priority 1: Determine the thickness of the heliosheath by detecting the Unders tand our TS and HP with in-s itu Determine the physical What is the size, shape, • Helios phere as a measurements of magnetic fields, processes that shape the and s tructure of the Habitable Astros phere plasma density and temperature, helios phere and their helios phere? and its Place in the energetic particles, PUIs, GCRs. • • • • global manifestation Galaxy Image the heliosheath structure in ENAs from an external vantage point ° beyond the HP. * Row 6 in Interstellar_Probe_STM_Master_v11.1.WORKING.xlsx

§ Dots mean that failure of MAG, PLS, EPS, CRS, or both ENA-H and ENA-M fail in Phase 2, then the measure will not be achieved § No dots in Phase 1 signifies that measurements are not needed but are needed for Phase 2

17 November 2020 17 Mission Phases

Inner Heliosphere Phase 1 Interstellar Medium Phase 3 Measurement Outer Heliosphere Phase 2 (70-250 AU) Science Science (1-70 AU) (250 AU and beyond) Goals Objectives Objectives Questions (Methodology) UV UV IDA VIR IRM IDA VIR IRM PLS EPS UV PLS EPS CRS NMS LYA NAC CRS NMS LYA NAC MAG PWS IDA VIR MAG PWS LYA IRM PLS EPS CRS NAC ENA-Hi ENA-M MAG PWS NMS ENA-Hi ENA-M ENA-Lo ENA-Lo ENA-M ENA-Hi All 3 objectives ENA-Lo NASA Task Order NNN06AA01C Loss of Threshold Priority 1: Determine the thickness must be met Science of the heliosheath by detecting the Unders tand our TS and HP with in-s itu Determine the physical What is the size, shape, • Helios phere as a measurements of magnetic fields, processes that shape the and s tructure of the Habitable Astros phere plasma density and temperature, helios phere and their helios phere? and its Place in the energetic particles, PUIs, GCRs. • • • • global manifestation 2 of 8 questions Galaxy Image the heliosheath structure in ENAs from an external vantage point ° must be Loss of Science beyond the HP. answered Objective 1 7 8

Loss of Science Question 1

Loss of Only 1 Priority 1 Measurement 3 measurement identified

Fault tree decomposes all elements of science Loss of Loss of Loss of Loss of Loss of Loss of requirements down to the MAG PLS EPS CRS ENA-H ENA-M instruments

17 November 2020 18 Mission Phases

Inner Heliosphere Phase 1 Interstellar Medium Phase 3 Measurement Outer Heliosphere Phase 2 (70-250 AU) Science Science (1-70 AU) (250 AU and beyond) Goals Objectives Objectives Questions (Methodology) UV UV IDA VIR IRM IDA VIR IRM PLS EPS UV PLS EPS CRS NMS LYA NAC CRS NMS LYA NAC MAG PWS IDA VIR MAG PWS LYA IRM PLS EPS CRS NAC ENA-Hi ENA-M MAG PWS NMS ENA-Hi ENA-M ENA-Lo ENA-Lo ENA-M ENA-Hi Loss of Threshold ENA-Lo NASA Task Order NNN06AA01C Priority 1: Determine the thickness Science of the heliosheath by detecting the Unders tand our TS and HP with in-s itu Determine the physical What is the size, shape, • Helios phere as a measurements of magnetic fields, processes that shape the and s tructure of the Habitable Astros phere plasma density and temperature, helios phere and their helios phere? and its Place in the energetic particles, PUIs, GCRs. • • • • global manifestation Galaxy Image the heliosheath structure in ENAs from an external vantage point ° Loss of Science beyond the HP. Objective 1

7 8

Loss of Science • Measurement not Question 1 required in Phase 1

• In order to perform in Loss of Phase 2 an instrument Measurement 3 must have survived Phase 1

• A failure to perform is a failure in Phase 2 or Loss of Loss of Loss of Loss of Loss of Loss of MAG PLS EPS CRS ENA-H ENA-M failure in Phase 1

Loss of Loss of Loss of Loss of Loss of Loss of Loss of Loss of Loss of Loss of Loss of Loss of MAG in MAG in PLS in PLS in EPS in EPS in CRS in CRS in ENA-H in ENA-H in ENA-M in ENA-M in Phase 1 Phase 2 Phase 1 Phase 2 Phase 1 Phase 2 Phase 1 Phase 2 Phase 1 Phase 2 Phase 1 Phase 2

17 November 2020 19 Cut Sets for Example NASA Task Order NNN06AA01C Include measurement 3 text § Evaluating this Science Question “What is the size, shape, and structure of the heliosphere?” yields the following combination of Basic Events that cause failure of measure § MAG-Phase 1 § MAG-Phase 2 § PLS-Phase 1 § PLS-Phase 2 § EPS-Phase 1 Single point failures § EPS-Phase 2 Internal redundancies within instruments not accounted for § CRS-Phase 1 § CRS-Phase 2 § ENA-H-Phase 1 & ENA-M-Phase 1 § ENA-H-Phase 1 & ENA-M-Phase 2 § ENA-H-Phase 2 & ENA-M-Phase 1 Two independent failures § ENA-H-Phase 2 & ENA-M-Phase 2

17 November 2020 20 Result From Current STM NASA Task Order NNN06AA01C

Cut Sets for Loss of Threshold Mission § Process is repeated for every Science Objective Singles Doubles Triples through all Priority 1 measurements identified in the MAG-1 ENA-HI-1, PLS-1 CRS-2, PLS-1, PWS-1 MAG-2 LYA-1, PLS-1 CRS-2, PLS-2, PWS-1 Science Traceability Matrix MAG-3 EPS-1, PLS-1 CRS-2, PLS-1, PWS-2 ENA-HI-2, PLS-1 CRS-2, PLS-2, PWS-2 ENA-HI-3, PLS-1 CRS-1, PLS-1, PWS-1 EPS-2, PLS-1 CRS-1, PLS-2, PWS-1 EPS-3, PLS-1 CRS-1, PLS-1, PWS-2 § Fault tree is evaluated for each objective and for the LYA-2, PLS-1 CRS-1, PLS-2, PWS-2 EPS-1, LYA-1 CRS-3, PLS-1, PWS-1 mission according to success criteria roles EPS-2, LYA-1 CRS-3, PLS-2, PWS-1 ENA-M-1, EPS-1 CRS-3, PLS-1, PWS-2 ENA-M-2, EPS-1 CRS-3, PLS-2, PWS-2 ENA-M-3, EPS-1 ENA-M-1, EPS-2 § Boolean reductions generates cut-sets ENA-M-2, EPS-2 ENA-M-3, EPS-2 § 3 single point failures EPS-1, LYA-2 EPS-2, LYA-2 § 28 doubles ENA-HI-1, PLS-2 LYA-1, PLS-2 § 12 triples EPS-1, PLS-2 ENA-HI-2, PLS-2 § 307 with combinations of 4 or more ENA-HI-3, PLS-2 EPS-2, PLS-2 EPS-3, PLS-2 LYA-2, PLS-2 EPS-1, PLS-3 EPS-2, PLS-3

17 November 2020 21 Lessons Learned From NASA Task Order NNN06AA01C § Similar process with done for Parker Solar Probe § 3 Objectives with 3 science questions each § Needed to answer 7 of 9 questions § At instrument and sensor level no single point failures identified § Forced science team to evaluate alternative methods of derivative analyses using other sensors § Further decomposition into instruments showed failure of a power supply within an instrument would lead to loss of more than 2 questions § Separated power supply into 2 units with separate interfaces to sensors § Removed single point failure threat to loss of science

17 November 2020 22 Spacecraft Reliability Model NASA Task Order NNN06AA01C

§ Model of spacecraft bus is also a fault tree decomposed to all systems § Baseline is design includes multiple redundancies throughout § Avionics contains redundant processors, memory, and network paths § Telecom system has internally redundant paths to all antenna from 2 radios § GNC sensors are redundant § Note two RTGs are not for redundancy but for power at end-of-life

Loss of Spacecraft Bus

Loss of Loss of Loss of Guidance, Loss of Loss of Loss of Mechanisms & Avionics Navigation, & Power Telecom Propulsion Structure Control

17 November 2020 23 Telecom System Example NASA Task Order NNN06AA01C

§ Complex system is modeled to identify single point failures and combinations of failures that would fail the system § Analyses eventually goes to the card or part level § Quantification for components probability of failure still needs to occur

Singles Doubles Fore LGA DP-A, DP-B Aft LGA SP3T-A, SP3T-B MGA DP-A, SP3T-B HGA IOS-A, SP3T-B LGA Hybrid RDA-XEXCITR, SP3T-B CXS-1 DP-B, SP3T-A HYB-1 ISO-B, SP3T-A SP3T-A, TWTA-B DP-A, TWTA-B IOS-A, TWTA-B TWTA-A, TWTA-B RDA-XEXCITR, TWTA-B SP3T-B, TWTA-A DP-B, TWTA-A ISO-B, TWTA-A DP-A, ISO-B IOS-A, ISO-B DP-B, IOS-A LNA-A, LNA-B

17 November 2020 24 Reliability Modeling NASA Task Order NNN06AA01C Information On Uncertainties § Reliability is integrated with Physics of Failure (PoF) models and testing

FMECA PRA Probability Bottom-Up Analysis of Develop potential Of Success every element of the scenarios that lead to § Uncertainties are address (w/uncert) up front and reduced with system undesired end-states literature, heritage, and testing

FTA Top-down analysis showing how elements § Sources of uncertainties wi/PRA can fail § Lack of data for 50 year old systems § Tech maturity § Tech design PoF Risk Informed Testing Develop models of End- Analysis to show the § System configuration of-Life failure modes & impact of testing on PoF § Con Ops (flight & ground) interactions with parameters & performance metrics uncertainty reduction § Manufacturing processes § Requirements Test program recommendations

Feedback of Test Results

17 November 2020 25 Reliability Modeling NASA Task Order NNN06AA01C ∑ � !#" $ % § Testing regime changes the paradigm of testing � � = � � = ∑ �% § From a system collecting many hours in order to show confidence in the lifetime

§ To a system that provides confidence in the understanding of how the system can fail at end of life � � = � � � � � � ��

§ Analysis indicates what confidence the project has in meeting the life requirement in the form of a Probability Of Failure Prior probability or margin To Requirement

Lifetime Requirement Physics of Failure Modeling NASA Task Order NNN06AA01C

§ Leverage NASA’s & industry’s knowledge about root causes and physical behavior of microelectronic devices critical failure mechanisms § Develop/extend Physics of Failure (PoF) reliability modeling for items of interest § General interconnect and printed circuit board components § Issues with small feature size ICs § High Voltage power components § Radioisotope Power Systems § Sensors & detectors § Thrusters / Valves / Mechanisms § Seals and lubricants § Translation of long-lived components (i.e., Voyager) to project future reliability § A program will need to augment its testing, modeling, and simulations to shrink uncertainty with projections going out 50 years § University of Maryland CALCE also supporting formulation of integrated framework

17 November 2020 27 Mission Resilience NASA Task Order NNN06AA01C

§ Establish a robust set of mission success criteria § Map science requirements to systems, mission phase § Ensure functional overlap were where possible

§ Mission resilience is not just a matter of better or redundant hardware § Risk based and targeted Fault Management design § Robust spacecraft autonomy § Functional redundancy and operational crossovers § Use of Artificial Intelligence onboard and within the ground segment

17 November 2020 28 Moving Forward NASA Task Order NNN06AA01C

§ As we engage with the larger the community

§ Are there more data sources we should be using?

§ Space environment PoF of current and near term technology

§ Lessons from previous long-duration missions

§ What are we not asking that we should? § Asking NASA what is required for them to have confidence that a mission like this is possible

17 November 2020 29 NASA Task Order NNN06AA01C