<<

Modeling Safe Mode Events Travis Imken Thomas Randolph Jet Propulsion Laboratory Jet Propulsion Laboratory California Institute of Technology California Institute of Technology 4800 Oak Grove Dr. 4800 Oak Grove Dr. Pasadena, CA 91109 Pasadena, CA 91109 818-354-5608 818-354-4871 Travis.Imken@jpl..gov [email protected]

Michael DiNicola Austin Nicholas Jet Propulsion Laboratory Jet Propulsion Laboratory California Institute of Technology California Institute of Technology 4800 Oak Grove Dr. 4800 Oak Grove Dr. Pasadena, CA 91109 Pasadena, CA 91109 818-354-2558 818-354-5608 [email protected] [email protected]

Abstract — Spacecraft enter a ‘safe mode’ to protect the vehicle failsafe mode. These ‘safe modes’ are architected to when a potentially harmful anomaly occurs. This minimally- guarantee a vehicle can wait in a safe operating state until functioning state isolates faults, establishes contact with Earth, ground operators can recover and restore nominal operations. and orients the vehicle into a power positive attitude until The characteristics of this safe state are common across operators intervene. Though ‘safings’ are inherently missions: isolate faults to prevent further propagation, unpredictable, mission teams build in time margin during operations to determine root causes and restore functionality. establish and maintain communications with Earth, orient the Planning and managing this margin is both critical and enabling vehicle in a power-positive and thermally stable on mission architectures dependent on near-continuous configuration, and be able to maintain this state indefinitely. operability – such as a low-thrust electric propulsion mission. Though some in-flight safe mode entries are planned, such as a reboot to initialize a flight software update, the majority of To better quantify the occurrences and severity of safe mode ‘safings’ are unpredicted events or conditions that trigger a anomalies, the Jet Propulsion Laboratory (JPL) has assembled system-level fault protection response. a database of safings from past and active missions. Currently nearly 240 records are captured from 21 beyond-Earth This research considers the subset of anomalies that result in missions, stemming from a collaboration between teams at JPL, safe mode entries. Spacecraft fault management is designed Ames Research Center, Goddard Space Flight Center, and the Johns Hopkins University Applied Physics Laboratory. This to handle a spectrum of faults and anomalies seen across paper discusses the event database, explores a statistical software, hardware, and payloads. When an anomaly is approach in modeling the occurrences and severity of safing detected, fault protection will first work to isolate the issue at events, presents a simulation technique, and details the component level. If the fault persists, subsystems or recommendations and future work to benefit future concepts. instruments can be isolated and marked as ‘sick’. At this tier, the issue will be noted in the next communications pass and the vehicle will continue with nominal operations. However, TABLE OF CONTENTS further fault containment failures will trigger a system-level response, often leading to a safe mode entry. Safing events 1. INTRODUCTION AND MOTIVATION ...... 1 stand apart from other flight anomalies because of the 2. SAFE MODE EVENT DATABASE ...... 2 technical investigation and engineering and management 3. MODELING OCCURRENCES AND RECOVERIES ... 3 team effort involved in the recovery process. The significant 4. APPLICABILITY TO FUTURE MISSIONS ...... 7 time spent to recover from each safing culminates into a 5. RECOMMENDATIONS AND FORWARD WORK ...... 9 significant impact on overall spacecraft rates of operability. 6. CONCLUSIONS ...... 10 Each safing event requires a coordinated and methodical ACKNOWLEDGEMENTS ...... 11 response to recover the spacecraft. Spacecraft teams work to REFERENCES...... 11 diagnose and understand the issue, ensuring that the anomaly BIOGRAPHY ...... 11 is not persistent and the planned operations can be resumed. The cumulative impacts of safing events are realized when the vehicle is in flight. To manage this, margins for 1. INTRODUCTION AND MOTIVATION operational outages are prepared throughout the development phases when designing trajectories, planning critical events, Since the development of the Galileo mission1, spacecraft and developing science campaigns. To better quantify these designers have implemented an onboard, self-detecting

1 Galileo has the first recorded (and located) event on November 28, 1989. 978-1-5386-2014-4/18/$31.00 ©2018 IEEE 1

margins on future mission concepts, a statistical approach is 1. 196 of the records are from anomalous events throughout applied to safing data from past and present missions to a mission’s primary and extended phases. 12 of these records model event occurrences and recovery durations. are ‘cascading’ events, where the spacecraft re-entered safe mode before recovery from the previous event was complete. Motivation Of these 196, 151 of the events have sufficient details and Initial research into modeling safing events was funded by context to reconstruct the diagnosis and recovery timeline. the proposed Asteroid Robotic Redirect Mission (ARRM). Ancillary data is also recorded for each event including ARRM’s concept baselined a 40 KW solar electric mission phase, vehicle location, and anecdotal information propulsion (EP) system to reach asteroid 2008 EV5. [1] Low- from the recovery process. Root causes, as specified and thrust EP technologies are uniquely enabling compared to recorded by each mission team, are captured and binned into chemical propulsion systems because they allow the flight software, hardware, operations, space environments, or system to achieve large changes in spacecraft velocity on a unknown categories. Unknown events are anomalies where comparatively small propellant mass. JPL has flown EP root cause is undeterminable by the mission team. technologies on the Deep Space 1 (DS1) and missions, The event database has been collected through a and has baselined their use on the planned Psyche mission collaboration between JPL, NASA’s Goddard Space Flight and the Next Orbiter (NeMO) mission concept. [2] [3] Center, NASA’s Ames Research Center, and the Johns EP engines produce low thrust levels, requiring near- Hopkins University Applied Physics Lab – please see the continuous periods of operability for maneuvers. Trajectories Acknowledgements section. Research is ongoing to capture may require weeks or months of thrusting at a time, additional details and to locate records for missions not yet increasing the likelihood that a safing anomaly will interrupt included in the database. the planned maneuver. For example, Dawn arrived at Ceres Table 1. The safe mode database captures events from 26 days late from a four day safing event and thrust outage. 168 years of cumulative flight time on missions [4] Similar to DS1 and Dawn, ARRM trajectory designers throughout the solar system. performed ‘missed thrust analyses’ to build robustness into maneuvers. This robustness is essential due to the non- Mission Launch Destination linearity of low thrust trajectories. These analyses model the impact of anomalies by injecting multi-day thrust outages Galileo 1989 Jupiter into the planned trajectory and recalculating new solutions. 1996 Mars While the missed thrust analyses give confidence a vehicle Cassini 1997 will reach the final destination, the flight system may realize lower system-level margins, such as requiring more Deep Space 1 1998 9969 Braille propellant or increasing the time-of-flight. [5] [6] Mars Climate Orbiter2 1998 Mars 2 EP missions are not the only architectures sensitive to safing Mars Polar Lander 1999 Mars events. Time-constrained missions, such as Europa Clipper, Stardust 1999 81P/Wild could be impacted by missed maneuver opportunities and Genesis 2001 Earth L1 brief science windows. Clipper will complete up to 45 flybys Mars Odyssey 2001 Mars of Europa on a 14.2 day orbit cadence. The mission uses a chemical propulsion system to adjust the trajectory as the Spitzer (Telescope) 2003 Earth Trailing spacecraft travels within 25 km of the icy moon’s surface. A 2005 Tempel 1 safe mode entry could cause missed maneuvers, introducing Mars Reconnaissance Obiter 2005 Mars a non-zero probability that Clipper could impact Europa, require a redesign of its trajectory, or miss science 2006 observations over a region of the icy moon that may not be Dawn 2007 Vesta, Ceres revisited within the mission’s lifetime. [7] Phoenix2 2007 Mars Missed thrust analyses on DS1 and Dawn were done using Kepler (Telescope) 2009 Earth Trailing engineering best estimates for periods of inoperability. [6] To Lunar Reconnaissance Orbiter 2009 Moon improve on this, ARRM sought to leverage nearly thirty years Juno 2011 Jupiter of beyond-Earth flight data. However, no comprehensive 2 database, literature, or analysis of events was located. Thus Mars Science Laboratory 2011 Mars began the task to create and explore this database. Mars Atmosphere and 2013 Mars Volatile Evolution OSIRIS-REx 2016 Bennu 2. SAFE MODE EVENT DATABASE The safe mode event database captures nearly 240 safe mode entries from 21 beyond-Earth missions, summarized in Table 2 Cruise phase of mission only 2

1. Safe 5. Nominal 6. Science mode 2. Safing realized 3. Recovery 4. Recovery operations (and operations event at comm contact start complete thrusting) resume resume

Discovery Command: Command: Command: Investigation, analysis, Human delay (pass Safe mode Restore nominal Restore science and corrective action factors cadence) exit operations sequences

Time Recovery duration

Inoperability period Figure 1. The safing event timeline is the framework for capturing anomaly details. The black lines are milestones and the colored blocks are periods of action. Blue periods are project specific and are quantifiable, testable, and can be levied through requirements on future missions. Green periods are statistical, derived from this research. The orange period encompasses anecdotal factors, such as days off, risk posture, recovery urgency, comm pass schedules, etc. Event Recovery Timeline Time between events is the elapsed time between two safe mode event occurrences. Since the flight durations of the The severity of a safing event is realized as an operational missions in the database range from months (Mars Science impact, requiring time to diagnose and recover the spacecraft Laboratory, etc.) to nearly two decades (Cassini), elapsed before resuming the nominal mission sequences. To capture time provides a more consistent and applicable metric than these details, a standardized timeline structure provides events per year. homogeneity across different missions. Figure 1 identifies the six milestones recorded for each event in the database: Recovery duration – highlighted in the green box below the timeline – is the elapsed time it takes the mission team to 1. Safe mode event – Safe mode entry asserted onboard diagnose and complete initial recovery. This period 2. Safing realized – Ground operators receive the first encompasses both subjective and objective recovery periods. indication of safe mode entry during a scheduled When sharing their mission’s safing data, many teams communications pass indicate that there was “no rush” or that recovery “could have been faster if needed”. Only a handful of these anecdotal 3. Recovery start – Start of command execution to restore notes are quantitatively recorded in the database research. As functionality as part of the planned safe mode recovery a result, the modeled recovery duration may overestimate the procedures (subjective milestone, defined by each recovery time if a future team needs to recover quickly. mission) The inoperability period – also highlighted below the 4. Recovery complete – Spacecraft restored to a nominally- timeline – captures the overall time impact of a safing event. operating state, releasing all safe mode constraints The spacecraft is defined as inoperable when not executing (subjective milestone, defined by each mission) the planned mission sequences or is inhibited from thrusting. Outside of this period, the spacecraft is considered operable 5. Nominal operations resume – Restart nominal mission and maneuverable. Summing all of the inoperability periods sequences and begin thrusting (if applicable) over a mission’s full duration gives the net inoperability rate, so the mission’s operability rate is therefore 1 minus 6. Science operations resume – Science sequences resume the inoperability rate. after restoring instrument functionality (if applicable)

The timeline is agnostic to each spacecraft’s location and 3. MODELING OCCURRENCES AND RECOVERIES does not immediately reflect the impact of round trip light time (RTLT) on the timeline periods. RTLT can be on the Initial data exploration and model development techniques order of hours, and accumulates as a mission executes have been applied to the event database to characterize top- required round-trip command sessions through the recovery level behaviors of the data and identify candidate models for process. Post processing can estimate the impact of RTLT further investigation. After fitting and validating these using the mission’s ephemeris from the day of the event. models, the selected model becomes part of a tool that enables portability of the mission-specific data to future Definitions architectures. Both time between events and recovery durations are investigated in this effort. Different statistical The timeline in Figure 1 is used to define three important distributions are fit to the empirical data to look for trends periods used in the modeling and simulation of safing events:

3

and represent the raw data in meaningful ways. The initial Developing the Datasets modeling is based on five foundational assumptions: Datasets for analysis and modeling are assembled directly  All missions are equal and identically distributed – from the safing event database with no manipulation unless Class, cost, launch year, destination, complexity, explicitly noted. Following the assumption that all missions instruments, propulsion type, and other factors are not and events are equal and from the same sample population, evaluated. all valid data points from all missions are combined into single, large datasets. Though the motivation of the research  All events are in the same population – Data is lumped comes from missed thrust analyses, the initial inspection into common datasets, meaning different numbers of shows no significant observed difference in behavior of events between missions are not considered. All events safing events on the two EP missions (Dawn and DS1) versus follow the same natural process. the other missions in the database. Histograms of the time between events and recovery duration datasets are in Figure  All events are random and independent – Variations in 2, overlaid with best-fits (discussed in the next subsection). elapsed time-of-flight variations are not considered (events that might be more common early in the mission) The time between events dataset is first organized into and past events have no influence on future events. mission-specific subsets by calculating the elapsed time between unplanned safing events within a mission’s flight  All events are equal – Influence of root cause, system duration. This methodology does not consider the impact of complexity, and other factors are not considered in other factors, such as root cause, occurrences of planned safe recovery durations. mode entries, or if the mission transitions between phases. This process treats cascading events (discussed previously)  All recoveries are perfect - Upon recovery, the as a single data point since nearly all of the cascading events spacecraft has the same probability of safing again as it happen within a day of the first event. Elapsed time from did prior to the event. launch to the first event is included as a valid elapsed time. Once each mission’s time between events subset is These assumptions provide a starting point for the initial calculated, records from all missions are combined to investigation and model development. Lumping the time assemble the full time between evets dataset. between events and recovery duration data points into independent and identically distributed populations creates The recovery duration dataset is assembled directly from the large datasets that can be assessed for statistical significance. recovery timelines associated with each safing event. Similar However, the appropriateness of these assumptions needs to to time between events, this process ignores the impact of be verified and potentially confounding variables will require other factors and treats cascading events as a one duration. further analysis. The Recommendations and Forward Work Some specific recovery duration data points are excluded; section discusses future efforts to test these assumptions and One multi-month safing event and all Galileo recovery understand their influence on the results. duration data is not evaluated per specific recommendations from team members involved on those projects.

Time Between Events Recovery Duration 90 40

80 All missions dataset 35 All missions dataset Weibull distribution fit Weibull distribution fit 70 30 60 25 50 20 40 15 30 20 10 10 5 0 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 2 4 6 8 10 12 Elapsed time between safing events - Years Recovery duration - Days

Figure 2. Histograms of both datasets are shown with fitted Weibull distributions. The longest time between events record is about 4.3 years (left), while no included event had a recovery duration longer than 10 days (right). 4

Normal Log-normal 5 5

4 4

3 3

2 2

1 1

0 0 -3 -2 -1 0 1 2 3 0 5 10 15 20 Normal distribution theoretical quantiles Log-normal distribution theoretical quantiles Exponential Weibull 5 5

4 4

3 3

2 2

1 1

0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Exponential distribution theoretical quantiles Weibull distribution theoretical quantiles Figure 3. Quantile-quantile plots visually compare the time between events dataset (y-axes) against the theoretical quantiles of four distributions (x-axes). Closeness-of-fit between the empirical data (blue dots) and the theoretical distribution (straight red line) is inspected. The same process is also done for the recovery duration dataset. Data Modeling and Distribution Fitting The Weibull is commonly used in reliability and failure engineering and is fully defined by the scale and shape To model the datasets in Figure 2, several different statistical parameters. The scale defines the magnitude of the horizontal distributions were applied and compared against the axis value, while the shape gives insight into the behavior of empirical data. Using the histograms as a starting point, both the sample population. The similarities between the datasets were compared to the normal, log-normal, exponential and Weibull seen in Figure 3 are not surprising; exponential, and Weibull distributions. This evaluation was When the Weibull shape parameter is 1, it mathematically done using the quantile-quantile (q-q) plots shown in Figure reduces to the exponential distribution. This extra parameter 3. The q-q plot is a technique where two distributions are gives the Weibull an additional degree of freedom to better compared graphically to investigate whether they come from fit the data. When the Weibull shape is 1 (exponential), it similar populations. If two distributions come from the same indicates that the event rate does not change as a function of population, then the two datasets should closely overlap. time. A shape less than 1 indicates a decreasing event rate Both the time between events and recovery duration datasets over time, while a shape greater than 1 highlights an (blue dots) were compared to theoretical distributions increasing rate. (straight red lines), as illustrated in the four sub plots of Figure 3. Both the normal and lognormal distributions fit The histograms in Figure 2 are shown as cumulative parts of the data, but quickly diverge. The exponential and distribution functions (CDFs) in Figure 4. The empirical data Weibull distributions have excellent closeness-of-fit for the (which falls in the same x-axis location as the histograms) is full datasets, with the Weibull ultimately providing a slightly shown in various colors to represent the contributions each better representation. mission anonymously. Since CDFs show cumulative probability, the empirical data is distributed evenly on the y- The equation of the Weibull probability distribution function axis and is overlaid with the integral of the Weibull fits. The (as plotted in Figure 2) is given in Equation 1, where k is the time between events CDF shows there is a 90% probability shape parameter and λ is the scale parameter. the next safing will occur within ~1.6 years of the previous event. Similarly, there is a 90% probability that the recovery 푒 푥 ≥0 duration of any event will be ~5.1 days or fewer. ( ) 푓 푥; 푘, 휆 = (1) 0 푥 <0 5

Elapsed Time Between Safing Events 100% Weibull Fit 90% Mission 1 (6) Mission 2 (3) 80% Mission 3 (9) Mission 4 (16) Mission 5 (26) 70% Mission 6 (3) Mission 7 (4) 60% Mission 8 (12) Mission 9 (10) 50% Mission 10 (16) Mission 11 (4) Mission 12 (21) 40% Mission 13 (1) Mission 14 (1) 30% Mission 15 (10) Mission 16 (2) Mission 17 (12) 20% Mission 18 (1) Mission 19 (1) 10% Mission 20 (8) Mission 21 (19) 0% 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Elapsed time between safing events - Years Elapsed Time to Recover from Safe Mode 100%

90%

80%

70% Weibull Fit Mission 1 (6) 60% Mission 4 (16) Mission 6 (3) Mission 7 (4) 50% Mission 8 (11) Mission 9 (10) 40% Mission 10 (16) Mission 11 (4) 30% Mission 12 (20) Mission 13 (1) Mission 17 (11) 20% Mission 18 (1) Mission 19 (1) 10% Mission 20 (6) Mission 21 (10) 0% 0 1 2 3 4 5 6 7 8 9 10 11 12 Recovery Time: Elapsed time since safing realized on ground to safe mode exit - Days Figure 4. The Weibull fits for the time between events CDF (top) and the recovery duration CDF (bottom) closely match the empirical data. The colors anonymously highlight the contribution of each mission to both datasets, with totals shown in parentheses. 5 missions have no locatable recovery duration data. 6

The Weibull parameters for the time between events and 4. APPLICABILITY TO FUTURE MISSIONS recovery duration datasets are found using MATLAB’s Weibull maximum likelihood estimator (wblfit) and are Future missions can combine Weibull parameters with shown in Table 2. mission specific details to begin to quantify the cumulative impact of safing events on their architectures. This process Table 2. Weibull parameters fit to the empirical data addresses architecture enabling assumptions during mission formulation, such as depending on a minimum operability Distribution Scale Shape rate or planning for finite time at a destination. To illustrate the potential use in mission design, a simulation methodology Time between events (years) 0.62 0.87 helps to quantify the cumulative impact of safing events over Recovery duration (days) 2.46 1.14 a mission’s life. When driven by a Monte Carlo front-end, the

outputs of each simulation are used to generate a probability Though the Weibull was selected due to the closeness-of-fit distribution of the Minimum Operability Rate (MOR). to the empirical data, the calculated parameters provide initial Analyzing the MOR distribution allows mission designers to insight. Both distributions have a shape relatively close to 1, compare risk to performance, illustrating likelihood of highlighting a near constant occurrence rate and confirming achieving an overall operability rate or better over the the similarities to the exponential distribution seen in Figure mission’s full duration. Findings can also be valuable to 3. The time between events shape of 0.87 indicates a slight probabilistic risk assessments of spacecraft reliability and decrease in the day-to-day likelihood of safing again. This performance. could be explained by team operations and procedures becoming more routine and robust over time. The recovery A flowchart of the simulation methodology is shown in duration shape of 1.14 highlights an increasing recovery rate Figure 5. The simulation makes use of a Weibull-based each day after event entry. This may be due to initial pseudo-random number generator (wblrnd in MATLAB) difficulty in securing communications pass time immediately based on the distribution parameters found in the analysis. after discovery, and could represent the process of mission Both the first and subsequent safing occurrences are teams gathering data and working to better understand the determined using the time between events Weibull fit. When anomaly over time. an event occurs, the inoperability period is found by summing the discovery delay, recovery duration from the Weibull fit, The most significant impact of the calculated Weibull fits is and the time to restore nominal operations. The discovery the portability of the data for future use. With only the shape delay block accounts for pass length, where an anomaly may and scale parameters, a pseudo-random number generator can be realized near-instantly if it occurs during a scheduled pass. be used to return Weibull-distributed times between safing After totaling the inoperability period, the simulation then events and recovery times. This allows the safing data to be predicts the next safing occurrence and checks if the next leveraged in simulations and mission design independent of entry is considered a cascading event. When a cascading the raw datasets. While the Weibull parameters will change entry occurs, the simulation adds the previous, interrupted as more missions are included, deviations from the recovery duration to a re-simulated inoperability period for parameters presented here should be relatively minor due to the cascading safing entry. Multiple cascades are possible. In the comparatively large sample size of both datasets. the simulation, the cascading event occurrences are recorded as single events and single, potentially longer recovery

n-simulation Monte Carlo Yes Simulation inputs: Inoperability Period Cascading Communications cadence Discovery Recovery Restore Next safing event: Safe: Next event in Time to restore operations delay (uniform duration nominal Time between inoperability Baseline mission duration distribution) Weibull operations events Weibull period? Weibull parameters Yes Time-of-flight function: No First safing event: Increase mission Event within Trajectory missed thrust Time between duration, as mission duration? Missed/repeat science events Weibull applicable Resource margin impact No Simulation outputs: Analysis of n-simulations: Safing event count Distribution of minimum Each inoperability period operability rates and other Final mission duration performance metrics Time-of-flight impacts Figure 5. The simulation process enables performance modeling of a candidate mission architecture. This methodology combines the Weibull fits with mission-specific details to investigate minimum operability rates.

7

durations in the outputs. Metrics and impacts of cascades are possible during the mission duration, while 6 – 11 events are noted separately in the outputs. most probable. The right histogram shows the range of simulated inoperable periods from each safing entry. No Following a single event’s inoperability period, the durations are less than 12 hours, with the peak around 3.5 simulation enters a time-of-flight function. This interface days; this is driven by the simulation inputs for discovery allows external tools, such as trajectory optimization delay and command time to restore nominal operations. The software, to quantify impacts to the planned mission duration. right tail is driven by the recovery duration Weibull fit, For low-thrust mission architectures, the cumulative decaying to an inoperable period of 14 days (99.7th additional time-of-flight from the safing events may be percentile). significant enough that additional events could occur. The simulation will continue until the next safing event occurs at Over the Monte Carlo runs, the cumulative inoperability a point in time beyond each simulation run’s full mission period of the simulations ranged from 0 to 88 days (99.7th duration. percentile). The overall operability rate distribution is shown as a survivor function in Figure 7. For this simulation, the Simulating a Candidate Mission distribution shows all simulation runs converged above a MOR of 94%, descending to an expected near-zero likelihood A mission with a continuous multi-year thrusting phase is th simulated using a Monte Carlo model in MATLAB. The of achieving a 100% operability rate. The 99.7 percentile, inputs used in the simulations are shown in Table 3. Since no marked by the green star, maps to a 96.1% MOR. This is equivalent to at least 350.8 days of operability or no more specific trajectory is under consideration, a 4 hour pass every th three days and a 1:1 time-of-flight increase are baselined. than 14.2 days of inoperability per year. The 95 percentile lies at a 97.1% MOR, about 10.6 days of inoperability per Table 3. Simulation parameters for the sample mission year.

Metric Value While the operability rates presented here will change as investigations apply mission-specific constraints, these Baseline mission duration 6 years simulations begin to bound the expected operability rates of Discovery delay / Pass cadence Pass every 3 days beyond-Earth spacecraft. Coupled with the Weibull fits, this simulation methodology provides valuable insight into Pass length 4 hours plausible vehicle operability rates early in the formulation Time to restore nominal operations 12 hours process. A design team is able to target a desired operability Time-of-flight function increase, rate, quantify the likelihood of achieving that rate, and 1:1 relative to inoperability period compare the results against project risk posture and performance margins. One million simulations were performed in the model, totaling around 9 million safing events shown in Figure 6. With the 1:1 time-of-flight function, time-of-flights increased up to 6.24 years (99.7th percentile). The left histogram indicates that between 0 and 20 events (99.7th percentile) are

104 Total Safing Count Distribution 105 Inoperability Period Distribution 12 12 1 million ~9 million safings 10 simulations 10 in the Monte Carlo

8 8

6 6

4 4

2 2

0 0 0 4 8 12 16 20 24 0 3 6 9 12 15 18 Total events per simulation - counts Inoperable duration of each event - days Figure 6. The distribution of total mission safings (left) peaks at 8 to 9 events, while the tail of the distribution of the total inoperable period of each safing (right) has recovery durations that extend beyond 30 days due to cascading events. 8

Figure 7. The shape of the distribution is a convolution of the simulation inputs with the Weibull distributions. 94.2% is the lowest observed MOR, and 0.08% of the simulations had a 100% operability rate. Sensitivity Investigation multiple percent increases in MOR. More generally, the sensitivity analysis shows a driving relationship between To investigate sensitivities in MOR, the Monte Carlo model MOR and baseline mission duration. Shorter missions have is used to look at mission simulations with durations between comparatively lower MORs when evaluated at these high one and ten years and with a discovery delay ranging from percentiles. While shorter missions will have a higher one pass per day to one pass per week. The time to restore percentage of simulations with no safing events, the 99.7th nominal operations is kept constant at 12 hours per event, the and 95th percentile MORs of shorter missions are dominated pass length is 4 hours, and the time-of-flight function remains by simulation runs with many safing events. In these at a 1:1 duration increase. The results are shown in Figure 8. situations, the simulation duration is sufficiently short such The sensitivity investigation shows that MORs are plausible that long periods between events will fall outside the from 89% to 97% (99.7th percentile) depending on mission simulation’s mission duration and are unable to ‘boost’ the duration and communications cadence. Decreasing the MOR percentiles. Finally, increasing the time-of-flight discovery delay with more frequent passes results in notable, function beyond a 1:1 impact results in slightly higher MORs, since the baseline mission duration is effectively increased. However, there may be other impacts to vehicle margin and resources that would be quantified through other tools.

5. RECOMMENDATIONS AND FORWARD WORK The analysis, modeling, and sensitivity results lead to initial recommendations in managing inoperability period margin on future mission architectures. Previous publications on managing margins for low-thrust missions have highlighted the sensitivities between cost, propellant mass, flight time, and launch vehicle selection. [5] However, assumptions for missed thrust periods have traditionally been modeled as overall system performance rates (such as 95% duty cycle on Dawn) because the tools and data for deriving quantitative insights were not yet available. [6] With the foundation of the safing event database, future mission teams can use the Weibull fits and simulation methodology to develop confidence in minimum expected operability rates of their Figure 8. MOR is sensitive to baseline mission duration, own architectures. with the shortest missions expecting a MOR above 89%. 9

Overall, this body of work provides key takeaways that can probabilistic distributions within the datasets, such as begin to bound the behavior and impact of safing events on comparing trends between primary and extended missions, future mission architectures: cruise and orbit, etc. Exploring variations in ancillary factors such as destination, cost, time-of-flight, and root cause  Safing events are plausible on missions of all types, provides a first look into trends and differences. Beyond classes, destinations, and architectures with no observed binning, statistical techniques will be applied to measure the difference between electric and impulsive propulsion influence and relationships between missions in each dataset. missions. Confidence intervals can be applied to the Weibull distributions, and mixed Weibull distributions could improve  The mean of the time between events Weibull fit is about the closeness-of-fit to the outliers seen in the q-q plots. three events every two years. No included mission has Hypothesis tests such as the Kolmogorov-Smirnov, t-test, or recorded a recovery duration longer than 10 days, with chi-square test can look for statistical connections within the 90% of the records under ~5.1 days. datasets. These questions are the beginning of understanding the ‘weight’ of each mission on the resulting distribution fits.  Exploring MOR distributions through simulation While binning creates an absolute sorting, weighting establishes a tailorable methodology to understand a techniques could be applied for more tailored simulations. vehicle’s performance based on the Weibull fits. For example, the NeMO mission concept could consider Mars orbiter data more significantly than others.  MOR is sensitive to communications pass cadences on missions of all lengths, allowing performance margin to The simulation methodology will evolve with database and be bought when it is needed dataset improvements. Integration of trajectory software, such as MALTO, through the time-of-flight function is a Further Investigations major focus. While vectors of time between events and The analysis and findings presented here are the first recovery duration values can be generated and exported to exploration of the records captured in the safing event existing tools, closed-loop interfacing with trajectory database. Both the Weibull modeling and simulation optimization tools can fully capture the impacts of discovery techniques will continue to undergo a validation process to delays, round trip light time, and cascading safes. While the ensure that the outputs are appropriate, accurate, and best inclusion of trajectory optimizers will slow down the speed represent the collected mission data. of the Monte Carlo, the simulations are easily parallelized for high-performance computing environments. Processes to improve the interpretation and inclusion of all of the event records are ongoing. Recovery durations may Finally, the event database will continue to grow as new decrease slightly when the full impact of round trip light time missions are added and missing records are located. While is measured and factored in. However, insufficient data has the dates of safing event entries are relatively straightforward been located on the number of round-trip command sessions to locate through records and interviews, only half of the used on past missions. Techniques are being developed to missions have recorded recovery durations for all of their represent missions with no safing events (such as GRAIL) safing events. Beyond this, five of the missions have no and to include events that were explicitly excluded per recovery duration for any events, and nearly every safing recommendations from those mission teams. The impact of record is missing various details related to the safing event statistical censoring on the datasets has not been fully timeline. While different mission practices, lost records, and evaluated, which will factor in the time between a completed retired personnel mean that some of this data will never be mission’s last event and end-of-mission, or represent current locatable, the authors welcome assistance in verifying and operable periods on ongoing missions that may or may not growing the database. have a future event.

Another primary focus will revisit the initial assumptions to 6. CONCLUSIONS test the value of developing methodologies to improve the Safe mode events are the predominant contributor to periods representations of the raw dataset. While the assumptions that of spacecraft inoperability. Safings consume mission all data is identically distributed and independent are resources to diagnose, recover, and prevent further vehicle simplifications of each mission, all missions studied here faults. Exciting mission architectures, such as the NeMO were designed to survive and operate as beyond-Earth concept, the planned Psyche mission, and Europa Clipper are spacecraft. Though their top-level objectives are diverse, sensitive to the cumulative impact of safing events on the these vehicles share many common architectural elements. spacecraft’s planned minimum operability rate. To better Thus, developing and validating new conclusions may not understand and quantify this risk, a comprehensive database reveal significantly deeper insight than conclusions derived of spacecraft safe mode events from past and present under the initial assumptions. missions has been collected, stemming from a collaboration between JPL, NASA’s Ames Research Center, NASA’s Analysis of the datasets are expanding beyond the lumped Goddard Flight Center, and the Johns Hopkins University Weibull fits. Binning methods are used to create separate Applied Physics Laboratory. 10

Initial exploration and analysis of the events in the safing Johns Hopkins University Applied Physics Laboratory: database highlights novel research. A standardized timeline New Horizons – Chris Hersman. NASA Ames Research is introduced to provide homogeneity in collecting detailed Center: Kepler – Charlie Sobeck, Marcie Smith. NASA records between missions. Several initial assumptions are Goddard Space Flight Center: OSIRIS-REx and LRO – applied, allowing data from all missions to be combined in Dave Everett; MAVEN – John Nagy, Martin Houghton, Rich two large datasets for elapsed time between events and Burns. recovery durations. Various statistical fits are tested against the empirical data, with Weibull distributions providing the This work was carried out at the Jet Propulsion Laboratory, best closeness-of fit. The shape and scale parameters of the California Institute of Technology, under a contract with the two Weibull distributions are discussed and presented to National Aeronautics and Space Administration. enable portability of the datasets beyond the raw data.

These parameters and other mission-specific details are REFERENCES applied in simulations driven by a Monte Carlo model. A [1] NASA’s Asteroid Redirect Mission Concept Overview, candidate six year mission with realistic operations variables https://hou.usra.edu/meetings/lpsc2016/eposter/2217.pdf shows a minimum operability rate (MOR) of 96.1% is th plausible at the 99.7 percentile. A brief sensitivity [2] NASA Selects Five Mars Orbiter Concept Studies, July investigation highlights the range of plausible MORs by 2016. https://www.nasa.gov/press-release/nasa-selects- varying the baseline mission duration and pass cadences. The five-mars-orbiter-concept-studies investigation shows MORs up to 97% (99.7th percentile) are achievable through realistic optimizations of the spacecraft [3] Psyche, Mission to a Metal World, design and operations plan. In future applications, trajectory https://jpl.nasa.gov/missions/psyche/ optimization and other tools can be interfaced through simulation’s time-of-flight function to develop high-fidelity, [4] D. Grebow, G Whiffen, D. Han, and B. Kennedy, Dawn mission-tailored MOR distributions. Safing Approach to Ceres Redesign, AIAA/AAS Astrodynamics Specialist Conference, Sept. 2016. Some exploratory work remains to improve interpretation of records in the database. Future investigations will improve [5] D. Landau, T. Kowalkowski, J. Sims, T. Randolph, P. the modeling and simulation by revisiting the initial Timmerman, J. Chase, and D. Oh, Electric Propulsion assumptions and applying different statistical techniques to System Selection Process for Interplanetary Missions, validate the work presented here. This includes testing AIAA/AAU Astrodynamics Specialist Conference, Aug. hypotheses concerning event and recovery behaviors, 2008. uncovering potential sub-groupings in the event database, and exploring alternative modeling approaches. [6] M. Rayman, T. Fraschetti, C. Raymond, and C. Russell, Coupling of system resource margins through the use of Finally, this paper is the first of many investigations resulting electric propulsion: Implications in preparing for the from the records in the safing event database. Improved Dawn mission to Ceres and Vesta, Acta Astronautica, Jan. understanding of the occurrence and severity of safing events 2007. will be valuable to future missions, especially those enabled by electric propulsion architectures or time-constrained [7] T. Bayer, B. Buffington, J.F. Castet, M. Jackson, G. Lee, operations. Recommendations are presented to guide future K. Lewis, J. Kastner, K. Schimmels, and K. Kirby, mission architects, trajectory designers, and spacecraft Europa Mission Update: Beyond Payload Selection, 2017 operators as they model and predict the performance of their IEEE Aerospace Conference, March 2017. systems.

BIOGRAPHY ACKNOWLEDGEMENTS Travis Imken received a M.S. in The authors thank Kelli McCoy and Steve Franklin for Aerospace Engineering from the starting the safing event database. This research was made University of Texas at Austin in 2014. He possible due to the collaboration and assistance from the is in the Project Systems Engineering and following individuals. At the Jet Propulsion Laboratory: Formulation Section at JPL, supporting Dawn – Paul Fieseler, Marc Rayman, Dan Grebow, Bruce the project systems team on the 2018 Waggoner; MRO – Dan Johnston; Cassini – Laura Burke, InSight Mars Lander and serving as the Julie Webster; Mars Odyssey – Bruce McLaughlin, David project systems engineer on the Lehman; Juno – Tracy Drain; Spitzer – Eric Rice, Joseph RainCube “Ka-band radar in a CubeSat” Hunt; – Grant Faris; Deep Space 1 – Tracy Neilson; mission. Prior to this, Travis was involved with the ARRM Genesis and Stardust – Ed Hirst; MSL – David Oh; Galileo – mission and the Lunar Flashlight and NEA Scout deep space Robert Gounley. Records were also located through JPL’s CubeSats. He is also involved with JPL’s Innovation incident database, enabled by Ian Colwell’s analytics tool. 11

Foundry, serving as a systems engineer on Team X/Xc as well as a small satellite expert with the A Team.

Thomas Randolph received his B.S. in Aerospace Engineering from the University of Southern California and his M.S. in Mechanical and Aerospace Engineering from Princeton University while doing research in ion and magnetoplasmadynamic thrusters. After leaving school in 1992, Tom worked at Space Systems Loral where he was initially the Cognizant Engineer for the successful western qualification of the Russian SPT-100 Hall thruster and later the Electric Propulsion Product Manager for the first commercial flights of a Hall thruster on MBSAT. After starting work at JPL in 2003, Tom has been the Product Delivery Manager for the ST7 colloid microthruster system, the Project System Engineer on the Low Density Supersonic Decelerator Technology Demonstration Mission, the Technical Group Supervisor of the Project Systems Engineering Group, the Project Systems Engineer on the Asteroid Redirect Robotic Mission, and is currently the Project Manager for ASPIRE, Advanced Supersonic Parachute Inflation Research Experiments.

Michael DiNicola is a senior engineer at the Jet Propulsion Laboratory in the Systems Modeling, Analysis & Architectures Group. In his ten years at JPL he has been part of several proposals, risk analyses and model development efforts. He currently works on the Europa Clipper Project developing probabilistic models to assess Planetary Protection and science requirements. He also provides statistical analysis as part of the NASA Instrument Cost Model Team. Michael attained his B.S. in Mathematics from UCLA and M.A. in Mathematics from UCSD.

Austin Nicholas received a B.S. degree in Aerospace Engineering from University of Illinois at Urbana- Champaign in 2011 and an M.S. in Aeronautics and Astronautics from the Massachusetts Institute of Technology in 2013. He currently works for JPL in the Project Systems Engineering & Formulation Section. His primary assignment is in the Mars Program Formulation Office developing architectures and vehicle concepts for Mars Sample Return. Recent work has included solar electric propulsion spacecraft with co-optimized flight systems and trajectory, developed for the Next Mars Orbiter concept. At MIT, he worked on attitude and cluster control for a formation-flight Cubesat mission using electrospray propulsion and architecture exploration for low-cost human missions to the lunar surface.

12