<<

Editor: Michiel van Genuchten Open Digital Dentistry Impact [email protected]

Editor: Les Hatton Kingston University [email protected]

Landing a Spacecraft on

Gerard J. Holzmann

How much software does it take to land a spacecraft safely on Mars, and how do you make all that code reliable? In this column, Gerard Holzmann describes the software development process that was followed. —Michiel van Genuchten and Les Hatton

Thousands of people worked on de- About 75 percent of the code is auto-gen- sign, construction, and testing of the hard- erated from other formalisms, such as state- ware for NASA’s latest mission to Mars. machine descriptions and XML files. The re- This hardware includes not just the rover it- mainder was handwritten specifically for this self with its science instruments, but also the mission, in many cases building on heritage cruise stage, which guided the rover code from earlier Mars missions. to Mars, and the descent stage, with the intri- The Curiosity Rover is the seventh space- cate sky-crane mechanism that gently lowered craft that NASA has successfully landed on the rover to the surface on 5 August 2012. Mars. Previous spacecraft include All this painstakingly developed, tested, and retested hardware is controlled by soft- • two Viking landers in 1976, ware. This software was written by a rela- • the Pathfinder minirover in 1996, tively small team of about 35 developers at • the two Mars Exploration Rovers Op- NASA’s Jet Propulsion Laboratory (JPL). portunity and in 2004, and Obviously, the control software is critically • the Phoenix Mars lander, which was important to the mission’s success, with launched in 2007 but reused the design of any failure potentially leading to the loss of the failed Mars Surveyor lander from 2001. the spacecraft—as well as to headline news around the world. Each new mission is more complex and The control software onboard the space- uses more control software than its predeces- craft consists of about 3 MLOC . Most of sor. But that’s putting it mildly. As in many this code is written in C, with a small por- other industries, code size is growing expo- tion (mostly for surface navigation) in C++. nentially fast: each new mission to Mars uses The code executes on a radiation hardened more control software than all missions be- CPU. The CPU is a version of an IBM Pow- fore it combined: erPC 750, called RAD750, which is designed for use in space. It has 4 Gbytes of flash • the Viking landers had about 5 KLOC memory, 128 Mbytes of RAM, and runs at a onboard, clock-speed of 133 MHz. • Pathfinder had 150 KLOC,

0740-7459/13/$31.00 © 2013 IEEE March/April 2013 | IEEE Software 17 Impact

fully managed process. The three Compound Annual Growth Rate control points in this process are pre- in Mars Missions’ Code vention, detection, and containment. The Compound Annual Growth Rate, as described in previous columns,1 in flight Prevention software for spacecraft over the last 36 years comes out at roughly 1.20—close to The best way to make software reliable the median value of 1.16. The Mars Science Laboratory software is comparable to is to prevent the introduction of defects other safety-critical systems that were described in this column, such as the Tokyo from the start. We tried to do this in a Railway system,2 Honeywell’s Flight Management System,3 and Airbus.4 What sets number of ways. this system apart is that it must operate reliably at a distance of millions of miles First, we adopted a strong new cod- from Earth, making it inaccessible to standard types of maintenance and repair. ing standard for the mission (which was later also adopted as a common References standard for all software development 1. M. Genuchten and L. Hatton, “Compound Average Growth Rate for Software,” IEEE Software, vol. 29, at JPL1). Although most software devel- no. 4, 2011, pp. 19–21. opment projects use coding standards, 2. D. Avery, “The Evolution of Flight Management Systems,” IEEE Software, vol. 28, no. 1, 2011, pp. 11–13. 3. S. Burger, O. Hummel, and M. Heinisch, “Airbus Cabin Software,” IEEE Software, vol. 30, no. 1, 2013, we followed a somewhat different ap- pp. 21–25. proach in the definition of this one. To 4. K. Tomita and K. Ito, “Software in an Evolving Train Traffic Control System,”IEEE Software, vol. 28, no. define the rules in this standard, we 2, 2011, pp. 19–21. first looked at everything that had gone wrong in previous space missions. We categorized the problems that could Code size—exponential growth trend be attributed to software and devised 10,000 a small set of rules that could prevent MSL these classes of errors. Next, we fo- cused on those rules for which we could 1,000 MER mechanically check compliance—for example, with a static analyzer. Our Pathƒnder Phoenix coding standard captured those rules, 100 and only those rules. Therefore, the rules in this coding standard cannot be silently ignored (as is often done with 10 other standards). We mechanically

Lines of code in thousands (KLOC) Viking checked compliance with all the rules on every build of the software. Any de- 1 viations were reported and became part 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 Year of the input to the downstream code re- view process. Second, we introduced a flight soft- Figure A. The amount of flight code that is flown to land spacecraft on Mars has ware developer certification course, grown exponentially in the last 36 years. Its Compound Annual Growth Rate comes focused in part on software risk and out at roughly 1.20—close to the median value of 1.16 from previous columns. defensive coding techniques. Every software developer is required to com- plete this course and pass the exams be- fore they can touch flight software. The course covers the coding standard’s ra- • the Phoenix lander had 300 KLOC, There’s clearly no single magic tool tionale, as well as general background • the Mars Exploration Rovers each or technique that can be used to secure on computer science principles and the had 650 KLOC, and the reliability of any large and complex basic structure of spacecraft control • the MSLRover upped the ante to 3 software application; rather, it takes software. Some of this material is also MLOC. good tools, workmanship, and a care- presented to more senior managers at

18 IEEE Software | www.computer.org/software Impact

JPL to secure a common knowledge 100.0 base regarding the challenges of mis- 90.0 sion-critical software development (al- though in the latter case, the material 80.0 86.0 84.4 82.3 is presented without the pressure of an 70.0 exam at the end). 60.0 Detection 50.0 The next best thing to preventing de- 40.0 rcent of comments fects is to detect them as early as pos- leading to code x Pe 30.0 sible in the software development cycle. To do this, we adopted a range of state- 20.0 of-the-art static source code analyzers, 10.0 paired with a new tool-based code re- 0.0 2 view process. High Medium Low The challenges in conducting peer code reviews on millions of lines of Figure 1. The percent of peer comments resulting in a code fix in the Mars Science code are well known.3 The process we Laboratory code review process between 2008 and 2012, by priority, showing that all adopted therefore shifted much of the comments were taken equally seriously. The majority of comments led to changes in the code. burden of the routine checks (such as checks for common types of coding er- rors, compliance with the coding stan- module. What was perhaps different Only the remaining 20 percent of the dard, or risky code patterns) to back- about the code review process we fol- reports or tool warnings therefore re- ground tools. A complete integration lowed was that we did all peer reviews quired discussion in the review meet- build of all MSL flight software was offline rather than in face-to-face meet- ings, leading to a final resolution of ei- performed nightly, with all checkers ings. We required the module owner ther fix (which in some cases overruled running over the code in parallel to the to respond to all reports (generated by an earlier disagree response from the builds. We jointly used four different tools or peers) with a simple agree, dis- module owner), or no fix (see Figure static analyzers with close to a hundred agree, or discuss response. Agree meant 1). In all, roughly 10,000 peer com- simpler custom-written checking scripts that the module owner agreed with ments on the code were processed in that verified compliance with various the finding and committed to chang- this way, together with approximately types of requirements that are harder to ing the code. Disagree indicated a dif- 25,000 tool reports. As shown, the encode in static analyzers (such as rules ference of opinion, where the module vast majority of these peer comments against the use of tabs in code or rules owner believed that the code was cor- and tool reports led to changes in the for the types of header files that must or rect as previously written and shouldn’t flight code to either address an issue or must not be used). be changed. Discuss meant that the re- to prevent a tool warning from recur- We used the static analyzers Cov- port was unclear and the owner needed ring in later builds. erity (www.coverity.com), Codesonar more information before they could de- We added another layer of checking (www.grammatech.com), Uno,4 and, termine a fix or no-fix resolution. with the application logic model-check- toward the end of the software devel- At the end of the offline review- ing techniques (http://spinroot.com) to opment, the newer tool Semmle (http:// ing period for each module, we held analyze key parts of the multithreaded semmle.com). Each of these tools has a single face-to-face meeting with re- code for possible race conditions or different strengths and tends to find dif- viewers, module owner, and the flight deadlocks. We analyzed five critical ferent types of flaws in the code; there’s software lead present to discuss any re- subsystems of the control software’s in surprisingly little overlap in the output. maining disagreements. In almost two this way. In one case our findings led to The results of the nightly analyses were hundred peer code review sessions held a complete redesign of the subsystem to made available in the single uniform in- for the MSL mission, approximately prevent the types of problems that the terface of the Scrub tool,2 which also 80 percent of all peer comments and model checker had uncovered. integrated peer comments collected tool reports were accepted with an im- Apart from the static analysis, peer during the code review phase for each mediate agree from the module owner. code reviews, and logic model-checking

January/February 2013 | IEEE Software 19 Impact

analyses, every software module had tests that can trigger the defensive code version of the landing software. That to pass rigorous unit and integration and show the circumstances under software version was called “Second- tests, both in a desktop software-only which it could be executed. Chance” (although, within the soft- context with simulated hardware and ware team it was also referred to as on the real hardware in flight system Containment “Last-Chance”). As we know now, testbeds. The requirement for the unit The final layer of defense in a reliable the main CPU succeeded in guiding tests was, as is common in this type of software system is more structural in the spacecraft to a flawless landing application, to realize full code cover- nature. It’s the mechanism that targets on Mars, and this part of the system age. Note, though, that this require- defects that weren’t prevented or de- wasn’t needed. To date, no significant ment isn’t always easy to comply with, tected earlier, to make sure that these anomalies have revealed themselves in especially when using defensive coding residual defects do not spread beyond the flight software. techniques. After all, defensive coding the module in which they occur and techniques often deal with cases that bring down the system as a whole. should be impossible under nominal One method to achieve this is to pro- Acknowledgments circumstances but that protect against vide redundant backups. The principle The research described in this article was carried out at the Jet Propulsion Laboratory, the consequences of as-yet unknown of redundancy is well understood in the California Institute of Technology, under a and unpredictable types of errors. De- hardware design and relatively easy to contract with the National Aeronautics and fensive code, then, can sometimes be implement in those cases. We can, for Space Administration. flagged as unreachable, which would instance, put a second set of thrusters violate a coding rule that bans the in- on a spacecraft, or a second transmit- clusion of unreachable code. The solu- ter and receiver. The assumption here References 1. JPL Coding Standard for Flight Software tion in this case is to devise demonic is that hardware failures in duplicated Written in the C Programming Language, Jet equipment are mostly uncorrelated. Propulsion Laboratory California Institute of This same principle is much harder Technology, 2009; http://lars-lab.jpl.nasa.gov/ jpl_coding_standard_c.pdf. to follow in software. Obviously, 2. G.J. Holzmann, “Scrub: A Tool for Code Re- running the same software on multiple views,” Innovations in System and Software CPUs doesn’t protect against software Eng., vol. 6, no. 4, 2010, pp. 311–318. 3. G.W. , “Experience with Inspection IEEE STC 2013 failures (a lesson that has been learned in Ultralarge-Scale Developments,” IEEE a few times too many in the past5). Software, Jan. 1991, pp. 25–31. 25th IEEE Software Technology The principle of containment was 4. G.J. Holzmann, “Static Source Code Checking Conference for User-Defined Properties,” Proc. 6th World most prominently used in software that Conf. Integrated Design and Process Technol- was used to control the critical land- ogy (IDPT 02), 2002; http://spinroot.com/ uno. ing sequence, which is often described 5. J.L. Lions, ARIANE 5: Flight 501 Failure, 8-11 April 2013 as the “seven minutes of terror.” The tech. report, Centre National d’Etudes hardware for the spacecraft was de- Spatiales, 1996; www.di.unito.it/~damiani/ ariane5rep.html. Salt Lake City, Utah, USA signed to be “dual string,” which means that many critical components are du- plicated for reliability. This duplica- Register today! tion includes the CPU; there are two Gerard J. Holzmann is a senior research scientist and fellow at NASA’s Jet Propulsion Labo- http://www.sstc-online.org/ completely separate CPU and memory subsystems. ratory at the California Institute of Technology. He was a member of the Mars Science Laboratory flight The backup CPU is designed to software team. Contact him at [email protected]. take over control of the spacecraft if the main CPU fails. However, if the software caused the failure, it clearly wouldn’t help much for the second CPU to execute precisely the same code after such a switch. During the landing Selected CS articles and columns sequence, the backup CPU therefore are also available for free at executed a simplified, stripped-down http://ComputingNow.computer.org.

20 IEEE Software | www.computer.org/software