Landing a Spacecraft on Mars

Editor: Michiel van Genuchten Open Digital Dentistry IMPACT [email protected] Editor: Les Hatton Kingston University [email protected] Landing a Spacecraft on Mars Gerard J. Holzmann How much software does it take to land a spacecraft safely on Mars, and how do you make all that code reliable? In this column, Gerard Holzmann describes the software development process that was followed. —Michiel van Genuchten and Les Hatton THOUSANDS OF PEOPLE worked on de- About 75 percent of the code is auto-gen- sign, construction, and testing of the hard- erated from other formalisms, such as state- ware for NASA’s latest mission to Mars. machine descriptions and XML files. The re- This hardware includes not just the rover it- mainder was handwritten specifically for this self with its science instruments, but also the mission, in many cases building on heritage cruise stage, which guided the Curiosity rover code from earlier Mars missions. to Mars, and the descent stage, with the intri- The Curiosity Rover is the seventh space- cate sky-crane mechanism that gently lowered craft that NASA has successfully landed on the rover to the surface on 5 August 2012. Mars. Previous spacecraft include All this painstakingly developed, tested, and retested hardware is controlled by soft- • two Viking landers in 1976, ware. This software was written by a rela- • the Pathfinder minirover in 1996, tively small team of about 35 developers at • the two Mars Exploration Rovers Op- NASA’s Jet Propulsion Laboratory (JPL). portunity and Spirit in 2004, and Obviously, the control software is critically • the Phoenix Mars lander, which was important to the mission’s success, with launched in 2007 but reused the design of any failure potentially leading to the loss of the failed Mars Surveyor lander from 2001. the spacecraft—as well as to headline news around the world. Each new mission is more complex and The control software onboard the space- uses more control software than its predeces- craft consists of about 3 MLOC . Most of sor. But that’s putting it mildly. As in many this code is written in C, with a small por- other industries, code size is growing expo- tion (mostly for surface navigation) in C++. nentially fast: each new mission to Mars uses The code executes on a radiation hardened more control software than all missions be- CPU. The CPU is a version of an IBM Pow- fore it combined: erPC 750, called RAD750, which is designed for use in space. It has 4 Gbytes of flash • the Viking landers had about 5 KLOC memory, 128 Mbytes of RAM, and runs at a onboard, clock-speed of 133 MHz. • Pathfinder had 150 KLOC, 0740-7459/13/$31.00 © 2013 IEEE MARCH/APRIL 2013 | IEEE SOFTWARE 17 IMPACT fully managed process. The three main COMPOUND ANNUAL GROWTH RATE control points in this process are pre- IN MARS MISSIONS’ CoDE vention, detection, and containment. The Compound Annual Growth Rate, as described in previous columns,1 in flight Prevention software for spacecraft over the last 36 years comes out at roughly 1.20—close to The best way to make software reliable the median value of 1.16. The Mars Science Laboratory software is comparable to is to prevent the introduction of defects other safety-critical systems that were described in this column, such as the Tokyo from the start. We tried to do this in a Railway system,2 Honeywell’s Flight Management System,3 and Airbus.4 What sets number of ways. this system apart is that it must operate reliably at a distance of millions of miles First, we adopted a strong new cod- from Earth, making it inaccessible to standard types of maintenance and repair. ing standard for the mission (which was later also adopted as a common References standard for all software development 1. M. Genuchten and L. Hatton, “Compound Average Growth Rate for Software,” IEEE Software, vol. 29, at JPL1). Although most software devel- no. 4, 2011, pp. 19–21. opment projects use coding standards, 2. D. Avery, “The Evolution of Flight Management Systems,” IEEE Software, vol. 28, no. 1, 2011, pp. 11–13. 3. S. Burger, O. Hummel, and M. Heinisch, “Airbus Cabin Software,” IEEE Software, vol. 30, no. 1, 2013, we followed a somewhat different ap- pp. 21–25. proach in the definition of this one. To 4. K. Tomita and K. Ito, “Software in an Evolving Train Traffic Control System,”IEEE Software, vol. 28, no. define the rules in this standard, we 2, 2011, pp. 19–21. first looked at everything that had gone wrong in previous space missions. We categorized the problems that could Code size—exponential growth trend be attributed to software and devised 10,000 a small set of rules that could prevent MSL these classes of errors. Next, we focused on those rules for which we could 1,000 MER mechanically check compliance—for example, with a static analyzer. Our Pathnder Phoenix coding standard captured those rules, 100 and only those rules. Therefore, the rules in this coding standard cannot be silently ignored (as is often done with 10 other standards). We mechanically Lines of code in thousands (KLOC) Viking checked compliance with all the rules on every build of the software. Any de- 1 viations were reported and became part 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 Year of the input to the downstream code review process. Second, we introduced a flight soft- FIGURE A. The amount of flight code that is flown to land spacecraft on Mars has ware developer certification course, grown exponentially in the last 36 years. Its Compound Annual Growth Rate comes focused in part on software risk and out at roughly 1.20—close to the median value of 1.16 from previous columns. defensive coding techniques. Every software developer is required to complete this course and pass the exams be- fore they can touch flight software. The course covers the coding standard’s ra- • the Phoenix lander had 300 KLOC, There’s clearly no single magic tool tionale, as well as general background • the Mars Exploration Rovers each or technique that can be used to secure on computer science principles and the had 650 KLOC, and the reliability of any large and complex basic structure of spacecraft control • the MSLRover upped the ante to 3 software application; rather, it takes software. Some of this material is also MLOC. good tools, workmanship, and a care- presented to more senior managers at 18 IEEE SOFTWARE | WWW.COMPUTER.ORG/SOFTWARE IMPACT JPL to secure a common knowledge 100.0 base regarding the challenges of mis- 90.0 sion-critical software development (although in the latter case, the material 80.0 86.0 84.4 82.3 is presented without the pressure of an 70.0 exam at the end). 60.0 Detection 50.0 The next best thing to preventing de- 40.0 rcent of comments fects is to detect them as early as pos- leading to code x Pe 30.0 sible in the software development cycle. To do this, we adopted a range of state- 20.0 of-the-art static source code analyzers, 10.0 paired with a new tool-based code re- 0.0 2 view process. High Medium Low The challenges in conducting peer code reviews on millions of lines of FIGURE 1. The percent of peer comments resulting in a code fix in the Mars Science code are well known.3 The process we Laboratory code review process between 2008 and 2012, by priority, showing that all adopted therefore shifted much of the comments were taken equally seriously. The majority of comments led to changes in the code. burden of the routine checks (such as checks for common types of coding errors, compliance with the coding stan- module. What was perhaps different Only the remaining 20 percent of the dard, or risky code patterns) to back- about the code review process we fol- reports or tool warnings therefore re- ground tools. A complete integration lowed was that we did all peer reviews quired discussion in the review meet- build of all MSL flight software was offline rather than in face-to-face meet- ings, leading to a final resolution of ei- performed nightly, with all checkers ings. We required the module owner ther fix (which in some cases overruled running over the code in parallel to the to respond to all reports (generated by an earlier disagree response from the builds. We jointly used four different tools or peers) with a simple agree, dis- module owner), or no fix (see Figure static analyzers with close to a hundred agree, or discuss response. Agree meant 1). In all, roughly 10,000 peer com- simpler custom-written checking scripts that the module owner agreed with ments on the code were processed in that verified compliance with various the finding and committed to chang- this way, together with approximately types of requirements that are harder to ing the code. Disagree indicated a dif- 25,000 tool reports. As shown, the encode in static analyzers (such as rules ference of opinion, where the module vast majority of these peer comments against the use of tabs in code or rules owner believed that the code was cor- and tool reports led to changes in the for the types of header files that must or rect as previously written and shouldn’t flight code to either address an issue or must not be used). be changed. Discuss meant that the re- to prevent a tool warning from recur- We used the static analyzers Cov- port was unclear and the owner needed ring in later builds.

Load more