Scalable Non-Blocking Concurrent Objects for Mission Critical Code

Scalable Nonblocking Concurrent Objects for Mission Critical Code Damian Dechev Bjarne Stroustrup Texas A&M University, Texas A&M University, College Station, TX 77843-3112, U.S.A. College Station, TX 77843-3112, U.S.A. [email protected] [email protected] Abstract General Terms Algorithms, Languages, Reliability The high degree of complexity and autonomy of future robotic Keywords nonblocking synchronization, C++, ABA problem pre- space missions, such as Mars Science Laboratory (MSL), poses vention, software transactional memory, autonomous space soft- serious challenges in assuring their reliability and efficiency. Pro- ware viding fast and safe concurrent synchronization is of critical importance to such autonomous embedded software systems. The application of nonblocking synchronization is known to help elimi- 1. Introduction nate the hazards of deadlock, livelock, and priority inversion. The Robotic space mission projects, such as Mars Science Laboratory nonblocking programming techniques are notoriously difficult to (MSL) [29], pose the challenging task of engineering some of the implement and offer a variety of semantic guarantees and usabil- most complex real-time embedded software systems. The notion ity and performance trade-offs. The present software development of concurrency is of critical importance for the design and imple- and certification methodologies applied at NASA do not reach the mentation of such systems. The present software development and level of detail of providing guidelines for the design of concurrent certification protocols (such as [26]) do not reach the level of de- software. The complex task of engineering reliable and efficient tail of offering guidelines for the engineering of reliable concur- concurrent synchronization is left to the programmer’s ingenuity. rent software. In this work, we provide a detailed analysis of the A number of Software Transactional Memory (STM) approaches state-of-the-art nonblocking programming techniques and derive a gained wide popularity because of their easy to apply interfaces, generic implementation for scalable lightweight concurrent objects but currently fail to offer scalable nonblocking transactions. In this that can help in implementing efficient and safe concurrent interac- work we provide an in-depth analysis of the nonblocking syn- tions in mission critical code. chronization semantics and their applicability in mission critical code. We describe a generic implementation of a methodology for 1.1 Nonblocking Objects scalable implementation of concurrent objects. Our performance The most common technique for controlling the interactions of con- evaluation demonstrates that our approach is practical and outper- current processes is the use of mutual exclusion locks. A mutual forms the application of nonblocking transactions by a large fac- exclusion lock guarantees thread-safety of a concurrent object by tor. In addition, we apply our Descriptor-based approach to pro- blocking all contending threads trying to access it except the one vide a solution to the fundamental ABA problem. Our ABA pre- holding the lock. In scenarios of high contention on the shared data, vention scheme, called the λδ approach, outperforms by a large such an approach can seriously affect the performance of the sys- factor the use of garbage collection for the safe management of tem and significantly diminish its parallelism. For the majority of each shared location. It offers speeds comparable to the applica- applications, the problem with locks is one of difficulty of provid- tion of the architecture-specific CAS2 instruction used for version ing correctness more than one of performance. The application of counting. The λδ approach is an ABA prevention technique based mutual exclusion locks poses significant safety hazards and incurs on classification of concurrent operations and 3-step execution of high complexity in the testing and validation of mission-critical a Descriptor object. A practical alternative to the application of software. Locks can be optimized in some scenarios by utilizing CAS2 is particularly important for the engineering of embedded fine-grained locks or context-switching. Because of resource limi- systems. tations, optimized lock mechanisms are not a desirable alternative for flight-qualified hardware [24]. Even for efficient locks, the in- Categories and Subject Descriptors D.3.3 [Programming Lan- terdependence of processes implied by the use of mutual exclusion guages]: Language Constructs and Features introduces the dangers of deadlock, livelock, and priority inversion. The incorrect application of locks is hard to detect with the tradi- tional testing procedures and a program can be deployed and used for a long period of time before the flaws become evident and eventually cause anomalous behavior. To achieve higher safety and gain performance, we suggest the Permission to make digital or hard copies of all or part of this work for personal or application of nonblocking synchronization. A concurrent object is classroom use is granted without fee provided that copies are not made or distributed nonblocking if it guarantees that some process in the system will for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute make progress in a finite amount of steps [16]. An object that guar- to lists, requires prior specific permission and/or a fee. antees that each process will make progress in a finite number of OOPSLA’09, October 25–29, 2009, Orlando, Florida, USA. steps is defined as wait-free. Obstruction-freedom [18] is an al- Copyright c 2009 ACM 978-1-60558-768-4/09/10. $10.00 ternative nonblocking condition that ensures progress if a thread eventually executes in isolation. It is the weakest nonblocking prop- part of the control system rather than a part of the system under erty and obstruction-free objects require the support of a contention control. There are dedicated state variables representing the data manager to prevent livelocking. states. In addition, data states can be controlled through the defi- nition of data goals. A data state might store information such as 1.2 Impact for Space Systems location, formatting, compression, and transport intent and status Modern robotic space exploration missions, such as Mars Science of the data. A data state might not be necessary for every state vari- Laboratory [29], are expected to embed a large array of advanced able. In a simple control system where no telemetry is used, the components and functionalities and perform a complex set of sci- state variable implementation might as well store the information entific experiments. The high degree of autonomy and increased regarding the variable’s value history and its extrapolated states. complexity of such systems pose significant challenges in assur- At its present state of design and implementation, MDS does not ing the reliability and efficiency of their software. A survey on the provide a concurrent synchronization mechanism for building safer challenges for the development of modern spacecraft software by and faster concurrent interactions. Elevating the level of efficiency Lowry [24] reveals that in July 1997 the Mars Pathfinder mission and reliability in the execution of the concurrent processes is of experienced a number of anomalous system resets that caused an particular significance to the implementation of the System Control operational delay and loss of scientific data. The follow-up analy- and the Data Management modules of MDS. It is the goal of this sis identified the presence of a priority inversion problem caused paper to illustrate the trade-offs in the semantics and application of by the low-priority meteorological process blocking the the high- some advanced nonblocking techniques and analyze their applica- priority bus management process. The software engineers found bility in MDS. The most ubiquitous and versatile data structure in out that it would have been impossible to detect the problem with the ISO C++ Standard Template Library (STL) [28] is vector, offer- the black box testing applied at the time. A more appropriate pri- ing a combination of dynamic memory management and constant- ority inversion inheritance algorithm had been ignored due to its time random access. Because of the vector’s wide use and chal- frequency of execution, the real-time requirements imposed, and lenging parallel implementation of its nonblocking dynamic oper- its high cost incurred on the slower flight-qualified computer hard- ations, we illustrate the efficiency of each nonblocking approach ware. The subtle interactions in the concurrent applications of the discussed in this work with respect to its applicability for the de- modern aerospace autonomous software are of critical importance sign and implementation of a shared nonblocking vector. A number to the system’s safety and operation. The presence of a large num- of pivotal concurrent applications in the Mission Data System [20] ber of concurrent autonomous processes implies an increased vol- framework employ a shared STL vector (in all scenarios protected ume of interactions that are hard to predict and validate. Allowing by mutual exclusion locks). Such is the Data Management Service fast and reliable concurrent synchronization is of critical impor- library described by Wagner in [30]. tance to the design of autonomous spacecraft software. 2. Nonblocking Data Structures 1.3 Mission Data System Lock-free

Load more