Evaluation of Failures Masking Across the Software Stack

Evaluation of Failures Masking Across the Software Stack Thiago Santini, Paolo Rech, Anderson Sartor, Ulisses B. Correa,ˆ Luigi Carro, and Flavio´ R. Wagner Instituto de Informatica´ Universitade Federal do Rio Grande do Sul Porto Alegre, RS, Brazil Email: ftcsantini, prech, alsartor, ubcorrea, carro, fl[email protected] Abstract—In this paper, we analyze how implementing an altitudes (i.e., 35,000 ft), considering only those failures that application in different software layers impacts the failure rate negatively impact the user experience [5]. When scaled to the of embedded computing systems. We investigate an ARM-based average number of passengers flying per year, which has been System-on-Chip executing an application on top of the Dalvik estimated to be three billion in 2013 by the International Air virtual machine, using Android’s Java Native Interface (JNI), and Transport Association [6], and conservatively supposing that as a native executable. The different versions are then exposed to each one uses a smartphone for an hour per trip, such a MTTF a controlled neutron beam, and the outcome of the resulting executions are logged and analyzed. We additionally classify translates to about 350,000 user-observable errors per year - observable failures based on the events observed in the logs during i.e., 0.1% of users will experience an observable failure. As the time window of each failure. Our experimental results show the complexity is expected to increase in future generations, that the Dalvik version presents the lowest failure rate followed and parallelism is becoming the new computing standard, by the JNI version and then by the native version, suggesting that software stack is going to be mandatory to ease applications the higher the software layer, the higher the failure masking. development and portability. In this scenario, software stack I. INTRODUCTION reliability needs to be carefully evaluated to understand the behaviour of applications executed in embedded systems when Nowadays, Personal Mobile Devices (PMDs), such as exposed to radiation. tablets and smartphones, are becoming the mainstream of The objective of this work is to evaluate how such large computing devices [1]. The amount of resources available software abstraction stacks impact user-observable errors. To in PMDs is continuously increasing, and it is very common reach this goal, three variants of a matrix multiplication to have PMDs powered by parallel embedded processors. To application were developed. The first variant is a full Java manage the available resources a large software platform is re- implementation, executing over the Dalvik Virtual Machine quired. PMD software platform also supports a faster time-to- (DVM). The second variant uses the Java Native Interface, market for third-part software, profoundly changing traditional in which parts of the applications code are implemented in software design development for embedded systems. a native shared library. The last variant is a native Linux For complex systems, one of the traditional approaches application. Our experimental results show that the Dalvik to speed-up the time-to-market of applications is through the version presents the lowest failure rate followed by the JNI usage of high-abstraction levels in the application design and version and then by the native version, suggesting that the implementation processes. These abstractions facilitate the em- higher the software layer, the higher the failure masking. bedded resources management and the software development This paper is organised as follows. Section II presents process, making software more readable, easier to maintain, Android software stack background, and Section III gives an and highly portable [2]. In fact, nowadays embedded systems overview of our experimental setup. Then Section IV presents projects typically include object-oriented languages, like Java and discusses the obtained experimental results. Section V and C++. These high abstraction languages are vastly applied concludes the paper and presents future works. even on resource-constrained microcontrollers platforms, such as Arduino, whose official Software Development Kit (SDK) uses C++ in an object oriented way [3]. II. ANDROID SOFTWARE STACK Android, the dominant software platform in the market for The Android platform is a software stack composed by four PMDs [4], uses a four-layer software stack to help third part levels. The lowest level is the Linux kernel, responsible for practitioners to develop applications. This software platform is task scheduling, device drivers, power management, recourse intended to be used in a broad range of devices (e.g., wear- access, and others low-level tasks. The second level com- ables, smartphones, TVs, automobile media centers). Since the prehends native libraries and the Android runtime. The An- Android platform stack abstracts several implementation de- droid runtime comprises core libraries and the Dalvik Virtual tails, code reuse becomes ubiquitous among different devices. Machine (DVM). Dalvik’s purpose is to provide a platform- With the shrinking of transistor dimensions and the exacer- independent programming environment that abstracts details of bation of the amount of resources available in modern devices, the underlying hardware and operating system. To do so, the the radiation-induced error rate cannot be considered negligible Dalvik bytecodes, called DEX, are interpreted to the target ar- even in consumer electronics and PMDs. For instance, the user- chitecture during the execution of the application. In addition, observable Apple iPhone 3 Mean Time To Failure (MTTF) can the DVM includes a trace-based Just-In-Time (JIT) compiler be as short as 1 year when operating at commercial aircraft to translate the bytecodes of frequently used execution paths into native instructions, and the result from this translation is was left unused. cached to avoid reinterpretation overhead. Moreover, the DVM was designed to run on memory-constrained environments and B. Software Under Test to allow multiple instances of the virtual machine, so every As benchmark, we selected Matrix Multiplication since it application runs a private instance, which provides security, is typically used in both safety-critical (e.g., filter and control isolation, and effective memory management. The third level operations) and user applications (e.g., media applications). is the application framework, which provides high-level ser- One application execution was defined as 200 multiplications vices in the form of Java classes accessible through the Java of 25 × 25 integer matrices to keep a tractable run-time Development Kit (JDK). The top level is the applications layer, and output throughput. A greater workload would increase where all applications available in the Android device reside, the probability of having radiation-induced errors, eventually using the resources provided by the layers below. allowing more than one neutron to generate a failure in one A developer has multiple possibilities when developing an single execution. As detailed in the next subsection, this is to Android application, each having its strengths and weaknesses. be avoided to derate the experimentally observed error rate to The application may be developed purely in Java code, with the natural radiation environment. A smaller workload would a combination of Java and native code, or purely in native impede the gathering of a statistically significant amount of code. Java applications run on the Dalvik Virtual Machine, data. therefore having portability and security provided by this As shown in Algorithm 1, after DUT initialization, the virtual machine. However, the additional software layer (i.e., application starts: all matrices are initialized, and for each of the virtual machine) may affect the application performance. the 200 sets of input matrices Ai is multiplied by Bi; the The combination of Java and native code can be done through resulting matrix is then compared to the expected result Gi the Java Native Interface (JNI) framework or through Native and, if they differ, a failure flag is raised. After the 200 matrix Activities; the former provides an interface for native methods multiplications are completed, errors are reported, and a new to be called from the Java side; the latter comprises of whole application execution is triggered. Android activities implemented in native code, which can be used along with Java activities. Through the JNI framework, ALGORITHM 1: Application under test. it is possible for Java code to interact with C or C++ code by calling methods implemented in native code. This possibility setup caches(); allows the reuse of legacy code and can be used to increase print banner(); application performance in some situations. Nevertheless, JNI while T rue do // Applications Start compromises the application’s portability and security, once F ail F alse; the code needs to be compiled to each target architecture and for i 1 to 200 do // Unrolled does not run on the DVM anymore. Furthermore, by using init(Ai;Bi;Gi); JNI, an overhead is created because of the context switches, end which involves copying of operands in memory between the for i 1 to 200 do // Unrolled Java and the native side. Applications developed purely in C Ai ∗ Bi; native code, referred to as ELF applications in this work, are if C 6= Gi then usual C or C++ applications that are compiled to run in an F ail T rue; Android device. ELF applications have low portability and end low execution overhead as the code is executed directly by the end print(F ail); processor and does not need to be interpreted by the DVM. // Application End On a radiation reliability point of view, increasing the end abstraction level may significantly modify the error rate of an application. Passing from one level of abstraction to a higher one may bring benefits to the device reliability, as In total, three variants of this application were produced: some errors could be masked. In fact, not all the failures Dalvik: The application was entirely implemented in the occurring at physical level actually propagates to the output Java language, from which an Android Application Package of an application.

Load more