To What Extent Is Stress Testing of Android TV

IEEE TRANSACTIONS ON RELIABILITY, VOL. 65, NO. 3, SEPTEMBER 2016 1223 To What Extent is Stress Testing of Android TV Applications Automated in Industrial Environments? Bo Jiang, Member, IEEE, Peng Chen, Wing Kwong Chan, Member, IEEE, and Xinchao Zhang Abstract—An Android-based smart television (TV) must reli- GUI Graphical user interface. ably run its applications in an embedded program environment under diverse hardware resource conditions. Owing to the diverse API Application programing interface. hardware components used to build numerous TV models, TV sim- ulators are usually not sufficiently high in fidelity to simulate var- OS Operating system. ious TV models and thus are only regarded as unreliable alterna- USB Universal serial bus. tives when stress testing such applications. Therefore, even though stress testing on real TV sets is tedious, it is the de facto approach TTS Text to speech. to ensure the reliability of these applications in the industry. In this paper, we study to what extent stress testing of smart TV ap- SAPI Speech application programing interface. plications can be fully automated in the industrial environments. ANR Application not responding. To the best of our knowledge, no previous work has addressed this important question. We summarize the findings collected from ten TAST Testing of Android-based Smart TVs. industrial test engineers who have tested 20 such TV applications in a real production environment. Our study shows that the in- ANOVA Analysis of variance. dustry required test automation supports on high-level GUI object controls and status checking, setup of resource conditions, and the interplay between the two. With such supports, 87% of the industrial test specifications of one TV model can be fully automated, I. INTRODUCTION and 71.4% of them were found to be fully reusable to test a sub- MART televisions (Smart TVs) [11] are widely used sequent TV model with major upgrades of hardware, operating embedded systems [10], [13], and a major class of such system, and application. It represents a significant improvement S 1 with margins of 28% and 38%, respectively, compared with stress embedded system is Android-Based Smart TVs. Different testing without such supports. models even for the same series of TV use diverse types of Index Terms—Android, automated testing, reliability, software hardware components. Each model in this class includes much reuse, stress testing, test case creation, TV. standard TV functionality, such as channel controls and pref- erences, as well as non-standard applications such as online game clients, web browsers, multimedia players, photograph Acronyms: albums, in the form of Android applications, and executes them concurrently. At any one time, users may turn on zero or more TV Television. nonstandard applications and keep these applications executing CPU Central processing unit. while watching TV, or the other way round. As we are going to present in Section II-A, stress testing of TV applications can MIC Microphone. be quite different from the testing of applications of Android ADB Android debug bridge. smart phones. Moreover, a high degree of test automation is in D-pad Directional pad. high demand in industrial environments. We thus ask a couple of research questions. First, what are the areas that test engineers consider most important, which re- Manuscript received September 22, 2014; revised March 31, 2015; accepted quire automation when stress testing TV applications? Second, September 21, 2015. Date of publication October 14, 2015; date of current ver- to what extent do semi-automated test cases become fully auto- sion August 30, 2016. This work was supported in part by the National Natural Science Foundation of China under Grant 61202077, the Civil Aviation Special mated for the purpose of stress testing? Fund under Grant MJ-S-2012–05 and Grant MJ-Y-2012–07, and the ECS and To the best of our knowledge, no previous work has studied GRF of Research Grants Council of Hong Kong under Grant 123512, Grant these two important questions. To answer these two questions, 125113, Grant 111313, and Grant 11201114. Associate Editor: W. E. Wong. (Corresponding author: Wing Kwong Chan.) in this paper, we report our seven months case study on im- B. Jiang and P. Chen are with the School of Computer Science and Engi- proving the degree of automation in the stress testing of TV ap- neering, Beihang University, Beijing 100191, China (e-mail: jiangbo@buaa. plications in a major TV vendor.2 edu.cn; [email protected]). W. K. Chan is with the Department of Computer Science, City University of Specifically, we worked together with ten test engineers in Hong Kong, Hong Kong (e-mail: [email protected]). testing 20 real-world applications in a real-world smart TV X. Zhang is with the Institute of Reliability, Si Chuan Chang Hong Electric model manufactured by the same vendor for seven months Co., Mianyang 621000, China (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. 1[Online]. Available: http://www.google.com/tv/ Digital Object Identifier 10.1109/TR.2015.2481601 2[Online]. Available: http//:www.changhong.com 0018-9529 © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 1224 IEEE TRANSACTIONS ON RELIABILITY, VOL. 65, NO. 3, SEPTEMBER 2016 (March–September 2013) using real TV sets. These test engi- significantly higher than that using MT, which only achieved neers have two to five years of TV testing experiences with 50.0%–71.3% (with a mean of 58.7%). an average of three years. They have experiences on both Second, when applying the resultant test scripts to conduct a digital TV testing and smart TV testing. Each test engineer session of stress test on a newer version of the same TV model was assigned to test two TV applications for a TV model. series, we found that 55.0%–88.0% (with a mean of 71.4%) of Because the TV model is not the first model in the TV series, TAST test scripts can be completely reused. The amount of com- each application was associated with a set of test scripts. Our plete reuse is significantly higher than that achieved by MT, methodology was to first observe how test engineers used where only 31.0%–49.0% (with a mean of 40.7%) of the MT their original Android testing toolkit (denoted by MT,which test scripts can be reused. Furthermore, out of these reusable was an upgraded version of MobileTest [7] for Android) to test cases, 59%–63% of them do not cover any change in code create and maintain their automated test scripts based on the and 37%–41% of them cover some changes in code and can still corresponding test case specifications (or test specification work correctly. Note that the newer TV set has used newer oper- for short). Following a typical recommended practice of code ating system and hardware, making the program environments refactoring, when we observed that they repeatedly wrote sim- not the same as the ones perceived by test engineers when they ilar code fragments, we asked questions on what they wanted to wrotetestscriptsbasedonanolderTVset. automate further to save their repetitive efforts. Moreover, on Third, being able to produce more automated test scripts observing them aborting the automation of a test specification, means being able to produce more scenarios for stress testing. we also asked questions on why that particular test specification Stress testing required the execution of many test scripts exten- was failed to be automated and what automation features to be sively. We also found that test scripts provided via TAST had available in order to make them willing to successfully write exposed previously unknown bugs during the above-mentioned automated test scripts. We then added each identified feature iterative process. Specifically, we found that the TAST test to the same testing tool as new APIs if the feature either is scripts exposed real faults from each application with a failure a piece of code produced via “method extraction” in code rate of 1.9%–4.1% (with a mean of 3.2%). We emphasized refactoring or controls the available hardware resources in the that the TV model analyzed in thecasestudywasnotthefirst program execution of a test script. We continuously enhanced model in the TV series to use the same test specification for and released the testing tool with added features, and the test both stress testing and functional testing. This result showed engineers continuously used the latest releases of the testing that the new automated test scripts exposed unknown bugs that tool to create and maintain their test cases. We note that, as were not exposable before the arrival of TAST. new APIs were available to test engineers, their subsequent Last, but not the least, the case study confirmed and validated test scripts might incorporate additional coding patterns that the feasibility of the above methodology to produce an effective triggered the discovery of new APIs to be incorporated into testing framework. newer versions of the testing tool. We continued this process The main contribution of this paper is twofold: 1) this paper of tool enhancement until all ten engineers found further au- presents the first work on studying to what extent stress testing tomation unable to assist them in testing their assigned TV of TV applications can be automated; 2) it reports the first large- applications meaningfully (in their work environments). We scale industrial case study on the stress testing of Android ap- then requested each test engineer to review the possible uses of plications in the industrial environment.

Load more