METTLE: a Metamorphic Testing Approach to Assessing and Validating Unsupervised Machine Learning Systems

Total Page:16

File Type:pdf, Size:1020Kb

METTLE: a Metamorphic Testing Approach to Assessing and Validating Unsupervised Machine Learning Systems 1 METTLE: a METamorphic Testing approach to assessing and validating unsupervised machine LEarning systems Xiaoyuan Xie, Member, IEEE, Zhiyi Zhang, Tsong Yueh Chen, Senior Member, IEEE, Yang Liu, Member, IEEE, Pak-Lok Poon, Member, IEEE and Baowen Xu, Senior Member, IEEE F Abstract—Unsupervised machine learning is the training of an artificial NSUPERVISED machine learning requires no prior intelligence system using information that is neither classified nor la- U knowledge and can be widely used in a large variety beled, with a view to modeling the underlying structure or distribution in a of applications such as market segmentation for targeting dataset. Since unsupervised machine learning systems are widely used customers [1], anomaly or fraud detection in banking [2], in many real-world applications, assessing the appropriateness of these grouping genes or proteins in biological process [3], deriv- systems and validating their implementations with respect to individual users’ requirements and specific application scenarios / contexts are ing climate indices from earth science data [4], and docu- indisputably two important tasks. Such assessment and validation tasks, ment clustering based on content [5]. More recently, unsu- however, are fairly challenging due to the absence of a priori knowledge pervised machine learning has also been used by software of the data. In view of this challenge, we develop a METamorphic testers in predicting software faults [6]. Testing approach to assessing and validating unsupervised machine This paper specifically focuses on clustering systems LEarning systems, abbreviated as METTLE. Our approach provides a (which refer to software systems that implement cluster- new way to unveil the (possibly latent) characteristics of various machine ing algorithms and are intended to be used in different learning systems, by explicitly considering the specific expectations and domains) Such a clustering system helps users partition requirements of these systems from individual users’ perspectives. To a given unlabeled dataset into groups (or clusters) based support METTLE, we have further formulated 11 generic metamorphic relations (MRs), covering users’ generally expected characteristics that on some similarity measures, so that data in the same should be possessed by machine learning systems. To demonstrate the cluster are more “similar” to each other than to data from viability and effectiveness of METTLE, we have performed an experiment different clusters. In artificial intelligence (AI) and data involving six commonly used clustering systems. Our experiment has mining, numerous clustering systems [7, 8, 9] have been shown that, guided by user-defined MR-based adequacy criteria, end developed and are available for public use. Thus, selecting users are able to assess, validate, and select appropriate clustering the most appropriate clustering system for use is an impor- systems in accordance with their own specific needs. Our investiga- tant concern from end users. (In this paper, end users, or tion has also yielded insightful understanding and interpretation of the simply users, refer to those people who are “causal” users behavior of the machine learning systems from an end-user software engineering’s perspective, rather than a designer’s or implementor’s of clustering systems. Although they have some hands-on perspective, who normally adopts a theoretical approach. experience on using such systems, they often do not possess a solid theoretical foundation on machine learning. These arXiv:1807.10453v4 [cs.SE] 11 Jun 2019 Index Terms—Unsupervised machine learning, clustering assessment, users come from different fields such as bioinformatics [17] clustering validation, metamorphic testing, metamorphic relation, end- and nuclear engineering [18]. Also, their main concern is user software engineering. the applicability of a clustering system in the users’ specific contexts, rather than the detailed logic of this system.) From a user’s perspective, this selection is not trivial [10], not only 1 INTRODUCTION because end users generally do not have very solid theoret- ical background on machine learning, but also because the • X. Xie and Z. Zhang are with the School of Computer Science, Wuhan selection task involves two complex issues as follows: University, Wuhan 430072, China. (Issue 1) The correctness of the clustering results is X. Xie is the corresponding author. E-mail: [email protected]. • T. Y. Chen is with the Department of Computer Science and Software a major concern for users. However, when evaluating a Engineering, Swinburne University of Technology, Hawthorn, VIC 3122, clustering system, there is not necessarily a correct solution Australia. or ”ground truth” that users can refer to for verifying the • Y. Liu is with the School of Computer Science and Engineering, Nanyang clustering result [11]. Furthermore, not only is the correct Technological University, Singapore 639798. • P.-L. Poon is with the School of Engineering and Technology, Central result difficult or infeasible to find, the interpretation of Queensland University Australia, Melbourne, VIC 3000, Australia. correctness varies from one user to another. This is because, • B. Xu is with the State Key Laboratory for Novel Software Technology, although data points are partitioned into clusters based on Nanjing University, Nanjing 210023, China. some similarity measures, the comprehension of ”similar- 2 ity” may vary among individual users. Given a cluster, one importance, it is unfortunate that both external and internal user may consider that the data in it are similar, yet another techniques generally do not consider the dynamic aspect of user may consider not. dataset transformation when testing clustering systems. (Issue 2) Despite the importance of the correctness of the We now turn to issue 2. There has not yet been a gener- clustering result, in many cases, users would probably be ally accepted and systematic methodology that allows end more concerned if a clustering system produces an output users to effectively assess the quality and appropriateness of that is appropriate or meaningful to their particular scenar- a clustering system for their particular applications. In tradi- ios of applications. This view is supported by the following tional software testing, test adequacy is commonly measured argument in [12]: by code coverage criteria to unveil necessary conditions ”... the major obstacle is the difficulty in evaluating a of detecting faults in the code (e.g., incorrect logic). In clustering algorithm without taking into account the this regard, clustering systems are harder to assess because context: why does the user cluster his data in the first the logic of a machine learning model is primarily learnt place, and what does he want to do with the cluster- from massive data. In view of this problem, a practically ing afterwards? We argue that clustering should not applicable adequacy criterion is in need to help a user assess be treated as an application-independent mathematical and validate the characteristics that a clustering system problem, but should always be studied in the context of should possess in a specific application scenario, so that the its end-use.” most appropriate system can be selected for use from this user’s perspective. As a reminder, the characteristics that a Regarding issue 1, it is well known as the oracle problem clustering system is “expected” to possess may vary across in software testing. This problem occurs when a test oracle different users. Needless to say, there is also no systematic (or simply an oracle) does not exist. Here, an oracle refers to methodology for users to validate the appropriateness of a a mechanism that can verify the correctness of the system clustering result in their own contexts. output [13]. In view of the oracle problem, users of unsu- In view of the above two challenging issues, we propose pervised machine learning rely on two types of validation a METamorphic Testing approach to assessing and validat- techniques (external and internal) to evaluate clustering ing unsupervised machine LEarning systems (abbreviated systems. Basically, external validation techniques evaluate as METTLE). To alleviate Issue 1, METTLE applies the frame- the output clusters based on some existing benchmarks; work of metamorphic testing (MT) [13], so that users are still while internal validation techniques adopt features inherent able to validate a clustering system even when the oracle to the data alone to validate the clustering result. problem occurs. In addition, MT is naturally considered to Both external and internal validation techniques suffer be a candidate solution for addressing the ripple effect of from some problems which affect their effectiveness and data transformation, sinceMT involves multiple inputs (or applicability. For external techniques, it is usually difficult datasets) which follow a specific transformation relation. By to obtain sufficient relevant benchmark data for compar- defining a set of metamorphic relations (MRs) (which capture ison [14, 15]. In most situations, the benchmarks selected the relations between multiple inputs (or datasets) and their for use are essentially those special cases in software ver- corresponding outputs (or clustering results)) to be used in ification and validation, thereby providing insufficient test MT, the dynamic perspective of a clustering system can adequacy, coverage, and diversity. This issue does not exist be properly assessed. Furthermore,
Recommended publications
  • How Effectively Does Metamorphic Testing Alleviate the Oracle Problem?
    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1 How Effectively does Metamorphic Testing Alleviate the Oracle Problem? Huai Liu, Member, IEEE, Fei-Ching Kuo, Member, IEEE,DaveTowey,Member, IEEE, and Tsong Yueh Chen, Member, IEEE Abstract— In software testing, something which can verify the does not exist, or is too expensive to be used. This problem, correctness of test case execution results is called an oracle. The referred to as the oracle problem, is a fundamental challenge for oracle problem occurs when either an oracle does not exist, or software testing, because it significantly restricts the applicability exists but is too expensive to be used. Metamorphic testing is a and effectiveness of most test case selection strategies. testing approach which uses metamorphic relations, properties of the software under test represented in the form of relations Metamorphic testing [5] is an approach to alleviating the ora- among inputs and outputs of multiple executions, to help verify cle problem. In metamorphic testing, some necessary properties the correctness of a program. This paper presents new empirical (hereafter referred to as metamorphic relations) are identified for evidence to support this approach, which has been used to alle- the software under test, usually from the software specifications. viate the oracle problem in various applications and to enhance These metamorphic relations provide a new perspective on ver- several software analysis and testing techniques. It has been ifying test results. Traditionally, after a test case is executed, observed that identification of a sufficient number of appropriate metamorphic relations for testing, even by inexperienced testers, its corresponding test output is verified using a test oracle.
    [Show full text]
  • How Effectively Does Metamorphic Testing Alleviate the Oracle Problem?
    How effectively does metamorphic testing alleviate the oracle problem? This is the Presentation version of the following publication Liu, Huai, Kuo, FC, Towey, D and Chen, TY (2014) How effectively does metamorphic testing alleviate the oracle problem? IEEE Transactions on Software Engineering, 40 (1). 4 - 22. ISSN 0098-5589 The publisher’s official version can be found at http://ieeexplore.ieee.org/document/6613484/?reload=true Note that access to this version may require subscription. Downloaded from VU Research Repository https://vuir.vu.edu.au/33046/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1 How Effectively does Metamorphic Testing Alleviate the Oracle Problem? Huai Liu, Member, IEEE, Fei-Ching Kuo, Member, IEEE,DaveTowey,Member, IEEE, and Tsong Yueh Chen, Member, IEEE Abstract— In software testing, something which can verify the does not exist, or is too expensive to be used. This problem, correctness of test case execution results is called an oracle. The referred to as the oracle problem, is a fundamental challenge for oracle problem occurs when either an oracle does not exist, or software testing, because it significantly restricts the applicability exists but is too expensive to be used. Metamorphic testing is a and effectiveness of most test case selection strategies. testing approach which uses metamorphic relations, properties of the software under test represented in the form of relations Metamorphic testing [5] is an approach to alleviating the ora- among inputs and outputs of multiple executions, to help verify cle problem.
    [Show full text]
  • Metamorphic Testing of Navigation Software: a Pilot Study with Google Maps
    University of Wollongong Research Online Faculty of Engineering and Information Faculty of Engineering and Information Sciences - Papers: Part B Sciences 2018 Metamorphic testing of navigation software: A pilot study with Google Maps Joshua Brown University of Wollongong, [email protected] Zhiquan Zhou University of Wollongong, [email protected] Yang-Wai Chow University of Wollongong, [email protected] Follow this and additional works at: https://ro.uow.edu.au/eispapers1 Part of the Engineering Commons, and the Science and Technology Studies Commons Recommended Citation Brown, Joshua; Zhou, Zhiquan; and Chow, Yang-Wai, "Metamorphic testing of navigation software: A pilot study with Google Maps" (2018). Faculty of Engineering and Information Sciences - Papers: Part B. 1347. https://ro.uow.edu.au/eispapers1/1347 Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: [email protected] Metamorphic testing of navigation software: A pilot study with Google Maps Abstract Millions of people use navigation software every day to commute and travel. In addition, many systems rely upon the correctness of navigation software to function, ranging from directions applications to self- driving machinery. Navigation software is difficulto t test because it is hard or very expensive to evaluate its output. This difficulty is generally known as the oracle problem, a fundamental challenge in software testing. In this study, we propose a metamorphic testing strategy to alleviate the oracle problem in testing navigation software, and conduct a case study by testing the Google Maps mobile app, its web service API, and its graphical user interface.
    [Show full text]
  • Evolution, Testing and Configuration of Variability Systems Intensive José Ángel Galindo Duarte
    Evolution, testing and configuration of variability systems intensive José Ángel Galindo Duarte To cite this version: José Ángel Galindo Duarte. Evolution, testing and configuration of variability systems intensive. Other [cs.OH]. Université Rennes 1, 2015. English. NNT : 2015REN1S008. tel-01187958 HAL Id: tel-01187958 https://tel.archives-ouvertes.fr/tel-01187958 Submitted on 28 Aug 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. ANNEE´ 2014 THESE` / UNIVERSITE´ DE RENNES 1 sous le sceau de l’Universite´ Europeenne´ de Bretagne En cotutelle Internationale avec Universite´ de Seville,´ Espagne pour le grade de DOCTEUR DE L’UNIVERSITE´ DE RENNES 1 Mention : informatique Ecole´ doctorale Matisse present´ ee´ par Jose´ Angel´ GALINDO DUARTE prepar´ ee´ a` l’unite´ de recherche IRISA-UMR6074 Institut de Recherche en Informatique et Systemes` Aleatoires´ INRIA These` soutenue a` Seville´ Evolution, testing le 4 mars 2015 and configuration devant le jury compose´ de : Miguel Toro Bonilla of variability Professeur de l’universite´ de Seville´ / examinateur Sergio Segura Rueda Professeur de l’universite´ de Seville´ / examinateur intensive systems Jean-Marc Jez´ equel´ Professeur de l’universite´ de Rennes 1/ examinateur Alexander Felferning Professeur de l’universite´ de Graz/ examinateur Juan Manuel Murillo Professeur de l’universite´ d’Extremadura/ rapporteur EVOLUTION, TESTING AND CONFIGURATION OF VARIABILITY INTENSIVE SYSTEMS JOSE´ A.
    [Show full text]
  • 4 Metamorphic Testing: a Review of Challenges and Opportunities
    Metamorphic Testing: A Review of Challenges and Opportunities TSONG YUEH CHEN and FEI-CHING KUO, Swinburne University of Technology HUAI LIU, Victoria University PAK-LOK POON, RMIT University DAVE TOWEY, University of Nottingham Ningbo China T. H. TSE, The University of Hong Kong ZHI QUAN ZHOU, University of Wollongong Metamorphic testing is an approach to both test case generation and test result verification. A central ele- ment is a set of metamorphic relations, which are necessary properties of the target function or algorithm in relation to multiple inputs and their expected outputs. Since its first publication, we have witnessed a rapidly increasing body of work examining metamorphic testing from various perspectives, including metamorphic relation identification, test case generation, integration with other software engineering techniques, andthe validation and evaluation of software systems. In this article, we review the current research of metamorphic testing and discuss the challenges yet to be addressed. We also present visions for further improvement of metamorphic testing and highlight opportunities for new research. 4 CCS Concepts: • Software and its engineering → Software verification and validation; Software test- ing and debugging; Additional Key Words and Phrases: Metamorphic testing, metamorphic relation, test case generation, oracle problem ACM Reference format: Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities. ACM Comput. Surv. 51, 1, Article 4 (January 2018), 27 pages. https://doi.org/10.1145/3143561 This research was supported in part by a linkage grant of the Australian Research Council (project ID LP160101691) and a grant of the General Research Fund of the Research Grants Council of Hong Kong (project no.
    [Show full text]
  • Metamorphic Testing Techniques to Detect Defects in Applications Without Test Oracles
    Metamorphic Testing Techniques to Detect Defects in Applications without Test Oracles Christian Murphy Submitted in partial fulfillment of the Requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2010 c 2010 Christian Murphy All Rights Reserved Abstract Metamorphic Testing Techniques to Detect Defects in Applications without Test Oracles Christian Murphy Applications in the fields of scientific computing, simulation, optimization, machine learning, etc. are sometimes said to be “non-testable programs” because there is no reliable test oracle to indicate what the correct output should be for arbitrary input. In some cases, it may be impossible to know the program’s correct output a priori; in other cases, the creation of an oracle may simply be too hard. These applications typically fall into a category of software that Weyuker describes as “Programs which were written in order to determine the answer in the first place. There would be no need to write such programs, if the correct answer were known.” The absence of a test oracle clearly presents a challenge when it comes to detecting subtle errors, faults, defects or anomalies in software in these domains. As these types of programs become more and more prevalent in various aspects of everyday life, the dependability of software in these domains takes on increasing impor- tance. Machine learning and scientific computing software may be used for critical tasks such as helping doctors perform a medical diagnosis or enabling weather forecasters to more accurately predict the paths of hurricanes; hospitals may use simulation software to understand the impact of resource allocation on the time patients spend in the emergency room.
    [Show full text]
  • Metamorphic Testing of Navigation Software: a Pilot Study with Google Maps
    Proceedings of the 51st Hawaii International Conference on System Sciences j 2018 Metamorphic Testing of Navigation Software: A Pilot Study with Google Maps Joshua Brown, Zhi Quan Zhou£, and Yang-Wai Chow Institute of Cybersecurity and Cryptology School of Computing and Information Technology University of Wollongong Wollongong, NSW 2522, Australia Abstract—Millions of people use navigation software every day exceed 50% of the total development budget. Testing in- to commute and travel. In addition, many systems rely upon the volves executing the software under test (SUT) with a set of correctness of navigation software to function, ranging from test cases together with a mechanism against which the tester directions applications to self-driving machinery. Navigation can decide whether the outcomes of test case executions are software is difficult to test because it is hard or very expensive correct (that is, a test oracle). The oracle problem refers to evaluate its output. This difficulty is generally known as the to the situation where an oracle does not exist or it is oracle problem, a fundamental challenge in software testing. theoretically available but practically too expensive to be In this study, we propose a metamorphic testing strategy to applied. The oracle problem is a fundamental challenge in alleviate the oracle problem in testing navigation software, and software testing practice but this problem is often ignored conduct a case study by testing the Google Maps mobile app, its by the research community — compared with many other web service API, and its graphical user interface. The results aspects of testing such as automated test case generation, show that our strategy is effective with the detection of several the challenge of test oracle automation “has received signif- real-life bugs in Google Maps.
    [Show full text]
  • A Survey on Metamorphic Testing
    This is a repository copy of A Survey on Metamorphic Testing. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/110335/ Version: Accepted Version Article: Segura, S., Fraser, G., Sanchez, A.B. et al. (1 more author) (2016) A Survey on Metamorphic Testing. IEEE Transactions on Software Engineering, 42 (9). pp. 805-824. ISSN 0098-5589 https://doi.org/10.1109/TSE.2016.2532875 © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy. Reuse Unless indicated otherwise, fulltext items are protected by copyright with all rights reserved. The copyright exception in section 29 of the Copyright, Designs and Patents Act 1988 allows the making of a single copy solely for the purpose of non-commercial research or private study within the limits of fair dealing. The publisher or other rights-holder may allow further reproduction and re-use of this version - refer to the White Rose Research Online record for this item. Where records identify the publisher as the copyright holder, users can verify any specific terms of use on the publisher’s website. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing [email protected] including the URL of the record and the reason for the withdrawal request.
    [Show full text]
  • Metamorphic Testing: a Review of Challenges and Opportunities
    University of Wollongong Research Online Faculty of Engineering and Information Faculty of Engineering and Information Sciences - Papers: Part B Sciences 2018 Metamorphic Testing: A Review of Challenges and Opportunities Tsong Yueh Chen Swinburne University of Technology, [email protected] Fei-Ching Kuo Swinburne University of Technology, [email protected] Huai Liu Victoria University, [email protected] Pak-Lok Poon [email protected] Dave Towey United International College, University of Nottingham China Campus See next page for additional authors Follow this and additional works at: https://ro.uow.edu.au/eispapers1 Part of the Engineering Commons, and the Science and Technology Studies Commons Recommended Citation Chen, Tsong Yueh; Kuo, Fei-Ching; Liu, Huai; Poon, Pak-Lok; Towey, Dave; Tse, T. H.; and Zhou, Zhi Quan, "Metamorphic Testing: A Review of Challenges and Opportunities" (2018). Faculty of Engineering and Information Sciences - Papers: Part B. 975. https://ro.uow.edu.au/eispapers1/975 Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: [email protected] Metamorphic Testing: A Review of Challenges and Opportunities Abstract Metamorphic testing is an approach to both test case generation and test result verification. A central element is a set of metamorphic relations, which are necessary properties of the target function or algorithm in relation to multiple inputs and their expected outputs. Since its first publication, we have witnessed a rapidly increasing body of work examining metamorphic testing from various perspectives, including metamorphic relation identification, test case generation, integration with other software engineering techniques, and the validation and evaluation of software systems.
    [Show full text]
  • Metamorphic Testing: Testing the Untestable
    This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/MS.2018.2875968 Metamorphic Testing: Testing the Untestable Sergio Segura Department of Computer Languages and Systems, University of Seville, Spain. Dave Towey School of Computer Science, University of Nottingham Ningbo China, China. Zhi Quan Zhou Institute of Cybersecurity and Cryptology, School of Computing and Information Technology, University of Wollongong, Australia. Tsong Yueh Chen Department of Computer Science and Software Engineering, Swinburne University of Technology, Australia ABSTRACT. What if we could know that a program is buggy, even if we could not tell whether or not its observed output is correct? This is one of the key strengths of metamorphic testing, a technique where failures are not revealed by checking an individual concrete output, but by checking the relations among the inputs and outputs of multiple executions of the program under test. Two decades after its introduction, metamorphic testing has become a fully-fledged testing technique with successful applications in multiple domains, including online search engines, autonomous machinery, compilers, Web APIs, and deep learning programs, among others. This article serves as a hands-on entry point for newcomers to metamorphic testing, describing examples, possible applications, and current limitations, providing readers with the basics for the application of the technique in their own projects. Keywords: Software testing, metamorphic testing, oracle problem, test case generation. Introduction Suppose you are helping your son with his homework.
    [Show full text]
  • Metamorphic Robustness Testing of Google Translate
    Postprint of article in IEEE/ACM 5th International Workshop on Metamorphic Testing (MET’20), Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops (ICSEW’20), ACM, New York, NY Metamorphic Robustness Testing of Google Translate Dickson T. S. Lee Zhi Quan Zhou T. H. Tse Department of Computer Science School of Computing and IT Department of Computer Science The University of Hong Kong University of Wollongong The University of Hong Kong Pokfulam, Hong Kong Wollongong, NSW 2522, Australia Pokfulam, Hong Kong [email protected] [email protected] [email protected] ABSTRACT or stressful environmental conditions”. While developers strive to ensure that the software performs the intended functionality cor- Current research on the testing of machine translation software rectly for valid inputs, there is often a lack of negative testing to mainly focuses on functional correctness for valid, well-formed ensure that the software can also reasonably handle invalid inputs. inputs. By contrast, robustness testing, which involves the ability Suppose we want to translate the sentence “Peter LOVES of the software to handle erroneous or unanticipated inputs, is often apples.” into Chinese using Google Translate, which is available at overlooked. In this paper, we propose to address this important https://translate.google.com. We want to stress the word “LOVE” shortcoming. Using the metamorphic robustness testing approach, and hence type it in caps. We end the sentence with a period as per we compare the translations of original inputs with those of follow- usual practice. Suppose we have made a careless mistake by typing up inputs having different categories of minor typos.
    [Show full text]
  • Testing Web Enabled Simulation at Scale Using Metamorphic Testing
    Testing Web Enabled Simulation at Scale Using Metamorphic Testing John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Maria Lomeli, Erik Meijer, Silvia Sapora, and Justin Spahr-Summers FACEBOOK Inc. Abstract—We report on Facebook’s deployment of MIA (Meta- Since the decisions made from the results of these sim- morphic Interaction Automaton). MIA is used to test Facebook’s ulations are so fundamental, their predictions and the code Web Enabled Simulation, built on a web infrastructure of on which they are based acquire paramount importance. The hundreds of millions of lines of code. MIA tackles the twin problems of test flakiness and the unknowable oracle problem. consequences of (perhaps subtle, yet pivotal) bugs are clearly It uses metamorphic testing to automate continuous integration among the most impactful facing any software engineer. In and regression test execution. MIA also plays the role of a test order to improve testing of simulation-based systems, more bot, automatically commenting on all relevant changes submitted work is needed to address the problems of test flakiness and for code review. It currently uses a suite of over 40 metamorphic the oracle problem. test cases. Even at this extreme scale, a non–trivial metamorphic test suite subset yields outcomes within 20 minutes (sufficient for Based on metamorphic testing, we have been deploying an continuous integration and review processes). Furthermore, our automated test infrastructure for our web-enabled simulation, offline mode simulation reduces test flakiness from approximately WW, at Facebook, that tackles these twin problems of oracles 50% (of all online tests) to 0% (offline).
    [Show full text]