Testing and Validating Machine Learning Classifiers by Metamorphic

Total Page:16

File Type:pdf, Size:1020Kb

Testing and Validating Machine Learning Classifiers by Metamorphic The Journal of Systems and Software 84 (2011) 544–558 Contents lists available at ScienceDirect The Journal of Systems and Software journal homepage: www.elsevier.com/locate/jss Testing and validating machine learning classifiers by metamorphic testingଝ Xiaoyuan Xie a,d,e,∗, Joshua W.K. Ho b, Christian Murphy c,f, Gail Kaiser c, Baowen Xu e, Tsong Yueh Chen a a Centre for Software Analysis and Testing, Swinburne University of Technology, Hawthorn, Vic. 3122, Australia b Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA c Department of Computer Science, Columbia University, New York, NY 10027, USA d School of Computer Science and Engineering, Southeast University, Nanjing 210096, China e State Key Laboratory for Novel Software Technology, Department of Computer Science and Technology, Nanjing University, Nanjing 210093, China f Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19103, USA article info abstract Article history: Machine learning algorithms have provided core functionality to many application domains – such as Received 10 January 2010 bioinformatics, computational linguistics, etc. However, it is difficult to detect faults in such applications Received in revised form 17 October 2010 because often there is no “test oracle” to verify the correctness of the computed outputs. To help address Accepted 25 November 2010 the software quality, in this paper we present a technique for testing the implementations of machine Available online 7 December 2010 learning classification algorithms which support such applications. Our approach is based on the tech- nique “metamorphic testing”, which has been shown to be effective to alleviate the oracle problem. Also Metamorphic testing presented include a case study on a real-world machine learning application framework, and a discussion Machine learning Test oracle of how programmers implementing machine learning algorithms can avoid the common pitfalls discov- Oracle problem ered in our study. We also conduct mutation analysis and cross-validation, which reveal that our method Validation has high effectiveness in killing mutants, and that observing expected cross-validation result alone is not Verification sufficiently effective to detect faults in a supervised classification program. The effectiveness of meta- morphic testing is further confirmed by the detection of real faults in a popular open-source classification program. © 2010 Elsevier Inc. All rights reserved. 1. Introduction The majority of the research effort in the domain of machine learning focuses on building more accurate models that can bet- Machine learning algorithms have provided core functionality ter achieve the goal of automated learning from the real world. to many application domains – such as bioinformatics, compu- However, to date very little work has been done on assuring the tational linguistics, etc. As these types of scientific applications correctness of the software applications that implement machine become more and more prevalent in society (Mitchell, 1983), learning algorithms. Formal proofs of an algorithm’s optimal qual- ensuring their quality becomes more and more crucial. ity do not guarantee that an application implements or uses the Quality assurance of such applications presents a challenge algorithm correctly, and thus software testing is necessary. because conventional software testing techniques are not always To help address the quality of machine learning programs, applicable. In particular, it may be difficult to detect subtle errors, this paper presents a technique for testing implementations faults, defects or anomalies in many applications in these domains of the supervised classification algorithms which are used by because it may be either impossible or too expensive to verify the many machine learning programs. Our technique is based on correctness of computed outputs, which is referred to as the oracle an approach called “metamorphic testing” (Chen et al., 1998), problem (Weyuker, 1982). which uses properties of functions such that it is possible to predict expected changes to the output for particular changes to the input. Although the correct output cannot be known in advance, if the change is not as expected, then a fault must ଝ exist. A preliminary version of this paper was presented at the 9th International Con- ference on Quality Software (QSIC 2009) (Xie et al., 2009). In our approach, we first enumerate the metamorphic relations ∗ Corresponding author at: Centre for Software Analysis and Testing, Swinburne that classifiers would be expected to demonstrate, then for a given University of Technology, Hawthorn, Vic. 3122, Australia. Tel.: +61 3 9214 8678; implementation determine whether each relation is a necessary fax: +61 3 9819 0823. property for the corresponding classifier algorithm. If it is, then fail- E-mail addresses: [email protected] (X. Xie), [email protected] (J.W.K. Ho), [email protected] (C. Murphy), [email protected] ure to exhibit the relation indicates a fault; if the relation is not a (G. Kaiser), [email protected] (B. Xu), [email protected] (T.Y. Chen). necessary property, then a deviation from the “expected” behav- 0164-1212/$ – see front matter © 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.jss.2010.11.920 X. Xie et al. / The Journal of Systems and Software 84 (2011) 544–558 545 ior has been found. In other words, apart from verification, our the labels are unknown. In a classification algorithm, the system approach also supports validation. attempts to predict the label of each individual example. That is, In addition to presenting our technique, we describe a case study the testing data input is an unlabeled test case ts, and the aim is on a real-world machine learning application framework, Weka to determine its class label ct based on the data-label relationship (Witten and Frank, 2005), which is used as the foundation for many learned from the set of training samples S and the corresponding computational science tools such as BioWeka (Gewehr et al., 2007) class labels C, where ct ∈ L. in bioinformatics. Additionally a mutation analysis is conducted on Weka to investigate the effectiveness of our method. We also discuss how our findings can be of use to other areas. 2.2. Algorithms investigated The rest of this paper is organized as follows. Section 2 provides background information about machine learning and introduces In this paper, we only study the k-nearest neighbors classifier the specific algorithms that are evaluated. Section 3 discusses and the Naïve Bayes classifier, because of their popularity in the the metamorphic testing approach and the specific metamorphic machine learning community. However, it should be noted that relations used for testing machine learning classifiers. Section the oracle problem description and techniques described below are 4 presents the results of case studies demonstrating that the not specific to any particular algorithm, and as shown in our pre- approach can find faults in real-world machine learning appli- vious work (Chen et al., 2009; Murphy et al., 2008), our results are cations. Section 5 discusses empirical studies that use mutation applicable to the general case. analysis to systematically insert faults into the source code, and In k-nearest neighbors (k NN), for a training sample set S, sup- measures the effectiveness of metamorphic testing. Section 6 pose each sample has m attributes, att0, att1, ..., attm−1, and there presents related work, and Section 7 concludes. are n classes in S, {l0, l1, ..., ln−1}. The value of the test case ts is a0, a1, ..., am−1. k NN computes the distance between each training 2. Background sample and the test case. Generally k NN uses the Euclidean Dis- tance: for a sample si ∈ S, suppose the value of each attribute is sa0, ... In this section, we present some of the basics of machine sa1, , sam−1 , and the distance between si and ts is as follows: learning and the two algorithms we investigated (k-nearest neigh- bors and Naïve Bayes Classifier), as well as the terminology used m−1 2 (Alpaydin, 2004). Readers familiar with machine learning may skip dist(si,ts) = (saj − aj) . this section. j One complication in our work arose due to conflicting tech- nical nomenclature: “testing”, “regression”, “validation”, “model” After sorting all the distances, k NN selects the k nearest ones and other relevant terms have very different meanings to machine which are considered as the k nearest neighbors. Then k NN calcu- learning experts than they do to software engineers. Here we lates the proportion of each label in the k nearest neighbors, and employ the terms “testing”, “regression testing”, and “validation” the label with the highest proportion is assigned as the label of the as appropriate for a software engineering audience, but we adopt test case. the machine learning sense of “model”, as defined below. In the Naï ve Bayes Classifier (NBC), for a training sample set S, suppose each sample has m attributes, att , att , ..., att − , and 2.1. Supervised machine learning fundamentals 0 1 m 1 there are n classes in S, {l0, l1, ..., ln−1}. The value of the test case ts is a , a , ..., a − . The label of t is called l , and is to be predicted In general, input to a supervised machine learning classifier con- 0 1 m 1 s ts by NBC. sists of a set of training data that can be represented by two vectors ... NBC computes the probability of lts to be of class lk, when each of size k. One vector is for the k training samplesS = s0, s1, , sk−1 attribute value of t is a , a , ..., a − . NBC assumes that attributes and the other is for the corresponding class labelsC = c , c , ..., s 0 1 m 1 0 1 are conditionally independent with one another given the class c . Each sample s ∈ S is a vector of size m, which represents m k−1 label, therefore we have the equation: attributes from which to learn.
Recommended publications
  • How Effectively Does Metamorphic Testing Alleviate the Oracle Problem?
    This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1 How Effectively does Metamorphic Testing Alleviate the Oracle Problem? Huai Liu, Member, IEEE, Fei-Ching Kuo, Member, IEEE,DaveTowey,Member, IEEE, and Tsong Yueh Chen, Member, IEEE Abstract— In software testing, something which can verify the does not exist, or is too expensive to be used. This problem, correctness of test case execution results is called an oracle. The referred to as the oracle problem, is a fundamental challenge for oracle problem occurs when either an oracle does not exist, or software testing, because it significantly restricts the applicability exists but is too expensive to be used. Metamorphic testing is a and effectiveness of most test case selection strategies. testing approach which uses metamorphic relations, properties of the software under test represented in the form of relations Metamorphic testing [5] is an approach to alleviating the ora- among inputs and outputs of multiple executions, to help verify cle problem. In metamorphic testing, some necessary properties the correctness of a program. This paper presents new empirical (hereafter referred to as metamorphic relations) are identified for evidence to support this approach, which has been used to alle- the software under test, usually from the software specifications. viate the oracle problem in various applications and to enhance These metamorphic relations provide a new perspective on ver- several software analysis and testing techniques. It has been ifying test results. Traditionally, after a test case is executed, observed that identification of a sufficient number of appropriate metamorphic relations for testing, even by inexperienced testers, its corresponding test output is verified using a test oracle.
    [Show full text]
  • How Effectively Does Metamorphic Testing Alleviate the Oracle Problem?
    How effectively does metamorphic testing alleviate the oracle problem? This is the Presentation version of the following publication Liu, Huai, Kuo, FC, Towey, D and Chen, TY (2014) How effectively does metamorphic testing alleviate the oracle problem? IEEE Transactions on Software Engineering, 40 (1). 4 - 22. ISSN 0098-5589 The publisher’s official version can be found at http://ieeexplore.ieee.org/document/6613484/?reload=true Note that access to this version may require subscription. Downloaded from VU Research Repository https://vuir.vu.edu.au/33046/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 1 How Effectively does Metamorphic Testing Alleviate the Oracle Problem? Huai Liu, Member, IEEE, Fei-Ching Kuo, Member, IEEE,DaveTowey,Member, IEEE, and Tsong Yueh Chen, Member, IEEE Abstract— In software testing, something which can verify the does not exist, or is too expensive to be used. This problem, correctness of test case execution results is called an oracle. The referred to as the oracle problem, is a fundamental challenge for oracle problem occurs when either an oracle does not exist, or software testing, because it significantly restricts the applicability exists but is too expensive to be used. Metamorphic testing is a and effectiveness of most test case selection strategies. testing approach which uses metamorphic relations, properties of the software under test represented in the form of relations Metamorphic testing [5] is an approach to alleviating the ora- among inputs and outputs of multiple executions, to help verify cle problem.
    [Show full text]
  • Metamorphic Testing of Navigation Software: a Pilot Study with Google Maps
    University of Wollongong Research Online Faculty of Engineering and Information Faculty of Engineering and Information Sciences - Papers: Part B Sciences 2018 Metamorphic testing of navigation software: A pilot study with Google Maps Joshua Brown University of Wollongong, [email protected] Zhiquan Zhou University of Wollongong, [email protected] Yang-Wai Chow University of Wollongong, [email protected] Follow this and additional works at: https://ro.uow.edu.au/eispapers1 Part of the Engineering Commons, and the Science and Technology Studies Commons Recommended Citation Brown, Joshua; Zhou, Zhiquan; and Chow, Yang-Wai, "Metamorphic testing of navigation software: A pilot study with Google Maps" (2018). Faculty of Engineering and Information Sciences - Papers: Part B. 1347. https://ro.uow.edu.au/eispapers1/1347 Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: [email protected] Metamorphic testing of navigation software: A pilot study with Google Maps Abstract Millions of people use navigation software every day to commute and travel. In addition, many systems rely upon the correctness of navigation software to function, ranging from directions applications to self- driving machinery. Navigation software is difficulto t test because it is hard or very expensive to evaluate its output. This difficulty is generally known as the oracle problem, a fundamental challenge in software testing. In this study, we propose a metamorphic testing strategy to alleviate the oracle problem in testing navigation software, and conduct a case study by testing the Google Maps mobile app, its web service API, and its graphical user interface.
    [Show full text]
  • Evolution, Testing and Configuration of Variability Systems Intensive José Ángel Galindo Duarte
    Evolution, testing and configuration of variability systems intensive José Ángel Galindo Duarte To cite this version: José Ángel Galindo Duarte. Evolution, testing and configuration of variability systems intensive. Other [cs.OH]. Université Rennes 1, 2015. English. NNT : 2015REN1S008. tel-01187958 HAL Id: tel-01187958 https://tel.archives-ouvertes.fr/tel-01187958 Submitted on 28 Aug 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. ANNEE´ 2014 THESE` / UNIVERSITE´ DE RENNES 1 sous le sceau de l’Universite´ Europeenne´ de Bretagne En cotutelle Internationale avec Universite´ de Seville,´ Espagne pour le grade de DOCTEUR DE L’UNIVERSITE´ DE RENNES 1 Mention : informatique Ecole´ doctorale Matisse present´ ee´ par Jose´ Angel´ GALINDO DUARTE prepar´ ee´ a` l’unite´ de recherche IRISA-UMR6074 Institut de Recherche en Informatique et Systemes` Aleatoires´ INRIA These` soutenue a` Seville´ Evolution, testing le 4 mars 2015 and configuration devant le jury compose´ de : Miguel Toro Bonilla of variability Professeur de l’universite´ de Seville´ / examinateur Sergio Segura Rueda Professeur de l’universite´ de Seville´ / examinateur intensive systems Jean-Marc Jez´ equel´ Professeur de l’universite´ de Rennes 1/ examinateur Alexander Felferning Professeur de l’universite´ de Graz/ examinateur Juan Manuel Murillo Professeur de l’universite´ d’Extremadura/ rapporteur EVOLUTION, TESTING AND CONFIGURATION OF VARIABILITY INTENSIVE SYSTEMS JOSE´ A.
    [Show full text]
  • 4 Metamorphic Testing: a Review of Challenges and Opportunities
    Metamorphic Testing: A Review of Challenges and Opportunities TSONG YUEH CHEN and FEI-CHING KUO, Swinburne University of Technology HUAI LIU, Victoria University PAK-LOK POON, RMIT University DAVE TOWEY, University of Nottingham Ningbo China T. H. TSE, The University of Hong Kong ZHI QUAN ZHOU, University of Wollongong Metamorphic testing is an approach to both test case generation and test result verification. A central ele- ment is a set of metamorphic relations, which are necessary properties of the target function or algorithm in relation to multiple inputs and their expected outputs. Since its first publication, we have witnessed a rapidly increasing body of work examining metamorphic testing from various perspectives, including metamorphic relation identification, test case generation, integration with other software engineering techniques, andthe validation and evaluation of software systems. In this article, we review the current research of metamorphic testing and discuss the challenges yet to be addressed. We also present visions for further improvement of metamorphic testing and highlight opportunities for new research. 4 CCS Concepts: • Software and its engineering → Software verification and validation; Software test- ing and debugging; Additional Key Words and Phrases: Metamorphic testing, metamorphic relation, test case generation, oracle problem ACM Reference format: Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities. ACM Comput. Surv. 51, 1, Article 4 (January 2018), 27 pages. https://doi.org/10.1145/3143561 This research was supported in part by a linkage grant of the Australian Research Council (project ID LP160101691) and a grant of the General Research Fund of the Research Grants Council of Hong Kong (project no.
    [Show full text]
  • Metamorphic Testing Techniques to Detect Defects in Applications Without Test Oracles
    Metamorphic Testing Techniques to Detect Defects in Applications without Test Oracles Christian Murphy Submitted in partial fulfillment of the Requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2010 c 2010 Christian Murphy All Rights Reserved Abstract Metamorphic Testing Techniques to Detect Defects in Applications without Test Oracles Christian Murphy Applications in the fields of scientific computing, simulation, optimization, machine learning, etc. are sometimes said to be “non-testable programs” because there is no reliable test oracle to indicate what the correct output should be for arbitrary input. In some cases, it may be impossible to know the program’s correct output a priori; in other cases, the creation of an oracle may simply be too hard. These applications typically fall into a category of software that Weyuker describes as “Programs which were written in order to determine the answer in the first place. There would be no need to write such programs, if the correct answer were known.” The absence of a test oracle clearly presents a challenge when it comes to detecting subtle errors, faults, defects or anomalies in software in these domains. As these types of programs become more and more prevalent in various aspects of everyday life, the dependability of software in these domains takes on increasing impor- tance. Machine learning and scientific computing software may be used for critical tasks such as helping doctors perform a medical diagnosis or enabling weather forecasters to more accurately predict the paths of hurricanes; hospitals may use simulation software to understand the impact of resource allocation on the time patients spend in the emergency room.
    [Show full text]
  • Metamorphic Testing of Navigation Software: a Pilot Study with Google Maps
    Proceedings of the 51st Hawaii International Conference on System Sciences j 2018 Metamorphic Testing of Navigation Software: A Pilot Study with Google Maps Joshua Brown, Zhi Quan Zhou£, and Yang-Wai Chow Institute of Cybersecurity and Cryptology School of Computing and Information Technology University of Wollongong Wollongong, NSW 2522, Australia Abstract—Millions of people use navigation software every day exceed 50% of the total development budget. Testing in- to commute and travel. In addition, many systems rely upon the volves executing the software under test (SUT) with a set of correctness of navigation software to function, ranging from test cases together with a mechanism against which the tester directions applications to self-driving machinery. Navigation can decide whether the outcomes of test case executions are software is difficult to test because it is hard or very expensive correct (that is, a test oracle). The oracle problem refers to evaluate its output. This difficulty is generally known as the to the situation where an oracle does not exist or it is oracle problem, a fundamental challenge in software testing. theoretically available but practically too expensive to be In this study, we propose a metamorphic testing strategy to applied. The oracle problem is a fundamental challenge in alleviate the oracle problem in testing navigation software, and software testing practice but this problem is often ignored conduct a case study by testing the Google Maps mobile app, its by the research community — compared with many other web service API, and its graphical user interface. The results aspects of testing such as automated test case generation, show that our strategy is effective with the detection of several the challenge of test oracle automation “has received signif- real-life bugs in Google Maps.
    [Show full text]
  • A Survey on Metamorphic Testing
    This is a repository copy of A Survey on Metamorphic Testing. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/110335/ Version: Accepted Version Article: Segura, S., Fraser, G., Sanchez, A.B. et al. (1 more author) (2016) A Survey on Metamorphic Testing. IEEE Transactions on Software Engineering, 42 (9). pp. 805-824. ISSN 0098-5589 https://doi.org/10.1109/TSE.2016.2532875 © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reproduced in accordance with the publisher's self-archiving policy. Reuse Unless indicated otherwise, fulltext items are protected by copyright with all rights reserved. The copyright exception in section 29 of the Copyright, Designs and Patents Act 1988 allows the making of a single copy solely for the purpose of non-commercial research or private study within the limits of fair dealing. The publisher or other rights-holder may allow further reproduction and re-use of this version - refer to the White Rose Research Online record for this item. Where records identify the publisher as the copyright holder, users can verify any specific terms of use on the publisher’s website. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing [email protected] including the URL of the record and the reason for the withdrawal request.
    [Show full text]
  • Metamorphic Testing: a Review of Challenges and Opportunities
    University of Wollongong Research Online Faculty of Engineering and Information Faculty of Engineering and Information Sciences - Papers: Part B Sciences 2018 Metamorphic Testing: A Review of Challenges and Opportunities Tsong Yueh Chen Swinburne University of Technology, [email protected] Fei-Ching Kuo Swinburne University of Technology, [email protected] Huai Liu Victoria University, [email protected] Pak-Lok Poon [email protected] Dave Towey United International College, University of Nottingham China Campus See next page for additional authors Follow this and additional works at: https://ro.uow.edu.au/eispapers1 Part of the Engineering Commons, and the Science and Technology Studies Commons Recommended Citation Chen, Tsong Yueh; Kuo, Fei-Ching; Liu, Huai; Poon, Pak-Lok; Towey, Dave; Tse, T. H.; and Zhou, Zhi Quan, "Metamorphic Testing: A Review of Challenges and Opportunities" (2018). Faculty of Engineering and Information Sciences - Papers: Part B. 975. https://ro.uow.edu.au/eispapers1/975 Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: [email protected] Metamorphic Testing: A Review of Challenges and Opportunities Abstract Metamorphic testing is an approach to both test case generation and test result verification. A central element is a set of metamorphic relations, which are necessary properties of the target function or algorithm in relation to multiple inputs and their expected outputs. Since its first publication, we have witnessed a rapidly increasing body of work examining metamorphic testing from various perspectives, including metamorphic relation identification, test case generation, integration with other software engineering techniques, and the validation and evaluation of software systems.
    [Show full text]
  • Metamorphic Testing: Testing the Untestable
    This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/MS.2018.2875968 Metamorphic Testing: Testing the Untestable Sergio Segura Department of Computer Languages and Systems, University of Seville, Spain. Dave Towey School of Computer Science, University of Nottingham Ningbo China, China. Zhi Quan Zhou Institute of Cybersecurity and Cryptology, School of Computing and Information Technology, University of Wollongong, Australia. Tsong Yueh Chen Department of Computer Science and Software Engineering, Swinburne University of Technology, Australia ABSTRACT. What if we could know that a program is buggy, even if we could not tell whether or not its observed output is correct? This is one of the key strengths of metamorphic testing, a technique where failures are not revealed by checking an individual concrete output, but by checking the relations among the inputs and outputs of multiple executions of the program under test. Two decades after its introduction, metamorphic testing has become a fully-fledged testing technique with successful applications in multiple domains, including online search engines, autonomous machinery, compilers, Web APIs, and deep learning programs, among others. This article serves as a hands-on entry point for newcomers to metamorphic testing, describing examples, possible applications, and current limitations, providing readers with the basics for the application of the technique in their own projects. Keywords: Software testing, metamorphic testing, oracle problem, test case generation. Introduction Suppose you are helping your son with his homework.
    [Show full text]
  • Metamorphic Robustness Testing of Google Translate
    Postprint of article in IEEE/ACM 5th International Workshop on Metamorphic Testing (MET’20), Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops (ICSEW’20), ACM, New York, NY Metamorphic Robustness Testing of Google Translate Dickson T. S. Lee Zhi Quan Zhou T. H. Tse Department of Computer Science School of Computing and IT Department of Computer Science The University of Hong Kong University of Wollongong The University of Hong Kong Pokfulam, Hong Kong Wollongong, NSW 2522, Australia Pokfulam, Hong Kong [email protected] [email protected] [email protected] ABSTRACT or stressful environmental conditions”. While developers strive to ensure that the software performs the intended functionality cor- Current research on the testing of machine translation software rectly for valid inputs, there is often a lack of negative testing to mainly focuses on functional correctness for valid, well-formed ensure that the software can also reasonably handle invalid inputs. inputs. By contrast, robustness testing, which involves the ability Suppose we want to translate the sentence “Peter LOVES of the software to handle erroneous or unanticipated inputs, is often apples.” into Chinese using Google Translate, which is available at overlooked. In this paper, we propose to address this important https://translate.google.com. We want to stress the word “LOVE” shortcoming. Using the metamorphic robustness testing approach, and hence type it in caps. We end the sentence with a period as per we compare the translations of original inputs with those of follow- usual practice. Suppose we have made a careless mistake by typing up inputs having different categories of minor typos.
    [Show full text]
  • Testing Web Enabled Simulation at Scale Using Metamorphic Testing
    Testing Web Enabled Simulation at Scale Using Metamorphic Testing John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Maria Lomeli, Erik Meijer, Silvia Sapora, and Justin Spahr-Summers FACEBOOK Inc. Abstract—We report on Facebook’s deployment of MIA (Meta- Since the decisions made from the results of these sim- morphic Interaction Automaton). MIA is used to test Facebook’s ulations are so fundamental, their predictions and the code Web Enabled Simulation, built on a web infrastructure of on which they are based acquire paramount importance. The hundreds of millions of lines of code. MIA tackles the twin problems of test flakiness and the unknowable oracle problem. consequences of (perhaps subtle, yet pivotal) bugs are clearly It uses metamorphic testing to automate continuous integration among the most impactful facing any software engineer. In and regression test execution. MIA also plays the role of a test order to improve testing of simulation-based systems, more bot, automatically commenting on all relevant changes submitted work is needed to address the problems of test flakiness and for code review. It currently uses a suite of over 40 metamorphic the oracle problem. test cases. Even at this extreme scale, a non–trivial metamorphic test suite subset yields outcomes within 20 minutes (sufficient for Based on metamorphic testing, we have been deploying an continuous integration and review processes). Furthermore, our automated test infrastructure for our web-enabled simulation, offline mode simulation reduces test flakiness from approximately WW, at Facebook, that tackles these twin problems of oracles 50% (of all online tests) to 0% (offline).
    [Show full text]