Losing Confidence in Quality: Unspoken Evolution of Computer Vision Services

Alex Cummaudo∗, Rajesh Vasa∗, John Grundy†, Mohamed Abdelrazek‡ and Andrew Cain‡ ∗ Applied Artificial Intelligence Institute, Deakin University, Geelong, Australia † Faculty of Information Technology, Monash University, Clayton, Australia ‡ School of Information Technology, Deakin University, Geelong, Australia { ca, rajesh.vasa, mohamed.abdelrazek, andrew.cain }@deakin.edu.au, [email protected]

Abstract—Recent advances in artificial intelligence (AI) and ML typically involves use of statistical techniques that yield machine learning (ML), such as computer vision, are now avail- components with a non-deterministic external behaviour; that able as intelligent services and their accessibility and simplicity is, for the same given input, different outcomes may result. is compelling. Multiple vendors now offer this technology as services and developers want to leverage these advances However, developers, in general, are used to libraries and to provide value to end-users. However, there is no firm inves- small components behaving predictably, while systems that tigation into the maintenance and evolution risks arising from rely on ML techniques work on confidence intervals1 and use of these intelligent services; in particular, their behavioural probabilities. For example, the developer’s mindset suggests consistency and transparency of their functionality. We evaluated that an image of a border collie—if sent to three intelligent the responses of three different intelligent services (specifically computer vision) over 11 months using 3 different data sets, computer vision services—would return the label ‘dog’ consis- verifying responses against the respective documentation and as- tently with time regardless of which service is used. However, sessing evolution risk. We found that there are: (1) inconsistencies one service may yield the specific dog breed, ‘border collie’, in how these services behave; (2) evolution risk in the responses; another service may yield a permutation of that breed, ‘collie’, and (3) a lack of clear communication that documents these risks and another may yield broader results, such as ‘animal’; each and inconsistencies. We propose a set of recommendations to both 2 developers and intelligent service providers to inform risk and with results of varying confidence values. Furthermore, the assist maintainability. third service may evolve with time, and thus learn that the ‘animal’ is actually a ‘dog’ or even a ‘collie’. The outcomes Keywords-machine learning, intelligent service, computer vi- sion, quality assurance, evolution risk, documentation are thus behaviourally inconsistent between services provid- ing conceptually similar functionality. As a thought exercise, consider if the sub-string function were created using ML I.INTRODUCTION techniques—it would perform its operation with a confidence The availability of intelligent services has made AI tooling where the expected outcome and the AI inferred output match accessible to software developers and promises a lower entry as a probability, rather than a deterministic (constant) outcome. barrier for their utilisation. Consider state-of-the-art computer How would this affect the developers’ approach to using such vision analysers, which require either manually training a a function? Would they actively take into consideration the deep-learning classifier, or selecting a pre-trained model and non-deterministic nature of the result? deploying these into an appropriate infrastructure. Either are Myriad software quality models and SE practices advocate laborious in time, and require non-trivial expertise along with maintainability and reliability as primary characteristics; sta- a large data set when training or customisation is needed. bility, testability, fault tolerance, changeability and maturity In contrast, intelligent servi