Software Engineering for Machine-Learning Applications
Total Page:16
File Type:pdf, Size:1020Kb
Editor: Giuliano Antoniol Polytechnique Montréal INVITED CONTENT [email protected] Editor: Steve Counsell Brunel University Editor: Phillip Laplante [email protected] Pennsylvania State University [email protected] Software Engineering for Machine-Learning Applications The Road Ahead Foutse Khomh, Bram Adams, Jinghui Cheng, Marios Fokaefs, and Giuliano Antoniol THE NEED AND desire for more auto- to address these challenges. In fact, experts could come together to dis- mation and intelligence have led to the learned behavior of an ML-based cuss challenges, new insights, and breakthroughs in machine learning system might be incorrect, even if practical ideas regarding the engi- (ML) and artifi cial intelligence (AI), the learning algorithm is imple- neering of ML- and AI-based sys- yet we still experience failures and mented correctly, a situation in tems. The program included talks shortcomings in the resulting soft- which traditional testing techniques and panels presented by renowned ware systems. The main reason is the are ineffective. A critical problem is academic researchers and indus- shift in the development paradigm in- how to effectively develop, test, and trial practitioners, including keynote duced by ML and AI. Traditionally, evolve such systems, given that they speakers David Parnas, Lionel Briand, software systems are constructed don’t have (complete) specifi cations and Yoshua Bengio. The full pro- deductively, by writing down the or even source code corresponding gram is at http://semla.polymtl.ca. rules that govern the system behav- to some of their critical behaviors. Here, we summarize some key chal- iors as program code. However, Motivated by these challenges, we lenges these experts identifi ed. with ML techniques, these rules are organized the First Symposium on inferred from training data (from Software Engineering for Machine System Accuracy which the requirements are gener- Learning Applications (SEMLA) at The fi rst topic concerned the accu- ated inductively). This paradigm Polytechnique Montréal on 12 and racy of systems built using ML and shift makes reasoning about the be- 13 June 2018, with the kind support AI models, and the responsibilities of havior of software systems with ML of Polytechnique Montréal’s Depart- engineers building them. For exam- components diffi cult, resulting in ment of Computer Engineering and ple, one keynote speaker mentioned software systems that are intrinsi- Software Engineering, the Institute three categories of AI research: cally challenging to test and verify. for Data Valorization (IVADO), SAP, Given the critical and increasing and Red Hat. The event attracted • building programs that imitate role of ML- and AI-based systems around 160 participants from all over human behavior to better under- in our society, it’s imperative for the world, including students, aca- stand human thinking (used in both the software engineering (SE) demics, and industrial practitioners. psychology research), and ML communities to research SEMLA’s main objective was to • building programs that play games and develop innovative approaches create a space in which SE and ML well (challenging and fun), and 2469-7087/19/$33.000740-7459/18/$33.00 © 2019 IEEE © 2018 IEEE Published by the IEEE Computer SocietySEPTEMBER/OCTOBER 2018 February | IEEE 2019SOFTWARE 8121 INVITED CONTENT • demonstrating that practical other domains (such as requirement systems, since an AI system’s behav- computerized products can use elicitation) are more challenging. ior might be incorrect even if the the same methods that humans Overall, AI’s full impact on SE is learning algorithms are implemented use (risky and often naive). still unclear. correctly. One keynote speaker Because of AI and ML systems’ explained how in complex cyber- He stressed that researchers should intrinsic imperfection, one panelist physical systems (CPSs), when no be very concerned about AI systems argued, only harmless AI technology clear specifications of the intended in the third category because they or applications should be released to systems exist (that is, humans have a can’t guarantee 100 percent accu- the public, since the responsibility of lot of knowledge but can’t formalize racy or correct answers in all cases. every engineer is to protect the pub- it), only AI can approximate the sys- He also raised concerns that people lic. He also mentioned that the pub- tem’s intended behavior by learning are using the Turing test to falsely lic should be informed accurately of models from the available data. claim intelligence in systems. He the AI technology it’s being exposed This is a clear improvement over commented, “Turing did not claim to. For example, instead of touting a the manual design of models and that his test was a test for artificial “100 percent self-driving car,” auto- controllers. However, it pushes most intelligence!” motive companies should advertise of the risk toward the trained models’ In response, a leading AI expert their products as “AI-assisted cars,” quality. So, how can we perform stated that AI’s goal is not to achieve with a clear list of the ways in which adequate quality assurance (QA) of 100 percent accuracy because AI is assisting. AI models, given that the number Another panelist emphasized that of environments in which the mod- • humans are also far from 100 AI isn’t a panacea. He illustrated els will be deployed is unlimited and percent accuracy in their daily how simple techniques could give that the human operator will re- tasks, and the illusion of AI, or how the blind quire a detailed explanation of any • AI technology’s strength comes application of AI wouldn’t improve failures? from the ability to abstract up the workflow of workers. For ex- Fortunately, we can use AI tech- from different factors of varia- ample, in principle, an intelligent nology to reduce the search space of tion between environments, to robot could easily replace a human the environments to be tested, nudg- obtain models that can general- worker to hand another worker the ing QA techniques to those environ- ize and transfer to situations that right tool for a given job, but not if ments most likely to have failures or weren’t encountered before. the worker afterward throws the violate important safety constraints. tool back on a pile. (The robot will Such an approach could even work He further explained that AI tech- have a hard time retrieving the right in the system-of-systems context of nologies’ main challenge is the curse tool from an unordered pile.) How- CPSs, where each sensor and actua- of dimensionality—that is, the need ever, using an intelligent robot to tor must be validated not only in iso- for sufficient, labeled data to cover return tools in an ordered fashion lation but also in close integration all important factors (features) of (which is a different problem) could with each other. a given problem. AI, in fact, needs allow other robots later on to be de- However, this QA doesn’t guard more training data than humans do! ployed to hand over tools to work- against hardware failure. So, hard- Whereas the key properties of ers. If a traditional computer science ware systems should incorporate techniques such as deep learning algorithm can solve a problem, we fault-tolerance mechanisms to cope (for example, compositionality, en- should just use that. with such failures. One audience par- coding into a simpler domain, and ticipant also observed that hardware conditional computation) aim to re- System Testing could incorporate fault-tolerance duce dimensionality’s impact, appli- The second hot topic our experts dis- mechanisms to mitigate the effect of cations of AI still risk being limited cussed was the difficulty of testing AI model errors, improving AI sys- to domains in which labeled data ML and AI systems. Our panelists tems’ robustness. is cheap. Although labeled data is debated whether we should tackle Another major challenge is that somehow abundant in some SE do- the testing of those systems the same humans, once they’ve started trust- mains (such as defect prediction), way we do the testing of traditional ing AI in their daily tasks, could 2282 IEEE SOFTWAREComputingEdge | WWW.COMPUTER.ORG/SOFTWARE | @IEEESOFTWARE February 2019 INVITED CONTENT INVITED CONTENT • demonstrating that practical other domains (such as requirement systems, since an AI system’s behav- computerized products can use elicitation) are more challenging. ior might be incorrect even if the the same methods that humans Overall, AI’s full impact on SE is learning algorithms are implemented FOUTSE KHOMH is an associate professor MARIOS FOKAEFS is an assistant profes- use (risky and often naive). still unclear. correctly. One keynote speaker at Polytechnique Montréal, where he leads the sor in Polytechnique Montréal’s Department of Because of AI and ML systems’ explained how in complex cyber- SWAT (Software Analytics and Technology) Lab. Computer Engineering and Software Engineer- He stressed that researchers should intrinsic imperfection, one panelist physical systems (CPSs), when no Contact him at [email protected]. ing. Contact him at [email protected]. be very concerned about AI systems argued, only harmless AI technology clear specifications of the intended in the third category because they or applications should be released to systems exist (that is, humans have a can’t guarantee 100 percent accu- the public, since the responsibility of lot of knowledge but can’t formalize racy or correct answers in all cases. every engineer is to protect the pub- it), only AI can approximate the sys- He also raised concerns that people lic. He also mentioned that the pub- tem’s intended behavior by learning BRAM ADAMS is an associate professor at GIULIANO ANTONIOL is a professor of soft- are using the Turing test to falsely lic should be informed accurately of models from the available data. Polytechnique Montréal, where he leads the ware engineering in Polytechnique Montréal’s claim intelligence in systems.