Continuous Delivery for Machine Learning on AWS

MLOps: Continuous Delivery for Machine Learning on AWS December 21, 2020 Notices Customers are responsible for making their own independent assessment of the information in this document. This document: (a) is for informational purposes only, (b) represents current AWS product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers. © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved. Contents Introduction .......................................................................................................................... 1 Continuous delivery for machine learning ....................................................................... 2 The different process steps of CD4ML ............................................................................ 3 The technical components of CD4ML ............................................................................. 6 AWS Solutions ..................................................................................................................... 8 Alteryx .................................................................................................................................. 8 Data governance and curation ......................................................................................... 9 Machine learning experimentation ................................................................................. 10 Productionized ML pipelines .......................................................................................... 13 Model serving and deployment ...................................................................................... 15 Model testing and quality ............................................................................................... 17 Continuous improvement ............................................................................................... 17 Alteryx: Your journey ahead........................................................................................... 18 Dataiku DSS ...................................................................................................................... 18 Access, understand, and clean data ............................................................................. 19 Build machine learning ................................................................................................... 22 Deploy machine learning ............................................................................................... 24 Dataiku DSS: Your journey ahead ................................................................................. 26 Domino Data Lab ............................................................................................................... 27 Introduction ..................................................................................................................... 27 Enterprise data science workflows ................................................................................ 28 Domino: Your journey ahead ......................................................................................... 37 KNIME ................................................................................................................................ 38 KNIME Software: creating and productionizing data science ....................................... 38 KNIME: Your journey ahead .......................................................................................... 46 AWS reference architecture .............................................................................................. 46 Model building ................................................................................................................ 46 Productionize the model ................................................................................................ 51 Testing and quality ......................................................................................................... 54 Deployment .................................................................................................................... 56 Monitoring and observability and closing the feedback loop ........................................ 58 High-level AI services ..................................................................................................... 59 AWS: Your journey ahead ............................................................................................. 61 Conclusion ......................................................................................................................... 61 Contributors ....................................................................................................................... 62 Resources .......................................................................................................................... 62 Document Revisions.......................................................................................................... 64 Abstract Artificial intelligence (AI) is expanding into standard business processes, resulting in increased revenue and reduced costs. As AI adoption grows, it becomes increasingly important for AI and machine learning (ML) practices to focus on production quality controls. Productionizing ML models introduces challenges that span organizations and processes, involving the integration of new and incumbent technologies. This whitepaper outlines the challenge of productionizing ML, explains some best practices, and presents solutions. ThoughtWorks, a global software consultancy, introduces the idea of MLOps as continuous delivery for machine learning. The rest of the whitepaper details solutions from AWS, Alteryx, Dataiku, Domino Data Lab, and KNIME. Are you Well-Architected? The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems on AWS. Using the Framework allows you to learn architectural best practices for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. In the Machine Learning Lens, we focus on how to design, deploy, and architect your machine learning workloads in the AWS Cloud. This lens adds to the best practices described in the Well-Architected Framework. Amazon Web Services MLOps: Continuous Delivery for Machine Learning on AWS Introduction by Christoph Windheuser, Global Head of Artificial Intelligence, ThoughtWorks Danilo Sato, Head of Data & AI Services UK, ThoughtWorks In modern software development, continuous delivery (CD) principles and practices have significantly improved the throughput of delivering software to production in a safe, continuous, and reliable way and helped to avoid big, disruptive, and error prone deployments. After machine learning (ML) techniques showed that they can provide significant value, organizations started to get serious about using these new technologies and tried to get them deployed to production. However, people soon realized that training and running a machine learning model on a laptop is completely different than running it in a production IT environment. A common problem is having models that only work in a lab environment and never leave the proof-of-concept phase. Nucleus Research published a 2019 report, where they analyzed 316 AI projects in companies ranging from 20- person startups to Fortune 100 global enterprises. They found that only 38% of AI projects made it to production. Further, projects that made it to production did so in a manual ad hoc way, often then becoming stale and hard to update. Creating a process to operationalize machine learning systems enables organizations to leverage the new and endless opportunities of machine learning to optimize processes and products. However, it also brings new challenges. Using ML models in software development makes it difficult to achieve versioning, quality control, reliability, reproducibility, explainability, and audibility in that process. This happens because there are a higher number of changing artifacts to be managed in addition to the software code, such as the datasets, the machine learning models, and the parameters and hyperparameters used by such models. And the size and portability of such artifacts can be orders of magnitude higher than the software code. There are also organizational challenges. Different teams might own different parts of the process and have their own ways of working. Data engineers might be building pipelines to make data accessible, while data scientists can be researching and exploring better models. Machine learning engineers or developers then have to worry about how to integrate that model and release it to production. When these groups work in separate siloes, there is a high risk of creating friction in the process and delivering suboptimal results. 1 Amazon Web Services MLOps: Continuous Delivery for Machine Learning on AWS Figure 1 – The different personas usually involved in machine learning projects. ThoughtWorks, an Advanced APN Consulting Partner, has more than 25 years of experience

Load more