Enabling Continual Learning in Deep Learning Serving Systems

ModelCI-e: Enabling Continual Learning in Deep Learning Serving Systems Yizheng Huang∗ Huaizheng Zhang∗ Yonggang Wen Nanyang Technological University Nanyang Technological University Nanyang Technological University [email protected] [email protected] [email protected] Peng Sun Nguyen Binh Duong TA SenseTime Group Limited Singapore Management University [email protected] [email protected] ABSTRACT streamline model deployment and provide optimizing mechanisms MLOps is about taking experimental ML models to production, (e.g., batching) for higher efficiency, thus reducing inference costs. i.e., serving the models to actual users. Unfortunately, existing ML We note that existing DL serving systems are not able to handle serving systems do not adequately handle the dynamic environ- the dynamic production environments adequately. The main reason ments in which online data diverges from offline training data, comes from the concept drift [8, 10] issue - DL model quality is resulting in tedious model updating and deployment works. This tied to offline training data and will be stale soon after deployed paper implements a lightweight MLOps plugin, termed ModelCI-e to serving systems. For instance, spam patterns keep changing (continuous integration and evolution), to address the issue. Specif- to avoid detection by the DL-based anti-spam models. In another ically, it embraces continual learning (CL) and ML deployment example, as fashion trends shift rapidly over time, a static model techniques, providing end-to-end supports for model updating and will result in an inappropriate product recommendation. If a serving validation without serving engine customization. ModelCI-e in- system is not able to quickly adapt to the data shift, the quality of cludes 1) a model factory that allows CL researchers to prototype inference would degrade significantly. Therefore, engineers must and benchmark CL models with ease, 2) a CL backend to automate track the system performance carefully and update the deployed and orchestrate the model updating efficiently, and 3) a web in- DL models in time. This results in much tedious, manual work in terface for an ML team to manage CL service collaboratively. Our the operations of DL serving systems. preliminary results demonstrate the usability of ModelCI-e, and Several attempts have been made to address the concept drift indicate that eliminating the interference between model updating issue, as summarized in Table 1. These efforts generally consider and inference workloads is crucial for higher system efficiency. the issue from an algorithm perspective; or the system perspective. The former primarily focuses on improving continual learning (CL), ACM Reference Format: also as referred to lifelong learning [4] to maintain model accuracy, Yizheng Huang, Huaizheng Zhang, Yonggang Wen, Peng Sun, and Nguyen while the latter (e.g., AWS SageMaker [1]) pays more attention to Binh Duong TA. 2018. ModelCI-e: Enabling Continual Learning in Deep Learning Serving Systems. In Woodstock ’18: ACM Symposium on Neural developing a serving platform equipped with tools such as perfor- Gaze Detection, June 03–05, 2018, Woodstock, NY. ACM, New York, NY, USA, mance monitors and drift detectors. However, 1) these solutions 5 pages. https://doi.org/10.1145/1122445.1122456 still require users to do cumbersome DL serving system customization works including development, validation, and deployment, 2) 1 INTRODUCTION none of them can provide full support for delivering a high-quality CL service, which requires close collaboration between model de- Machine Learning (ML), especially Deep Learning (DL), has become signers and production (e.g., DevOps) engineers, etc., and 3) how the essential component of many applications, ranging from video to efficiently orchestrate both updating (training) and inference analytics [23] to resume assessment [14]. Consequently, inference workload in an ML cluster has not been considered. arXiv:2106.03122v1 [cs.DC] 6 Jun 2021 accounts for a large proportion of ML product costs [3, 11]. To re- To fill the gaps, we propose a new system with the following duce the inference costs, both industry and academia have invested design guidelines. First, the system should be easily integrated with a lot of efforts in developing high-performance DL serving sys- existing serving systems. Second, the system should support the tems (e.g., TensorFlow Serving [18] and Clipper [5]). These systems entire model updating workflow to foster team collaboration and ∗Both authors contributed equally to this research. automate most of the labor-intensive operations. Third, the system should be efficient when training and deploying CL in practice. Permission to make digital or hard copies of all or part of this work for personal or In this paper, we describe ModelCI-e (continuous integration [22] classroom use is granted without fee provided that copies are not made or distributed and evolution), an automated and efficient plugin that can be easily for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM integrated into existing model serving systems (e.g., TensorFlow must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, Serving, Clipper). We also conduct some preliminary studies to to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. illustrate the potential challenges (e.g., interference of training and Woodstock ’18, June 03–05, 2018, Woodstock, NY inference jobs in a cluster) for the system as well as the future © 2018 Association for Computing Machinery. optimization directions. To the best of our knowledge, ModelCI-e ACM ISBN 978-1-4503-XXXX-X/18/06...$15.00 https://doi.org/10.1145/1122445.1122456 Woodstock ’18, June 03–05, 2018, Woodstock, NY Yizheng Huang, Huaizheng Zhang, Yonggang Wen, Peng Sun, and Nguyen Binh Duong TA Table 1: Comparison of several continual learning (CL) framework. Continuous Continuous Drift CL data Model history CL algorithm Service CL algorithm Project Portability monitoring updating detection cache management benchmark validations zoo Continuum [19] X X AWS Sagemaker [1] XX Avalanche [13] XXX River [16] XX ModelCI-e XXXXXXXXX is the first holistic plugin system that can benefit both DL system suitable CL method as well as prepare the corresponding training research and operations. samples to improve the model online. These operations can build a training job that will be dispatched to suitable workers. Third, 2 SYSTEM DESIGN AND IMPLEMENTATION once the updating is accomplished, our system will call a service This section first summarizes the system workflow and then presents validator that implements many off-the-shelf testing methods to the functionalities and implementation details of the core compo- verify the updated model. If the newly updated model passes all nents in ModelCI-e, as illustrated in Fig. 1. checks, the system will replace the existing models with it. The whole process needs no human involved, while the ML team can ModelCI-e Plugin check and manage each step’s output from the web interface. Web Dashboard Python API (e.g., annotator) Data Manager CL Envelope Evaluator 2.2 System Implementation Scenario Switch Sampler SQL Generator CL Profiler Service Validator Model Factory helps model researchers prototype and evaluate CL Training Engine models quickly. It consists of three functions, model template, loss Model Factory CL Server library, and metric store. (1) Model template is implemented atop Metric Store Loss Library Drift Cache Collector Detector PyTorch-lighting and contains high-level templates which make it Model Template Historical Data Storage easy to build popular CL applications (e.g., image and text recogni- tion). (2) Loss library incorporates many popular CL loss functions deliver online data like Elastic Weight Consolidation (EWC) [9], Synaptic Intelligence ML Model Serving (SI) [21], etc. and provides Python APIs for researchers to upgrade Model Developing Online Applications (e.g., TensorFlow-Serving, Triton (using TensorFlow, PyTorch, etc) their existing training loss functions to overcome the catastrophic Inference Server) (using K8s, docker, etc) forgetting [15] in CL. (3) Metric store helps users evaluate CL services from two perspectives, learning capability, and efficiency. The Figure 1: ModelCI-e architecture. former includes metrics like BWT (backward transfer that measures the influences of a CL method on previous tasks) [6], whereas the latter contains metrics such as computational efficiency, model size 2.1 Workflow (MS) efficiency, etc. The complete workflow of developing a CL-based ML service canbe CL Server is a lightweight Python service implemented with split into two phases, the offline preparation phase and the online Flask and can be integrated with many serving engines (e.g., Tensor- orchestration phase. In the offline phase, researchers first leverage flow Serving). It implements 1) a data collector to collect andsave the built-in APIs of the model factory to customize a CL model with online request data, which will be selectively utilized for human their favorite DL frameworks (e.g., PyTorch). The model is then labeling and model upgrading, 2) an in-memory CL cache based tested and deployed to an existing serving engine like the Triton on Redis to optimize the data sampling and loading

Enabling Continual Learning in Deep Learning Serving Systems

Building Alexa Skills Using Amazon Sagemaker and AWS Lambda

Innovation on Every Front

Build Your Own Anomaly Detection ML Pipeline

Cloud Computing & Big Data

Build a Secure Enterprise Machine Learning Platform on AWS AWS Technical Guide Build a Secure Enterprise Machine Learning Platform on AWS AWS Technical Guide

Analytics Lens AWS Well-Architected Framework Analytics Lens AWS Well-Architected Framework

All Services Compute Developer Tools Machine Learning Mobile

Predictive Analytics with Amazon Sagemaker

Fraud Detection in Online Lending Using Amazon Sagemaker

Media2cloud Implementation Guide Media2cloud Implementation Guide

Aws-Mlops-Framework.Pdf

Deployment Service for Scalable Distributed Deep Learning Training on Multiple Clouds