PRELIMINARY PREPRINT VERSION: DO NOT CITE The AAAI Digital Library will contain the published version some time after the conference.

AutoText: An End-to-End AutoAI Framework for Text

Arunima Chaudhary, Alayt Issak∗, Kiran Kate, Yannis Katsis, Abel Valente, Dakuo Wang, Alexandre Evfimievski, Sairam Gurajada, Ban Kawas, Cristiano Malossi, Lucian Popa, Tejaswini Pedapati, Horst Samulowitz, Martin Wistuba, Yunyao Li IBMArchitecture Research AI {arunima.chaudhary, yannis.katsis, dakuo.wang, sairam.gurajada, martin.wistuba}@.com, [email protected], [email protected], [email protected], {kakate, evfimi, bkawas, lpopa, tejaswinip, samulowitz, yunyaoli}@us.ibm.com

Abstract Frontend Backend Building models for natural language processing (NLP) tasks Pipeline Selection & Model Tuning remains a daunting task for many, requiring significant tech- Optimizers nical expertise, efforts, and resources. In this demonstration, Labeled Initiate we present AutoText, an end-to-end AutoAI framework for Text Data Experiment text, to lower the barrier of entry in building NLP models. Au- Operators Search toText combines state-of-the-art AutoAI optimization tech- Specification niques and learning for NLP tasks into a single Data Neural extensible framework. Through its simple, yet powerful UI, Preprocessors Architecture non-AI experts (e.g., domain experts) can quickly generate Search performant NLP models with support to both control (e.g., Featurizers via specifying constraints) and understand learned models. Declarative Transformers/ Pipeline High-quality Inspect Estimators Composition Introduction Models Pipelines Recent AI advances led to rising desire of businesses and organizations to apply AI to solve their problems (Mao Figure 1: AutoText Architecture et al. 2019). However, developing performant AI models re- IBM Confidential 14 mains a tedious iterative process that includes data prepara- tion, , /model architecture se- mance: AutoText is able to learn performant models that, lection, hyperparameter tuning, training, model evaluation, in many cases, are comparable or outperform state-of-the- and comparison to other models. To lower the barrier of en- art models for the corresponding tasks/datasets. A video try in AI, the research community has looked into automat- demonstration of AutoText can be found online2. ing this process with AutoML (Hutter, Kotthoff, and Van- Related Work. Despite increasing popularity of AutoAI schoren 2019) or AutoAI (Wang et al. 2019, 2020). systems, we are not aware of any with the comprehensive We present AutoText, an extensible end-to-end AutoAI support for text offered by AutoText. Google AutoML Nat- framework, to democratize the development of Natural Lan- ural Language3 does not allow users to select model types, or guage Processing (NLP) models. Its design considerations give insights into the model produced nor into its tuning and include: (a) Usability: Non-experts can create NLP models hyper-parameter optimization (Weidele et al. 2020). Ama- (referred to as pipelines) through a GUI, while expert users 4 1 zon SageMaker Autopilot offers AutoML capabilities that can fully control the process via a Notebook UI . (b) Exten- can be applied on text, but does not provide intuitive visu- sibility: AutoText employs an extensible architecture, where alizations of pipelines and their performance; it also lacks both the components of the pipelines and the pipeline dis- more advanced capabilities for neural architecture search covery (known as optimization) algorithms are modeled as (NAS). Other systems, such as Microsoft AzureML-BERT5, exchangeable modules. (c) Comprehensiveness: AutoText focus on high-performance model building with BERT pre- offers a comprehensive approach, combining various learn- training and fine-tuning, but do not offer simpler but faster ing algorithms (incl. classical and deep models. H206 is perhaps the closest AutoML framework to learning approaches) and optimization approaches (incl. al- ours; however, it supports a limited range of learning al- gorithm selection, hyperparameter tuning, and neural archi- tecture search) into a single framework. (d) High Perfor- 2https://youtu.be/tujB0BrYlBw 3 ∗ https://cloud.google.com/natural-language/automl Work done at IBM Research; now at the College of Wooster. 4 Copyright c 2021, Association for the Advancement of Artificial https://aws.amazon.com/sagemaker/autopilot/ Intelligence (www.aaai.org). All rights reserved. 5https://azure.microsoft.com/en-us/services/machine-learning 1This demo focuses on AutoText’s GUI-based interaction. 6https://www.h2o.ai/products/h2o-automl gorithms (e.g., RF, GBM, GLM for classical ML and only These include classical ML operators (TfIdfTransformer, MLP for ), a small pipeline search space (only SVM, XGBoost, etc.) and deep learning-based operators ensembles), and limited features for NLP (Word2Vec). Fi- (BiLSTM, CNN, BERT, XLNet, DistilBERT, MLP, etc.). nally, while NAS is becoming an essential part of Au- The library can be easily extended with new operators to toML (He, Zhao, and Chu 2019), few systems focus on NLP; further increase the coverage of supported techniques. moreover, none combine NAS with exploration of alterna- tive pipelines that may include classical ML algorithms. Search Specification. AutoText supports two ways to define the pipeline search space: (1) Declarative Pipeline Compo- The AutoText Framework sition: Leveraging the declarative LALE framework (Bau- dart et al. 2020), AutoText developers can write high-level AutoText comprises a frontend for easy interaction and a specifications of the search space for each pipeline operator, backend for automated model building. We now describe as well as operator composition rules. (2) Neural architec- each of them in detail. ture search (NAS) enables automatic pipeline composition for deep learning models. Frontend Optimizers. AutoText combines into a single framework End users can initiate the building of NLP models by up- multiple optimization techniques presented in the literature, loading a labeled textual dataset and selecting the target col- including combined algorithm selection and hyperparameter umn. AutoText’s backend handles the rest of the process, 7 tuning (CASH) (Thornton et al. 2013) and NAS (Wistuba, including identifying the task type and suitable evaluation Rawat, and Pedapati 2019). This is done by integrating mul- metric, splitting the dataset into train/test, and discovering tiple optimizers, each suited to different scenarios, selecting pipelines. The best performing pipelines are then shown to between them based on heuristics. These include Hyperopt the user to inspect. This includes both a graphical represen- (Bergstra et al. 2015), Hyperband (Li et al. 2017), ADMM tation of each pipeline illustrating its components and a tab- (CASH optimizer supporting black-box constraints, such as ular leaderboard of pipelines sorted by performance. prediction latency or memory footprint) (Liu et al. 2020), Customizations. AutoText allows users to customize TAPAS (Istrate et al. 2019), and NeuRex (Wistuba 2018). through its UI many aspects of the framework, including Pipeline Selection and Model Tuning. Given a labeled the types of learning algorithms explored, the number of dataset and optional user options, AutoText employs the ap- pipelines returned, as well as constraints, such as a time bud- propriate optimizer to iteratively explore different pipeline get for pipeline discovery. More advanced users can also use configurations. During this process, the optimizer considers its Notebook interface to impose additional controls (e.g., the search specification, potential user constraints, and the override the default search space for individual operators). performance of previously explored configurations to deter- mine the next pipeline to explore. This process continues Backend until the user-specified constraints are satisfied or after a pre- The backend of AutoText includes: (1) Operator library: a determined number of iterations. The resulting pipelines are set of operators that form basic components of typical NLP then shown on the frontend for the user to inspect. pipelines, (2) Search specification, defining the pipeline search space, (3) Optimizers: a set of methods to perform Demonstration joint algorithm search and hyperparameter optimization in We will demonstrate AutoText through a variety of datasets, the defined search space, and (4) Pipeline selection and including standard benchmarks, such as MR (Pang and Lee model tuning: the main component that picks the appropriate 2005) and MPQA (Wiebe, Wilson, and Cardie 2005), and optimizer and coordinates the pipeline discovery process. additional data, such as the US Consumer Financial Com- Operator Library. AutoText employs an extensible opera- plaints dataset8. During the demo the audience will be able tor library, with support for three main operator classes: to gain the following insights: (a) experience how Auto- • Data Preprocessors – operators that clean and/or trans- Text allows the quick generation of AI models, (b) under- form data to prepare for later processing. These include stand common components of NLP models and find out tokenization, lemmatization, part-of-speech (POS) tag- which model architectures work best for particular tasks, by ging, normalization, simplification, spell correction, etc. inspecting the discovered pipelines, and (c) explore trade- offs made by different estimators (such as the training • Featurizers – non-trainable (or pretrained) operators that time/performance trade-off of classical ML vs deep learn- compute features for use by downstream operators. These ing algorithms (Li et al. 2020)) by customizing the experi- include both classical features, such as tf-idf, and deep ment to only consider specific algorithms. Finally, the audi- learning-based features, such as embeddings generated ence will also become familiar with AutoText’s architecture by GloVe, Fast-Text, BERT, RoBERTa, and others. and understand how different AutoAI techniques presented • Transformers/Estimators – trainable operators that trans- in the literature have been incorporated into a unified end- form data to a new feature space or make final predictions. to-end AutoAI framework capable of generating performant NLP models. 7AutoText currently supports binary and multi-class classifica- tion and will be extended in the future with additional NLP tasks, 8https://www.kaggle.com/cfpb/us-consumer-finance- such as entity extraction, relationship extraction, and others. complaints References Wang, D.; Weisz, J. D.; Muller, M.; Ram, P.; Geyer, W.; Baudart, G.; Hirzel, M.; Kate, K.; Ram, P.; and Shinnar, A. Dugan, C.; Tausczik, Y.; Samulowitz, H.; and Gray, A. 2019. 2020. Lale: Consistent Automated Machine Learning. In Human-AI Collaboration in Data Science: Exploring Data KDD Workshop on Automation in Machine Learning (Au- Scientists’ Perceptions of Automated AI. Proceedings of toML@KDD). URL https://arxiv.org/abs/2007.01977. the ACM on Human-Computer Interaction 3(CSCW): 1–24. Bergstra, J.; Komer, B.; Eliasmith, C.; Yamins, D.; and Cox, Weidele, D. K. I.; Weisz, J. D.; Oduor, E.; Muller, M.; An- D. D. 2015. Hyperopt: a Python Library for Model Selection dres, J.; Gray, A.; and Wang, D. 2020. AutoAIViz: opening and Hyperparameter Optimization. Computational Science the blackbox of automated artificial intelligence with condi- & Discovery 8(1). URL http://dx.doi.org/10.1088/1749- tional parallel coordinates. In Proceedings of the 25th In- 4699/8/1/014008. ternational Conference on Intelligent User Interfaces, 308– 312. He, X.; Zhao, K.; and Chu, X. 2019. AutoML: A Survey of the State-of-the-Art. ArXiv abs/1908.00709. Wiebe, J.; Wilson, T.; and Cardie, C. 2005. Annotating ex- pressions of opinions and emotions in language. Language Hutter, F.; Kotthoff, L.; and Vanschoren, J. 2019. Automated resources and evaluation 39(2-3): 165–210. machine learning: methods, systems, challenges. Springer Nature. Wistuba, M. 2018. Deep Learning Architecture Search by Neuro--Based Evolution with Function-Preserving Mu- Istrate, R.; Scheidegger, F.; Mariani, G.; Nikolopoulos, tations. In ECML/PKDD. D. S.; Bekas, C.; and Malossi, A. C. I. 2019. TAPAS: Train- less Accuracy Predictor for Architecture Search. In AAAI. Wistuba, M.; Rawat, A.; and Pedapati, T. 2019. A Survey on Neural Architecture Search. CoRR abs/1905.01392. URL Li, J.; Li, Y.; Wang, X.; and Tan, W.-C. 2020. Deep or Sim- http://arxiv.org/abs/1905.01392. ple Models for Semantic Tagging? It Depends on your Data. Proceedings of the VLDB Endowment 13(11). Li, L.; Jamieson, K.; DeSalvo, G.; Rostamizadeh, A.; and Talwalkar, A. 2017. Hyperband: A novel bandit-based ap- proach to hyperparameter optimization. The Journal of Ma- chine Learning Research 18(1): 6765–6816. Liu, S.; Ram, P.; Vijaykeerthy, D.; Bouneffouf, D.; Bram- ble, G.; Samulowitz, H.; Wang, D.; Conn, A.; and Gray, A. G. 2020. An ADMM Based Framework for AutoML Pipeline Configuration. In Conference on Artificial Intelli- gence (AAAI), 4892–4899. URL https://aaai.org/ojs/index. php/AAAI/article/view/5926. Mao, Y.; Wang, D.; Muller, M.; Varshney, K. R.; Baldini, I.; Dugan, C.; and Mojsilovic,´ A. 2019. How Data Scientists Work Together With Domain Experts in Scientific Collab- orations: To Find The Right Answer Or To Ask The Right Question? Proceedings of the ACM on Human-Computer Interaction 3(GROUP): 1–23. Pang, B.; and Lee, L. 2005. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), 115–124. Ann Arbor, Michigan: Association for Compu- tational Linguistics. doi:10.3115/1219840.1219855. URL https://www.aclweb.org/anthology/P05-1015. Thornton, C.; Hutter, F.; Hoos, H. H.; and Leyton-Brown, K. 2013. Auto-WEKA: Combined selection and hyperparam- eter optimization of classification algorithms. In Proceed- ings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 847–855. Wang, D.; Ram, P.; Weidele, D. K. I.; Liu, S.; Muller, M.; Weisz, J. D.; Valente, A.; Chaudhary, A.; Torres, D.; Samu- lowitz, H.; et al. 2020. AutoAI: Automating the End-to-End AI Lifecycle with Humans-in-the-Loop. In Proceedings of the 25th International Conference on Intelligent User Inter- faces Companion, 77–78.