Author Guidelines for 8
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the 53rd Hawaii International Conference on System Sciences | 2020 Easy and Efficient Hyperparameter Optimization to Address Some Artificial Intelligence “ilities” Trevor J. Bihl Joseph Schoenbeck Daniel Steeneck & Jeremy Jordan Air Force Research Laboratory, USA The Perduco Group, USA Air Force Institute of Technology, USA [email protected] [email protected] {Daniel.Steeneck; Jeremy.Jordan}@afit.edu hyperparameters. This is a complex trade space due to Abstract ML methods being brittle and not robust to conditions outside of those on which they were trained. While Artificial Intelligence (AI), has many benefits, attention is now given to hyperparameter selection [4] including the ability to find complex patterns, [5], in general, as mentioned in Mendenhall [6], there automation, and meaning making. Through these are “no hard-and-fast rules” in their selection. In fact, benefits, AI has revolutionized image processing their selection is part of the “art of [algorithm] design” among numerous other disciplines. AI further has the [6], as appropriate hyperparameters can depend potential to revolutionize other domains; however, this heavily on the data under consideration itself. Thus, will not happen until we can address the “ilities”: ML methods themselves are often hand-crafted and repeatability, explain-ability, reliability, use-ability, require significant expertise and talent to appropriately train and deploy. trust-ability, etc. Notably, many problems with the “ilities” are due to the artistic nature of AI algorithm development, especially hyperparameter determination. AI algorithms are often crafted products with the hyperparameters learned experientially. As such, when applying the same Accuracy w w w w algorithm to new problems, the algorithm may not w* w* w* w* w* perform due to inappropriate settings. This research a) ε = ε b) ε << ε c) ε < ε d) ε < ε e) ε << ε aims to provide a straightforward and reliable opt opt opt opt opt approach to automatically determining suitable Figure 1. Conceptualization of a general hyperparameter settings when given an AI algorithm. hyperparameter problem: adapted from [2, pp. Results, show reasonable performance is possible and 312-313]. end-to-end examples are given for three deep learning A conceptual example of this problem is presented algorithms and three different data problems. in Figure 1, adapted and extended from [2, pp. 312- 313]. Here, one is attempting to optimize the learning rate ε, i.e., the rate at which an algorithm converges to 1. Introduction a good solution (w*). Ideally, one would want to find ε = εopt, the optimal rate, depicted in Figure 1a, but this Analytics and machine learning (ML), is largely impossible to find for any meaningful colloquially termed Artificial Intelligence (AI) [1], are problems due to existence of multiple local optima. A becoming increasingly ubiquitous for classification slower, sub-optimal, convergence rate is good when ε and prediction across a broad spectrum of applications is much smaller than εopt, Figure 1b, but this can take due to their ability to learn nonlinear patterns in data a long time (100s of hours or more in today’s deep [1]. Fundamentally, AI/ML are complex algorithms learning systems) to train. A reasonable, sub-optimal that automate procedures based on statistics and rate, when ε < εopt, Figure 1c, can be ideal since nonlinear optimization [1] [2]. However, as a result of convergence is relatively quick and performance is their complexity, issues exist in broadly adopting AI stable. However, when the learning rate increases solutions [3]. above the optimal rate, Figures 1d for ε > εopt and Throughout the space of AI/ML, users must not Figure 1e for ε much larger than εopt, highly oscillatory only decide which algorithms to use, but the settings behaviors can be introduced that bound around local, for the selected algorithm, also known as or global, optima. However, the example in Figure 1 URI: https://hdl.handle.net/10125/63857 978-0-9981331-3-3 Page 943 (CC BY-NC-ND 4.0) is simple and conceptual; due to the nature of both data knowledge. When a discipline moves to more and algorithms, there is a naturally complex established procedures, production, and developing relationship between hyperparameter settings and applications for sales, then one has moved to a results. commercial stage. However, even at this stage, a lack The overall opacity of algorithms, required of sound methodologies can limit reproducibility. knowledge in tricks of the trade, and the general When one adds scientific approaches to a discipline, misunderstandings how algorithms in general work one can take commercial art to professional science. are pervasive [7]. Due to these factors, as noted in [8], At the professional engineering stage, many of the issues exist in using ML solutions due to the inability “ilities” are naturally met since understanding exists of addressing the ML “ilities” [9], e.g., the reliability, about the underlying approaches. As noted in [9], ML repeatability [8] [3], replicability [3], trust-ability [10], development can yield highly sophisticated and explain-ability [11] of the algorithms. In general, commercial products yet still be a result of craft and the “ilities” of can also be further expanded to include not science as ML requires significant experience to general software quality metrics, e.g. ISO 9126, of: get meaningful results. In contrast, professional functionality, reliability, usability, efficiency, engineering disciplines have established maintainability, and portability [12]. methodologies to rigorously develop products [9]. One The vast majority of these “ilities” are relate to such approach for this problem in ML is developing a typical questions users ask of automation [13]: defined process from which to determine the • What is it doing? hyperparameters of interest. • Why is it doing that? While approaches, such as CRISP-DM [15] • What will it do next? (CRoss-Industry Standard Process for Data Mining), Many of these questions are related to ad hoc provide general end-to-end (business concept to algorithm development and deployment methods and deployment) processes for develop data mining a general misunderstanding of capabilities and solutions, these are high level in nature and do not operations [9]. To understand the “ilities”, and address the complex space of ML algorithm decisions. likewise address these general questions, [9] presents What is missing is a general approach that spans the Shaw’s [14] software engineering framework as a Data, Modeling, and Evaluation layers of CRISP-DM general model of maturity of engineering disciplines, with hyperparameter optimization which enables one presented in Figure 2 which identifies three stages in a to 1) select a dataset, 2) select an AI algorithm from field’s development: craft, commercial, and literature or a library, and then 3) automatically professional engineering. determine workable hyperparameter settings without expert algorithmic knowledge. Science While general solutions to this problem already Professional exist, they are either 1) automated cloud-based Production Engineering approaches which provide almost unlimited 1 Commercial computing resources , 2) automated methods which provide a predefined list of algorithms [16], or 3) Craft theoretical approaches to find optimal hyperparameter • Virtuosos and • Skilled craftsmen • Educated settings [5]. Each of these solutions presents a talented amateurs • Established procedure professionals challenge: 1) cloud-based solutions introduce security • Intuition and brute • Pragmatic refinement • Analysis and theory force • Training in • Progress relies on issues can exist if one wishes to analyze proprietary or • Haphazard progress mechanics science secured data, 2) automated approaches can be limited • Casual transmission • Economic concern for • Educated in their list of available algorithms, and 3) theoretical • Extravagant use of cost and supply of professional class available materials materials • Enabling new optimization approaches require additional subject • Manufacture for use • Manufacture for sale applications through matter expertise. For example, Snoek et al. [5] propose rather than sale analysis Bayesian Optimization (BO) for AI hyperparameter • Market segmentation optimization; however, implementing [5] requires by product variety Figure 2. Shaw’s Model of the evolution of advanced knowledge of mathematics and algorithms. engineering disciplines, from [14]. The contributions of our paper is fourfold. First, we present a framework to automatically tune AI As illustrated in Figure 2, craft involves highly algorithm hyperparameters, extending upon [5] to experiential work, which involves tricks of the trade, create a simple and straightforward process when one and various and haphazard approaches to transmit is given a general AI algorithm. Next, we show how 1 For example, DataRobot (https://www.datarobot.com/), AutoML (http://www.automl.org/), and SigOpt (https://sigopt.com/) Page 944 this approach provides workable results on different extracting fully explainable results [11], we posit that ML recognition tasks. Third, we draw on software what is of interest for most applications are trustable engineering body of knowledge, incorporate CRISP- and reliable AI. Thus, one can consider AI/ML DM and show how this approach facilitates ML solutions in a similar way as service dogs which, moving from craft based implementations to more though reliable