Towards Understanding the Challenges Faced by Machine Learning Software Developers and Enabling Automated Solutions

Total Page:16

File Type:pdf, Size:1020Kb

Towards Understanding the Challenges Faced by Machine Learning Software Developers and Enabling Automated Solutions Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2020 Towards understanding the challenges faced by machine learning software developers and enabling automated solutions Md Johirul Islam Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Recommended Citation Islam, Md Johirul, "Towards understanding the challenges faced by machine learning software developers and enabling automated solutions" (2020). Graduate Theses and Dissertations. 18149. https://lib.dr.iastate.edu/etd/18149 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Towards understanding the challenges faced by machine learning software developers and enabling automated solutions by Md Johirul Islam A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Science Program of Study Committee: Hridesh Rajan, Major Professor Gianfranco Ciardo Gurpur Prabhu Jin Tian Anuj Sharma The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2020 Copyright c Md Johirul Islam, 2020. All rights reserved. ii DEDICATION To my teachers, family and friends, who made me realize the real purpose of education. iii TABLE OF CONTENTS Page LIST OF TABLES . vii LIST OF FIGURES . viii ACKNOWLEDGMENTS . .x ABSTRACT . xi CHAPTER 1. INTRODUCTION . .1 1.1 Contributions . .2 1.2 Outline . .3 CHAPTER 2. RELATED WORK . .4 2.1 Empirical Study Using Stack Overflow .............................4 2.2 Study of Challenges in ML . .5 2.3 Empirical Study of Bugs in Non-ML . .6 2.4 Empirical Study of Bugs in DNN . .6 2.5 Study of Fixing DNN Bugs . .7 2.6 Study of API-misuse Classification . .8 2.7 API Misuse Detection . .8 2.8 Model Testing . .9 CHAPTER 3. WHAT DO DEVELOPERS ASK ABOUT ML LIBRARIES? A LARGE-SCALE STUDY USING STACK OVERFLOW . 10 3.1 Introduction . 10 3.2 Methodology . 12 3.2.1 Classification of Questions . 13 3.2.2 Manual Labeling . 18 3.2.3 Threats to Validity . 19 3.3 Analysis and Results . 21 3.4 RQ1: Difficult stages . 22 3.4.1 Most difficult stage . 22 3.4.2 Data preparation . 24 3.5 RQ2: Nature of problems . 25 3.5.1 Type mismatch . 25 3.5.2 Shape mismatch . 26 3.5.3 Data Cleaning . 27 3.5.4 Model creation . 28 iv 3.5.5 Error/Exception . 29 3.5.6 Parameter selection . 30 3.5.7 Loss function selection . 31 3.5.8 Training accuracy . 32 3.5.9 Tuning parameter selection . 32 3.5.10 Correlation between libraries . 34 3.5.11 API Misuses in All ML Stages . 35 3.6 RQ3: Nature of libraries . 38 3.7 RQ4: Time consistency of difficulty . 40 3.8 Implications and Discussion . 42 3.8.1 Implications of RQ1 . 42 3.8.2 Implications of RQ2 . 43 3.8.3 Implications of RQ3 . 44 3.8.4 Implications of RQ4 . 44 3.9 Conclusion . 44 CHAPTER 4. A COMPREHENSIVE STUDY ON DEEP LEARNING BUG CHARACTERISTICS 46 4.1 Introduction . 46 4.2 Methodology . 47 4.2.1 Data Collection . 47 4.2.2 Classification . 49 4.2.3 Labeling the Bugs . 50 4.2.4 Types of Bugs in Deep Learning Software . 51 4.2.5 Classification of Root Causes of Bugs . 54 4.2.6 Classification of Effects of Bugs . 56 4.3 Frequent bug types . 57 4.3.1 Data Bugs . 57 4.3.2 Structural Logic Bugs . 59 4.3.3 API Bugs . 59 4.3.4 Bugs in Github Projects . 60 4.4 Root Cause . 61 4.4.1 Incorrect Model Parameter (IPS) . 61 4.4.2 Structural Inefficiency (SI) . 61 4.4.3 Unaligned Tensor (UT) . 63 4.4.4 Absence of Type Checking . 63 4.4.5 API Change . 64 4.4.6 Root Causes in Github Data . 64 4.4.7 Relation of Root Cause with Bug Type . 64 4.5 Impacts from Bugs . 65 4.5.1 Crash . 66 4.5.2 Bad Performance . 67 4.5.3 Incorrect Functionality . 67 4.5.4 Effects of Bugs in Github ............................... 68 4.6 Difficult Deep Learning stages . 68 4.6.1 Data Preparation . 68 4.6.2 Training Stage . 69 v 4.6.3 Choice of Model . 70 4.7 Commonality of Bug . 70 4.8 Evolution of Bugs . 72 4.8.1 Structural Logic Bugs Are Increasing . 73 4.8.2 Data Bugs Are Decreasing . 73 4.9 Threats to Validity . 74 4.10 Discussion . 75 4.11 Conclusion . 76 CHAPTER 5. REPAIRING DEEP NEURAL NETWORKS: FIX PATTERNS AND CHALLENGES 78 5.1 Introduction ..
Recommended publications
  • Intro to Tensorflow 2.0 MBL, August 2019
    Intro to TensorFlow 2.0 MBL, August 2019 Josh Gordon (@random_forests) 1 Agenda 1 of 2 Exercises ● Fashion MNIST with dense layers ● CIFAR-10 with convolutional layers Concepts (as many as we can intro in this short time) ● Gradient descent, dense layers, loss, softmax, convolution Games ● QuickDraw Agenda 2 of 2 Walkthroughs and new tutorials ● Deep Dream and Style Transfer ● Time series forecasting Games ● Sketch RNN Learning more ● Book recommendations Deep Learning is representation learning Image link Image link Latest tutorials and guides tensorflow.org/beta News and updates medium.com/tensorflow twitter.com/tensorflow Demo PoseNet and BodyPix bit.ly/pose-net bit.ly/body-pix TensorFlow for JavaScript, Swift, Android, and iOS tensorflow.org/js tensorflow.org/swift tensorflow.org/lite Minimal MNIST in TF 2.0 A linear model, neural network, and deep neural network - then a short exercise. bit.ly/mnist-seq ... ... ... Softmax model = Sequential() model.add(Dense(256, activation='relu',input_shape=(784,))) model.add(Dense(128, activation='relu')) model.add(Dense(10, activation='softmax')) Linear model Neural network Deep neural network ... ... Softmax activation After training, select all the weights connected to this output. model.layers[0].get_weights() # Your code here # Select the weights for a single output # ... img = weights.reshape(28,28) plt.imshow(img, cmap = plt.get_cmap('seismic')) ... ... Softmax activation After training, select all the weights connected to this output. Exercise 1 (option #1) Exercise: bit.ly/mnist-seq Reference: tensorflow.org/beta/tutorials/keras/basic_classification TODO: Add a validation set. Add code to plot loss vs epochs (next slide). Exercise 1 (option #2) bit.ly/ijcav_adv Answers: next slide.
    [Show full text]
  • Ways to Use Machine Learning Approaches for Software Development
    Ways to use Machine Learning approaches for software development Nicklas Jonsson Nicklas Jonsson VT 2018 Examensarbete, 30 hp Supervisor: Eddie wadbro Extern Supervisor: C4 Contexture Examiner: Henrik Bjorklund¨ Master of Science Programme in Computing Science and Engineering, 300 hp Abstract With the rise of machine learning and in particular deep learning enter- ing all different types of fields, including software development. It could be a bit hard to know where to begin to search for the tools when some- one wants to use machine learning for a one’s problems. This thesis has looked at some available technologies of today for applying machine learning to one’s applications. This thesis has looked at some of the available cloud services, frame- works, and libraries for machine learning and it presents three different implementation structures that can be used with these technologies for the problem of image classification. Acknowledgements I want to thank C4 Contexture for giving me the thesis idea, support, and supplying me with a working station. I also want to thank Lantmannen¨ for supplying me with the image data that was used for this thesis. Finally, I want to thank Eddie Wadbro for his guidance during this thesis and of course a big thanks to my family and friends for their support during this period of my life. 1(45) Content 1 Introduction 3 1.1 The Client 3 1.1.1 C4 Contexture PIM software 4 1.2 The data 4 1.3 Goal 5 1.4 Limitation 5 2 Background 7 2.1 Artificial intelligence, machine learning and deep learning.
    [Show full text]
  • Keras2c: a Library for Converting Keras Neural Networks to Real-Time Compatible C
    Keras2c: A library for converting Keras neural networks to real-time compatible C Rory Conlina,∗, Keith Ericksonb, Joeseph Abbatec, Egemen Kolemena,b,∗ aDepartment of Mechanical and Aerospace Engineering, Princeton University, Princeton NJ 08544, USA bPrinceton Plasma Physics Laboratory, Princeton NJ 08544, USA cDepartment of Astrophysical Sciences at Princeton University, Princeton NJ 08544, USA Abstract With the growth of machine learning models and neural networks in mea- surement and control systems comes the need to deploy these models in a way that is compatible with existing systems. Existing options for deploying neural networks either introduce very high latency, require expensive and time con- suming work to integrate into existing code bases, or only support a very lim- ited subset of model types. We have therefore developed a new method called Keras2c, which is a simple library for converting Keras/TensorFlow neural net- work models into real-time compatible C code. It supports a wide range of Keras layers and model types including multidimensional convolutions, recurrent lay- ers, multi-input/output models, and shared layers. Keras2c re-implements the core components of Keras/TensorFlow required for predictive forward passes through neural networks in pure C, relying only on standard library functions considered safe for real-time use. The core functionality consists of ∼ 1500 lines of code, making it lightweight and easy to integrate into existing codebases. Keras2c has been successfully tested in experiments and is currently in use on the plasma control system at the DIII-D National Fusion Facility at General Atomics in San Diego. 1. Motivation TensorFlow[1] is one of the most popular libraries for developing and training neural networks.
    [Show full text]
  • Lecture 6 Learned Feedforward Visual Processing Neural Networks, Deep Learning, Convnets
    William T. Freeman, Antonio Torralba, 2017 Lecture 6 Learned feedforward visual processing Neural Networks, Deep learning, ConvNets Some slides modified from R. Fergus We need translation invariance Lots of useful linear filters… Laplacian Gaussian derivative Gaussian Gabor And many more… High order Gaussian derivatives We need translation and scale invariance Lots of image pyramids… Gaussian Pyr Laplacian Pyr And many more: QMF, steerable, … We need … What is the best representation? • All the previous representation are manually constructed. • Could they be learnt from data? A brief history of Neural Networks enthusiasm time Perceptrons, 1958 Rosenblatt http://www.ecse.rpi.edu/homepages/nagy/PDF_chrono/2011_Na gy_Pace_FR.pdf. Photo by George Nagy 9 http://www.manhattanrarebooks-science.com/rosenblatt.htm Perceptrons, 1958 10 Perceptrons, 1958 enthusiasm time Minsky and Papert, Perceptrons, 1972 12 Perceptrons, 1958 enthusiasm Minsky and Papert, 1972 time Parallel Distributed Processing (PDP), 1986 14 XOR problem Inputs Output 0 0 0 1 0 1 0 1 1 0 1 1 1 0 0 1 PDP authors pointed to the backpropagation algorithm as a breakthrough, allowing multi-layer neural networks to be trained. Among the functions that a multi-layer network can represent but a single-layer network cannot: the XOR function. 15 Perceptrons, PDP book, 1958 1986 enthusiasm Minsky and Papert, 1972 time LeCun conv nets, 1998 Demos: http://yann.lecun.com/exdb/lenet/index.html 17 18 Neural networks to recognize handwritten digits? yes Neural networks for tougher problems? not really http://pub.clement.farabet.net/ecvw09.pdf 19 NIPS 2000 • NIPS, Neural Information Processing Systems, is the premier conference on machine learning.
    [Show full text]
  • Comparative Study of Deep Learning Software Frameworks
    Comparative Study of Deep Learning Software Frameworks Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, Mohak Shah Research and Technology Center, Robert Bosch LLC {Soheil.Bahrampour, Naveen.Ramakrishnan, fixed-term.Lukas.Schott, Mohak.Shah}@us.bosch.com ABSTRACT such as dropout and weight decay [2]. As the popular- Deep learning methods have resulted in significant perfor- ity of the deep learning methods have increased over the mance improvements in several application domains and as last few years, several deep learning software frameworks such several software frameworks have been developed to have appeared to enable efficient development and imple- facilitate their implementation. This paper presents a com- mentation of these methods. The list of available frame- parative study of five deep learning frameworks, namely works includes, but is not limited to, Caffe, DeepLearning4J, Caffe, Neon, TensorFlow, Theano, and Torch, on three as- deepmat, Eblearn, Neon, PyLearn, TensorFlow, Theano, pects: extensibility, hardware utilization, and speed. The Torch, etc. Different frameworks try to optimize different as- study is performed on several types of deep learning ar- pects of training or deployment of a deep learning algorithm. chitectures and we evaluate the performance of the above For instance, Caffe emphasises ease of use where standard frameworks when employed on a single machine for both layers can be easily configured without hard-coding while (multi-threaded) CPU and GPU (Nvidia Titan X) settings. Theano provides automatic differentiation capabilities which The speed performance metrics used here include the gradi- facilitates flexibility to modify architecture for research and ent computation time, which is important during the train- development. Several of these frameworks have received ing phase of deep networks, and the forward time, which wide attention from the research community and are well- is important from the deployment perspective of trained developed allowing efficient training of deep networks with networks.
    [Show full text]
  • Comparative Study of Caffe, Neon, Theano, and Torch
    Workshop track - ICLR 2016 COMPARATIVE STUDY OF CAFFE,NEON,THEANO, AND TORCH FOR DEEP LEARNING Soheil Bahrampour, Naveen Ramakrishnan, Lukas Schott, Mohak Shah Bosch Research and Technology Center fSoheil.Bahrampour,Naveen.Ramakrishnan, fixed-term.Lukas.Schott,[email protected] ABSTRACT Deep learning methods have resulted in significant performance improvements in several application domains and as such several software frameworks have been developed to facilitate their implementation. This paper presents a comparative study of four deep learning frameworks, namely Caffe, Neon, Theano, and Torch, on three aspects: extensibility, hardware utilization, and speed. The study is per- formed on several types of deep learning architectures and we evaluate the per- formance of the above frameworks when employed on a single machine for both (multi-threaded) CPU and GPU (Nvidia Titan X) settings. The speed performance metrics used here include the gradient computation time, which is important dur- ing the training phase of deep networks, and the forward time, which is important from the deployment perspective of trained networks. For convolutional networks, we also report how each of these frameworks support various convolutional algo- rithms and their corresponding performance. From our experiments, we observe that Theano and Torch are the most easily extensible frameworks. We observe that Torch is best suited for any deep architecture on CPU, followed by Theano. It also achieves the best performance on the GPU for large convolutional and fully connected networks, followed closely by Neon. Theano achieves the best perfor- mance on GPU for training and deployment of LSTM networks. Finally Caffe is the easiest for evaluating the performance of standard deep architectures.
    [Show full text]
  • Tensorflow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction
    TensorFlow, Theano, Keras, Torch, Caffe Vicky Kalogeiton, Stéphane Lathuilière, Pauline Luc, Thomas Lucas, Konstantin Shmelkov Introduction TensorFlow Google Brain, 2015 (rewritten DistBelief) Theano University of Montréal, 2009 Keras François Chollet, 2015 (now at Google) Torch Facebook AI Research, Twitter, Google DeepMind Caffe Berkeley Vision and Learning Center (BVLC), 2013 Outline 1. Introduction of each framework a. TensorFlow b. Theano c. Keras d. Torch e. Caffe 2. Further comparison a. Code + models b. Community and documentation c. Performance d. Model deployment e. Extra features 3. Which framework to choose when ..? Introduction of each framework TensorFlow architecture 1) Low-level core (C++/CUDA) 2) Simple Python API to define the computational graph 3) High-level API (TF-Learn, TF-Slim, soon Keras…) TensorFlow computational graph - auto-differentiation! - easy multi-GPU/multi-node - native C++ multithreading - device-efficient implementation for most ops - whole pipeline in the graph: data loading, preprocessing, prefetching... TensorBoard TensorFlow development + bleeding edge (GitHub yay!) + division in core and contrib => very quick merging of new hotness + a lot of new related API: CRF, BayesFlow, SparseTensor, audio IO, CTC, seq2seq + so it can easily handle images, videos, audio, text... + if you really need a new native op, you can load a dynamic lib - sometimes contrib stuff disappears or moves - recently introduced bells and whistles are barely documented Presentation of Theano: - Maintained by Montréal University group. - Pioneered the use of a computational graph. - General machine learning tool -> Use of Lasagne and Keras. - Very popular in the research community, but not elsewhere. Falling behind. What is it like to start using Theano? - Read tutorials until you no longer can, then keep going.
    [Show full text]
  • DIY Deep Learning for Vision: the Caffe Framework
    DIY Deep Learning for Vision: the Caffe framework caffe.berkeleyvision.org github.com/BVLC/caffe Evan Shelhamer adapted from the Caffe tutorial with Jeff Donahue, Yangqing Jia, and Ross Girshick. Why Deep Learning? The Unreasonable Effectiveness of Deep Features Classes separate in the deep representations and transfer to many tasks. [DeCAF] [Zeiler-Fergus] Why Deep Learning? The Unreasonable Effectiveness of Deep Features Maximal activations of pool5 units [R-CNN] conv5 DeConv visualization Rich visual structure of features deep in hierarchy. [Zeiler-Fergus] Why Deep Learning? The Unreasonable Effectiveness of Deep Features 1st layer filters image patches that strongly activate 1st layer filters [Zeiler-Fergus] What is Deep Learning? Compositional Models Learned End-to-End What is Deep Learning? Compositional Models Learned End-to-End Hierarchy of Representations - vision: pixel, motif, part, object - text: character, word, clause, sentence - speech: audio, band, phone, word concrete abstract learning What is Deep Learning? Compositional Models Learned End-to-End figure credit Yann LeCun, ICML ‘13 tutorial What is Deep Learning? Compositional Models Learned End-to-End Back-propagation: take the gradient of the model layer-by-layer by the chain rule to yield the gradient of all the parameters. figure credit Yann LeCun, ICML ‘13 tutorial What is Deep Learning? Vast space of models! Caffe models are loss-driven: - supervised - unsupervised slide credit Marc’aurelio Ranzato, CVPR ‘14 tutorial. Convolutional Neural Nets (CNNs): 1989 LeNet: a layered model composed of convolution and subsampling operations followed by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ] Convolutional Nets: 2012 AlexNet: a layered model composed of convolution, + data subsampling, and further operations followed by a holistic + gpu representation and all-in-all a landmark classifier on + non-saturating nonlinearity ILSVRC12.
    [Show full text]
  • Deep Learning with Keras I
    Deep Learning with Keras i Deep Learning with Keras About the Tutorial Deep Learning essentially means training an Artificial Neural Network (ANN) with a huge amount of data. In deep learning, the network learns by itself and thus requires humongous data for learning. In this tutorial, you will learn the use of Keras in building deep neural networks. We shall look at the practical examples for teaching. Audience This tutorial is prepared for professionals who are aspiring to make a career in the field of deep learning and neural network framework. This tutorial is intended to make you comfortable in getting started with the Keras framework concepts. Prerequisites Before proceeding with the various types of concepts given in this tutorial, we assume that the readers have basic understanding of deep learning framework. In addition to this, it will be very helpful, if the readers have a sound knowledge of Python and Machine Learning. Copyright & Disclaimer Copyright 2019 by Tutorials Point (I) Pvt. Ltd. All the content and graphics published in this e-book are the property of Tutorials Point (I) Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish any contents or a part of contents of this e-book in any manner without written consent of the publisher. We strive to update the contents of our website and tutorials as timely and as precisely as possible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt. Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of our website or its contents including this tutorial.
    [Show full text]
  • Caffe in Practice
    This Business of Brewing: Caffe in Practice caffe.berkeleyvision.org Evan Shelhamer github.com/BVLC/caffe from the tutorial by Evan Shelhamer, Jeff Donahue, Yangqing Jia, and Ross Girshick Deep Learning, as it is executed... What should a framework handle? Compositional Models Decompose the problem and code! End-to-End Learning Solve and check! Vast Space of Architectures and Tasks Define, experiment, and extend! Frameworks ● Torch7 ○ NYU ○ scientific computing framework in Lua ○ supported by Facebook ● Theano/Pylearn2 ○ U. Montreal ○ scientific computing framework in Python ○ symbolic computation and automatic differentiation ● Cuda-Convnet2 ○ Alex Krizhevsky ○ Very fast on state-of-the-art GPUs with Multi-GPU parallelism ○ C++ / CUDA library Framework Comparison ● More alike than different ○ All express deep models ○ All are nicely open-source ○ All include scripting for hacking and prototyping ● No strict winners – experiment and choose the framework that best fits your work ● We like to brew our deep networks with Caffe Why Caffe? In one sip… ● Expression: models + optimizations are plaintext schemas, not code. ● Speed: for state-of-the-art models and massive data. ● Modularity: to extend to new tasks and architectures. ● Openness: common code and reference models for reproducibility. ● Community: joint discussion, development, and modeling. CAFFE EXAMPLES + APPLICATIONS Share a Sip of Brewed Models demo.caffe.berkeleyvision.org demo code open-source and bundled Scene Recognition by MIT Places CNN demo B. Zhou et al. NIPS 14 Object Detection R-CNN: Regions with Convolutional Neural Networks http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/detection.ipynb Full R-CNN scripts available at https://github.com/rbgirshick/rcnn Ross Girshick et al.
    [Show full text]
  • Intro to Keras
    Intro to Keras Justin Zhang July 2017 1 Introduction Keras is a high-level Python machine learning API, which allows you to easily run neural networks. Keras is simply a specification; it provides a set of methods that you can use, and it will use a backend (TensorFlow, Theano, or CNTK, as chosen by the user) to actually run your code. Like many machine learning frameworks, Keras is a so-called define-and-run framework. This means that it will define and optimize your neural network in a compilation step before training starts. 2 First Steps First, install Keras: pip install keras We'll go over a fully-connected neural network designed for the MNIST (clas- sifying handwritten digits) dataset. # Can be found at https://github.com/fchollet/keras # Released under the MIT License import keras from keras.datasets import mnist from keras .models import S e q u e n t i a l from keras.layers import Dense, Dropout from keras.optimizers import RMSprop First, we handle the imports. keras.datasets includes many popular machine learning datasets, including CIFAR and IMDB. keras.layers includes a variety of neural network layer types, including Dense, Conv2D, and LSTM. Dense simply refers to a fully-connected layer, and Dropout is a layer commonly used to ad- dress the problem of overfitting. keras.optimizers includes most widely used optimizers. Here, we opt to use RMSprop, an alternative to the classic stochas- tic gradient decent (SGD) algorithm. Regarding keras.models, Sequential is most widely used for simple networks; there is also a functional Model class you can use for more complex networks.
    [Show full text]
  • Auto-Keras: an Efficient Neural Architecture Search System
    Auto-Keras: An Efficient Neural Architecture Search System Haifeng Jin, Qingquan Song, Xia Hu Department of Computer Science and Engineering, Texas A&M University {jin,song_3134,xiahu}@tamu.edu ABSTRACT epochs are required to further train the new architecture towards Neural architecture search (NAS) has been proposed to automat- better performance. Using network morphism would reduce the av- ically tune deep neural networks, but existing search algorithms, erage training time t¯ in neural architecture search. The most impor- e.g., NASNet [41], PNAS [22], usually suffer from expensive com- tant problem to solve for network morphism-based NAS methods is putational cost. Network morphism, which keeps the functional- the selection of operations, which is to select an operation from the ity of a neural network while changing its neural architecture, network morphism operation set to morph an existing architecture could be helpful for NAS by enabling more efficient training during to a new one. The network morphism-based NAS methods are not the search. In this paper, we propose a novel framework enabling efficient enough. They either require a large number of training Bayesian optimization to guide the network morphism for effi- examples [6], or inefficient in exploring the large search space [11]. cient neural architecture search. The framework develops a neural How to perform efficient neural architecture search with network network kernel and a tree-structured acquisition function optimiza- morphism remains a challenging problem. tion algorithm to efficiently explores the search space. Intensive As we know, Bayesian optimization [33] has been widely adopted experiments on real-world benchmark datasets have been done to to efficiently explore black-box functions for global optimization, demonstrate the superior performance of the developed framework whose observations are expensive to obtain.
    [Show full text]