Deep Bayesian Active Learning, a Brief Survey on Recent Advances

Deep Bayesian Active Learning, A Brief Survey on Recent Advances Salman Mohamadi Hamidreza Amindavar Computer Science and Electrical Engineering Electrical Engineering West Virginia University Amirkabir University of Technology Morgantown, WV, USA Tehran, Iran [email protected] [email protected] Abstract—Active learning frameworks offer efficient is differences between their learning paradigm. In data annotation without remarkable accuracy degradation. more detain, semi-supervised learning frameworks In other words, active learning starts training the model use the unlabeled data for feature representations with a small size of labeled data while exploring the space in order to better model the labeled data [3]–[5]. of unlabeled data in order to select most informative samples to be labeled. Generally speaking, representing the Classical methods and tools in signal processing uncertainty is crucial in any active learning framework, area, with an emphasis on parametric modeling, however, deep learning methods are not capable of either have been used in many areas with different type of representing or manipulating model uncertainty. On the data [7]–[9]. However recent advances in machine other hand, from the real world application perspective, learning and in particular, artificial neural network, uncertainty representation is getting more and more atten- have shown that non-parametric models, are capable tion in the machine learning community. Deep Bayesian active learning frameworks and generally any Bayesian of almost modeling any type of data at the cost of active learning settings, provide practical consideration in higher complexity. In this line, deep learning could the model which allows training with small data while be incorporated with classical tools and frameworks representing the model uncertainty for further efficient of machine learning such as active learning in order training. In this paper, we briefly survey recent advances to improve the performance. in Bayesian active learning and in particular deep Bayesian Representing the uncertainty of either embedding active learning frameworks. Index Terms—Bayesian Active Learning, Deep learning, space or output probability space is challenging Posterior estimation, Bayesian inference, Semi-supervised especially when we are going to use deep learning learning tools and concepts. It would show up in multiple scenarios in which we need to measure the model I. INTRODUCTION uncertainty such as problems addressed by classical Active learning is a framework in the area of active learning or its various versions. In practice, arXiv:2012.08044v1 [cs.LG] 15 Dec 2020 machine learning in which the model starts training we are interested in forming a desired output prob- by small amount of labeled data and then, in a ability space in a typical active learning frame- sequential process asks for more data samples from work, which necessitates the model uncertainty a pool of unlabeled data to be labeled. In fact, measurement and representation. In recent years, the key idea behind this framework is to achieve multiple efforts have been performed to introduce desired accuracy while lowering the cost of labeling deep learning tools to active learning framework. by efficiently asking for more labeled data. There- Authors of [6], in their work as a pioneered effort, fore, compared to many other frameworks, active discussed that bringing deep learning tools into learning tries to achieve the same or higher accu- active learning setting poses two major problem; racy by using smaller amount of labeled data [1], uncertainty representation and the amount of data [2]. In contrast, semi-supervised learning addresses needed to train the model. In next two sections, relatively similar problem domain, however, there in order have a tast of basic concepts and some of data while at the same time allow model uncertainty representation using an acquisition function, which essentially makes them the key tool for active learning on big data with high dimensional samples. Accordingly, authors of [11] proposed a new version of CNNs with Bayesian prior on a set of weights,i.e., Gaussian prior p(w); w is fW1;W2; :::WN g. Bayesian CNNs for classification tasks with a softmax layer, would be formulated with a likelihood model: w Fig. 1. Learning paradigm of active learning; as it is shown, at every p(y = cjx; w) = softmax(f (x)): (1) iteration the training starts from scratch on modified training data. However, in order to practically implement this model, the Gal et.al [6] suggest approximate in- theoretical concepts, a brief overview of the learning ference using stochastic regularization techniques, paradigm of active learning and Bayesian neural and perform it by applying dropout during the network will be presented. training as well as test process (to estimating the posterior). In more detail, they do this by finding II. LEARNING PARADIGM IN ACTIVE LEARNING ∗ a distribution, namely qθ (w) which given a set of Simply put, The goal of active learning is to training data D, minimizes the Kullback-Leibler minimize the cost of data annotation or labeling (KL) divergence between estimated posterior and by efficiently selecting the unlabeled data to be exact posterior p(wjD). Finally using Mont Carlo labeled. In more detail, in every iteration of active integration we will have: learning, a new labeled data sample (or even batch Z of data) will be added to the training data, and the p(y = cjx; D) = p(y = cjx; w)p(wjD)dw (2) training process starts from scratch. This sequential Z ∗ training will continue until the accuracy reaches to ≈ p(y = cjx; w)qθ (w)dw (3) the desired level. The overview of the learning cycle T of active learning is shown in Fig.1. 1 X ≈ p(y = cjx; w^ ); (4) In each iteration, all of the unlabeled data samples T t will be evaluated in the sense of uncertainty, and t=1 the best one will be selected by a function, namely with qθ(w) as dropout distribution and w^t as ∗ acquisition function. Generally speaking,acquisition estimation of qθ [6]. function performs as a function for uncertainty IV. REVIEW ON RECENT ADVANCES sampling, or diversity sampling or both of them. While random acquisition is considered baseline, Bayesian inference methods allow the introduc- depending on data setting, there are several acqui- tion of probabilistic framework to machine learning sition functions, some of them for image data are and deep learning. The notion behind the intro- presented and approximately formulated in [6]. duction of these kind of frameworks to machine learning is that learning from data would be treated III. CONVOLUTIONAL NEURAL NETWORKS as inferring optimal or near optimal models for WITH BAYESIAN PRIOR data representation, such as automatic model dis- Deep learning algorithms mostly rely on training covery. In this sense, Bayesian methods and here, a convolutional neural networks (CNNs). In fact specifically Bayesian active learning methods gain with the recent advancement of CNNs, one of the attention due to their ability for uncertainty repre- main advantages of CNNs is that they enable cap- sentation and even better generalization on small turing spatial information of the data [10]. Bayesian amount of data [20]. One of the main work on CNNs are capable of learning from small amount introduction of model uncertainty measurement and manipulation to active learning is done by Gal be interpreted as a variational Bayesian approxima- and Ghahramani [6]. In fact the major contribution tion [21], [22], where the approximating distribution of this paper is special introduction of Bayesian is a mixture of two Gaussians with small variances uncertainty estimation to active learning in order which the mean of one of the Gaussians is zero. The to form a deep active learning framework. In more prediction uncertainty caused by uncertainty in the detail, deep learning tools are data hungry while weights which could be measured by approximate active learning tends to use small amount of data, posterior using Monte Carlo integration. moreover, generally speaking deep learning is no Authors of reference [24] poses another similar suitable for uncertainty representation while active problem by introducing deep learning with rela- learning relies on model uncertainty measurement or tively very large amount of data and big network even manipulation. Understanding these big natural into active learning; and suggesting the necessity of differences, authors of this paper found the Bayesian systematic request for labeling in the form of batch approach to be the solution. In fact they refine the active learning (batch rather than sample in each active learning general framework, which usually active learning iteration). They offer batch active work with SVM and small amount of data, to be learning in order to address the problem that existing well scaled to high dimensional data such as images greedy algorithms become computationally costly in the case of big data. in contrast to small data. and sensitive to the model slightest changes. The It practice, the authors put a Bayesian prior on authors propose a model aimed at efficiently scaled the kernels of a convolutional neural network as active learning by well estimating data posterior. the training engine of active learning framework. They suggest scenarios in which more efficiency They refer to their previous work [21] suggesting comes with one batch rather than one data sample that in order to have a practical Bayesian CNN, at each iteration. In this paper, authors take multiple the Bayesian inference could be done through ap- active learning methods, different acquisition func- proximate inference in the Bayesian CNN, which tions, into account for their objective of efficient makes the solution computationally tractable. The batch selection in the sense of sparsity, or sparse interesting point is that they empirically showed that subset approximation. Moreover, they claim that dropout is a Bayesian approximation which can be based on their experiments, that reference [6], as used as a way to introduce uncertainty to deep learn- a Bayesian approach, outperforms others in many ing [22].

Load more