LEEP: a New Measure to Evaluate Transferability of Learned Representations
Total Page:16
File Type:pdf, Size:1020Kb
LEEP: A New Measure to Evaluate Transferability of Learned Representations Cuong V. Nguyen 1 Tal Hassner 2 Matthias Seeger 1 Cedric Archambeau 1 Abstract choose good source models for a given target task (Achille et al., 2019; Bao et al., 2019; Bhattacharjee et al., 2019). Pre- We introduce a new measure to evaluate the trans- vious approaches to transferability estimation often require ferability of representations learned by classifiers. running a transfer learning algorithm that involves expensive Our measure, the Log Expected Empirical Pre- parameter optimization (Zamir et al., 2018; Achille et al., diction (LEEP), is simple and easy to compute: 2019), do not have a simple interpretation (Bao et al., 2019), when given a classifier trained on a source data or make strong assumptions about the data sets that limit set, it only requires running the target data set their applicability (Zamir et al., 2018; Tran et al., 2019). through this classifier once. We analyze the prop- erties of LEEP theoretically and demonstrate its We propose a novel measure called the Log Expected effectiveness empirically. Our analysis shows that Empirical Prediction (LEEP) for transferability estimation LEEP can predict the performance and conver- of deep networks that overcomes all the shortcomings above. gence speed of both transfer and meta-transfer In contrast to previous approaches, LEEP scores are ob- learning methods, even for small or imbalanced tained without training on the target task, thus avoiding the data. Moreover, LEEP outperforms recently pro- expensive parameter optimization step. Additionally, they posed transferability measures such as negative have a simple interpretation and can be applied in general conditional entropy and H scores. Notably, when settings to a wide range of modern deep networks. transferring from ImageNet to CIFAR100, LEEP In particular, LEEP scores are obtained from a source model can achieve up to 30% improvement compared to and a target data set by making a single forward pass of the best competing method in terms of the corre- the model through the target data. This is a simpler process lations with actual transfer accuracy. than previous methods, such as Taskonomy (Zamir et al., 2018) and Task2Vec (Achille et al., 2019), where one has to re-train at least part of the source model on the target 1. Introduction data set. Furthermore, LEEP has a simple interpretation: Transferability estimation (Eaton et al., 2008; Ammar et al., it is the average log-likelihood of the expected empirical 2014; Sinapov et al., 2015) is the problem of quantitatively predictor, a simple classifier that makes prediction based estimating how easy it is to transfer knowledge learned on the expected empirical conditional distribution between from one classification task to another. Specifically, given a source and target labels. Finally, LEEP does not make any source task, represented by a labeled data set or a pre-trained assumption on the source and target input samples, except model, and a target task, represented by a labeled data set, that they have the same size. This is more general and transferability estimation aims to develop a measure (or applicable than previous work (Zamir et al., 2018; Tran et al., 2019) where source and target data sets were assumed arXiv:2002.12462v2 [cs.LG] 14 Aug 2020 a score) that can tell us, ideally without training on the target task, how effectively transfer learning algorithms can to share the same input samples. transfer knowledge from the source task to the target task. Contributions. We formally define LEEP and rigorously Answering this question is important, since good estima- analyze it, both theoretically and empirically. We show two tions of transferability can help understand the relationships theoretical properties of the measure: (1) LEEP is upper between tasks (Tran et al., 2019), select groups of highly bounded by the average log-likelihood of the optimal model, transferable tasks for joint training (Zamir et al., 2018), or obtained by re-training the head classifier while freezing 1Amazon Web Services 2Facebook AI (Work done before join- the feature extractor; (2) LEEP is related to the negative ing Facebook). Correspondence to: Cuong V. Nguyen <nguy- conditional entropy measure proposed by Tran et al.(2019). [email protected]>. We conduct extensive experiments to evaluate our LEEP Proceedings of the 37 th International Conference on Machine measure in several scenarios. We show that the mea- Learning, Online, PMLR 119, 2020. Copyright 2020 by the au- sure is useful for predicting the performance of two com- thor(s). LEEP: A New Measure to Evaluate Transferability of Learned Representations monly used transfer learning algorithms – head classifier task that uses the representation learned on the source with re-training (Donahue et al., 2014; Razavian et al., 2014) a head classifier learned on the target (Donahue et al., 2014; and model fine-tuning (Agrawal et al., 2014; Girshick et al., Razavian et al., 2014; Zeiler & Fergus, 2014; Oquab et al., 2014) – not only for large target data sets, but also for small 2014; Whatmough et al., 2019). We call this the head re- or imbalanced target data sets that are difficult to use for training method. Another popular transfer learning method, re-training. We also show that LEEP can predict the conver- called the fine-tuning method, fine-tunes the feature extrac- gence speed of the fine-tuning method for transfer learning. tor while re-training the new head classifier on the target data set to get a model for the target task (Agrawal et al., We further demonstrate that LEEP can predict the per- 2014; Girshick et al., 2014; Chatfield et al., 2014; Dhillon formance of a recently developed meta-transfer learning et al., 2020). method, the Conditional Neural Adaptive Processes (Re- queima et al., 2019). Meta-transfer learning (Wei et al., In this work, we study the transferability estimation prob- 2018b; Sun et al., 2019; Requeima et al., 2019) is a frame- lem, which aims to develop a measure (or a score) that can work for learning to transfer using several meta-training tell us, without training on the target data set, how effectively tasks. Importantly, to our knowledge, our work is the first to these transfer learning algorithms can transfer knowledge develop a transferability measure for meta-transfer learning. learned in the source model θ to the target task, using the target data set D. We emphasize that although transfer- We empirically compare our method with the very recent ability estimation can tell us the effectiveness of transfer negative conditional entropy measure (Tran et al., 2019) learning, it is not a solution for transfer learning in itself. and H scores (Bao et al., 2019). Our comparisons show that Instead, transfer learning is achieved by e.g., the head re- LEEP better correlates with the actual transfer accuracy than training or fine-tuning methods above. In our experiments these methods. Finally, we demonstrate the effectiveness of in Sec.5, we will use these transfer learning methods to test LEEP for the source model selection problem in comparison our transferability measure. with the negative conditional entropy and H scores. We now describe our proposed measure, LEEP, that requires 2. Log Expected Empirical Prediction no expensive training on the target task and offers an a priori estimate of how well a model will transfer to the target task. Consider transfer learning between two classification tasks: The measure can be efficiently computed from the source a source task, represented by a pre-trained model, and a model θ and the target data set D. The computation involves target task, represented by a labeled data set. Formally, the following three steps. assume the source task requires learning a model that maps Step 1: Compute dummy label distributions of the in- input instances from a domain X = RN to labels in a finite label set Z. Further, assume that we already trained puts in the target data set D. We apply θ to each input such a model, which we call the source model, denoted by xi in D to get the predicted distribution over Z, the label θ. We note that X can contain text or images since they set of the source task. We denote this predicted distribution by θ(x ), which is a categorical distribution over Z. Note can be flatten to vectors in RN . Transfer learning seeks i to learn a model for a target task mapping inputs from X that θ(xi) is a dummy distribution over labels of the source to some finite target label set Y, given a target data set task, since these labels may not be meaningful for the exam- ple xi. For instance, if θ is a model pre-trained on ImageNet D = f(x1; y1); (x2; y2);:::; (xn; yn)g, where xi 2 X and and D is the CIFAR data set, then θ(xi) is a distribution over yi 2 Y, for this purpose. We emphasize that, unlike recent previous work (Zamir et al., 2018; Tran et al., 2019), the ImageNet labels, which may not be semantically related to domain X in our setting is general: the input instances of the true label of xi in the CIFAR data set. the source and target tasks are only assumed to have the Step 2: Compute the empirical conditional distribution 1 same dimension, such as ImageNet (Russakovsky et al., P^(yjz) of the target label y given the source label z. We 2015) and CIFAR (Krizhevsky, 2009) images scaled to the next compute the empirical conditional distribution P^(yjz) same size. for all (y; z) 2 Y × Z. To this end, we first compute ^ The transfer learning setting above is very common in com- the empirical joint distribution P (y; z) for all label pairs puter vision for leveraging expensive pre-trained deep learn- (y; z) 2 Y × Z: ing models.