A Survey on Negative Transfer

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. -, NO. -, 2021 1 A Survey on Negative Transfer Wen Zhang, Lingfei Deng, Lei Zhang, Senior Member, IEEE, Dongrui Wu, Senior Member, IEEE Abstract—Transfer learning (TL) utilizes data or knowledge from one or more source domains to facilitate the learning in a target domain. It is particularly useful when the target domain has very few or no labeled data, due to annotation expense, privacy concerns, etc. Unfortunately, the effectiveness of TL is not always guaranteed. Negative transfer (NT), i.e., leveraging source domain data/knowledge undesirably reduces the learning performance in the target domain, has been a long-standing and challenging problem in TL. Various approaches have been proposed in the literature to handle it. However, there does not exist a systematic survey on the formulation of NT, the factors leading to NT, and the algorithms that mitigate NT. This paper fills this gap, by first introducing the definition of NT and its factors, then reviewing about fifty representative approaches for overcoming NT, according to four categories: secure transfer, domain similarity estimation, distant transfer, and NT mitigation. NT in related fields, e.g., multi-task learning, lifelong learning, and adversarial attacks, are also discussed. Index Terms—Negative transfer, transfer learning, domain adaptation, domain similarity F 1 INTRODUCTION Common assumption in traditional machine learning A is that the training data and the test data are drawn from the same distribution. However, this assumption does not hold in many real-world applications. For example, two image datasets may contain images taken using cameras with different resolutions under different light conditions; different people may demonstrate strong individual differ- ences in brain-computer interfaces [1]. Therefore, the result- ing machine learning model may generalize poorly. A conventional approach to mitigate this problem is to re-collect a large amount of labeled or partly labeled data, which have the same distribution as the test data, and then Fig. 1. Illustration of NT: introducing source domain data/knowledge train a machine learning model on the new data. However, decreases the target domain learning performance. many factors may prevent easy access to such data, e.g., high annotation cost, privacy concerns, etc. A better solution to the above problem is transfer learn- the source domain and target domain data distributions are ing (TL) [2], or domain adaptation (DA) [3], which tries not too different; and, 3) a suitable model can be applied to utilize data or knowledge from related domains (called to both domains. Violations of these assumptions may lead source domains) to facilitate the learning in a new domain to negative transfer (NT), i.e., introducing source domain (called target domain). TL was first studied in educational data/knowledge undesirably decreases the learning perfor- psychology to enhance human’s ability to learn new tasks mance in the target domain, as illustrated in Fig. 1. NT is a and to solve novel problems [4]. In machine learning, TL is long-standing and challenging problem in TL [2], [14], [15]. mainly used to improve a model’s ability to generalize in The following fundamental problems need to be ad- arXiv:2009.00909v4 [cs.LG] 9 Aug 2021 the target domain, which usually has zero or a very small dressed for reliable TL [2]: 1) what to transfer; 2) how to number of labeled data. Many different TL approaches have transfer; and, 3) when to transfer. Most TL research [3], [16] been proposed, e.g., traditional (statistical) TL [5]–[9], deep focused only on the first two, whereas all three should be TL [10], [11], adversarial TL [12], [13], etc. taken into consideration to avoid NT. To our knowledge, Unfortunately, the effectiveness of TL is not always NT was first studied in 2005 [14], and received increasing guaranteed, unless its basic assumptions are satisfied: 1) the attention recently [15], [17], [18]. Various ideas, e.g., finding learning tasks in the two domains are related/similar; 2) similar parts of domains, evaluating the transferability of different tasks/models/features, etc., have been explored. Though very important, there does not exist a compre- • Wen Zhang, Lingfei Deng and Dongrui Wu are with the Key Laboratory of the Ministry of Education for Image Processing and Intelligent Control, hensive survey on NT. This paper aims to fill this gap, School of Artificial Intelligence and Automation, Huazhong University of by systematically reviewing about fifty representative ap- Science and Technology, Wuhan 430074, China. (e-mail: fwenz, lfdeng, proaches to cope with NT. We mainly consider homoge- [email protected]). neous and closed-set classification problems in TL, i.e., the • Lei Zhang is with the School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China (e-mail: source and target tasks are the same, and the target feature [email protected]). and label spaces are also unchanged during testing. This is • Wen Zhang and Lingfei Deng contributed equally to this work. Don- the most studied TL scenario. We introduce the definition grui Wu is the corresponding author. and factors of NT, methods that can avoid NT under theo- IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. -, NO. -, 2021 2 retical guarantees, methods that can mitigate NT to a certain 2.2 TL Categorization extent, and some related fields. Articles that do not explain According to [2], TL approaches can be categorized into four their methods from the perspective of NT are not included groups: instance based, feature based, model/parameter in this survey, to keep it more focused. based, and relation based. The remainder of this paper is organized as follows. Instance based approaches mainly focus on sample Section 2 introduces background knowledge in TL and NT. weighting, assuming the distribution discrepancy between Section 3 proposes an scheme for reliable TL. Sections 4-7 the source and target domains is caused by a sample selec- review secure transfer, domain similarity estimation, distant tion bias, which can be compensated by reusing a certain transfer, and NT mitigation strategies, respectively. Section 8 portion of the weighted source domain data [3], [8]. introduces several related machine learning fields. Section 9 Feature based approaches aim to find a latent subspace compares all reviewed approaches. Finally, Section 10 draws or representation to match the two domains, assuming there conclusions and points out some future research directions. exists a common space in which distribution discrepancies of different domains can be minimized [3]. 2 BACKGROUND KNOWLEDGE Model/parameter based approaches transfer knowledge This section introduces some background knowledge on TL via parameters, assuming the distributions of model param- and NT, including the notations, definitions and categoriza- eters in different domains are the same or similar [19], [20]. tions of TL, and factors of NT. Relation based approaches assume that some internal logical relationships or rules in the source domain are pre- served in the target domain. 2.1 Notations and Definitions More details on TL can be found in [2], [3], [21], [22]. We consider classifiers with K categories, with an input feature space X and an output label space Y. Assume we i i ns 2.3 NT have access to one labeled source domain S = f(xs; ys)gi=1 drawn from PS (X; Y ), where X ⊆ X and Y ⊆ Y. The Rosenstein et al. [14] first discovered NT through exper- target domain consists of two sub-datasets: T = (Tl; Tu), iments, and concluded that “transfer learning may actually j j nl hinder performance if the tasks are too dissimilar” and “inductive where Tl = f(xl ; yl )gj=1 consists of nl labeled samples k nu bias learned from the auxiliary tasks will actually hurt perfor- drawn from PT (X; Y ), and Tu = fxugk=1 consists of nu mance on the target task.” Pan et al. [2] also briefly mentioned unlabeled samples drawn from PT (X). The main notations are summarized in Table 1. NT in their TL survey: “When the source domain and target domain are not related to each other, brute-force transfer may be TABLE 1 unsuccessful. In the worst case, it may even hurt the performance Main notations in this survey. of learning in the target domain, a situation which is often referred to as negative transfer.” Notation Description Notation Description Wang et al. [15] gave a mathematical definition of NT, x Feature vector `(·); L(·) Loss function and proposed a negative transfer gap (NTG) to determine y Label of x h Hypothesis whether NT happens. X Feature space f Classifier Definition 1. (Negative transfer gap [15]). Let be the Y Label space θ TL algorithm T S Source domain g Feature extractor test error in the target domain, θ(S; T ) a TL algorithm T Target domain Error (risk) between S and T , and θ(;; T ) the same algorithm but P (·) Distribution n Number of samples does not use the source domain information at all. Then, E(·) Expectation K Number of classes (θ(S; T )) > (θ(;; T )) d(·) Distance metric M No. of source domains NT happens when T T , and the degree of NT can be evaluated by the NTG: In TL, the condition that the source and target domains NTG = T (θ(S; T )) − T (θ(;; T )): (1) are different (i.e., S 6= T ) implies one or more of the following: Obviously, NT occurs if the NTG is positive. However, NTG may not always be computable. For example, in an un- 1) The feature spaces are different, i.e., XS 6= XT . supervised scenario, (θ(;; T )) is impossible to compute 2) The label spaces are different, i.e., YS 6= YT . T 3) The marginal probability distributions of the two do- due to the lack of labeled target data.

A Survey on Negative Transfer

Transfer Adaptation Learning: a Decade Survey

Deep Learning for Electromyographic Hand Gesture Signal Classification

A Comprehensive Survey on Transfer Learning

Personalizing EEG-Based Affective Models with Transfer Learning

Transfer Learning: Introduction & Application

Transfer Learning for Reinforcement Learning Domains: a Survey

Transfer Learning Approach for Occupancy Prediction in Smart Buildings

Stylegans and Transfer Learning for Generating Synthetic Images in Industrial Applications

Deep Learning Occupancy Activity Detection Approach for Optimising Building Energy Loads

Use Transfer Learning for Efficient Deep Learning Training on Intel® Xeon® Processors

Deep Learning for EMG-Based Human-Machine Interaction: a Review Dezhen Xiong, Daohui Zhang, Member, IEEE, Xingang Zhao, Member, IEEE, and Yiwen Zhao

Transfer Learning and Intelligence: an Argument and Approach