Multi-Source Domain Adaptation with Distribution Fusion and Relationship Extraction

Multi-Source Domain Adaptation with Distribution Fusion and Relationship Extraction 1st Keqiuyin Li 2nd Jie Lu 3rd Hua Zuo Faculty of Engineering and IT Faculty of Engineering and IT Faculty of Engineering and IT University of Technology Sydney University of Technology Sydney University of Technology Sydney Sydney, NSW, Australia Sydney, NSW, Australia Sydney, NSW, Australia [email protected] [email protected] [email protected] 4th Guangquan Zhang Faculty of Engineering and IT University of Technology Sydney Sydney, NSW, Australia [email protected] Abstract—Transfer learning is gaining increasing attention discrepancy of distributions between the source and target due to its ability to leverage previously acquired knowledge domains in a latent feature space, including adaptations of to assist in completing a prediction task in a similar domain. marginal distribution [10], conditional distribution [11] and While many existing transfer learning methods deal with single source and single target problem without considering the fact joint distribution [12]. that a target domain maybe similar to multiple source domains, Zellinger et al. [13] proposed a novel metric function this work proposes a multi-source domain adaptation method named central moment discrepancy to measure the distance based on a deep neural network. Our method contains common between probability distributions which can be solved without feature extraction, specific predictor learning and target predictor highly complex and costly kernel matrix computations. Zuo estimation. Common feature extraction explores the relationship between source domains and target domain by distribution fusion et al. [14], [15] used fuzzy system and granular computing and guarantees the strength of similar source domains during to achieve regression transfer in homogeneous and hetero- training, something which has not been well considered in ex- geneous feature spaces. Benefiting from the development of isting works. Specific predictor learning trains source tasks with deep learning, recent surveys employed deep neural networks cross-domain distribution constraint and cross-domain predictor to extract common features of source and target domains. constraint to enhance the performance of single source. Target predictor estimation employs relationship extraction and selective Based on adversarial learning, Long et al. [16] proposed strategy to improve the performance of the target task and to joint adaptation networks which used joint maximum mean avoid negative transfer. Experiments on real-world visual datasets discrepancy to align the joint distributions across domains in show the performance of the proposed method is superior to other hidden specific layers. Liu et al. [17] and Jang et al. [18] deep learning baselines. explored the transferability of deep feature representations and I. INTRODUCTION provided experiments and theory analysis on what and where to transfer in deep networks. Transfer learning [1], [2] has been explored for many years However, all these studies focused on single source domain and gained success in a variety of applications in real-world adaptation while, in practice, a target domain can be similar scenarios, such as nature language processing, computer vision to multiple source domains, and they may be different from and biological problems. The main goal of transfer learning each other but could provide richer information for transfer. is to improve the performance of the target task using the It is easy to see why multi-source domain adaptation now knowledge learned from a similar source domain, since the attracts greater attention. Zhao et al. [19] and Wen et al. [20] target training data is often difficult to collect or expensive measured the discrepancy by H-divergence and proposed an to label, especially where medicine is involved. Employing adversarial strategy based deep framework to solve multiple different transfer information, it can be divided into four cat- sources domain adaptation both for classification and regres- egories: instance-based method [3], feature-based method [4], sion problems. Guo et al. [21] proposed a mixture-of-experts parameter-based method [5] and relationship-based method method to do text classification and speech tagging with [6]. multiple sources, where Mahalanobia distance and confidence Based on feature and parameter transformation, one popular score were used to extract the relationship between each technique to achieve transfer learning is domain adaptation source domain and target domain. Ding et al. [22] tried to [7]–[9], which aims to tackle domain shift by reducing the achieve domain adaptation with multiple incomplete source This work was supported by the Australian Research Council under grant domains via low-rank matrix which could recover missing FL190100149.(Corresponding author: Jie Lu.) labels in source domains based on latent features from a target 978-1-7281-6926-2/20/$31.00 ©2020 IEEE domain. Redko et al. [23] and Xu et al. [24] solved multiple A. Common Feature Extraction sources domain adaptation under target shift and category In general, domain adaptation is an unsupervised learning shift, where source labels might not completely share labels task. For single source domain adaptation, given source do- of target domain or share labels with different proportions. i i ns j nt main Ds = f(xs; ys)gi=1 and target domain Dt = fxt gj=1, Zhu et al. [25] proposed a two-stage alignment framework for where xs; xt 2 X represent samples, ys 2 Y indicates multiple sources domain adaptation, in which domain-specific corresponding label of xs and ns; nt indicate the number of distribution alignment was used to reduce discrepancy between samples in source domain and target domain respectively. The source domains and target domain; domain-specific classifier main step is mapping distributions of source domain and target alignment was used to reduce difference among all classifiers. domain. One popular method to achieve this is maximum mean The main idea of all described multi-source domain adap- discrepancy (MMD) [10] which can be formulated as: tation begins with extracting common features of source do- 2 mains and target domain before training specific predictors of ns nt 1 X i 1 X j MMD(Xs;Xt) = φ(x ) − φ(x ) ; (1) each source domain, finally combining all specific predictors n s n t s i=1 t j=1 as the target predictor. The popular combination rules include H the average of source predictors and weighted average of where k·kH is reproducing kernel Hillbert space (RKHS) source predictors. Although some of these approaches have norm, φ is kernel-induced feature map. considered the connections between different source domains K In our setting, when there are K source domains fDsk gk=1, and target domain, they still treat each source domain equally to map original samples into a common feature space where during training. It has been proven that adding irrelevant domain adaptation can be achieved, we use a pre-trained deep source samples may lead to negative transfer and reduce the neural network Fp to collect latent features of all domains. At performance of the learning task [26]. Measuring relationship the same time, fine-tuning block Ffc, which is optimized by among source domains and target domain is necessary to avoid distribution fusion block Fd and specific domain adaptation negative performance. In this paper, in order to learn weights block (we will explain it in detail in section II-B), is added of multiple sources to get high performance of predictor in to fine-tune convolution layers in order to extract more robust target domain, following work [25] and domain matching latent representations. The extracted latent features are then method proposed by Li et al. [27], we proposed a deep neural used to complete specific domain adaptation. Common feature network based multi-source domain adaptation approach with extraction can be formulated as: distribution fusion and relationship extraction. This measures i i fc = Ffc(Fp(xs )); relatedness among sources and target during training and sk k guarantees that more similar sources contribute more to target f j = F (F (xj)); (2) ct fc p t task learning. Our main contributions are as follows: i = 1; 2; : : : ; nsk ; j = 1; 2; : : : ; nt: • We propose a method using distribution fusion and re- Rewrite (1) according to the proposed common feature lationship extraction to guarantee the strength of similar extractor and express it as multi-source setting, that is: source domains during learning a target task, something which is not well considered in existing methods; MMD(Xsk ;Xt) = 2 • We use a selective strategy to choose the best perfor- ns k nt (3) mance of the target task and to avoid negative transfer of 1 X i 1 X j φ(fc ) − φ(fc ) : n sk n t irrelevant sources. sk i=1 t j=1 H The structure of this paper is designed as follows: Section Before fine-tuning common features extracted by (2) using II describes the proposed model. Section III carries a series specific domain adaptation block and estimating RKHS dis- of experiments on real-world datasets and analyses the results. tance between each source domain and target domain, we first Our conclusion is given in section IV. add the distribution fusion block to optimize the parameters of the common feature fine-tuning block. So, corresponding II. PROPOSED METHOD loss function can be written as: L = MMD(X^ ;X ) The proposed method contains the following three parts: Fd s t 2 n (1) common feature extraction where pre-trained deep neural sk nt 1 X i 1 X j network, fine-tuning operation and distribution fusion strategy = Fd φ(fc ) − φ(fc ) n sk n t sk i=1 t j=1 are employed to collect robust features; (2) specific predictor H (4) 2 learning where specific domain adaptation and predictor of n K sk nt X 1 X 1 X each source domain are learned with cross-domain constraints; = ! φ(f i ) − φ(f j ) ; k cs ct (3) target predictor estimation where relationship extraction ns k nt k=1 k i=1 j=1 and selective strategy are used to obtain target predictor H without negative transfer.

Multi-Source Domain Adaptation with Distribution Fusion and Relationship Extraction

Originally, the Descendants of Hua Xia Were Not the Descendants of Yan Huang

Negotiation Philosophy in Chinese Characters

English Versions of Chinese Authors' Names in Biomedical Journals

Delong Zuo Associate Professor Department of Civil and Environmental Engineering Texas Tech University

From Translation to Adaptation: Chinese Language Texts and Early Modern Japanese Literature

Man Hua Zuo P N Ke Xian Li De Qiu Fang Wei J N Keion Shouno Zou Zh Yue Gu Ng Tiao Li Gett Robo Xu N Hu Sh Ng Mai Yue Gu Ng Ji Mian

Performing Chinese Contemporary Art Song

Names of Chinese People in Singapore

CJK NACO Searching

The Acquisition of Mandarin Prosody by American Learners of Chinese As a Foreign Language

Isolation Forest

1 a Syntactic Approach to the Grammaticalization of the Modal Marker Dāng 當 in Middle Chinese Barbara Meisterernst, National