Class Subset Selection for Partial Domain Adaptation

Class Subset Selection for Partial Domain Adaptation Fariba Zohrizadeh Mohsen Kheirandishfard Farhad Kamangar Department of Computer Science and Engineering, University of Texas at Arlington {fariba.zohrizadeh,mohsen.kheirandishfard,farhad.kamangar}@uta.edu Abstract performance on the target domain classification task. In the absence of target labels, unsupervised domain Domain adaptation is the task of transferring knowl- adaptation (UDA) seeks to bridge different domains by edge from a labeled source dataset to an unlabeled tar- learning feature representations that are discriminative get dataset. Partial domain adaptation (PDA) investi- and domain-invariant [1, 12, 21]. gates the scenarios in which the target label space is a Recently, various approaches have been proposed subset of the source label space. The main purpose of to combine both domain adaptation and deep feature the PDA is to identify the shared classes between the learning in a unified framework for exploiting more domains and promote learning transferable knowledge transferable knowledge across domains [6, 7, 15, 18, 31, from these classes. Inspired by the idea of subset selec- 37] (see [34] for a comprehensive survey on deep domain tion, we propose an adversarial PDA approach which adaptation methods). A class of deep domain adapta- aims to not only automatically select the most relevant tion methods aims to reduce the misfit between the subset of source domain classes but also ignore the sam- distributions of the source and target domains through ples that are less transferable across the domains. In minimizing discrepancy measures such as maximum the absence of target labels, the proposed approach is mean discrepancy [15, 18], correlation distance [27, 29], able to effectively learn domain-invariant feature repre- etc. In this way, they map the domains into the same sentations, which in turn can facilitate and enhance the latent space, which results in learning feature repre- classification performance in the target domain. Em- sentations that are domain-invariant. A new line of pirical results on Office-31 and Office-Home datasets research has recently emerged which uses the concept demonstrate the high potential of the proposed approach of generative adversarial networks [13] to align feature in addressing different partial domain adaptation tasks. distributions across the domains and learn discriminators that are able to predict the domain labels of different samples [19, 23, 35]. Specifically, these methods try to generate feature representations that are difficult 1. Introduction for the discriminators to differentiate. Deep neural networks have demonstrated superior Despite the advantages offered by the existing UDA performance in a variety of machine learning prob- methods, they mostly exhibit superior performance lems such as semantic image segmentation [5, 11, 17], in scenarios in which the source and target domains object detection, and classification [16, 24, 30], etc. share the same label space. With the goal of consider- These impressive achievements heavily depend on the ing more realistic cases, [4] introduced partial domain availability of large amounts of labeled training data. adaptation (PDA) as a new adaptation scenario which However, in many applications, the acquisition of suffi- assumes the target domain label space is a subset of the cient labeled data is difficult and time-consuming. One source domain label space. The primary challenge in potential solution to reduce the labeling consumption PDA is to identify and reject the source domain classes is to build an effective predictive model using richly- that do not appear in the target domain, known as out- annotated datasets from different but related domains. lier classes, since they may have negative impacts on However, this paradigm generally suffers from the do- the transfer performance [3, 22]. Addressing this chal- main shift between the distributions of the source and lenge enables the PDA methods to effectively transfer the target datasets. As a result, deep networks trained models learned on large labeled datasets (e.g. Ima- on labeled source datasets often exhibit unsatisfactory geNet) to small-scale datasets from different but re- 66 lated domains. recently developed which simultaneously promote pos- In this paper, we propose an adversarial approach itive transfer from the common classes between the do- for partial domain adaptation, which aims to not only mains and alleviate the negative transfer from the out- automatically reject the outlier source classes, but also lier classes [3, 4, 38]. Selective adversarial network [3] down-weight the relative importance of irrelevant sam- trains separate domain discriminators for each source ples, i.e. those samples that are highly dissimilar across class to align the distributions of the source and target different domains. Our method uses the same network domains across the shared label space and to ignore architecture as partial adversarial domain adaptation the outlier classes. Partial adversarial domain adapta- (PADA) [4] and incorporates two additional regular- tion (PADA) [4] proposed a new architecture which ization terms to boost the target domain classification assigns a weight to each source domain class based performance. Inspired by the idea of subset selection, on the target label prediction and automatically re- the first regularization is a row-sparsity term on the duces the weights of the outlier classes. Importance output of the classifier, which promotes the selection of weighted adversarial nets [38] develops a two-domain a small subset of classes that are in common between classifier strategy to estimate the relative importance the source and target domains. The second regular- of the source domain samples. ization is a minimum entropy term which utilizes the Closely related to our work, transferable atten- output of the discriminator to down-weight the relative tion for domain adaptation (TADA) [35] proposed an importance of irrelevant samples from both domains. attention-based mechanism for UDA, which can high- We empirically observe that our method can effectively light transferable regions or images. Unlike TADA, enhance the target classification accuracy on different our method is focused on the PDA problem and uti- PDA tasks. lizes a different network architecture with a novel loss function that efficiently assigns weights to both classes 2. Related Work and samples. Our method differs from PADA [4] in the sense that we incorporate two novel regularization To date, various deep unsupervised domain adapta- terms which not only able to discover and reject the tion methods have been developed to extract domain- outlier classes more effectively but also down-weight invariant feature representations from different do- the relative importance of the irrelevant samples in the mains. Some studies [9, 15, 18, 20, 36] have proposed to training procedure. minimize the maximum mean discrepancy between the source and target distributions. In [28], a correlation 3. Problem Formulation alignment (CORAL) method is proposed that utilizes a linear transformation to match the second-order statis- This section briefly reviews two well-established do- tics between the domains. [29] presented an extension main adaptation methods and then provides a detailed of the CORAL method that aligns correlations of layer explanation on how our proposed method relates to i i ns activations in deep networks by learning a non-linear them. Let {(xs, ys)}i=1 be a set of ns sample points i transformation. Despite the practical success of the drawn i.i.d from the source domain Ds, where xs de- th i aforementioned methods in aligning the domain distri- notes the i source image with label ys. Similarly, let i nt butions, it is shown that they are unable to completely {xt}i=1 be a set of nt sample points collected i.i.d from i th eliminate the domain shift [7, 37]. the target domain Dt, where xt indicates the i target Recently, adversarial learning has been widely em- image. To clarify notation, let X =Xs∪Xt be the set of i ns ployed to enhance the performance of UDA methods entire images from both domains, where Xs = {xs}i=1 i nt [2, 8, 10, 19, 32]. The basic idea behind the adversarial- and Xt = {xt }i=1. The UDA methods assume that based methods is to train a discriminator for predicting the source and target domains possess the same label domain labels and a deep network for extracting fea- space, denoted as Cs and Ct, respectively. In the ab- tures that are indistinguishable by the discriminator. sence of target labels, the primary goal of the UDA is By doing so, the discrepancy between the source and to learn domain-invariant feature representations that target domains can be efficiently eliminated, which re- can reduce the domain shift. One promising direction sults in significant improvement in the overall classifi- to achieve this goal is to train a domain adversarial cation performance [8, 23, 32]. [39] developed an incre- neural network [8] which consists of a discriminator Gd mental adversarial scheme which gradually reduces the for predicting the domain labels, a feature extractor gap between the domain distributions by iteratively se- Gf for confusing the discriminator by learning trans- lecting the high confidence pseudo-labeled target sam- ferable feature representations, and a classifier Gy that ples to enlarge the training set. classifies the source domain samples. Training the ad- Towards the task of PDA, great studies have been versarial network is equivalent to solve the following 67 Figure 1: Overview of the proposed adversarial network for partial transfer learning. The network consists of a feature extractor, a classifier, and a domain discriminator, denoted by Gf , Gy, and Gd, respectively. The blue and green arrows depict the source flow and target flow. Loss functions Ly, Ld, Le, and L∞ denote the classification loss, the discriminative loss, the entropy loss, and the selection loss. Best viewed in color. optimization problem classifier trained along such feature extractor can gen- eralize well to the target domain.

Class Subset Selection for Partial Domain Adaptation

Surviving Set Theory: a Pedagogical Game and Cooperative Learning Approach to Undergraduate Post-Tonal Music Theory

Pitch-Class Set Theory: an Overture

The Axiom of Choice and Its Implications

Formal Construction of a Set Theory in Coq

Equivalents to the Axiom of Choice and Their Uses A

P0138R2 2016-03-04 Reply-To: [email protected]

CE160 Determinacy Lesson Plan Vukazich

Cardinality and the Nature of Infinity

A Tale of Two Set Theories

Nonconvergence, Covariance Constraints, and Class Enumeration in Growth Mixture Models

Florida State University Libraries

Guide to Set Theory Proofs