Learning with Sets in Multiple Instance Regression Applied to Remote Sensing

Learning with Sets in Multiple Instance Regression Applied to Remote Sensing Thomas Uriot European Space Agency Advanced Concepts Team [email protected] ABSTRACT 1 INTRODUCTION In this paper, we propose a novel approach to tackle the multiple Multiple instance learning (MIL) has been an active area of research instance regression (MIR) problem. This problem arises when the as of late. However, the most commonly studied MIL problem is the data is a collection of bags, where each bag is made of multiple one of multiple instance classification (MIC), where negative bags instances corresponding to the same unique real-valued label. Our strictly contain negative instances and positive bags contain at least goal is to train a regression model which maps the instances of one positive instance. A multitude of applications are covered by an unseen bag to its unique label. This MIR setting is common the MIC framework and it has been extensively applied to medical to remote sensing applications where there is high variability in imaging in a weakly supervised setting [33], [35], where each image the measurements and low geographical variability in the quan- is taken as a bag and sub-regions of the image are instances. The tity being estimated. Our approach, in contrast to most competing MIL framework has also been applied to analyzing videos [23], methods, does not make the assumption that there exists a prime where each video sequence is represented as a bag and each frame instance responsible for the label in each bag. Instead, we treat each or segment is an instance, and to image categorization [4] and bag as a set (i.e, an unordered sequence) of instances and learn to retrieval [38], [36]. map each bag to its unique label by using all the instances in each However, the MIL problem in a regression setting, where bag bag. This is done by implementing an order-invariant operation labels are real valued, has been surprisingly much less studied in characterized by a particular type of attention mechanism. This the literature. This may be due to the fact that the main data sources method is very flexible as it does not require domain knowledge found in MIL are images and text, which are more often encountered nor does it make any assumptions about the distribution of the in classification tasks. The main difference between the MIRand instances within each bag. We test our algorithm on five real world MIC problems lies in the fact that it is no longer possible to restrict datasets and outperform previous state-of-the-art on three of the oneself to finding a single positive instance, which renders the datasets. In addition, we augment our feature space by adding the problem more open-ended. The MIR problem first appeared in [21] moments of each feature for each bag, as extra features, and show where the authors made the assumption that each bag contained that while the first moments lead to higher accuracy, there isa a prime instance that was responsible for the bag’s label. They diminishing return. would first identify the prime instance in each bag and thenfita linear predictor on those instances. However, by only considering KEYWORDS the prime instance, we may get rid of a lot of useful information contained in the non-prime instances. For example, let us posit that Attention mechanism; Distribution regression; Multiple instance the instances in a bag follow a Gaussian distribution, a sensible regression; Neural networks; Remote sensing; Sample raw moments choice for the prime instance would be the mean, but in doing so, we would lose information about the variance, which could be ACM Reference Format: important in predicting the label. Thomas Uriot. 2019. Learning with Sets in Multiple Instance Regression Instead of the prime instance assumption, two main directions Applied to Remote Sensing. In KDD ’19: The 25th ACM SIGKDD International of tackling the MIR problem have emerged in the literature. The Conference on Knowledge Discovery & Data Mining, August 03–07, 2019, first direction consists in mapping the instances in each bagtoa Anchorage, Alaska USA. ACM, New York, NY, USA, 8 pages. https://doi.org/ arXiv:1903.07745v3 [stat.ML] 12 Mar 2020 10.1145/nnnnnnn.nnnnnnn new embedding space while losing the least information possible. For instance, in Szabó et al. [24] and in Law et al. [15], the authors propose to use kernel mean embedding [17] in order to summarize the information in each bag. While in Chen et al. [3], the authors Permission to make digital or hard copies of all or part of this work for personal or convert the MIR problem to a standard supervised learning problem classroom use is granted without fee provided that copies are not made or distributed by first mapping each bag into a feature space characterized bya for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the similarity measure between the instances present in each bag. The author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or second direction, rather than assuming that a single prime instance republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. is responsible for the bag label, looks at explicitly using more than KDD ’19, August 03–07, 2019, Anchorage, Alaska USA one instance per bag, whether it be as a weighted combination [29] © 2019 Copyright held by the owner/author(s). Publication rights licensed to the or as a prime cluster of instances [30]. Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 In this paper, we propose to both map information from each bag https://doi.org/10.1145/nnnnnnn.nnnnnnn to a single vector by computing the first moments of each feature KDD ’19, August 03–07, 2019, Anchorage, Alaska USA Uriot and to learn a non-linear weighted combination of the instances the median, is called the instance-MIR. The instance-MIR method by implementing a neural network with attention mechanism [1]. essentially ignores the fact that the given problem can be framed in The idea is to treat each bag as an unordered sequence where the a MIR setup. In Wang et al. [31], the authors build on their previous elements of the sequence are the instances. Essentially, each bag work on predicting AOD and investigate a probabilistic framework is a set composed of a certain number of instances and we want by fitting a mixture model and learning the parameters using the to make the learning process invariant to permutations of the in- expectation-maximization (EM) algorithm. While the authors as- stances. This is achieved by using an order invariant aggregation sume that a unique prime instance is responsible for the bag label, operator (e.g, mean, median) which corresponds to a particular type each instance still contributes to the label proportionally to its prob- of attention mechanism described in Vinyals et al. [28]. In general ability of being the prime instance. This is in contrast to Ray and (e.g, in neural machine translation [16]), the instances used as input Page [21], where the authors also use the EM algorithm to select a to the attention mechanism follow a sequential ordering, while in unique prime instance (prime-MIR), rather than using a soft, prob- our case, we assume that the instances are unordered and indepen- abilistic weighted combination of instances. The authors in Wang dent. The bag label is thus fully parametrized by neural networks et al. [31] test their EM algorithm on the aforementioned AOD and and the output is insensitive to the ordering of the instances. crop yield datasets and achieve state-of-the-art results. They also We test our algorithm on 5 real-world datasets, stemming from evaluate the performance of previous MIR algorithms: prime-MIR remotely sensed data, which have previously been studied as a MIR [21], pruning-MIR [32], cluster-MIR [30] and two other baseline problem, and we compare our results to the current state-of-the-art algorithms, instance-MIR and aggregated-MIR. In this paper, we on those datasets. The first application consists in predicting aerosol compare our algorithm (attention-MIR) to the results obtained in optical depth (AOD)1 [8] - aerosols are fine airborne solid particles Wang et al. [31] on both datasets, and re-implement instance-MIR or liquid droplets in air, that both reflect and absorb incoming solar and aggregated-MIR. radiation - which was first attempted in Wang et al. [32]. The second The estimation of crop yields in a MIR framework was studied in application is the prediction of county-level crop yields2 [29] (wheat Wagstaff and Lane [29] and Wagstaff et. al [30], where the authors and corn) in Kansas between 2001 and 2005. Remotely sensed data used county-level crop (wheat and corn) data in Kansas and Califor- from satellite is a setting in which the MIR problem naturally arises nia between 2001 and 2005. In their first work, the authors proposed due to two reasons. Firstly, Sensors from the satellite will gather the cluster-MIR algorithm, which improved upon prime-MIR, by several noisy measurements due to the variability from the sensors clustering similar instances together in order to have a represen- themselves, and to the properties of the targeted area on Earth (e.g, tation of the bag structure and identify the most relevant instance surface and atmospheric effects). Secondly, aerosols have a very within each bag.

Load more