Name That Sculpture
Total Page:16
File Type:pdf, Size:1020Kb
Name that Sculpture Relja Arandjelovic´ Andrew Zisserman Department of Engineering Science Department of Engineering Science University of Oxford University of Oxford [email protected] [email protected] ABSTRACT 1. INTRODUCTION We describe a retrieval based method for automatically de- The goal of this work is to automatically identify both the termining the title and sculptor of an imaged sculpture. This sculptor and the name of the sculpture given an image of is a useful problem to solve, but also quite challenging given the sculpture, for example from a mobile phone. This is a the variety in both form and material that sculptures can capability similar to that offered by Google Goggles, which take, and the similarity in both appearance and names that can use a photo to identify certain classes of objects, and can occur. thereby carry out a text based web search. Our approach is to first visually match the sculpture and Being able to identify a sculpture is an extremely useful func- then to name it by harnessing the meta-data provided by tionality: often sculptures are not labelled in public places, Flickr users. To this end we make the following three con- or appear in other people’s photos without labels, or appear tributions: (i) we show that using two complementary vi- in our own photos without labels (and we didn’t label at the sual retrieval methods (one based on visual words, the other time we took them because we thought we would remember on boundaries) improves both retrieval and precision per- their names). Indeed there are occasionally pleas on the web formance; (ii) we show that a simple voting scheme on the of the form “Can anyone help name this sculpture?”. tf-idf weighted meta-data can correctly hypothesize a sub- set of the sculpture name (provided that the meta-data has Identifying sculptures is also quite challenging. Although first been suitably cleaned up and normalized); and (iii) we Google Goggles can visually identify objects such as land- show that Google image search can be used to query expand marks and some artwork, sculptures have eluded it to date [11] the name sub-set, and thereby correctly determine the full because the visual search engine used for matching does not name of the sculpture. “see” smooth objects. This is because the first step in visual matching is to compute features such as interest points, and The method is demonstrated on over 500 sculptors covering these are often completely absent on sculptures and so visual more than 2000 sculptures. We also quantitatively evalu- matching fails. ate the system and demonstrate correct identification of the sculpture on over 60% of the queries. We divide the problem of identifying a sculpture from a query image into two stages: (i) visual matching to a large dataset of images of sculptures, and (ii) textual labelling Categories and Subject Descriptors given a set of matching images with annotations. Figure 1 H.3.3 [Information Storage and Retrieval]: Information shows an example. That we are able to match sculptures in Search and Retrieval; H.3.1 [Information Storage and images at all, for the first stage, is a result of combining two Retrieval]: Content analysis and indexing; I.4.9 [Image complementary visual recognition methods. First, a method Processing and Computer Vision]: Applications for recognizing 3D smooth objects from their outlines in clut- tered images. This has been applied to the visual matching General Terms of smooth (untextured) sculptures from Henry Moore and Rodin [4], and is reviewed in section 3.2. Second, we note Algorithms, Experimentation, Performance that there is still a role for interest point based visual match- ing as some sculptures do have texture or can be identified Keywords from their surroundings (which are textured). Thus we also Image retrieval, Object recognition, Image labelling employ a classical visual word based visual recognition sys- tem. This is reviewed in section 3.1. The matching image set for the query image is obtained from the sets each of the two recognition systems returns (section 3.3). Permission to make digital or hard copies of all or part of this work for The other ingredients required to complete the identifica- personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies tion are a data set of images to match the query image to, bear this notice and the full citation on the first page. To copy otherwise, to and annotation (of the sculptor and sculpture name) for the republish, to post on servers or to redistribute to lists, requires prior specific images of this data set. For the annotated dataset we take permission and/or a fee. advantage of the opportunity to harness the knowledge in ICMR ’12, June 5-8, Hong Kong, China social media sites such as Facebook and Flickr. As is well Copyright c 2012 ACM 978-1-4503-1329-2/12/06 ...$10.00. known, such sites can provide millions of images with some form of annotation in the form of tags and descriptions – though the annotation can often be noisy and unreliable [19]. The second stage of the identification combines this meta- Query information associated with the matched image set in order to propose the name of the sculptor and sculpture. The pro- posed sculpture name is finally determined using a form of Bag query expansion from Google image search. Visual of Words The stages of the identification system are illustrated in fig- Boundaries ure 1. We describe the dataset downloaded from Flickr in section 2, and the method of obtaining the name from the meta-data and Google query expansion in section 4. Others have used community photo collections to identify objects in images [10, 12] and have dealt with the prob- lems of noisy annotations [14, 21]. In particular, Gammeter et al [10] auto-annotated images with landmarks such as “Arc de Triomphe” and “Statue of Liberty” using a stan- dard visual word matching engine. In [10], two additional ideas were used to resolve noisy annotations: first, the GPS of the image was used to filter results (both for the query and for the dataset); second, annotations were verified us- ing Wikipedia as an Oracle. Although we could make use of GPS this has not turned out to be necessary as (i) sculptures are often sufficiently distinctive without it, and (ii) sculp- Visual tures are sometimes moved to different locations (e.g. the Matching human figures of Gormley’s “Event Horizon” or Louise Bour- geois’ “Maman”) and so using GPS might harm recognition performance. Similarly, using Wikipedia to verify sculpture matches has not been found to be necessary, and also at the moment Wikipedia only covers a fraction of the sculptures that we consider. 2. DATASET The dataset provides both the library of sculpture images and the associated meta-data for labelling the sculptor and sculpture. A list of prominent sculptors was obtained from Wikipedia [1] (as of 24th November 2011 this contained 616 names). This contains sculptors such as “Henry Moore”, “Auguste Rodin”, “Michelangelo”, “Joan Mir´o”, and “Jacob Sculptor: Giambologna Epstein”. Near duplicates were removed from the list au- Keywords: centaur hercules tomatically by checking if the Wikipedia page for a pair of Perform Google Image Search sculptor names redirects to the same entry. Only Michelan- using the sculptor and keywords gelo was duplicated (as “Michelangelo” and “Michelangelo Buonarroti”). Flickr [2] was queried using this list, leading to 50128 mostly high resolution (1024 × 768) images. Figure 2 shows a ran- dom sample. For each of the images textual meta data is kept as well. It is obtained by downloading the title, descrip- tion and tags assigned to the image by the Flickr user who uploaded it. The textual query (i.e. sculptor name) used to retrieve an image is saved too. This forms the Sculptures Labelling 50K dataset used in this work. Unlike the recent Sculptures 6k dataset of [4] we did not bias our dataset towards smooth textureless sculptures. Extract sculpture name from titles 3. PARTICULAR OBJECT LARGE SCALE Sculptor: Giambologna Sculpture: Hercules and the Centaur Eurytion RETRIEVAL SYSTEM The first stage of the naming algorithm is to match the query Figure 1: Sculptor and sculpture identification: on- image to those images in the Sculptures 50k that contain the line system overview. same sculpture as the query. We briefly review here the two complementary visual retrieval engines that we have imple- The system returns a ranked list of images with each scored by the number of features (visual words) matched to the query. 3.2 Review of boundary descriptor large scale retrieval The boundary descriptor retrieval method [4] follows some of the elements of the visual word retrieval system, in that an inverted index is used on quantized descriptors, but in- stead of representing the entire image only the boundaries of certain objects are represented. This quantized boundary descriptor – the bag of boundaries (BoB) representation – is computed and stored for each image in the corpus. Then for a query image, images from the corpus are retrieved in a similar manner to the visual word system by first ranking on the similarity of the BoB descriptor, and then reranking a short list by the spatial compatibility between the query and retrieved image using an affine transformation computed on local boundary descriptors. Figure 2: Random sample from the Sculptures 50k The BoB representation is obtained in three stages: first, dataset. ‘relevant’ objects are segmented automatically to suppress mented. In each case, a visual query is used as the input and background clutter in the image; second, their boundaries the system returns a ranked list of matched images from the are described locally and at multiple scales; and, third, the dataset, where the ranking is based on the number of spatial boundary descriptors are vector quantized and the BoB rep- correspondences between the query and target image.