A 10 Million Image Database for Scene Recognition

Total Page:16

File Type:pdf, Size:1020Kb

A 10 Million Image Database for Scene Recognition This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2723009, IEEE Transactions on Pattern Analysis and Machine Intelligence 1 Places: A 10 million Image Database for Scene Recognition Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba Abstract—The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near- human semantic classification performance at tasks such as visual object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories, comprising a large and diverse list of the types of environments encountered in the world. Using the state-of-the-art Convolutional Neural Networks (CNNs), we provide scene classification CNNs (Places-CNNs) as baselines, that significantly outperform the previous approaches. Visualization of the CNNs trained on Places shows that object detectors emerge as an intermediate representation of scene classification. With its high-coverage and high-diversity of exemplars, the Places Database along with the Places-CNNs offer a novel resource to guide future progress on scene recognition problems. Index Terms—Scene classification, visual recognition, deep learning, deep feature, image dataset. F 1 INTRODUCTION Whereas most datasets have focused on object categories If a current state-of-the-art visual recognition system would (providing labels, bounding boxes or segmentations), here send you a text to describe what it sees, the text might read we describe the Places database, a quasi-exhaustive repos- something like: “There is a sofa facing a TV set. A person itory of 10 million scene photographs, labeled with 434 is sitting on the sofa holding a remote control. The TV is scene semantic categories, comprising about 98 percent of on and a talk show is playing”. Reading this, you would the type of places a human can encounter in the world. likely imagine a living-room. However, that scenery can Image samples are shown in Fig. 1 while Fig. 2 shows the very well happen in a resort by the beach. number of images per category, sorted in decreasing order. For an agent acting into the world, there is no doubt that Departing from Zhou et al. [1], we describe in depth the object and event recognition should be a primary goal of its construction of the Places Database, and evaluate the per- visual system. But knowing the place or context in which formance of several state-of-the-art Convolutional Neural the objects appear is as equally important for an intelligent Networks (CNNs) for place recognition. We compare how system to understand what might have happened in the past the features learned in a CNN for scene classification be- and what may happen in the future. For instance, a table have when used as generic features in other visual recogni- inside a kitchen can be used to eat or prepare a meal, while tion tasks. Finally, we visualize the internal representations a table inside a classroom is intended to support a notebook of the CNNs and discuss one major consequence of training or a laptop to take notes. a deep learning model to perform scene recognition: object A key aspect of scene recognition is to identify the place detectors emerge as an intermediate representation of the in which the objects seat (e.g., beach, forest, corridor, office, network [2]. Therefore, while the Places database does not street, ...). Although one can avoid using the place category contain any object labels or segmentations, it can be used by providing a more exhaustive list of the objects in the to train new object classifiers. picture and a description of their spatial relationships, a 1.1 The Rise of Multi-million Datasets place category provides the appropriate level of abstraction to avoid such a long and complex description. Note that one What does it take to reach human-level performance with could avoid using object categories in a description by only a machine-learning algorithm? In the case of supervised listing parts (i.e. two eyes on top of a mouth for a face). learning, the problem is two-fold. First, the algorithm must Like objects, places have functions and attributes. They are be suitable for the task, such as Convolutional Neural composed of parts and some of those parts can be named Networks in the large scale visual recognition [1], [3] and and correspond to objects, just like objects are composed Recursive Neural Networks for natural language processing of parts, some of which are nameable as well (e.g., legs, [4], [5]. Second, it must have access to a training dataset eyes). of appropriate coverage (quasi-exhaustive representation of classes and variety of exemplars) and density (enough • B. Zhou, A. Khosla, A. Oliva, A. Torralba are with the Computer samples to cover the diversity of each class). The optimal Science and Artificial Intelligence Laboratory, Massachusetts Institute space for these datasets is often task-dependent, but the of Technology, USA. rise of multi-million-item sets has enabled unprecedented • A. Lapedriza is with Universitat Oberta de Catalunya, Spain. performance in many domains of artificial intelligence. 0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2723009, IEEE Transactions on Pattern Analysis and Machine Intelligence 2 veterinarians office elevator door fishpond windmill train station platform bedroom cafeteria watering hole corral amusement park staircase bar field road arch tower soccer field conference center shoe shop rainforest swimming pool street Indoor Nature Urban Fig. 1. Image samples from various categories of the Places Database (two samples per category). The dataset contains three macro-classes: Indoor, Nature, and Urban. Fig. 2. Sorted distribution of image number per category in the Places Database. Places contains 10,624,928 images from 434 categories. Category names are shown for every 6 intervals. The successes of Deep Blue in chess, Watson in “Jeop- MIT Indoor67 database [15] with 67 indoor categories and ardy!”, and AlphaGo in Go against their expert human the SUN (Scene Understanding, with 397 categories and opponents may thus be seen as not just advances in algo- 130,519 images) database [16] provided a larger coverage rithms, but the increasing availability of very large datasets: of place categories, but failed short in term of quantity of 700,000, 8.6 million, and 30 million items, respectively [6]– data needed to feed deep learning algorithms. To comple- [8]. Convolutional Neural Networks [3], [9] have likewise ment large object-centric datasets such as ImageNet [11], achieved near human-level visual recognition, trained on we build the Places dataset described here. 1.2 million object [10]–[12] and 2.5 million scene images [1]. Expansive coverage of the space of classes and samples Meanwhile, the Pascal VOC dataset [17] is one of the allows getting closer to the right ecosystem of data that a earliest image dataset with diverse object annotations in natural system, like a human, would experience. The history scene context. The Pascal VOC challenge has greatly ad- of image datasets for scene recognition also sees the rapid vanced the development of models for object detection and growing in the image samples as follows. segmentation tasks. Nowadays, COCO dataset [18] focuses on collecting object instances both in polygon and bounding box annotations for images depicting everyday scenes of 1.2 Scene-centric Datasets common objects. The recent Visual Genome dataset [19] The first benchmark for scene recognition was the Scene15 aims at collecting dense annotations of objects, attributes, database [13], extended from the initial 8 scene dataset in and their relationships. ADE20K [20] collects precise dense [14]. This dataset contains only 15 scene categories with annotation of scenes, objects, parts of objects with a large a few hundred images per class, and current classifiers are and open vocabulary. Altogether, annotated datasets further saturated, reaching near human performance with 95%. The enable artificial systems to learn visual knowledge linking 0162-8828 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPAMI.2017.2723009, IEEE Transactions on Pattern Analysis and Machine Intelligence 3 parts, objects and scene context. 2.2.1 Step 1: Downloading images using scene cate- gory and attributes From online image search engines (Google Images, Bing 2 PLACES DATABASE Images, and Flickr), candidate images were downloaded using a query word from the list of scene classes provided 2.1 Coverage of the categorical space by the SUN database [16]. In order to increase the diversity The first asset of a high-quality dataset is an expansive of visual appearances in the Places dataset, each scene class coverage of the categorical space to be learned. The strategy query was combined with 696 common English adjectives 1 of Places is to provide an exhaustive list of the categories of (e.g., messy, spare, sunny, desolate, etc.). In Fig. 3) we show environments encountered in the world, bounded by spaces some examples of images in Places grouped by queries. where a human body would fit (e.g.
Recommended publications
  • Bag-Of-Features Image Indexing and Classification in Microsoft SQL Server Relational Database
    Bag-of-Features Image Indexing and Classification in Microsoft SQL Server Relational Database Marcin Korytkowski, Rafał Scherer, Paweł Staszewski, Piotr Woldan Institute of Computational Intelligence Cze¸stochowa University of Technology al. Armii Krajowej 36, 42-200 Cze¸stochowa, Poland Email: [email protected], [email protected] Abstract—This paper presents a novel relational database ar- data directly in the database files. The example of such an chitecture aimed to visual objects classification and retrieval. The approach can be Microsoft SQL Server where binary data is framework is based on the bag-of-features image representation stored outside the RDBMS and only the information about model combined with the Support Vector Machine classification the data location is stored in the database tables. MS SQL and is integrated in a Microsoft SQL Server database. Server utilizes a special field type called FileStream which Keywords—content-based image processing, relational integrates SQL Server database engine with NTFS file system databases, image classification by storing binary large object (BLOB) data as files in the file system. Microsoft SQL dialect (Transact-SQL) statements I. INTRODUCTION can insert, update, query, search, and back up FileStream data. Application Programming Interface provides streaming Thanks to content-based image retrieval (CBIR) access to the data. FileStream uses operating system cache for [1][2][3][4][5][6][7][8] we are able to search for similar caching file data. This helps to reduce any negative effects images and classify them [9][10][11][12][13]. Images can be that FileStream data might have on the RDBMS performance.
    [Show full text]
  • Read and Write Images from a Database Using SQL Server
    HOWTO: Read and write images from a database using SQL Server Atalasoft is an imaging SDK and we do not directly have any dependencies on, or specific functionality for Database access (with the single exception of our DbImageSource class). However, since the typical means of image storage in a database is a BLOB (Binary Long Object), developers can take advantage of the AtalaImage.ToByteArray and AtalaImage.FromByteArray methods to use in their own code to store and retrieve image data to and from databases respectively. Atalasoft dotImage has streaming capabilities that can be used jointly with ADO.NET to read and write images directly to a database without saving to a temporary file. The following code snippets demonstrates this in C# and VB.NET. Please note that the code examples below are provided as a courtesy/convenience only and are not meant to be production code or represent the only way to access a database. The code is provided as is as but an example of how to get the needed data in a byte array suitable for use in storing to a database and how to take binary data containing an image back into an AtalaImage to use in our SDK. Write to a Database C# rivate void SaveToSqlDatabase(AtalaImage image) { SqlConnection myConnection = null; try { // Save image to byte array. // we will be storing this image as a Jpeg using our JpegEncoder with 75 quality // you could use any of our ImageEncoder classes to store as the respective image types byte[] imagedata = image.ToByteArray(new Atalasoft.Imaging.Codec.JpegEncoder(75)); // Create the SQL statement to add the image data.
    [Show full text]
  • Peoplesoft Financials/Supply Chain Management 9.2 (Through Update Image 40) Hardware and Software Requirements
    PeopleSoft Financials/Supply Chain Management 9.2 (through Update Image 40) Hardware and Software Requirements July 2021 PeopleSoft Financials/Supply Chain Management 9.2 (through Update Image 40) Hardware and Software Requirements Copyright © 2021, Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software
    [Show full text]
  • Powerhouse(R) 4GL Migration Planning Guide
    COGNOS(R) APPLICATION DEVELOPMENT TOOLS POWERHOUSE(R) 4GL FOR MPE/IX MIGRATION PLANNING GUIDE Migration Planning Guide 15-02-2005 PowerHouse 8.4 Type the text for the HTML TOC entry Type the text for the HTML TOC entry Type the text for the HTML TOC entry Type the Title Bar text for online help PowerHouse(R) 4GL version 8.4 USER GUIDE (DRAFT) THE NEXT LEVEL OF PERFORMANCETM Product Information This document applies to PowerHouse(R) 4GL version 8.4 and may also apply to subsequent releases. To check for newer versions of this document, visit the Cognos support Web site (http://support.cognos.com). Copyright Copyright © 2005, Cognos Incorporated. All Rights Reserved Printed in Canada. This software/documentation contains proprietary information of Cognos Incorporated. All rights are reserved. Reverse engineering of this software is prohibited. No part of this software/documentation may be copied, photocopied, reproduced, stored in a retrieval system, transmitted in any form or by any means, or translated into another language without the prior written consent of Cognos Incorporated. Cognos, the Cognos logo, Axiant, PowerHouse, QUICK, and QUIZ are registered trademarks of Cognos Incorporated. QDESIGN, QTP, PDL, QUTIL, and QSHOW are trademarks of Cognos Incorporated. OpenVMS is a trademark or registered trademark of HP and/or its subsidiaries. UNIX is a registered trademark of The Open Group. Microsoft is a registered trademark, and Windows is a trademark of Microsoft Corporation. FLEXlm is a trademark of Macrovision Corporation. All other names mentioned herein are trademarks or registered trademarks of their respective companies. All Internet URLs included in this publication were current at time of printing.
    [Show full text]
  • Eloquence: HP 3000 IMAGE Migration by Michael Marxmeier, Marxmeier Software AG
    Eloquence: HP 3000 IMAGE Migration By Michael Marxmeier, Marxmeier Software AG Michael Marxmeier is the founder and President of Marxmeier Software AG (www.marxmeier.com), as well as a fine C programmer. He created Eloquence in 1989 as a migration solution to move HP250/HP260 applications to HP-UX. Michael sold the package to Hewlett-Packard, but has always been responsible for Eloquence development and support. The Eloquence product was transferred back to Marxmeier Software AG in 2002. This chapter is excerpted from a tutorial that Michael wrote for HPWorld 2003. Email: [email protected] Eloquence is a TurboIMAGE compatible database that runs on HP-UX, Linux and Windows.Eloquence supports all the TurboIMAGE intrinsics, almost all modes, and they behave identically. HP 3000 applications can usually be ported with no or only minor changes. Compatibility goes beyond intrinsic calls (and also includes a performance profile.) Applications are built on assumptions and take advantage of specific behavior. If those assumptions are no longer true, the application may still work, but no longer be useful. For example, an IMAGE application can reasonably expect that a DBFIND or DBGET execute fast, independently of the chain length and that DBGET performance does not differ substantially between modes. If this is no longer true, the application may need to be rewritten, even though all intrinsic calls are available. Applications may also depend on external utilities or third party tools. If your application relies on a specific tool (let’s say, Robelle’s SUPRTOOL), you want it available (Suprtool already works with Eloquence). If a tool is not available, significant changes to the application would be required.
    [Show full text]
  • Turboimage/XL Database Management System Reference Manual
    TurboIMAGE/XL Database Management System Reference Manual HP e3000 MPE/iX Computer Systems Edition 8 Manufacturing Part Number: 30391-90012 E0701 U.S.A. July 2001 Notice The information contained in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this material, including, but not limited to, the implied warranties of merchantability or fitness for a particular purpose. Hewlett-Packard shall not be liable for errors contained herein or for direct, indirect, special, incidental or consequential damages in connection with the furnishing or use of this material. Hewlett-Packard assumes no responsibility for the use or reliability of its software on equipment that is not furnished by Hewlett-Packard. This document contains proprietary information which is protected by copyright. All rights reserved. Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws. Restricted Rights Legend Use, duplication, or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013. Rights for non-DOD U.S. Government Departments and Agencies are as set forth in FAR 52.227-19 (c) (1,2). Acknowledgments UNIX is a registered trademark of The Open Group. Hewlett-Packard Company 3000 Hanover Street Palo Alto, CA 94304 U.S.A. © Copyright 1985, 1987, 1989, 1990, 1992, 1994, 1997, 2000, 2001 by Hewlett-Packard Company 2 Contents 1. Introduction Data Security . 27 Rapid Data Retrieval and Formatting . 28 Program Development .
    [Show full text]
  • Oracle Multimedia Image PL/SQL API Quick Start
    Oracle Multimedia Image PL/SQL API Quick Start Introduction Oracle Multimedia is a feature that enables Oracle Database to store, manage, and retrieve images, audio, video, and other heterogeneous media data in an integrated fashion with other enterprise information. Oracle Multimedia extends Oracle Database reliability, availability, and data management to multimedia content in media-rich applications. This article provides simple PL/SQL examples that upload, store, manipulate, and export image data inside a database using Oracle Multimedia’s PL/SQL packages. Some common pitfalls are also highlighted. The PL/SQL package used here is available in Oracle Database release 12c Release 2 or later with Oracle Multimedia installed (the default configuration provided by Oracle Universal Installer). The functionality in this PL/SQL package is the same as the functionality available with the Oracle Multimedia relational interface and object interface. For more details refer to Oracle Multimedia Reference and Oracle Multimedia User’s Guide. The following examples will show how to store images within the database in BLOB columns so that the image data is stored in database tablespaces. Oracle Multimedia image also supports BFILEs (pointers to files that reside on the filesystem), but this article will not demonstrate the use of BFILEs. Note that BFILEs are read -only so they can only be used as the source for image processing operations (i.e. you can process from a BFILE but you can’t process into a BFILE). NOTE: Access to an administrative account is required in order to grant the necessary file system privileges. In the following examples, you should change the command connect sys as sysdba to the appropriate one for your system: connect sys as sysdba Enter password: password The following examples also connect to the database using connect ron Enter password: password which you should change to an actual user name and password on your system.
    [Show full text]
  • Using Free Pascal to Create Android Applications
    Using Free Pascal to create Android applications Michaël Van Canneyt January 19, 2015 Abstract Since several years, Free pascal can be used to create Android applications. There are a number of ways to do this, each with it’s advantages and disadvantages. In this article, one of the ways to program for Android is explored. 1 Introduction Programming for Android using Free Pascal and Lazarus has been possible since many years, long before Delphi unveiled its Android module. However, it is not as simple a process as creating a traditional Desktop application, and to complicate matters, there are several ways to do it: 1. Using the Android NDK. Google makes the NDK available for tasks that are CPU- intensive (a typical example are opengl apps), but warns that this should not be the standard way of developing Android apps. This is the way Delphi made it’s Android port: Using Firemonkey – which uses OpenGL to draw controls – this is the easiest approach. There is a Lazarus project that aims to do things in a similar manner (using the native-drawn widget set approach). 2. Using the Android SDK: a Java SDK. This is the Google recommended way to create Android applications. This article shows how to create an Android application using the Java SDK. 2 FPC : A Java bytecode compiler The Android SDK is a Java SDK: Android applications are usually written in Java. How- ever, the Free Pascal compiler can compile a pascal program to Java Bytecode. This is done with the help of the JVM cross-compiler. Free Pascal does not distribute this cross-compiler by default (yet), so it must be built from sources.
    [Show full text]
  • COMMUNICATOR 3000 MPE/Ix-Powerpatch 5 (Powerpatch Tape C.55.05) Based on Release 5.5
    COMMUNICATOR 3000 MPE/iX-PowerPatch 5 (PowerPatch Tape C.55.05) Based on Release 5.5 HP 3000 MPE/iX Computer Systems Volume 9, Issue 4 Customer Order Number 30216-90257 30216-90257 E0798 Printed in: U.S.A. July 1998 Notice The information contained in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this material, including, but not limited to, the implied warranties of merchantability or fitness for a particular purpose. Hewlett-Packard shall not be liable for errors contained herein or for direct, indirect, special, incidental or consequential damages in connection with the furnishing or use of this material. Hewlett-Packard assumes no responsibility for the use or reliability of its software on equipment that is not furnished by Hewlett-Packard. This document contains proprietary information which is protected by copyright. All rights reserved. Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws. Acknowledgements Microsoft Windows ™, Windows 95 ™, Windows NT ™, Microsoft Access ™, Visual Basic ™, Visual C ++ ™, Visual FoxPro ™, and MS-Query ™are U.S. registered trademarks of Microsoft Corporation. ODBCLink/SE ™is a registered trademark of M. B. Foster Software Labs, Inc. Dr. DeeBeeSpy© 1995 Syware, Inc., all rights reserved. Axiant ™ and Impromptu ™ are registered trademarks of Cognos. Foxbase ™ is a registered trademark of Fox Software. Lotus ™ is a registered trademark of Lotus Development Corporation. Jetform is a registered trademark of Jetform Corporation. Paradox ™ is a registered trademark of Borland International Inc. PowerBuilder ™ is a registered trademark of Powersoft Corporation.
    [Show full text]
  • Decision Support and Data Warehousing in an HP 3000 Environment
    Paper# : 3011 Decision Support and Data Warehousing in an HP 3000 Environment. Christophe Jacquet Hewlett-Packard Co. 19447 Pruneridge Ave Cupertino, CA, 94087 Email: [email protected] Decision Support and Data Warehousing in an HP 3000 Environment. 3011 - 1 As corporate downsizing continues, flatter organizational structures give end users more decision making responsibilities. HP 3000 users can take advantage of powerful tools such as Image/SQL and ODBC drivers to gain direct access to the data in the databases. While these tools provide real benefits, the limitations of directly accessing operational data, in terms of performance, concurrency, ease of use and data quality, are rapidly becoming apparent. A current approach to Decision Support is the consolidation of company-wide data into a separately designed environment, commonly called a Data Warehouse. This paper covers the benefits and history of Data Warehouses and examines a variety of approaches available within the HP 3000 marketplace to implement such a Decision Support environment. It describes the elements needed to create a Data Warehouse environment from the Decision Support data store to the end user desktop, including extraction and cleansing tools. Data Warehousing is just one example of how enhancements to the MPE/iX operating environment permit HP 3000 customers to take full advantage of the many new technologies and approaches emerging in the Information Technology marketplace. Why is decision support important. In today’s world, companies have to be able to make decisions much faster than before. Making the right decisions at the right time can provide significant competitive advantage for a company.
    [Show full text]
  • Free Pascal and Lazarus Programming Textbook
    This page deliberately left blank. In the series: ALT Linux library Free Pascal and Lazarus Programming Textbook E. R. Alekseev O. V. Chesnokova T. V. Kucher Moscow ALT Linux; DMK-Press Publishers 2010 i UDC 004.432 BBK 22.1 A47 Alekseev E.R., Chesnokova O.V., Kucher T.V. A47 Free Pascal and Lazarus: A Programming Textbook / E. R. Alekseev, O. V. Chesnokova, T. V. Kucher M.: ALTLinux; Publishing house DMK-Press, 2010. 440 p.: illustrated.(ALT Linux library). ISBN 978-5-94074-611-9 Free Pascal is a free implementation of the Pascal programming language that is compatible with Borland Pascal and Object Pascal / Delphi, but with additional features. The Free Pascal compiler is a free cross-platform product implemented on Linux and Windows, and other operating systems. This book is a textbook on algorithms and programming, using Free Pascal. The reader will also be introduced to the principles of creating graphical user interface applications with Lazarus. Each topic is accompanied by 25 exercise problems, which will make this textbook useful not only for those studying programming independently, but also for teachers in the education system. The book’s website is: http://books.altlinux.ru/freepascal/ This textbook is intended for teachers and students of junior colleges and universities, and the wider audience of readers who may be interested in programming. UDC 004.432 BBK 22.1 This book is available from: The <<Alt Linux>> company: (495) 662-3883. E-mail: [email protected] Internet store: http://shop.altlinux.ru From the publishers <<Alians-kniga>>: Wholesale purchases: (495) 258-91-94, 258-91-95.
    [Show full text]
  • A Large-Scale Hierarchical Image Database
    ImageNet: A Large-Scale Hierarchical Image Database Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li and Li Fei-Fei Dept. of Computer Science, Princeton University, USA fjiadeng, wdong, rsocher, jial, li, [email protected] Abstract content-based image search and image understanding algo- rithms, as well as for providing critical training and bench- The explosion of image data on the Internet has the po- marking data for such algorithms. tential to foster more sophisticated and robust models and ImageNet uses the hierarchical structure of WordNet [9]. algorithms to index, retrieve, organize and interact with im- Each meaningful concept in WordNet, possibly described ages and multimedia data. But exactly how such data can by multiple words or word phrases, is called a “synonym be harnessed and organized remains a critical problem. We set” or “synset”. There are around 80; 000 noun synsets introduce here a new database called “ImageNet”, a large- in WordNet. In ImageNet, we aim to provide on aver- scale ontology of images built upon the backbone of the age 500-1000 images to illustrate each synset. Images of WordNet structure. ImageNet aims to populate the majority each concept are quality-controlled and human-annotated of the 80,000 synsets of WordNet with an average of 500- as described in Sec. 3.2. ImageNet, therefore, will offer 1000 clean and full resolution images. This will result in tens of millions of cleanly sorted images. In this paper, tens of millions of annotated images organized by the se- we report the current version of ImageNet, consisting of 12 mantic hierarchy of WordNet.
    [Show full text]