Taxonomy Based Image Retrieval

DEGREE PROJECT, IN COMPUTER SCIENCE , SECOND LEVEL STOCKHOLM, SWEDEN 2015 Taxonomy Based Image Retrieval TAXONOMY BASED IMAGE RETRIEVAL USING DATA FROM MULTIPLE SOURCES JIMMY LARSSON KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION (CSC) Taxonomy Based Image Retrieval Taxonomy Based Image Retrieval using Data from Multiple Sources JIMMY LARSSON Master’s Thesis at CSC KTH - Royal Institute of Technology, Sweden Supervisor: Hedvig Kjellström Examiner: Danica Kragic TRITA xxx yyyy-nn Acknowledgements This page is dedicated to everyone who has been involved in this work. I would thus like to start by acknowledging and thanking, my professor at the university, Hedvig Kjellström for accepting yet another project while already having many other projects to supervise as well as for the feedback and time that she has given me. I would like to thank Danica Kragic, my examiner who also accepted the position of examiner for this work while already having many other tasks to attend to. I would like to thank my supervisors at Findwise, Martin Nycander, Birger Rydback and Simon Stenström for their help and supervision during my time at the company. I would also like to thank Findwise for accepting this project. Finally I would like to thank my family who supported me during this time and who kept pushing me forward. Abstract With a multitude of images available on the Internet, how do we find what we are looking for? This project tries to determine how much the precision and recall of search queries is improved by using a word taxonomy on tradi- tional Text-Based Image Search and Content-Based Image Search. By applying a word taxonomy to different data sources, a strong keyword filter and a keyword extender were implemented and tested. The results show that de- pending on the implementation, the precision or the recall can be increased. By using a similar approach on real life implementations, it is possible to force images with higher precisions to the front while keeping a high recall value, thus increasing the experienced relevance of image search. Referat Taxonomibaserad Bildsök Med den mängd bilder som nu finns tillgänglig på Inter- net, hur kan vi fortfarande hitta det vi letar efter? Den- na uppsats försöker avgöra hur mycket bildprecision och bildåterkallning kan öka med hjälp av appliceringen av en ordtaxonomi på traditionell Text-Based Image Search och Content-Based Image Search. Genom att applicera en ordtaxonomi på olika datakällor kan ett starkt ordfilter samt en modul som förlänger ordlistor skapas och testas. Resul- taten pekar på att beroende på implementationen så kan antingen precisionen eller återkallningen förbättras. Genom att använda en liknande metod i ett verkligt scenario är det därför möjligt att flytta bilder med hög precision längre fram i resultatlistan och samtidigt behålla hög återkallning, och därmed öka den upplevda relevansen i bildsök. Contents Acknowledgements List of Figures List of Tables 1 Introduction 1 1.1 Concept . 2 1.2 Abbreviations . 4 1.3 Problem Statement . 4 1.3.1 Research Question . 4 1.3.2 Hypothesis . 5 1.4 Contributions . 5 1.5 Delimitations . 5 I Background 7 2 Image Retrieval 9 2.1 Early History . 9 2.2 Search Users . 10 2.3 Presentation . 10 3 Content-Based Image Retrieval 13 3.1 Semantic Gap . 13 3.2 Features . 14 3.2.1 Global Features . 15 3.2.2 Local Features . 15 3.2.3 Image Segmentation . 15 3.3 Visual Signature . 15 3.4 Learning Approaches . 16 3.4.1 Relevance Feedback . 16 3.4.2 Support Vector Machines . 16 3.4.3 Artificial Neural Networks . 17 3.4.4 Convolutional Neural Networks . 17 3.4.5 Random Forest . 18 4 Text-Based Image Retrieval 19 4.1 Relevant Information in Text . 19 4.1.1 Metadata . 20 4.2 Current Techniques . 21 4.2.1 Term Frequency and Inverse Document Frequency . 21 4.2.2 Natural Language Processing . 21 4.2.3 Part-of-Speech Tagging . 22 4.2.4 Stop Words . 22 4.2.5 Stemming and Lemmatization . 22 4.3 Indexing . 22 4.4 Word Taxonomy . 23 5 Related Work 25 II Method 27 6 Architecture 29 6.1 Data Retrieval . 30 6.2 Extraction of Relevant Information . 30 6.3 Content-Based Image Retrieval . 31 6.3.1 Classification . 31 6.4 Text-Based Image Retrieval . 32 6.4.1 Natural Language Processing . 32 6.4.2 Data Clean-Up . 32 6.5 WordNet Evaluation . 32 6.6 Search Platform . 33 6.6.1 Filters, Stemming and Tokenizing . 33 7 Evaluation Method 35 7.1 Evaluation Data . 35 7.2 Classifier Evaluation . 36 7.3 Evaluation Formulas . 37 7.4 Baseline and Comparison . 38 III Results and Discussion 39 8 Results 41 8.1 Average Precision . 42 8.2 Average Recall . 42 8.3 Average F Measures . 43 9 Discussion and Conclusion 47 9.1 Conclusions . 48 9.2 Future Work . 49 Bibliography 51 Appendices 56 A Tags 57 List of Figures 1.1 The system concept portraying a simplified view of the full system. 3 4.1 A taxonomy example. 23 6.1 An extended figure portraying the system architecture. 29 6.2 Example of data that is extracted from a post. (Original blog post from Bites @ Animal Planet) .................................. 31 7.1 An image containing a single animal. Tag: sloth. (Figure from Bites @ Animal Planet) 36 7.2 An image containing two or more animals. Tags: dog, fox. (Figure from Bites @ Animal Planet) .................................. 36 7.3 A figure about positives and negatives. (Original figure from http: // en. wikipedia. org/ wiki/ Precision_ and_ recall ) ............................. 37 8.1 A graph of the averages. 41 8.2 A graph of the average precision scores. 42 8.3 A graph of the recall scores. 43 8.4 A graph of the F2 scores. 44 8.5 A graph of the F0.5 scores. 45 8.6 A graph of the F1 scores. 45 List of Tables 8.1 Average precision scores. 42 8.2 Average recall scores. 43 8.3 Average F scores. 44 Chapter 1 Introduction Within the multitude of images currently available on the Internet, how can we possibly find what we are looking for? With image retrieval now being applied to health and medical applications [29, 40] as well as military and traffic surveillance [25], rapid progress is not only inevitable but also fascinating. Image retrieval has for a couple of decades now, been a discipline with a constant flow of research being done. Image retrieval is about making the images that are part of a computer systems, easily accessible through means such as search engines, and in the late 1970s the focus of image retrieval was found in what we today call Text-Based Im- age Retrieval (TBIR), also known as Context-Based Image Retrieval, Meta-Data Image Retrieval or Keyword-Based Image Retrieval. Initially with the databases being quite small, methods focused on the so called Database Management Systems (DBMS) [42] in which a user or an administrator would determine appropriate keywords for the images and store the keywords in a database. With the rapid growth of available data online however, the problem that is individual subjectivity became evident in which different people who were determining appropriate keywords, had their own subjective view of what keywords were appropriate. Another problem was the shear amount of data which had to be processed. Manually determining keywords was no longer a feasible option and thus Content-Based Image Retrieval (CBIR) was proposed. Content-Based Image Retrieval is a discipline with its ori- gin in the field of Computer Vision. The Content-Based Image Retrieval works on determining what an image may portray by looking at the image, its features, col- ors, textures and so on, and compare that to already known data. The Text-Based Image Retrieval of today on the other hand, is a field in which the information that surrounds an image is used as a basis for different Natural Language Processing (NLP) algorithms in order to determine to at least some degree, what the image may portray. While Content-Based and Text-Based Image Retrieval might not be enough, a strict word taxonomy applied to the result of such modern image retrieval systems might drastically increase the image retrieval precision, and as such, for this work, Extended Java WordNet Library [1] will be used. The original WordNet [39] which in essence is a lexical database for several languages have the capability to, 1 CHAPTER 1. INTRODUCTION given an input word, output information about said word. The output includes a description of the word, direct hyponyms of a word, inherited hypernyms and sister terms. The inherited hypernyms specifically can be seen as a tree structure with nodes and leaves. Using the WordNet tree structure feature on the results from the Content-Based Image Retrieval component, and the results from the Text-Based Image Retrieval component, it is possible to find words which has nodes in common. The common nodes can then be used to augment the existing data or in order to filter out data which could be considered noise. As such, increasing the image recall at the cost of image precision or increasing the image precision at the cost of image recall should be possible. This work will therefore test different implementations of the WordNet tree structure on the results of Content- and Text-Based output to measure the precision and recall of a Taxonomy-Based Image Retrieval (TaBIR) system. Following Chapter 1 which contains the introducing sections, the image retrieval background will be split into three chapters. This was done in order to avoid confusion when talking about image retrieval from the viewpoint of two similar, yet very different methodologies, namely Content-Based Image Retrieval as opposed to Text-Based Image Retrieval.

Load more