Image Retrieval Using Image Captioning
Total Page:16
File Type:pdf, Size:1020Kb
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-20-2019 Image Retrieval Using Image Captioning Nivetha Vijayaraju San Jose State University Follow this and additional works at: https://scholarworks.sjsu.edu/etd_projects Part of the Artificial Intelligence and Robotics Commons, and the Databases and Information Systems Commons Recommended Citation Vijayaraju, Nivetha, "Image Retrieval Using Image Captioning" (2019). Master's Projects. 687. DOI: https://doi.org/10.31979/etd.vm9n-39ed https://scholarworks.sjsu.edu/etd_projects/687 This Master's Project is brought to you for free and open access by the Master's Theses and Graduate Research at SJSU ScholarWorks. It has been accepted for inclusion in Master's Projects by an authorized administrator of SJSU ScholarWorks. For more information, please contact [email protected]. Image Retrieval Using Image Captioning A Thesis Presented to The Faculty of the Department of Computer Science San Jose State University In Partial Fulfillment of the Requirements for the Degree Master of Science By Nivetha Vijayaraju May 2019 IMAGE RETRIEVAL USING IMAGE CAPTIONING The Designated Project Committee Approves the Project Titled Image Retrieval Using Image Captioning By Nivetha Vijayaraju APPROVED FOR THE DEPARTMENT OF COMPUTER SCIENCE SAN JOSE STATE UNIVERSITY Spring 2019 Dr. Robert Chun, Department of Computer Science Dr. Katerina Potika, Department of Computer Science Dr. Thomas Austin, Department of Computer Science IMAGE RETRIEVAL USING IMAGE CAPTIONING © 2019 NIVETHA VIJAYARAJU ALL RIGHTS RESERVED IMAGE RETRIEVAL USING IMAGE CAPTIONING Abstract The rapid growth in the availability of the Internet and smartphones have resulted in the increase in usage of social media in recent years. This increased usage has thereby resulted in the exponential growth of digital images which are available. Therefore, image retrieval systems play a major role in fetching images relevant to the query provided by the users. These systems should also be able to handle the massive growth of data and take advantage of the emerging technologies, like deep learning and image captioning. This report aims at understanding the purpose of image retrieval and various research held in image retrieval in the past. This report will also analyze various gaps in the past research and it will state the role of image captioning in these systems. Additionally, this report proposes a new methodology using image captioning to retrieve images and presents the results of this method, along with comparing the results with past research. Keywords – Image retrieval, deep learning, image captioning IMAGE RETRIEVAL USING IMAGE CAPTIONING Acknowledgments I would like to thank my advisor Dr. Robert Chun for his continued support and providing me with the guidance necessary to work on this project. I would also like to thank my advisor for teaching me the core skills needed to succeed and reviewing this research topic. I would also like to thank my committee members Dr. Katerina Potika and Dr. Thomas Austin for their suggestions and support. IMAGE RETRIEVAL USING IMAGE CAPTIONING TABLE OF CONTENTS CHAPTER 1. Introduction ........................................................................................................... 1 2. Background .......................................................................................................... 3 2.1 Features in Images..........................................................................................3 2.2 Object Detection..............................................................................................7 2.3 Image Segmentation.......................................................................................8 2.4 Deep Learning in Image Captioning................................................................9 3. Related Work ..................................................................................................... .15 3.1 Types of Image Retrieval...............................................................................15 3.2 Approaches in Image Captioning...................................................................19 4. Proposal...............................................................................................................24 4.1 Flow of Implementation..................................................................................24 4.2 Image Captioning System..............................................................................25 5. Data Preparation...................................................................................................26 5.1 Flickr8k Dataset..............................................................................................26 5.2 Wang's Database............................................................................................27 5.3 Image Data Preparation..................................................................................29 5.4 Caption Data Generation................................................................................30 i IMAGE RETRIEVAL USING IMAGE CAPTIONING 6. Image Captioning Model.......................................................................................32 6.1 Loading Data..................................................................................................32 7. Model Definition....................................................................................................35 7.1 Model Architecture..........................................................................................35 7.2 Fitting the Model.............................................................................................37 8. Evaluation of Image Captioning Model.................................................................38 8.1 BLEU Score Evaluation..................................................................................38 9. Image Retrieval Using Image Captioning.............................................................41 9.1 Caption Generation........................................................................................41 9.2 Image Retrieval..............................................................................................41 10. Evaluation of Image Retrieval.............................................................................45 10.1 Precision.....................................................................................................45 10.2 Recall..........................................................................................................49 10.3 F1 score......................................................................................................52 11. Conclusion and Future Work..............................................................................53 11.1 Conclusion..................................................................................................53 11.2 Future work.................................................................................................53 References..............................................................................................................55 ii IMAGE RETRIEVAL USING IMAGE CAPTIONING LIST OF FIGURES 1. Histogram of an image..........................................................................................4 2. Image and its HOG Descriptor..............................................................................5 3. Image and its Canny Edges..................................................................................6 4. Object Detection in an Image................................................................................7 5. Segmentation in an Image...................................................................................8 6. Artificial Neural Network......................................................................................10 7. An Example of a Convolutional Layer..................................................................11 8. An Example of Pooling Layer...............................................................................12 9. An Example of a Fully Connected Layer..............................................................13 10. Sample Recurrent Neural Network......................................................................14 11. Hashing Based Image Retrieval..........................................................................15 12. Content-Based Image Retrieval..........................................................................16 13. Sketch-Based Image Retrieval............................................................................18 14. Image Retrieval using Image Captioning............................................................24 15. Sample Flickr8k image and its captions..............................................................27 16. Sample images in Wang’s database...................................................................28 17. Feature Extraction in images using VGG ...........................................................29 18. Training phase of image and caption data..........................................................33 19. Merge architecture for Image Captioning............................................................35 20. Summary of the model........................................................................................36 iii IMAGE RETRIEVAL USING IMAGE CAPTIONING 21.Sample BLEU score results.................................................................................39