Semantic Multimedia Modelling & Interpretation for Annotation
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Multimedia Modelling & Interpretation for Annotation Irfan Ullah Submitted in partial fulfilment of the Requirements for the degree of Doctor of Philosophy School of Engineering and Information Sciences MIDDLESEX UNIVERSITY May, 2011 © 2011 Irfan Ullah All rights reserved II Declaration I certify that the thesis entitled Semantic Multimedia Modelling and Interpretation for Annotation submitted for the degree of doctor of philosophy is the result of my own work and I have fully cited and referenced all material and results that are not original to this work. Name …………………………IRFANULLAH …. [print name] Signature …………………… …… MAY 24TH, 2011 Date………………………………. III Abstract “If we knew what it was we were doing, it would not be called research, would it?” Albert Einstein The emergence of multimedia enabled devices, particularly the incorporation of cameras in mobile phones, and the accelerated revolutions in the low cost storage devices, boosts the multimedia data production rate drastically. Witnessing such an iniquitousness of digital images and videos, the research community has been projecting the issue of its significant utilization and management. Stored in monumental multimedia corpora, digital data need to be retrieved and organized in an intelligent way, leaning on the rich semantics involved. The utilization of these image and video collections demands proficient image and video annotation and retrieval techniques. Recently, the multimedia research community is progressively veering its emphasis to the personalization of these media. The main impediment in the image and video analysis is the semantic gap, which is the discrepancy among a user’s high-level interpretation of an image and the video and the low level computational interpretation of it. Content-based image and video annotation systems are remarkably susceptible to the semantic gap due to their reliance on low-level visual features for delineating semantically rich image and video contents. However, the fact is that the visual similarity is not semantic similarity, so there is a demand to break through this dilemma through an alternative way. The semantic gap can be narrowed by counting high-level and user-generated information in the annotation. High- level descriptions of images and or videos are more proficient of capturing the semantic meaning of multimedia content, but it is not always applicable to collect this information. It is commonly agreed that the problem of high level semantic annotation of multimedia is still far from being answered. This dissertation puts forward approaches for intelligent multimedia semantic extraction for high level annotation. This dissertation intends to bridge the gap between the visual features and semantics. It proposes a framework for annotation enhancement and refinement for the object/concept annotated images and videos datasets. The entire theme is to first purify the datasets from noisy keyword and then expand the concepts lexically and commonsensical to fill the vocabulary and lexical gap to achieve IV high level semantics for the corpus. This dissertation also explored a novel approach for high level semantic (HLS) propagation through the images corpora. The HLS propagation takes the advantages of the semantic intensity (SI), which is the concept dominancy factor in the image and annotation based semantic similarity of the images. As we are aware of the fact that the image is the combination of various concepts and among the list of concepts some of them are more dominant then the other, while semantic similarity of the images are based on the SI and concept semantic similarity among the pair of images. Moreover, the HLS exploits the clustering techniques to group similar images, where a single effort of the human experts to assign high level semantic to a randomly selected image and propagate to other images through clustering. The investigation has been made on the LabelMe image and LabelMe video dataset. Experiments exhibit that the proposed approaches perform a noticeable improvement towards bridging the semantic gap and reveal that our proposed system outperforms the traditional systems. V Acknowledgement “No duty is more urgent than that of returning thanks [Schaff and Wace, 2002].” ST. AMBROSE Bishop of Milan (340 – 397) I was once told that PhD is a long journey of transformation from a ‘novice’ to ‘professional’ researcher. To succeed, an individual can never do it alone; there must be someone who is there for them, providing all the helping hands and supports. I can’t agree more and as for my case, I have a lot of people to thank. I owe my gratitude to all those people who have made this thesis possible. It is a pleasure to thank them all in my humble acknowledgment In the first place, I am heartily thankful to my supervisor Dr. Jonathan Loo (Middlesex University) for helpful discussions, guidance and support throughout my PhD studies. His scientific intuition and rigorous research attitude have enriched me with the in-depth understanding of the subject and the passion of dedicating myself to research, which I will benefit from in the long run. His gracious support in many aspects has enabled me to overcome difficulties and finish my study. I also extend my thanks to my second supervisor Dr. Martin Loomes. Further, I would like to express my special gratitude to my colleague and best friend Nida Aslam, whose dedication, love, and persistent care has diminished all the difficulties of studying and living in the unfamiliar land far away from home. I am grateful to my financial supporters, including the Kohat University of Science & Technology, Pakistan. Many thanks also go to my friends shafi Ullah khan, Tahir Naeem who accompanied me during the last three years, making my time in London most enjoyable. Finally, I pay tribute to the constant support of my family, but especially of my parents, Brother, sister and my wife & kids (Mahnoor & Lutfullah) for their untiring efforts and praying. Without them, my whole studies would have been impossible and whose sacrifice, I can never repay. This one is for you! . VI Dedication I dedicate this thesis to my late Father, (may his soul rest in peace) for their unconditional support and prayers, tireless love and motivation throughout my studies. Everything I have accomplished, I owe to him. VII List of associated Publications 1. Irfan Ullah, Nida Aslam, Jonathan Loo, RoohUllah "Adding Semantics to the reliable Object Annotated Image Databases " World Conference on Information Technology, 2010. 2. Irfan Ullah, Nida Aslam, Jonathan Loo, RoohUllah “A Framework for High Level Semantic Annotation Using Trusted Object Annotated Dataset" IEEE International Symposium on Signal Processing and Information Technology December 15-18, 2010 - Luxor – Egypt. 3. I.Khan, N. Aslam, K.K. Loo, “Semantic Annotation Gap: where to put the responsibility”, International Journal of Digital Content Technology and its Applications (JDCTA), ISSN: 1975-9339, Vol. 3, No. 1, March 2009. 4. I.Khan, N. Aslam, K.K. Loo, “Semantic Multimedia Annotation: Text Analysis”, International Journal of Digital Content Technology and its Applications (JDCTA), ISSN: 1975-9339, Vol. 3, No. 2, June 2009. 5. Irfanullah, Nida Aslam, Jonathan Loo, Roohullah. “A Framework for Image Annotation Enhancement & Refining Using Knowledge Bases”. Springerlink International Journal on Digital Libraries (Submitted Feb 2011). 6. Irfanullah, Nida Aslam, Jonathan Loo, Roohullah. “Semantic Space Enhancement and Refinement for Video using Knowledgebases”. Elsevier Journal of King Saud University – Computer and Information. (Submitted Feb 2011) 7. Irfanullah, Nida Aslam, Jonathan Loo, Roohullah. “Exploiting the Semantic Intensity for High Level Semantic Annotation of Images”. Elsevier Journal of Visual Communication and Image Representation (Submitted Feb 2011) 8. Irfanullah, Nida Aslam, Jonathan Loo, Roohullah. A survey on exploiting knowledge bases for visual media annotation. VIII Table of Contents LIST OF FIGURES ............................................................................................................................................................. XII LIST OF TABLES ............................................................................................................................................................ XVII LIST OF ACRONYMS .................................................................................................................................................... XVIII CHAPTER 01 - INTRODUCTION .................................................................................................................... 1 1.1 INTRODUCTION ................................................................................................................................... 3 1.2 MOTIVATION AND APPLICATION ........................................................................................................ 8 1.3 RESEARCH AIMS AND OBJECTIVES ..................................................................................................... 10 1.4 THE EXISTING PROBLEMS AND CHALLENGES ..................................................................................... 11 1.5 RESEARCH DIRECTIONS ..................................................................................................................... 14 1.5.1 HIGH-LEVEL SEMANTIC CONCEPTS AND LOW-LEVEL VISUAL FEATURES ............................................................... 14 1.5.2 VARIATION OF OBJECTS IN THE MULTIMEDIA