Potential Energy Distance Based Image Retrieval

The Pennsylvania State University The Graduate School Eberly School of Science POTENTIAL ENERGY DISTANCE BASED IMAGE RETRIEVAL A Thesis in Statistics By Qi Fang © 2013 Qi Fang Submitted in Partial Fullfillment of the Requirements for the degree of Master of Science August 2013 The thesis of Qi Fang was reviewed and approved* by the following: Jia Li Professor of Statistics Professor of Computer Science and Engineering (by courtesy) Thesis Advisor Le Bao Assistant Professor of Statistics David Hunter Professor and Department Head of Statistics *Signatures are on file in the Graduate School ii Abstract Due to the large-scale use of digital cameras and easy access to the Internet, the number of images available on the Internet has exploded over the last twenty years and continues to grow. As a result, effective content-based image retrieval approaches are needed to help analyze and understand the large scale of images. Image retrieval approaches depend greatly on similarity measures. There are several popular similarity measures used for image retrieval, for example, Mallows distance and integrated region matching (IRM). Mallows distance is a metric, but IRM is much faster to compute. This thesis introduces a new measure - potential energy distance for image similarity computation. Potential energy distance is also a metric, but is much faster to compute. To evaluate the performance of potential energy distance, we conduct experiments to compare potential energy distance with Mallows distance and IRM. In our experiment, we use the MIR Flickr dataset that contains 25,000 images, and we evaluate different similarity measures from accuracy, speed, and robustness perspectives. Experiment results show that potential energy distance performs similarly to Mallows and IRM in accuracy, similarly to IRM but much faster than Mallows in speed, and fairly robust in image alternations. iii Table of Contents List of Figures ................................................................................................................................ vi List of Tables ................................................................................................................................ viii Chapter 1 Introduction .................................................................................................................... 1 1.1. Image Retrieval ................................................................................................................. 1 1.2. Similarity Measures .......................................................................................................... 4 1.3. Related Research .............................................................................................................. 6 1.4. Research Questions......................................................................................................... 15 Chapter 2 Background Knowledge ............................................................................................... 16 2.1. Image Description ........................................................................................................... 16 2.1.1. Image Segmentation ............................................................................................. 17 2.1.2. Feature Selection .................................................................................................. 19 2.2. Similarity Measures ........................................................................................................ 22 2.2.1. Mallows Distance ................................................................................................. 23 2.2.2. Integrated Region Matching ................................................................................. 25 Chapter 3 Potential Energy Distance ............................................................................................ 29 Chapter 4 Experiments .................................................................................................................. 32 4.1. Data Preparation ............................................................................................................. 32 4.1.1. Datasets ................................................................................................................ 32 4.1.2. Image Signature Construction .............................................................................. 35 4.2. Evaluation Measures ....................................................................................................... 37 4.3. Accuracy ......................................................................................................................... 39 4.3.1. Image Queries ...................................................................................................... 39 iv 4.3.2. Image Categorization ........................................................................................... 43 4.4. Speed .............................................................................................................................. 48 4.5. Robustness ...................................................................................................................... 49 Chapter 5 Conclusion .................................................................................................................... 53 Bibliography ................................................................................................................................. 56 v List of Figures Figure 1. Different Similarity Measures and Techniques for Different Types of Signatures [1] .. 4 Figure 2. Mallows Distance Region Matching [65] .................................................................... 23 Figure 3. IRM Matching Process [43] ......................................................................................... 27 Figure 4. Workflow of Image Signature Construction ................................................................ 36 Figure 5. Average and Weighted Precisions of Top 10 Retrieved Images in Image Query ......... 40 Figure 6. Average and Weighted Precisions of Top 50 Retrieved Images in Image Query......... 41 Figure 7. Average and Weighted Precisions of Top 100 Retrieved Images in Image Query ....... 42 Figure 8. Average and Weighted Precisions of Top 10 Retrieved Images in Image Categorization ..................................................................................................................................................... 43 Figure 9. Average and Weighted Precisions of Top 20 Retrieved Images in Image Categorization ..................................................................................................................................................... 43 Figure 10. Average and Weighted Precisions of Top 30 Retrieved Images in Image Categorization ............................................................................................................................. 44 Figure 11. Average and Weighted Precisions of Top 40 Retrieved Images in Image Categorization ............................................................................................................................. 44 Figure 12. Average and Weighted Precisions of Top 50 Retrieved Images in Image Categorization ............................................................................................................................. 44 Figure 13. Average and Weighted Precisions of Top 60 Retrieved Images in Image Categorization ............................................................................................................................. 45 Figure 14. Average and Weighted Precisions of Top 70 Retrieved Images in Image Categorization ............................................................................................................................. 45 Figure 15. Average and Weighted Precisions of Top 80 Retrieved Images in Image Categorization ............................................................................................................................. 45 Figure 16. Average and Weighted Precisions of Top 90 Retrieved Images in Image Categorization ............................................................................................................................. 46 Figure 17. Average and Weighted Precisions of Top 100 Retrieved Images in Image Categorization ............................................................................................................................. 46 vi Figure 18. Average and Weighted Precisions of Category "Animals" with Different Number of Retrieved Images ......................................................................................................................... 47 Figure 19. Rank of the Target Image and PE Distance with Image Brightening ........................ 49 Figure 20. Rank of the Target Image and PE Distance with Image Darkening .......................... 50 Figure 21. Rank of the Target Image and PE Distance with Image Blurring .............................. 50 Figure 22. Rank of the Target Image and PE Distance with Image Sharpening ......................... 50 Figure 23. Rank of the Target Image and PE Distance with More Saturation ............................ 51 Figure 24. Rank of the Target Image and PE Distance with Less Saturation .............................. 51 Figure 25. Rank of the Target Image and PE Distance with Image Random Spread .................. 51 Figure 26. Rank of the Target Image and PE Distance

Potential Energy Distance Based Image Retrieval

A Modified Energy Statistic for Unsupervised Anomaly Detection

Package 'Energy'

Independence Measures

The Energy Goodness-Of-Fit Test and E-M Type Estimator for Asymmetric Laplace Distributions

From Distance Correlation to Multiscale Graph Correlation Arxiv

Level Sets Based Distances for Probability Measures and Ensembles with Applications Arxiv:1504.01664V1 [Stat.ME] 7 Apr 2015

The Perfect Marriage and Much More: Combining Dimension Reduction, Distance Measures and Covariance

An Efficient and Distribution-Free Two-Sample Test Based on Energy Statistics and Random Projections

A Generalization of K-Means by Energy Distance

Equivalence of Distance-Based and RKHS-Based Statistics In

The Energy of Data

Package 'Energy'