Cairn Detection in Southern Arabia Using a Supervised Automatic Detection
Total Page:16
File Type:pdf, Size:1020Kb
Cairn Detection in Southern Arabia Using a Supervised Automatic Detection Algorithm and Multiple Sample Data Spectroscopic Clustering DISSERTATION Presented in Partial Ful¯llment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Jared Michael Schuetter, M.S. Graduate Program in Statistics The Ohio State University 2010 Dissertation Committee: Professor Tao Shi, Co-Advisor Professor Prem Goel, Co-Advisor Professor Joy McCorriston Professor Yoon Lee Professor Stuart Ludsin, GFR Copyright by Jared Michael Schuetter 2010 ABSTRACT Excavating cairns in southern Arabia is a way for anthropologists to understand which factors led ancient settlers to transition from a pastoral lifestyle and tribal narrative to the formation of states that exist today. Locating these monuments has traditionally been done in the ¯eld, relying on eyewitness reports and costly searches through the arid landscape. In this thesis, an algorithm for automatically detecting cairns in satellite imagery is presented. The algorithm uses a set of ¯lters in a window based approach to eliminate background pixels and other objects that do not look like cairns. The resulting set of detected objects constitutes fewer than 0:001% of the pixels in the satellite image, and contains the objects that look the most like cairns in imagery. When a training set of cairns is available, a further reduction of this set of objects can take place, along with a likelihood-based ranking system. To aid in cairn detection, the satellite image is also clustered to determine land- form classes that tend to be consistent with the presence of cairns. Due to the large number of pixels in the image, a subsample spectral clustering algorithm called \Mul- tiple Sample Data Spectroscopic clustering" is used. This multiple sample clustering procedure is motivated by perturbation studies on single sample spectral algorithms. The studies, presented in this thesis, show that sampling variability in the single sample approach can cause an unsatisfactory level of instability in clustering results. ii The multiple sample data spectroscopic clustering algorithm is intended to stabilize this perturbation by combining information from di®erent samples. While sampling variability is still present, the use of multiple samples mitigates its e®ect on cluster results. Finally, a step-through of the cairn detection algorithm and satellite image clus- tering are given for an image in the Hadramawt region of Yemen. The top ranked detected objects are presented, and a discussion of parameter selection and future work follows. iii Dedicated to Michelle, who has patiently waited for 5 years to see this ¯nished dissertation, and to Claudia, who gave me the deadline for ¯nishing it. iv ACKNOWLEDGMENTS First and foremost, I would like to thank my co-advisors, Dr. Shi and Dr. Goel. Both of you have provided a great deal of input for the material presented in this thesis. In addition, I appreciate your continued optimism that a solution for any problem will turn up eventually. On multiple occasions, I have met with each of you to share the results of my most recent abject failure at clustering and/or cairn detection to have you suggest a number of other possible approaches to try. I'm starting to learn that you can't make any progress without failing a few times along the way. I would also like to thank the NSF-HSD team. I am grateful to Dr. McCorriston for her unwavering excitement about the project, her leadership, and her con¯dence that I will eventually come up with something that will work in the cairn detection algorithm. Thank you to Matt for always making sure you understand the latest detection techniques, providing your insights as an anthropologist, and taking all of those horrible meeting notes. Finally, I'm especially thankful to Jihye, who has on numerous occasions provided me with last-minute satellite imagery for some crazy detection technique I've come up with, spent a great deal of her time helping me create (and recreate) the cairn training set, and processed imagery so that I can use Matlab for the detection algorithm. v I am also grateful to Dr. Lee for agreeing to serve on both my candidacy and dissertation committees, and for being one of the best professors I've had at OSU. In addition, I want to thank Drs. Notz, Santner, and Dean for allowing me to moonlight as a computer experimenter. I also appreciate the friendships I've made with other graduate students in the department, including Danel, Candace, Jenny, Josh, Arun, Soma, and especially Mallik, who will really get a kick out of the ridiculous title of my dissertation. Finally, I would like to thank Dr. Stasny, whose con¯dence in my abilities is the reason I even came to OSU. I wasn't sure if I had the chops to get a Ph.D, but she sure was, and I appreciate it. Last but not least, I would like to thank my family for their support over the last 5 years. I am especially grateful to Michelle, who has been a source of inspiration (and motivation), and still makes me feel like the luckiest guy in the world. vi VITA May 1998. .duPont Manual High School May 2002. .B.S. Math & Education, Denison University August 2005 . M.S. Applied Statistics, Bowling Green State University FIELDS OF STUDY Major Field: Statistics vii TABLE OF CONTENTS Page Abstract ....................................... ii Dedication ...................................... iv Acknowledgments .................................. v Vita ......................................... vii List of Tables .................................... xi List of Figures ................................... xii Chapters: 1. INTRODUCTION .............................. 1 1.1 The NSF-HSD Project ......................... 1 1.2 Multiple Sample Data Spectroscopic Clustering ........... 4 1.3 Combining Clustering and Cairn Detection ............. 5 2. PROJECT OVERVIEW ........................... 7 2.1 Southern Arabia in the Holocene ................... 7 2.2 Goals of the Project .......................... 8 2.3 Cairn Detection ............................. 10 3. TECHNIQUES FOR DETECTING OBJECTS IN IMAGERY ...... 17 3.1 Introduction .............................. 17 3.2 Basic Image Processing Techniques .................. 19 3.2.1 Point Operators ........................ 19 viii 3.2.2 Template Operators ...................... 25 3.2.3 Group/Window Operators ................... 29 3.3 Edge Detection ............................. 35 3.3.1 Edge Vector Formulation ................... 36 3.3.2 Improvements on Edge Detection ............... 38 3.3.3 Other Edge Detection Techniques ............... 47 3.4 Shape Matching and Extraction .................... 49 3.4.1 Basic Techniques ........................ 49 3.4.2 Template Matching ....................... 50 3.4.3 Hough Transform Methods .................. 52 3.4.4 Deformable Templates ..................... 60 3.5 Post-Detection Processing ....................... 63 4. CAIRN DETECTION ............................ 64 4.1 Introduction .............................. 64 4.2 Blob Detection ............................. 70 4.3 Vegetation Removal .......................... 73 4.4 Size Metrics ............................... 77 4.5 Measuring Circularity ......................... 81 4.5.1 Hough Transform Circle Fitting ................ 81 4.5.2 Boundary Extraction ...................... 85 4.5.3 Circularity Calculation ..................... 87 4.6 Reduction to Cairn Region ...................... 91 4.7 Assigning Cairn Likelihoods ...................... 95 4.8 Algorithm Summary .......................... 97 4.9 Results for Polygon 9 ......................... 101 4.10 Discussion ................................ 112 5. TECHNIQUES FOR CLUSTERING DATA ................ 118 5.1 Introduction .............................. 118 5.1.1 Data Clustering ........................ 119 5.1.2 Applications of Clustering ................... 121 5.2 Clustering Algorithms ......................... 123 5.2.1 Clustering by Central Tendency ................ 124 5.2.2 Model Based Clustering .................... 130 5.2.3 Spectral Clustering Algorithms ................ 133 5.2.4 Measuring the Quality of Results ............... 144 5.3 Spectral Clustering for Large Datasets ................ 146 5.3.1 Sparse Matrix Representations ................ 147 5.3.2 Single Subsample Approximation ............... 150 ix 6. MULTIPLE SAMPLE DATA SPECTROSCOPIC CLUSTERING .... 158 6.1 Algorithm Overview .......................... 158 6.2 Sparse Extension for Faster Computation .............. 164 6.3 Performance on Real and Simulated Datasets ............ 167 6.3.1 Comparison to Single Subsample Approach ......... 168 6.3.2 Image Segmentation Applications ............... 169 6.3.3 Sparse Extension vs. Full Extension Comparison ...... 170 6.3.4 Parameter Selection ...................... 171 6.4 Conclusions ............................... 173 7. REDUCTION OF FALSE DETECTIONS BY CLUSTERING ...... 180 7.1 Introduction .............................. 180 7.2 Satellite Image Clustering ....................... 182 7.2.1 Size Reduction for Computation ............... 183 7.2.2 Equalized DEM Measure ................... 185 7.2.3 Algorithm Summary ...................... 188 7.3 Cluster Results for Polygon 9 ..................... 190 7.4 Discussion ................................ 196 8. SUMMARY AND FUTURE WORK .................... 198 8.1 Algorithm Summaries ......................... 198 8.1.1 The Cairn Detection Algorithm ................ 198 8.1.2 The Multiple Sample