Search Improvement Within the Geospatial Web in the Context of Spatial Data Infrastructures
Total Page:16
File Type:pdf, Size:1020Kb
2012 22 Aneta Jadwiga Florczyk Search improvement within the geospatial web in the context of spatial data infrastructures Departamento Informática e Ingeniería de Sistemas Director/es Zarazaga Soria, Francisco Javier López Pellicer, Francisco Javier Tesis Doctoral Autor Director/es UNIVERSIDAD DE ZARAGOZA Repositorio de la Universidad de Zaragoza – Zaguan http://zaguan.unizar.es Departamento Director/es Tesis Doctoral SEARCH IMPROVEMENT WITHIN THE GEOSPATIAL WEB IN THE CONTEXT OF SPATIAL DATA INFRASTRUCTURES Autor Aneta Jadwiga Florczyk Director/es Zarazaga Soria, Francisco Javier López Pellicer, Francisco Javier UNIVERSIDAD DE ZARAGOZA Informática e Ingeniería de Sistemas 2012 Repositorio de la Universidad de Zaragoza – Zaguan http://zaguan.unizar.es Departamento Director/es Tesis Doctoral Autor Director/es UNIVERSIDAD DE ZARAGOZA Repositorio de la Universidad de Zaragoza – Zaguan http://zaguan.unizar.es SEARCH IMPROVEMENT WITHIN THE GEOSPATIAL WEB IN THE CONTEXT OF SPATIAL DATA INFRASTRUCTURES Aneta Jadwiga Florczyk PhD DISSERTATION RESEARCH ADVISORS Dr. Francisco Javier Zarazaga–Soria Dr. Francisco Javier López–Pellicer May 2012 Computer Science and Systems Engineering Department Universidad de Zaragoza © Copyright by author, copyrightyear All Rights Reserved iii Acknowledgments I would like to thank all members of the IAAA research group of the University of Zaragoza, to which I had pleasue belong during the thesis development, for their collaboration and support in any aspect, and especially, my research advisors, F.Javier Zarazaga–Soria and F.Javier López–Pellicer for their endless forbearance and patience. Also, I would like to express my gratitude to members of the IFGI of the University of Muenster, Germany, and every other person I have met there, who made effort to make my stay a wonderful ex- perience, which has contributed to my profesional and personal development. I feel specially grateful to proffesor Werner Kuhn, Tomi Kauppinen and Patrick Maué for their help in this development. I would like to thank the people that have reviewed this thesis. Despite all of their help, I take full responsibility for any errors or omission herein. Finally, I would like to thank particularly my parents for their unconditional support during the work on this thesis. They always managed to find the right words to encourage me to follow the path I have chosen, even if this meant personal sacrifices for them. v Contents Acknowledgments v 1 Context and research issues 1 1.1 Context . .3 1.2 Motivation . .6 1.3 Problem statement . 10 1.4 Research questions . 11 1.5 Methodology . 14 1.6 Scope . 14 1.7 Contributions . 16 1.8 Thesis structure . 16 2 Enhanced search for a geospatial entity 19 2.1 Introduction . 19 2.2 Terminology . 19 2.3 Geocoding Web services . 22 2.4 Quality of geocoding service . 23 2.4.1 Web service . 23 2.4.2 Spatial data quality . 25 2.4.3 Geocoding quality . 28 2.5 Compound geocoder . 29 2.6 Address geocoding for urban management . 35 2.6.1 Data model . 36 2.6.2 Geospatial resources . 37 2.6.3 Geocoding framework . 39 2.6.4 Application . 41 2.7 Summary . 42 vii 3 Content–based semantics for geospatial resources 43 3.1 Introduction . 43 3.2 Geoidentifier . 43 3.2.1 Linking geographic features . 45 3.2.2 Spatial search . 48 3.2.3 Administrative geography of Spain . 49 3.2.4 OGC services catalog application . 52 3.3 Semantics of the WMS layer . 54 3.3.1 Related work . 56 3.3.2 The method . 58 3.3.3 Experiment . 67 3.3.4 Implementation of the method as a WPS . 71 3.4 Application of the methods developed . 74 3.5 Summary . 76 4 Semantic characterisation of Geospatial Web resources 79 4.1 Introduction . 79 4.2 Web community and geographic metadata . 80 4.3 Knowledge Generator . 82 4.3.1 Workflow . 82 4.3.2 Architecture . 83 4.4 Applying the Knowledge Generator . 84 4.4.1 Method for coverage estimation . 85 4.4.2 Prototype . 89 4.4.3 Experiment . 91 4.4.4 Evaluation of the content–based heuristic . 94 4.4.5 Improvement discussion . 97 4.5 Summary . 97 5 Geospatial Web Search Engine 99 5.1 Introduction . 99 5.2 Domain searches on the Web . 99 5.2.1 Searching for Web services . 99 5.2.2 Searching for Geospatial Web resources . 101 5.3 Searching via a general search engine . 102 5.3.1 Search goals . 102 5.3.2 Search strategy . 103 5.4 Non–expert user support . 104 viii 5.4.1 Faceted classification . 105 5.4.2 Faceted search and browsing . 106 5.5 Domain–specific search user interface . 108 5.5.1 Web service community . 109 5.5.2 Geospatial community . 111 5.6 Geospatial Web Search Engine . 112 5.6.1 Facets for search, browsing and navigational interface . 112 5.6.2 Search User Interface design . 117 5.6.3 OWS search and indexing procedure . 120 5.6.4 Architecture and Implementation . 124 5.7 System Evaluation . 125 5.7.1 Precision testing . 125 5.7.2 Interface Evaluation . 126 5.8 Summary . 129 6 Conclusions 131 6.1 Summary of Contributions . 131 6.2 Future Work . 133 A KnowledgeGenerator: Prototype details 135 B SKOS classification schemes for faceted–search. 139 C Generation of OWS dataset summaries. 147 Bibliography 155 ix List of Tables 1.1 Examples of SDIs. .2 1.2 OGC Web Service interface specifications relevant in this work. .3 2.1 Geocoding parameters of used Web services. 39 3.1 Modelling the spatial object. 44 3.2 Number of layers per each layer type for the layer collection used in the experiment. 68 3.3 Parameters of GetMap request for definition of the spatial constraint according to the WMS service version and CRS. 69 3.4 Values of functions of the content collecting procedure per iteration. 69 3.5 Image analysis parameters. 70 3.6 The results of the performed experiment. 70 4.1 Geospatial meta elements used in Web pages. 81 4.2 Example of the Hhip heuristic results. 89 4.3 Metadata model and mapping to the HTML meta elements. 90 4.4 Classification of Web pages in the corpus according to Web site characteristics. 92 4.5 Classification of Web pages in the corpus according to the coverage estimated. 92 4.6 Summary of metadata extraction. 93 4.7 Trimmed corpus. ..