Retrieving Address-based Locations from the Web Dirk Ahlers Susanne Boll OFFIS Institute for Information Technology University of Oldenburg Oldenburg, Germany Germany ahlers@offis.de
[email protected] ABSTRACT user’s query. In the following, we discuss the issue of de- Geospatial search for the Web determines the relation of tecting location information in the content of common Web documents’ contents to a location within a region. For some pages to accurately geotag Web information. pedestrian scenarios, information at a higher granularity Today, the location information is already out on the Web down to individual buildings is necessary. In this paper, and present in a variety of forms, ranging from a mention of a we describe a process for the extraction and simultaneous place name to full addresses or even geographic coordinates verification of precise addresses on German Web pages by a [6]. Location information can be found on common Web validating parser. We describe how an address-level location content with detailed addresses [12] to extensive mapping extraction can be aided by an extensive use of previous geo- services with open APIs for data access. graphic knowledge and the use of its structure. The analysis To meet the challenge of location-based Web search, we of address structure, components and dependencies leads to implemented our own geospatial Web search engine as de- the design of a geoparser that determines valid addresses scribed in [1]. Common technologies such as Web crawler within unstructured Web content. We further discuss some and indexing systems are used, but each extended with tech- noteworthy issues that arise within the process.