Using Xml Languages for Modeling and Web-Visualization of Geographical Legacy Data
Total Page:16
File Type:pdf, Size:1020Kb
USING XML LANGUAGES FOR MODELING AND WEB-VISUALIZATION OF GEOGRAPHICAL LEGACY DATA Brigitte Mathiak, Andreas Kupfer and Karl Neumann Institut f¨urInformationssysteme - TU Braunschweig M¨uhlenpfordtstr. 23, 38100 Braunschweig, Germany {mathiak, kupfer, neumann}@idb.cs.tu-bs.de Abstract In our aim to modernize geographical legacy data from the German office of geographical survey with XML languages, we first modeled the data in GML, a standard language for geography markup. The resulting model has then been used as a template for a JAVA application that assembles the necessary informa- tion from the legacy data and writes it into a GML document. During the second part of our work we mapped GML with the extensible style-sheet language for transformation (XSLT) into scalable vector graphics (SVG). The resulting SVG graphics share close resemblance to official maps. Keywords: GML, SVG, XSLT, geographical data, XML 1. Introduction For the exchange of geographical data and its computer-based usage the Internet becomes more and more important (Kraak and Brown, 2000). In this context special techniques for the unification of the geographical data formats are necessary to provide means for the format-independent exchange of the data (Fidalgo et al., 2003). XML (Harold and Means, 2002; Goldfarb and Prescod, 2003) gains more and more importance in the field of Internet data exchange. Therefore it is quite obvious to apply XML to geographical data as well. XML provides the base for GML (geography markup language), a language specified by the Open Geospatial Consortium (OGC, 2001), an international consortium of about 200 companies and organizations. GML was designed as an open standard to permit the exchange of geographical data without estab- lishing too many constraints on how the data is structured. GML works with XML Schema (E. van der Vlist, 2003), defining several schema types which can be used or derived in a customized model. It also provides a theoretical framework for the structure, yet leaves enough individuality to fit almost every given data set. Our goal is to improve the usability of data from the German office of ge- ographical survey by using XML. This legacy data has not been structurally changed for over a decade. Thus making it a challenging starting point for a GML model. Additionally to the model definition we also implemented a JAVA application which transforms the legacy data to the GML model. As a further step to demonstrate the improved usability, we decided to attempt a visualiza- tion (Kraak and Ormeling, 2002) of GML by using even more XML languages. The obvious choice for the graphical description language was SVG (scalable vector graphics) (W3C, 2000). For the transformation from GML to SVG (Guo et al., 2003) we used a third XML language: extensible style sheet language for transformation (XSLT) (Kay, 2001; W3C, 1999; Bex et al., 2002). XSLT documents contain instructions for the transformation of XML documents into differently structured XML documents. We use the open source XSLT proces- sor Xalan (Apache, 2003) for transformation, but any XSLT-conform processor will do as well. Cascading style sheets (CSS) (Meyer, 2002) control the style of the various cartographic objects. Adjustments to the map’s style can be eas- ily made by either changing the CSS or in more complicated cases changing the standardized XSLT templates. This paper gives a short overview over the mentioned XML languages and the structure of the geographical legacy data. We also discuss the applications of the languages in the transformation process. In section 2 we describe GML, our customized GML model and how we transformed the legacy data into it. Sections 3 and 4 give introductions to SVG and XSLT and how we used them to produce map-like graphics from the GML data. Section 5 contains a summary of the whole process. 2. Geography Markup Language GML or Geography Markup Language is an XML based encoding standard for geographic information developed by the Open Geospatial Consortium (OGC). The main elements of the GML modeling structure are geographic fea- tures, which represent real world geographical objects. Their geometric prop- erties are modeled by a uniform geometric definition using two-dimensional coordinates. The current version of GML is GML 3.1 (OGC, 2004), though the version used in this case study is GML 2.0 (OGC, 2001). GML 2.0 is defined in XML Schema (XSD) and is divided into three parts for structur- ing purposes: Geometry schema (geometry.xsd), feature schema (feature.xsd) and XLinks schema (xlinks.xsd) to provide support for linking the geographi- cal data with other sources and to allow referencing on GML-documents. All three schemas are provided by the OGC. Geometries and Features in GML The geometric properties of GML are strictly defined. There are five geo- metric types, all derived from one abstract GML geometry type. First, ”Point” contains one pair of coordinates, while ”Line” contains more pairs which are connected by straight lines. ”Box” is defined by its lower left corner and upper right corner. ”Linear Ring” is a closed line circuit of at least three different points, additionally the last point must be the same as the first point, so all in all there have to be at least four points. In contrast to ”Linear Ring” a ”Poly- gon” in GML may possess holes, inner boundaries that mark areas that are cut out from the area of the outer boundary. A ”Polygon” therefore consists of one outer boundary and possibly one or more inner boundaries. Abstract Geometry Point LineString Box Polygon innerBoundary 2..* 2 * 1 4..* coords Linear Ring 1 outerBoundary Figure 1. Polygon with holes and the UML schema of the GML geometry While the form of the geometrical description is very exactly defined, the form of the actual features is not. Instead GML offers a structural framework to fit the scenario it is meant to describe. Every model is made up of features. These features have properties. The properties have a name, a type and a value. The structure of the feature, hence the number and types of the properties, is defined by its type. A feature collection is also considered a feature. The contained features are the values of the collection’s property. The contained features may contain other features thereby building a hierarchy. The actually used feature types are user-defined in a XML Schema. The advantage of XML Schema is, that it allows object-oriented handling of the types like derivation or the combination of several types into one substitution group. GML does state some rules that are not directly given by the schemata like naming conventions and the obligatory derivation from the base types. These rules are additional to the ones given by the schemata. GML allows the users to model their own version of GML, fitting their individual needs. This individuality is much needed as geographical data is not very standard- Table 1. Extractions of the respective files corresponding to the object ”Bachstr.” file name lines containing 86118065 objecttype.txt 86118065, 3101 (means street) name.txt 86118065, Bachstr. coordinates.txt 86118065, 4437960.070, 5331818.450, 4437967.200, 5331825.410 86118065, 4437952.980, 5331812.550, 4437960.070, 5331818.450 ized and may contain very different kinds of meta-information. The conform appearance of the data, especially the geometric data, does allow some degree of interchangeability with other GML models, yet it is still general enough to model most geographic scenarios. Modeling landscape data with GML Since GML already models the geography and the base feature outline, the main focus for modeling lies in the definition of new, customized feature types and their relationships. In order to customize the feature types best, we must first have a look at the form of the legacy data. The so-called Digital Landscape Model (DLM) is part of a German project, which has the purpose of digitizing the surface in the Federal Republic of Ger- many topographically and cartographically. It has now been running for over 15 years with great success, yet most of the specification for the data has been made in the initial phase of the project, hence 15 years ago, with no systemat- ical changes since that time. The landscape data for example has a catalogue for object types that encodes the object types into four-digit numbers. So the dataset for a street does not contain the word ”street”, but instead the code 3101 which can be decoded into ”street” using the non-database catalogue. There are about 200 object types and over 1000 attributes, which contain additional information for the object types, like ”width of street” for a street or ”kind of vegetation” for a forest. The attributes are encoded in a similar way as the ob- ject types using a three-letter code for the attribute type and four digits for its value. The original DLM data can be downloaded from the Internet for test pur- poses. An additional program, EDBS-Extra (Rinner, 2001), is used to trans- form the original format into a more accessible format, which uses a quasi relational framework for the data. The transformed information is stored in the file corresponding to the type of information that is extracted, all linked by the ID of the object that the information belongs to (see an example in Table 1). In order to model this data structure in terms of GML a complex type called DLMMemberType is introduced, of which the object types like ”street” are to be instances. The objects themselves are modeled as features, therefore DLM- MemberType is inherited from AbstractFeatureType the GML base type for all features. The DLMMemberType should be able to contain all available data from the DLM to avoid information loss.