GML VALIDATION BASED ON NORWEGIAN STANDARD

Heng Min BACKGROUND

 GIS plays an important part in people’s life.  Geographic data quality matters.  Data validation is one data quality measurement method.  Hundreds of geographic data formats => One standard exchanging data format => GML.  Different countries define different models on the geographic world, and have different requirements on geospatial data quality => Norwegian standard MOTIVATION AND BENEFIT

 Controlling geographic data  Software developer needs the quality can optimize the usage processes of planning, of the data, improve the order implementation, testing, of the GIS market, and also documenting, deployment and make GIS industry more maintenance to develop a attracting. complete software. And  Four benefit object groups: planning is one of the most Data maintainer, Software important processes, developer, Standard  Standard association may association, and end user. consider to standardize GML  Data maintainer has validation. responsibility to offer high-  End users can received quality dataset to the users. dataset from unknown Dataset needs validation resource and with unknown before being saved, sent and quality. In addition, end users shared. can use this to find errors in the dataset and correct them. CONTRIBUTION

 Build up a GML validation framework that combines existing XML validation with additional “application- specific” rules.  Help to bring GML in official use in Norway.  Not only dataset received validation, but also service validation. E.g. validation to WFS-T. RESEARCH QUESTIONS

What is data validation? The process of ensuring that a program operates on clean, correct and useful data. It uses routines, often called "validation rules”, that check for correctness, meaningfulness, and security of data that are input to the system.

What are the processes of developing the validation system? Like developing any other software, the processes mainly include planning, implementation, testing and documenting, deployment and maintenance, despite what development model to use.

When should the data validation be used? Data validation is used before saving datasets in database. And it should be done when the dataset is changed, the product specification changed or the real world is changed. METHODOLOGY AND STRUCTURE

 The goal of the project is to develop a GML validation system. And the system includes three specific goal: requirements determination, approach, and result reporting. So all the literature researches are aiming to reach the goals. The method is that first find all relevant resources, and then analysis and choose the suitable and best solution from them.

 The figure shows the main contribution that each subject gives to the final goal. METHODOLOGY

Road network product specification

Simple features Geodata Modeling GML Requirements determination Inconsistency problem

Data quality

Geodata Quality Approach Relevant ISO standards

SOSI-Control Results Related handling Work ESRI AcrGIS

GeoTools GEOGRAPHIC DATA MODELING

 Real world is complicated. A data model is a set of constructs for describing and representing parts of the real world in a digital computer system. The presentation of the real world in our computer is the simplified world. From the reality to the physical store, two modeling processes: conceptual modeling and logical/operational model are needed.  Conceptual modeling extract proper elements from part of the real world, and structure them with conceptual model language, UML, explain the definitions and constrains in a document.  Logical modeling uses certain rules to realize and express the content of conceptual modeling to be understandable by the computer. NORWEGIAN ROAD NETWORK PRODUCT SPECIFICATION

 The road network product specification is the conceptual model for road network domain in Norway. It includes UML model and certain descriptions of the objects definitions, and constrains, and value domain, etc.  The product specification is regarded that it covers and represents the road network phenomenon in real life perfectly, which means that it is consistent between the real life and the product specification.  Some constrains and content defined in product specification can’t be described by the logical model (GML schema).  Product specification is important resource for defining GML validation requirements.

 The simple features define a geometry object model to describe the geospatial element of the real world. SIMPLE FEATURES

 Simple features also define a serial of geometry class operations. GML

 The Geography Markup Language (GML) is the XML grammar defined by the Open Geospatial Consortium (OGC) to express geographical features. GML serves as a modeling language for geographic systems as well as an open interchange format for geographic transactions on the Internet.  GML uses Simple Feature geometries.  Core schema – Application schema – GML instance structure.  28 different standard core schemas imply the capacities of GML, and are the necessary frameworks for creating application schema. Application schema is for special usage.  Primitives: feature, geometry, coordinate reference system, time, topology, dynamic feature, coverage, unit of measure, etc. INCONSISTENT PROBLEM DATA QUALITY ---- ISO STANDARDS DATA QUALITY ---- ISO STANDARDS RELATED WORK ---- ESRI ARCGIS

 ESRI ArctGIS is an integrated collection of GIS software products. It provides a standards-based platform for spatial analysis, data management, and mapping.  Here the topology rules defined by ESRI are very useful and contributive to my thesis.  The topology rules cover a lot of topology rules that are common used in many applications, and also some topology rules that are only suitable for some certain applications. So these topology rules are kind of very complete.  For instance: Two lines must not overlap: Requires that lines not overlap with lines in the same feature class. This rule is used where line segments should not be duplicated. RELATED WORK ---- SOSI-CONTROL

 SOSI-Control is the current data quality control software used in Norway for SOSI data format. It checks the SOSI dataset against certain rules from the product specification.  It is a similar quality control system with my thesis. So it gives certain experiences of how should the GML validation work, how to report the validation result, and it is also a short-cut to discover the rules that validate the consistency between dataset and product specification.  The GML validation is more than the SOSI-Control, because it conforms to the ISO standard, and extend it with more complete topology checking. RELATED WORK ---- GEOTOOLS

 GeoTools is a free software (LGPL) GIS toolkit for developing standards (OGC) compliant solutions.  The JTS Topology Suite is the toolkit that contains a number of function classes for topology checking. It conforms to simple feature geometry classes and spatial relations. E.g. run the function overlaps(geometry1, geometry2) , and it will return Boolean value True or False.  JTS topology checking and validation are between two geometries. So to develop the GML validation using JTS, the dataset must have some pre-implement that extract the information in the dataset, and restore it with geometry object as unit. RESULT ---- VALIDATION REQUIREMENTS

 Besides the XML validation requirements, there are some more requirements needed for GML validation.  Not all the data quality elements defined in ISO standard can be validated. Logical consistency element can be.  All topology rules should be included.  Avoid requirements redundancy. RESULT ---- VALIDATION REQUIREMENTS

 The GML file is well-formed; (XML validation)  The GML file is valid against its schema; (XML validation)  The GML file is conformed to its production specification; 1. All coordinates in the GML file have to be in the boundary defined; 2. The segments of one road have to be continuous and consistent; 3. The serial number of each element has to be unique; 4. Can't mix 2D and 3D coordinates in one feature; 5. Find out all the blind nodes which are the end point of a road and doesn't connect to another road;  Topological consistency rules. RESULT ---- APPROACH

 The approach consists of three parts: XML validation, JTS topology suit, and Java programming.  XML validation not only validate the well-form of the GML file, but also other capacities.  The topology validation is to develop upon JTS topology suit.  Java programming is first being compatible to JTS, and second to realize the functions that are beyond JTS, and third do pre-implementation before using JTS. RESULT ---- VALIDATION RESULT HANDLING

 The validation result can be handled in two ways.  One is the common text report that contain the statistic information about what kind of errors are found, how many errors, and where are the errors.  The other way is to further report the result as meta attaches to the dataset. The report format should be conformed to the ISO standard. CONCLUSION AND FUTURE WORK

 The thesis builds a framework of GML validation based on Norwegian standard. The framework includes determining validation requirements, realizing approach, and result handling. The GML validation can validate the GML file against GML schema and extra rules, and a validated GML file is consistent to the corresponding product specification and follows certain topology rules. During the process of building the framework, necessary discussions of choices making were stated.  The future work could be 1. Developing the GML validation software based on Norwegian standard for all domains; 2. Developing the GML validation system based on other standards. 3. Developing validation system of other data formats. 4. Correction system after the GML validation process. THANK YOU! QUESTIONS?