Research Collection

Doctoral Thesis

Updating of cartographic road databases by image analysis

Author(s): Zhang, Chunsun

Publication Date: 2003

Permanent Link: https://doi.org/10.3929/ethz-a-004660753

Rights / License: In Copyright - Non-Commercial Use Permitted

This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use.

ETH Library Updating of Cartographic Road Databases by Image Analysis

Dr. sc. techn. Chunsun Zhang

Zürich, 2003 This publication is an edited version of:

Diss. ETH No. 14934

Updating of Cartographic Road Databases by Image Analysis

A dissertation submitted to the Swiss Federal Institute of Technology Zurich for the degree of Doctor of Technical Sciences Presented by Chunsun Zhang M.Sc. Liaoning Technical University

born 30th of March, 1968 citizen of Chinese accepted on the recommendation of Prof. Dr. Armin Grün, examiner Dr. Emmanuel Baltsavias, co-examiner Prof. Dr. Christian Heipke, co-examiner

November 2002

Updating of Cartographic Road Databases by Image Analysis Chunsun Zhang Copyright © 2003, Chunsun Zhang All rights reserved

Published by: Institute of Geodesy and Swiss Federal Institute of Technology (ETH) CH-8093, Zürich

ISBN 3-906467-41-4 VORWORT

In den letzten Jahren haben sich die Bestände digitaler topographischer Daten weltweit dramatisch vervielfacht. Neben der Forderung nach effizienter Ersterhebung stellte sich schnell das Problem der Nachführung. In Kooperation mit dem Bundesamt für Landestopographie, Bern haben wir ein Thema aufgegriffen, welches einen hohen prak- tischen, aber auch wissenschaftlichen Stellenwert hat: Die Verbesserung und Nachführung von landesweiten Strassennetzen. Der am Bundesamt existierende Bestand an Strassend- aten wurde gewonnen durch Digitalisierung der Landeskarte 1:25 000. Als klassische kar- tographische Grundlagendaten sind diese von beschränkter metrischer Genauigkeit, liegen nur zweidimensional vor und sind gesamthaft nicht auf dem letzten Stand der Nachführung. Wir stellten uns nun gemeinsam die Aufgabe, im Rahmen des Forschungsprojekts ATOMI (Automated reconstruction of Topographic Objects from aerial images using vectorized Map Information) diese bestehenden Strassendaten möglichst vollautomatisch aus Far- bluftbildern des Massstabs 1:16 000 zu extrahieren. Der Autor dieser Arbeit Chunsun Zhang hat in mühsamer Detailarbeit ein algorithmisches Gerüst entwickelt, implementiert und ausgetestet, welches diese Aufgabe der 3D Strassen- nachführung zum ersten Mal auch unter praktischen Randbedingungen ermöglicht. Durch konsequente Ausnutzung aller verfügbaren a priori Strassendaten und einer Vielfalt von Bildinformationen (Kanten, homogene Bildregionen, Schatten, DSM/DTM, Farbe, Strassenmarkierungen, etc.) gelingt es dem Autor, Ergebnisse von bisher nicht dagewesener Qualität zu erzeugen. Das Strassennetz wird nach dem Konzept einer kom- binierten Bottom-up/Top-down Strategie generiert: Basierend auf extrahierten elementaren geometrischen Bildprimitiven werden sukzessive höherwertige geometrische Gebilde unter steter Nutzung von Modellvorstellungen über das Objekt Strasse abgeleitet, bis schliesslich das gesamte 3D Strassennetz vorliegt. Durch vielfache Tests weist der Autor nach, dass er nicht nur eine innovative wissen- schaftliche Lösung gefunden hat, sondern dass diese auch den strengen Bedingungen der Praxis standhält. Chunsun Zhang hat mit seiner Arbeit einen Durchbruch auf diesem Sektor der automa- tischen Bildanalyse erzielt. Seine Arbeit wird deshalb auf Jahre hinaus Referenzcharakter haben. Dennoch bleiben einige interessante Problembereiche übrig, wie zum Beispiel die robuste Strassenextraktion in Innenstadtgebieten. Es ist mir ein Anliegen, Herrn Chunsun Zhang zu seiner hervorragenden Arbeit zu gratul- ieren. Ich wünsche ihm für seine weitere Karriere alles Gute und eine Fortschreibung des Erfolgs, den er sich mit dieser Studie erarbeitet hat. Den Leserinnen und Lesern wünsche ich viel Vergnügen mit der Lektüre dieser Disserta- tion.

Zürich, im Februar 2003 Prof. Dr. Armin Grün ABSTRACT

This thesis addresses the topic of improvement and updating of cartographic road databases by image analysis. Research on this issue is mainly motivated by the demand of generation of digital landscape models that conform to reality and the need of efficient data acquisition and updating for geographic information systems (GIS). Aerial imagery provides the per- fect medium to capture geospatial information. Accordingly, object extraction from aerial images is a fundamental photogrammetric operation. Despite substantial work in the pho- togrammetry and communities during the last two decades, full-automatic methods are still far out of reach. Thus, semi-automatic methods have been developed. However, the optimization of interaction between the operator and computer is a crucial task. A recent tendency that aims at easing automation and improving the results is the inte- gration of existing geodatabases in image processing. The effect of this integration is two- fold: the existing information provides a rough model of the scene, that will help the automation process, while the old road database gets revised and updated with the latest information from aerial images. In this dissertation, a system for automatic extraction of 3-D road networks from stereo aerial images which integrates knowledge processing of colour image data and existing digital geodatabases is presented. A great deal of efforts has been made to increase the suc- cess rate and the reliability of the extraction results. This is achieved by the extraction of high quality features and cues, which are then combined in a careful way. The main features and cues are 3-D straight edges, road regions, shadows, road marks and zebra crossings, and DSM blobs. The system uses and fuses multiple cues about the road existence and existing information sources to generate and group road primitives. This fusion provides not only complementary, but also redundant information about road existence to account for errors and incomplete results in low-level image analysis. The knowledge from the existing geodatabases and road design rules includes information for each individual road as well as for the topology of the whole road network. They are employed to restrict the search space, treat each road subclass differently, check plausibility of multiple possible hypotheses and derive reliability criteria. The presented system essentially consists of the following main components: feature and cue extraction, road primitive generation and grouping, road junction and road network construction, and the system performance evaluation. Each of them is important and pos- sesses particular features which are fully elaborated in different parts of the thesis. Edges are extracted in stereo images and are then aggregated and processed to generate straight edge segments. Each edge segment is attributed with geometric and photometric properties. In order to transform the 2-D edge segments to 3-D object space, an efficient and robust straight edge segment matching method has been developed. The method ii exploits the rich attributes of edge segments as well as the edge structure information to achieve consistent results. The color images are segmented by a clustering algorithm to find road regions. The original RGB image data is transformed into different color spaces to enhance features. In addition, the principal component transformation technique is applied to analyse the original image data and select the appropriate image bands for clustering. The DSM data is also employed to support road extraction. The DSM blobs are detected directly from the DSM data by a Multiple Height Bin method, in which the DSM heights are grouped into consecutive bins of a certain size. Road marks and zebra crossings are usually present on main roads, and are good indications of road existence. The road marks are treated as linear objects and extracted using an image line model, while zebra crossings are extracted as clusters with distinct color and certain size. All the information derived from the existing geodatabases, image and DSM data are used to extract roads. The main features of road primitive generation and grouping are: direct modelling in 3-D, extensive use of multiple and redundant cues, combination of 2-D and 3- D processing. The first step in road extraction involves a process for the exclusion of irrel- evant features. The road primitives are generated from edges or road marks in object space. Several techniques are developed to infer the missing 3-D road sides. Gaps caused by occlusions and shadows are bridged using the information of the existing road vectors. In each step, the extracted cues are employed to ensure reliable generation of primitives and rejection of false hypotheses. The primitives are then connected to extract roads by maxi- mizing a merit function. The function combines various measures for the primitives and gaps as well as the shape information of the existing road vectors. Thus, the road segments are selected and connected with gaps bridged while the false hypotheses are rejected. Based on the extracted roads, the road junctions are generated. Highways and main roads are also extracted using the detected road marks and zebra cross- ings. In rural areas, the extracted roads using road marks are also used to verify the extrac- tion results using edges. In complex areas, such as in cities or city centers, the road sides are generally occluded very much, and sometimes it is impossible to identify them. How- ever, some of these roads are successfully extracted by exploiting road marks. Finally, an analysis of the road reconstruction results is carried out. In order to test the per- formance of the developed system, various datasets in different landscapes in Switzerland and Belgium are used. The experiments and the results of the evaluation using precise ref- erence data show that more than 93% of the roads in rural areas are correctly reconstructed by the system, and the achieved accuracy of the road centerlines is better than 1m both in planimetry and height. The developed system can serve as an automatic tool to extract roads in rural areas for digital road database production. Zusammenfassung

Die vorliegende Arbeit hat die Verbesserung und Nachführung kartographischer Datenban- ken von Verkehrswegen zum Inhalt. Das Ziel der auf diesem Gebiet geleisteten For- schungsarbeit ist die Erzeugung realitätsnaher digitaler Landschaftmodelle mit effizienter Datenerfassung und Nachführung innerhalb von geographischen Informationssystemen (GIS). Ein geeignetes Medium zur Erfassung räumlicher Daten sind Luftbilder. Die Objek- textraktion aus Luftbildern ist daher eine der grundlegenden Aufgaben der Photogramme- trie. Trotz umfangreicher Forschungsarbeiten in den Bereichen Photogrammetrie und Computer Vision während der letzten beiden Jahrzehnte sind vollautomatische Methoden in naher Zukunft immer noch undenkbar. Darum wurden halbautomatische Methoden ent- wickelt. Eine entscheidende Aufgabe ist die Optimierung der Interaktion zwischen dem Operateur und dem Computer. Neuere Entwicklungen zielen auf eine erleichterte Automa- tisierung und Verbesserung der Ergebnisse durch die Intergration von existierenden Geo- Datenbanken bei der Bildverarbeitung ab. Diese Integration wirkt sich auf zweierlei Weise aus: Die bestehenden Information liefert ein grobes Modell der Szenerie, welches die auto- matisierte Prozessierung unterstützt, während der vorliegende Datenbestand überarbeitet und anhand neuester Informationen aus Luftbildern aktualisiert wird. In dieser Dissertation wird ein System zur automatischen Extraktion von dreidimensiona- len Verkehrswegenetzen aus Farbluftbildern unter Zuhilfenahme von bereits existierenden digitalen Geodatenbanken vorgestellt. Es wurde sehr viel Aufwand betrieben, um die Erfolgsrate und die Zuverlässigkeit der Ergebnisse der Extraktion zu erhöhen. Dies wird durch die Verwendung von aussagekräftigen Merkmalen und Eigenschaften sowie deren sorgfältige Kombination erreicht. Die Hauptmerkmale sind geradlinige 3D-Kanten, Stras- senflächen, Schatten, Fahrbahnmarkierungen, Zebrastreifen und Blobs im digitalen Ober- flächenmodell. Das System nutzt und vereint mehrere Indikatoren über das Vorhandensein von Strassen sowie existierende Datenquellen, um Strassenelemente zu generieren und zusammenzufassen. Diese Fusion liefert nicht nur sich ergänzende, sondern auch redun- dante Information über vorhandene Verkehrswege, um Fehler und unvollständige Ergeb- nisse der Bildanalyse aufzudecken. Existierenden Geo-Datenbanken und die Kenntnisse der Grundsätze des Verkehrswegebaus enthalten Informationen über jede einzelne Strasse als auch über die Topologie des gesamten Verkehrswegenetzes. Diese werden dazu verwen- det, Suchbereiche einzuschränken, einzelne Verkehrswegeklassen unterschiedlich zu behandeln, die Plausibilität verschiedener Hypothesen zu prüfen und Zuverlässigkeits- masse abzuleiten. Das vorgestellte System besteht im wesentlichen aus dem folgenden Komponenten: Extraktion von Merkmalen und Eigenschaften, Generierung und Zusammenfassung von Strassenprimitiven, Verbindung von Verkehrswegen, Aufbau des Verkehrswegenetzes und iv

Ergebnis- und Leistungsfähigkeitsanalyse. Jede einzelne Komponente wird als wichtig erachtet und verfügt über spezielle Charakteristika, welche ausführlich in den entsprechen- den Abschnitten der Doktorarbeit behandelt und diskutiert werden. Aus Stereobildern werden linienhafte Elemente extrahiert, zusammengefasst und zur Ableitung geradliniger Kanten verwendet. Jedes Kantenelement wird mit geo- und photo- metrischen Eigenschaften versehen. Um die zweidimensionalen Kantenelemente in den dreidimensionalen Objektraum zu transformieren, wurde eine effiziente und robuste Methode für das Matching von Kanten entwickelt. Die Methode nutzt sowohl die Eigen- schaften der Kantenelemente als auch deren Struktur, um schlüssige Ergebnisse zu erzielen. Durch die Anwendung eines Clustering-Algorithmus werden in den Farbbildern die Stras- senflächen gesucht. Die originalen RGB-Bilddaten werden zur Merkmalsverstärkung in verschiedene Farbräume transformiert. Zusätzlich wird die Hauptkomponententransforma- tion angewandt um die originalen Bilddaten zu analysieren und geeignete Farbkanäle für die Clusterbildung auszuwählen. DSM-Daten werden bei der Strassenextraktion ebenfalls verwendet. Die Blobs werden mit einer “Multiple Height Bin” Methode detektiert, welche direkt auf das DSM angewendet werden kann. Die üblicherweise auf Hauptstrassen vorhandenen Fahrbahnmarkierungen und Zebrastrei- fen eignen sich gut als Indikator für derartige Verkehrswege. Die Fahrbahnmarkierungen werden als lineare Objekte behandelt und durch die Verwendung von Linienerkennung im Bildraum extrahiert, während Zebrastreifen anhand von Gruppen von Elementen bestimm- ter Farbe und Grösse erkannt werden. Die ganze Information, die als Geo-Datenbank, Bilder und DSM-Daten vorliegt, wird zur Strassenextraktion verwendet. Die wichtigsten Merkmale für die Generierung und Grup- pierung von Strassenprimitiven sind: Direkte Modellierung im Objektraum, umfassende Nutzung mehrerer und redundanter Merkmale sowie die Kombination von 2D- und 3D- Prozessierung. Als erster Arbeitsschritt der Strassenextraktion sind irrelevante Merkmale auszuschliessen. Die Strassenprimitive werden ausgehend von Kanten oder Fahrbahnmar- kierungen im Objektraum generiert. Mehrere Techniken wurden dazu entwickelt, fehlende 3D-Strassengrenzen hinzuzufügen. Durch Überdeckungen oder Schatten erzeugte Lücken werden durch Vektoren aus dem existierenden Datensatz vervollständigt. Innerhalb von jedem Schritt werden die extrahierten Eigenschaften dazu verwendet, zuverlässige Stras- senprimitive zu generieren und fehlerhafte Hypothesen zu verwerfen. Die Verbindung der Primitive zu Strassen erfolgt durch die Maximierung einer Gewichtsfunktion. Diese Funk- tion verknüpft sowohl mehrere Masse der Primitive und Lücken als auch die Form der bereits bestehenden Strassenvektordaten. Dadurch werden Teilstücke von Strassen ausge- wählt und Lücken geschlossen, aber auch Fehler eliminiert. Ausgehend von den extrahier- ten Strassen werden Strassenkreuzungen generiert. Autobahnen und Hauptstrassen werden unter Verwendung der erkannten Fahrbahnmarkie- rungen und Zebrastreifen extrahiert. In ländlichen Gebieten werden die Fahrbahnmarkie- v rungen auch dazu verwendet, die Ergebnisse der Extraktion der Fahrbahnbegrenzungen zu überprüfen. Unter schwierigeren Bedingungen, wie sie in Siedlungen und Innenstadtberei- chen auftreten, sind die Fahrbahngrenzen oft verdeckt und teilweise ist deren Rekonstruk- tion nicht mehr möglich. Es können jedoch trotzdem einige dieser Verkehrswege mittels der Fahrbahnmarkierungen rekonstruiert werden. Abschliessend wurde eine Analyse der Ergebnisse der Strassenextraktion durchgeführt. Um die Leistungsfähigkeit des entwickelten System zu beurteilen, wurden mehrere Daten- sätze unterschiedlicher Landschaftsstruktur aus der Schweiz und Belgien verwendet. Der Vergleich der Resultate der Prozessierung der Testdatensätze mit genauen Referenzdaten zeigte, dass in ländlichen Gebeiten mehr als 93% der Strassen durch das System richtig rekonstruiert wurden und dass dabei die erreichte Genauigkeit der Strassenmittellinie sowohl in der Lage als auch in der Höhe besser als 1m ist. Das entwickelte System kann erfolgreich für die automatische Extraktion von Strassen in ländlichen Gebieten zur Erstel- lung von Verkehrswege-Datenbanken eingesetzt werden.

Table of Contents

1. Introduction ...... 1

1.1 Motivation ...... 1 1.2 Project ATOMI and Research Aims ...... 2 1.3 Thesis Organization...... 3

2. Review of Previous Work on Road Extraction ...... 5

2.1Overview ...... 5 2.2 Automatic Methods...... 8 2.3 Semi-automatic Methods...... 15 2.4 Automatic Methods using Maps ...... 18 2.5 Summary of Existing Methods ...... 21

3. Data Constraints and General Strategy...... 23

3.1Input Data Description ...... 23 3.2 3-D Road Extraction Strategy ...... 27

4. Feature and Cue Extraction ...... 33

4.13-D Straight Edge Generation ...... 33 4.1.1 Review of Related Work ...... 33 4.1.1.1 Edge Extraction ...... 34 4.1.1.2 Edge Segment Matching...... 35 4.1.2 Overview of Strategy ...... 39 4.1.3 Edge Extraction ...... 40 4.1.4 Straight Edge Matching ...... 41 4.1.4.1 Construction of Match Pool and Computation of Match Score ...... 41 4.1.4.2 Structural Matching with Probability Relaxation...... 46 4.1.5 3-D Straight Edge Generation ...... 49 4.1.5.1 3-D Edge Computation...... 49 4.1.5.2 3-D Straight Edge Fitting ...... 52 4.1.6 Results and Discussion ...... 53 4.2 Image Segmentation for Road Region Separation ...... 56 viii Contents

4.2.1The Clustering Algorithm...... 57 4.2.2 Selection of Image Data ...... 58 4.2.3 Results and Discussion...... 62 4.3 DSM and DTM Analysis ...... 64 4.4 Road Mark and Zebra Crossing Detection...... 68 4.4.1Review of Related Work ...... 69 4.4.2 Image Line Model and Line Extraction ...... 71 4.4.3 Zebra Crossing Detection...... 75 4.5 Summary ...... 78

5. 3-D Road Reconstruction...... 79

5.1Finding 3-D Parallel Road Sides ...... 79 5.1.1 Removal of Irrelevant Edges ...... 80 5.1.2 Forming Parallel Edges ...... 82 5.1.3 Evaluation of the Area between Parallel Edges ...... 83 5.2 Evaluation of Missing Road Sides and Gap Bridging ...... 85 5.2.1Classification of Missing Road Sides...... 86 5.2.2 Reconstruction of Missing Road Sides...... 92 5.2.2.1Reconstruction of Missing Road Sides by Extension of Parallel Edges. 92 5.2.2.2 Hypothesising Missing Road Sides...... 93 5.2.3 Evaluation of Road Segment Candidates ...... 94 5.2.4 Gap Definition and Evaluation...... 97 5.2.4.1Gap Definition...... 97 5.2.4.2 Gap Evaluation ...... 99 5.3 3-D Road Segment Linking for 3-D Road Reconstruction ...... 101 5.3.1Align Road Segment Candidates ...... 101 5.3.2 Road Segment Linking for Road Reconstruction ...... 102 5.4 3-D Road Reconstruction using Road Marks ...... 104 5.4.1Evaluation of Road Marks ...... 105 5.4.2 Road Mark Linking for Road Reconstruction ...... 106 5.5 Road Junction and Road Network Generation...... 107 5.6 Results and Discussion ...... 108 5.6.1Results of Road Reconstruction using Edges ...... 108 5.6.2 Results of Road Reconstruction using Road Marks...... 112 5.6.3 Results of Road Junction and Road Network Generation ...... 113

6 . Performance Evaluation...... 119 Contents ix

6.1 Review of Related Work ...... 119 6.2 Internal Quality Evaluation ...... 121 6.3 Scheme for External Evaluation ...... 126 6.4 Experiments and Results ...... 129 6.5 Summary ...... 133

7. Conclusions and Outlook ...... 135

7.1Summary ...... 135 7.1.1 Summary of Feature and Cue Extraction ...... 136 7.1.2 Summary of 3-D Road Reconstruction ...... 137 7.2 Conclusions ...... 138 7.3 Outlook ...... 142

APPENDIX ...... 145

A VEC25 Road Attributes...... 145 B Derivation of Knowledge from Existing Geodatabase ...... 147

BIBLIOGRAPHY...... 151

ACKNOWLEDGMENTS

List of Figures

Figure 3-1. Road images and VEC25 roads ...... 25 Figure 3-2. General strategy for road extraction ...... 29 Figure 3-3. Derived information from existing geodatabase...... 29 Figure 3-4. Algorithms and features used in road extraction system ...... 31 Figure 3-5. User interface for the developed 3-D road extraction system...... 31

Figure 4-1. Scheme for the generation of 3-D straight edge segments ...... 39 Figure 4-2. The epipolar band defines search space for edge segment ...... 42 Figure 4-3. Search space for edge segment parallel to epipolar line...... 43 Figure 4-4. Definition of weight function...... 44 Figure 4-5. Ellipses representing the spread of a, b in edge flanking regions...... 45 Figure 4-6. Graph representation of the structure of edge segments in image.....47 Figure 4-7. Relations between a pair of edge segments ...... 48 Figure 4-8. Pixel correspondence for 3-D computation ...... 50 Figure 4-9. Pixel grouping for 3-D computation...... 50 Figure 4-10. Definition of X’O’Z’ space...... 51 Figure 4-11. Plot of the computed 3-D edge chains using the three methods in X’O’Z’ space ...... 52 Figure 4-12. Edge detection and straight edge extraction ...... 53 Figure 4-13. Straight edge segment extraction and matching ...... 54 Figure 4-14. Straight edge segment extraction and matching ...... 55 Figure 4-15. GUI based quality evaluation...... 55 Figure 4-16. A scene RGB image...... 59 Figure 4-17. PCA transformed images...... 61 Figure 4-18. Greenness and shadow images ...... 62 Figure 4-19. Clustering result...... 63 Figure 4-20. Clustering can not separate unpaved road with bare soil...... 63 Figure 4-21. Example of DSM, DTM data showing presence and absence of above- ground objects...... 65 Figure 4-22. Geometrical interpretation of the MHB techniques...... 67 Figure 4-23. Detected above-ground objects in Figure 4-24 using MHB method67 Figure 4-24. Detected above-ground objects using MHB method...... 68 Figure 4-25. A typical road image with road marks and zebra crossing...... 68 Figure 4-26. Intensity profile of road marks...... 71 Figure 4-27. Example for road mark detection through line extraction ...... 75 Figure 4-28. Procedures for zebra crossing detection ...... 76 Figure 4-29. Example for zebra crossing detection...... 77

Figure 5-1. Definition of VEC25 road error buffer ...... 80 Figure 5-2. Two cases should be avoided...... 80 Figure 5-3. Effect of edge deduction...... 81 List of Figures xii

Figure 5-4. 3-D parallel edge segment ...... 82 Figure 5-5. The found parallel 3-D straight edges ...... 83 Figure 5-6. Road marks verify parallel edges belonging to road ...... 84 Figure 5-7. Examples of the found PRSPs ...... 85 Figure 5-8. Type 1 of missing road sides...... 86 Figure 5-9. Type 2 of missing road sides...... 87 Figure 5-10. Type 3 of missing road sides...... 87 Figure 5-11. Type 4 of missing road sides...... 88 Figure 5-12. Type 5 of missing road sides...... 88 Figure 5-13. Type 6 of missing road sides...... 89 Figure 5-14. Type 7 of missing road sides...... 89 Figure 5-15. Type 8 of missing road sides...... 90 Figure 5-16. Type 9 of missing road sides...... 90 Figure 5-17. Type 10 and Type 11 of missing road sides...... 91 Figure 5-18. Extension of PRSP using 3-D edges...... 92 Figure 5-19. Extension of PRSP using 2-D edges...... 93 Figure 5-20. Generation of the opposite sides of 3-D straight edge...... 94 Figure 5-21. Criteria to select edge segments to hypothesize missing road sides94 Figure 5-22. Gap definition ...... 98 Figure 5-23. nDSM confirms shadow pixels belonging to road...... 100 Figure 5-24. Align PRSPs using distance criterion...... 101 Figure 5-25. Align PRSPs using VEC25 road...... 102 Figure 5-26. Trees generated at both sides of a PRSP...... 104 Figure 5-27. Definition of regions beside road marks...... 105 Figure 5-28. Classify road marks at road borders ...... 106 Figure 5-29. Quadrilateral generation for evaluation of road mark linking ...... 106 Figure 5-30. Junction generation by road intersection ...... 107 Figure 5-31. Road extraction in rural area using edges in the test sites in Switzerland ...... 109 Figure 5-32. Road extraction in rural area using edges in the test sites in Belgium 110 Figure 5-33. Road extraction using edges in suburban, urban areas in the test sites in Switzerland, and in villages in the test sites in Belgium...... 111 Figure 5-34. Road extraction using road marks in the test sites in Switzerland.112 Figure 5-35. Road extraction using road marks in Belgium test site ...... 113 Figure 5-36. Results of reconstruction of road junctions in the test sites of Switzerland ...... 114 Figure 5-37. Results of reconstruction of road junctions in the test sites in Belgium 113 Figure 5-38. Problematic cases of road extraction and junction generation ...... 116 Figure 5-39. Reconstructed road networks in the test sites in Switzerland...... 117 Figure 5-40. Reconstructed road network in the test site in Belgium ...... 118

Figure 6-1. Internal quality assessment through overall quality measures...... 124 Figure 6-2. Internal quality assessment through gap-based quality measures ...125 xiii List of Figures

Figure 6-3. Internal evaluation for roads with road marks...... 126 Figure 6-4. Computation of projection a point onto a straight segment in object space...... 127 Figure 6-5. Compute matched extraction: case 1...... 128 Figure 6-6. Compute matched extraction: case 2...... 129 Figure 6-7. Compute matched extraction: case 3...... 129 Figure 6-8. Extracted roads and reference data in the test site in Switzerland .. 131 Figure 6-9. Extracted roads and reference data in the test site of Belgium ...... 132 Figure 6-10. Roads are not extracted in villages in the test site of Switzerland 133 Figure 6-11. Different delineation of highway in map, extracted results and reference data ...... 133

Figure B-1. Processing geodatabase data to derive roads and road attributes ... 147 Figure B-2. Classify VEC25 roads according landcover information ...... 148 Figure B-3. Road junction and its attributes ...... 149

List of Tables

Table 2-1. Summary of landcover types and image resolutions in reviewed methods 8 Table 2-2. Summary of automatic road extraction methods...... 15 Table 2-3. Summary of semi-automatic road extraction methods...... 18 Table 2-4. Summary of methods using additional sources...... 21

Table 3-1. Image specifications of ATOMI dataset ...... 23 Table 3-2. Image specifications of Belgium dataset...... 27

Table 4-1. Evaluation of three methods for 3-D computation...... 51 Table 4-2. Quantitative evaluation of straight edge matching ...... 56

Table 5-1. Summary of the methods for reconstruction of missing road sides ....92 Table 5-2. Criteria of the specific reliability measures for different types of road segment candidates ...... 96

Table 6-1. Test scene descriptions ...... 130 Table 6-2. Quality measures ...... 130

Table A-1. VEC25 road attributes ...... 145

1. INTRODUCTION

1.1 Motivation

In modern map production a shift took place from maps stored in analogue form on paper or film to a digital database containing topographic information. A digital topographic data- base is an essential part of a GIS. Recently, there is a need to generate digital landscape models that conform to reality and do not include map generation effects. This allows the integration of additional object classes and information compared to the ones in traditional topographic maps and also inclusion of the third dimension. In addition, the demand for digital data, especially of buildings and roads, for various applications is increasing and the requirements for their accuracy, completeness and up-to-date status are also raised. To cope with higher product demands, increase the productivity and cut cost and time requirements, automation tools in the production should be employed. As aerial images are the major source of primary data, it is obvious that automated aerial image analysis can lead to sig- nificant benefits. In the last two decades, numerous approaches and various strategies have been developed for the automation of object extraction from aerial images. Although existing approaches which target specific subtasks show various degrees of success, however, automated meth- ods usually fail to provide good quality results and are still far out of reach. Therefore, semi-automatic methods have been developed and often used successfully in various projects. In semi-automatic approaches the optimization of the interaction between opera- tor and computer is a crucial task. To ease automation of aerial image analysis a prior information can be used. This includes usually data from maps, GIS and other geodatabases. In addition to existing data, a prior information can include models, constraints about the objects and the scene, rules etc. A clear tendency that aims at easing automation and improving results is the combination of different input data that provide complimentary, but also redundant information and cues about the existence, shape, size, etc. of an object. The automation of 3-D road reconstruction from aerial imagery by integrating existing geo- databases as well as other data sources is the topic of this investigation. Various existing and emerging applications require in particular up-to-date, accurate and sufficiently attrib- uted road databases, including car navigation, tourism, traffic and fleet management and monitoring, intelligent transportation systems, internet-based map services, location-based services, etc. The fact that vendors of commercial photogrammetric, remote sensing and GI systems do not offer anything regarding automation of road extraction (not even rudimen- tary semi-automatic methods) stresses the importance of this research topic. In Europe, interest in automated road extraction is also exemplified by a proposal to start an EuroSDR (previously OEEPE) -initiated large Network of Excellence on acquisition and mainte- 2 1. INTRODUCTION nance of 3-D geospatial information (including roads) within the 6th EU Framework Pro- gramme, and the establishment of an EuroSDR Working Group on road database extraction, refinement and update (http://www.bauv.unibw-muenchen.de/institute/inst10/ oeepe). Furthermore, in 2002 twelve organisations from National Mapping Agencies (NMAs), road administrations and private sector key players of road data market submitted the HERDS (Harmonised European Road Data Solution) project proposal for EC funding. Also in the European Territorial Management Information Infrastructure project roads are mentioned together with elevation and hydrography as the only objects, commonly agreed to be important enough to be defined as reference data, needed by most applications (see http://www.ec-gis.org/etemii/reports/chapter1.pdf).

1.2 Project ATOMI and Research Aims

The work presented in the thesis has been achieved within the framework of the ATOMI project. The project ATOMI is a cooperation between the Swiss Federal Office of Topog- raphy (L+T) and the Institute of Geodesy and Photogrammetry (IGP), ETH Zurich. ATOMI stands for Automated reconstruction of Topographic Objects from aerial images using vec- torized Map Information. Its aim is to use aerial images and DTM data and automated pro- cedures to improve vector data (road centerlines, building outlines) from digitised 1:25,000 topomaps by fitting it to the real landscape (remove generalisation effects), updating it, improving the planimetric accuracy to 1m and providing height information with 1-2m accuracy. The topology of the existing dataset, with the exception of error correction, should be maintained. The whole procedure should be implemented as a standalone soft- ware package, able to import and export data as used at L+T. It should be quasi operational, fast, and most importantly reliable. We do not aim at full automation, but the “correct” results should be really correct to avoid checking manually the whole dataset. More details about ATOMI can be found in Eidenbenz et al. (2000). This thesis deals with road reconstruction in the project ATOMI. The task is to develop and evaluate a road extraction system which includes a set of robust and efficient methods to process the raw image data and the existing information. The results of this system will be the road network that should be as complete as possible and fulfil the accuracy requirements of ATOMI. The results will be used to improve the old road map. Road extraction from aerial images is a difficult task. The difficulty depends on the com- plexity of the scenes. Although the developed road reconstruction strategy can also treat to a certain extent roads in complex context, such as in urban areas, however, for a practical application in the framework of ATOMI, the results in such areas are poor and thus we con- centrate on road reconstruction in open rural areas. Secondly, we only treat the roads in the existing road database. That is, detection of new roads which are not in the existing road database will not be included. 1.3 Thesis Organization 3

1.3 Thesis Organization

This thesis is built over seven main chapters. After this introduction, we will provide an overview of the existing road extraction approaches in Chapter 2. This chapter aims to give a description of the different existing methods. Furthermore, it supplies the reader with many references where more details can be found. This chapter is necessary to supply the reader with information on road extraction techniques, permitting thus a better understand- ing and a possibility of comparison to our developed system. The input data is introduced in Chapter 3. The used image data and the existing geodatabase as well as other data sources such as DTM and DSM are described. Based on the input data, our strategic choices and the system architecture are presented. Chapter 4 describes several algorithms for feature and cue extraction. Finding 3-D edges is a crucial component of our system. The generation of 3-D edges is explained in Section 4.1. This includes edge extraction, edge matching and 3-D edge generation. Next, we analyse the color image data to find the road regions. This is realised by an algorithm for image seg- mentation, and is detailed in Section 4.2. In Section 4.3, we analyse the height data to sep- arate above-ground objects and ground objects to support road extraction. Road marks and zebra crossings are good indications of the existence of certain roads, such as highways, first class roads and some roads in urban area, the methods for road mark and zebra crossing detection are presented in Section 4.4. In Chapter 5, the procedures for 3-D reconstruction of roads using the extracted features and cues are described. This is achieved by finding road segments from 3-D straight edges or road marks and linking them in sequence. The general scheme is to use knowledge obtained from the existing database and the extracted cues as much as possible and as early as possible to exclude irrelevant features and provide reliable primitives followed by a link- ing process. Finally, the road network is reconstructed through the generation of road junc- tions by intersecting the extracted roads with the aid of the topology information of the existing road database. In Chapter 6, we present our methods for system performance evaluation. Both internal self-diagnosis and external evaluation of the obtained results are discussed. Several criteria for internal evaluation are defined, which take into account the information of the recon- structed results as well as the primitives that make up the extracted road. The external eval- uation using manually measured road centerlines is conducted in a semi-automatic manner. We use several quality measures to assess exhaustiveness and geometrical accuracy, and present a method for the computation. The results of the evaluation will be reported. The last part, Chapter 7, gives conclusions about the suitability of the developed road extraction system for updating of road maps and discusses the advantages, limits and per- spectives of such a road network extraction system. Finally, recommendations for further research are given.

2. REVIEW OF PREVIOUS WORK ON ROAD EXTRACTION

On the way towards automatic mapping, spatial data acquisition and updating, the auto- matic extraction of roads from digital imagery has drawn considerable attention in the past years. A variety of techniques has emerged in the literature. Unfortunately, there is no uni- fying theoretical background behind all these techniques. Approaches developed are quite different due to the differences in strategies, type and resolution of input images, the prim- itives employed for road identification, experiment configurations, ways of processing, general assumptions, etc. In this chapter, a large number of relevant publications on road extraction will be reviewed.

2.1 Overview

Road extraction approaches are generally classified according to the degree of automation. A typical automatic system for road extraction consists of road finding, road following/ tracking/tracing, and road linking. A semi-automatic approach requires the interaction between the algorithm and an operator. In contrast to automatic methods, the road initiali- zation is given by a human operator. A semi-automatic approach does not have to consider the problem of road finding. Furthermore, the given road approximation can often be used to reduce the search space. A trend to employ a prior information for road extraction has been observed recently. This includes usually data from maps, GIS and other geodatabases. In addition to existing data, a prior information can include models, constraints about objects and scenes, rules. These approaches might be more promising to provide more com- plete and reliable results, because the prior information can be used to help the image inter- pretation process by restricting search space, applying realistic constraints, etc. In the following, a review of approaches for road extraction is given. We review the approaches in three categories: automatic methods, semi-automatic methods and automatic methods using additional sources. They are presented in Section 2.2, Section 2.3 and Section 2.4 respectively. In each category, the approaches are reviewed chronically. If more papers about the same road extraction procedures were published by the same group, for the review given in this chapter the most complete or latest one is selected. The approaches are also summarized in Table 2-2, Table 2-3 and Table 2-4 respectively. When reviewing previous work one should realize that a successful interpretation not only depends on the strategy and techniques used for road extraction, but also on the type of images which they are applied. Firstly, the resolution of images has a great influence on the representation of both roads and other objects. Many details recognizable in high resolution images become unclear and finally disappear when the image resolution gets lower. In low resolution images, roads appear as lines or less homogeneous surface. The disadvantage of 6 2. REVIEW OF PREVIOUS WORK ON ROAD EXTRACTION such representation is that roads can easily be confused with other linear structures in images. Nevertheless, by suppression of small road details the low resolution road extrac- tion problem is largely reduced to the general problem of line extraction. In high resolution images, the presence of many characteristic road details provides more evidence for the road existence. More reasons for accepting and rejecting a given hypothesis can be found from the analysis of these details. On the other hand, the details of roads can change greatly. Also, the bigger number of details will increase the complexity of road detection process. Thus, the details to be used have to be chosen carefully. Besides the selection of road details, the contribution of different road details to verify the presence of a road also has to be considered. Secondly, depending on the scene under consideration (e.g. urban or rural), the road network can have a different degree of complexity in e.g. density, shape, etc. Roads in urban or suburban areas have a quite different appearance from roads in forest or open rural areas. The differences in appearance are partly consequences of different relations between roads and the neighbouring objects. Usually, in urban and forest areas knowledge about geometry and radiometry alone is often insufficient to extract roads. The resolution and scene complexity of the test images used in the reviewed approaches are listed in Table 2-1. Regarding the scene complexity, a discrimination is made between urban, sub- urban, rural areas. This information will be useful as a look up table to understand some choices made in approaches reviewed in the following sections.

Authors Landcover Ground resolution Bajcsy and Tavakoli (1976) Rural 57 by 79m Quam (1978) Rural 1-3m Nagao and Matsuyama (1980) Rural, Suburban 0.5m Fischler et al. (1981) rural Unknown, low resolution Zhu and Yeh (1986) Rural 3 to 4m Canning et al. (1987) rural Unknown, low resolution McKeown and Denlinger (1988) Urban, Suburban and 1 to 3m rural Ton et al. (1989) Rural 30 by 30 m Matsuyama and Hwang (1990) Suburban Unknown Fua and Leclerc (1990) Suburban Unknown Van Cleynenbreugel et al. (1990) rural 10m, 20m Wang et al. (1992) Suburban 10m Zlotnick and Carnine (1993) rural and suburban 1 to 3m Heipke et al. (1994) Rural 0.23m Stilla and Hajdu (1994) Urban Unknown

Table 2-1. Summary of landcover types and image resolutions in the reviewed methods. 2.1 Overview 7

Authors Landcover Ground resolution Airault and Jamet (1994) Rural 0.5-1.0m Plietker (1994) Rural 0.45m Heipke et al. (1995) Rural 0.23m Vosselman and de Knecht (1995) Rural 1.6m Grün and Li (1995, 1997a, 1997b) Urban, suburban and 0.6 to 10m rural Steger (1996) Rural 3.6m Gong and Wang (1996) Suburban 1.6m Ruskone (1996) Suburban and rural 0.45m Trinder et al. (1997) Rural 0.75m Baumgartner et al. (1997) Rural 0.23m Mayer et al. (1997) Rural 0.23m Vosselman and de Gunst (1997) Rural 1.6m Barzohar and Cooper (1997) Suburban Unknown, low resolution Bordes et al. (1997) Rural 0.5m Wiedemann et al. (1998) Rural 16m Zafiropoulos and Schenk (1998) Rural 10m Fischler and Heller (1998) Rural Unknown Klang (1998) Rural 10m Fiset et al. (1998) Urban 10m Bueckner (1998) Rural Unknow, low resolution Wiedemann (1999) Rural 16m Heipke et al. (2000) Rural 0.8m Price (1999) Urban Unknown, high resolution Trinder et al. (2000) Rural Unknown, high resolution Hu et al. (2000) Rural Unknow, low resolution Fortier et al. (2001) Rural 1.5m Doucette et al. (2001) Rural 1.0m Hinz et al. (2001) Urban 0.07m Agouris et al. (2001) Rural 5.0m Dial et al. (2001) Suburban and urban 1.0m and 4.0m Wiedemann (2002) rural 16m Baumgartner et al. (2002) rural 0.2-3.0m Willrich (2002) rural 1.7m

Table 2-1. Summary of landcover types and image resolutions in the reviewed methods. 8 2. REVIEW OF PREVIOUS WORK ON ROAD EXTRACTION

Authors Landcover Ground resolution Barsi et al. (2002) rural 0.4m Zhao et al. (2002) rural and urban 1m

Table 2-1. Summary of landcover types and image resolutions in the reviewed methods. 2.2 Automatic Methods

An early attempt is reported in Bajcsy and Tavakoli (1976), where roads are extracted from Landsat-1MSS images with ground resolution 57 by 79 meters. They first determine the approximate intensity range of roads and then perform a threshold operation on the whole image. Road finding is based on matching of image points with 52 templates assuming roads are line-like and from 1to 3 pixels wide. The matched parts are connected into line segments using constraints such as curvature and distance between road points. The limi- tation of this approach is that it requires the road surface to exhibit a specific spectral prop- erty. An image analysis system is presented in Nagao and Matsuyama (1980). They start with image smoothing and extract regions based on spectral properties. They then focus on the elongated regions which are not belonging to vegetation or water. These regions are then connected based on their spectral and geometric attributes to form roads and road networks. Fischler et al. (1981) propose an approach to overcome the weak operation of thresholding. The approach employs the DRO (Duda Road Operator) to detect lines and edges. Two oper- ators are used: type I offers high accuracy in identifying roads without dealing with the determination of their precise outline; type II offers a high precision in delineating features. Type I uses four masks (horizontal, vertical and two diagonal) to detect road candidates by combination of the brightness uniformity measure along the road with the measures of the contrast between the road adjacent areas. The results of type I are image features with high probability of belonging to road parts. These results are used as road hypotheses, which are connected using the results of Type II. The interconnection by dynamic programming to look for a minimum cost path provides the final road. It does a good job most of the time but has some weaknesses: it is sensitive to road orientation and sharp changes of the road direction and has certain contrast problems (Fischler et al. 1981). Zhu and Yeh (1986) propose an approach to extract roads from aerial images with ground resolution of 3 to 4m. They first extract anti-parallel linear edge segments from which the road seeds are selected by setting appropriate thresholds on the width and gray level of the area bounded by the segments. Roads are then extracted by a road growing algorithm start- ing from the ends of the road seeds. In case of a gap between two adjacent road seeds, a set of rules are applied either to fill the gap or to stop the road growing process. A road detection method is introduced in Canning et al. (1987). They use a symbolic label- ling of pixels as the local property criterion and search for maximal sets of pixels whose labels are consistent with a hypothesis of a narrow road. In their method, each pixel is oper- ated upon with a set of masks. These masks describe the possible road segments passing 2.2 Automatic Methods 9 through a given pixel. The consistency links between the masks of neighbouring pixels are then used to construct all linear features (including roads) in the image. In Ton et al. (1989), a road identification system for Landsat TM 4 images is presented. It consists of a road seed detection algorithm, a conceptually parallel road growing method and a rule-based segment labelling and merging phase. The road detector is similar to that of DRO, where road pixels are determined based on the intensities, directions and the local neighbourhood of the pixels. The detected road seeds are then extended to form roads. The roads are merged using knowledge-based rules. The Knowledge used relies on the appear- ance of road with regard to the image resolution and on geometric properties (shape, length) of the extracted segment. Matsuyama and Hwang (1990) start with a simple image segmentation based on grey value thresholding which produces regions. The candidate road parts and houses are selected by evaluating region shapes and contrasts. The road parts are then extended and linked to form roads. They also use the spatial relationships between houses and roads to search for drive- ways. Wang et al. (1992) present a gradient direction profile analysis method to extract road net- works from SPOT High Resolution Visible (HRV) panchromatic data. The algorithm finds digital ridges by computing gradient directions and profiles along the gradient directions. The road network is then detected with segmentation, noise removal and thinning. The method requires big spectral contrast of the roads, and the spectral characteristics must vary little along the length of the road. Thus, difficulties arise where roads pass through areas of bare soil or in the areas where the contrast is low. Furthermore, it cannot handle the cases of occlusions. In Zlotnick and Carnine (1993), a road finding algorithm is presented. Firstly, edges are extracted from the image, and the road center hypotheses are computed as the points between the parallel edges. The pixels of the road center hypotheses are connected to gen- erate continuous trajectories. The trajectories with locally dense road center hypotheses are assumed to belong to roads. They are then delineated by a so-called ISP algorithm, which are further processed by applying smoothness constraint. Finally, a postlinking is con- ducted to connect the fragmented roads by applying geometrical constraints such as dis- tance criteria, smoothness of the link, etc. Heipke et al. (1995) propose a strategy to extract roads in multi-resolution images. Two dif- ferent resolutions of the same aerial image are used. In the coarse resolution roads are mod- elled as bright lines and extracted by a combination of local and global thresholding. In fine resolution the edges are extracted. The edges are grouped to form parallels and the region between is checked. Roads are then extracted by the combination of the results from both resolutions through a set of rules. This multi-resolution approach is lately adapted and extended by Baumgartner et al. (1997), Mayer et al. (1997), Hinz et al. (2001) and other researchers. 10 2. REVIEW OF PREVIOUS WORK ON ROAD EXTRACTION

In Gong and Wang (1996), an image classification method is employed for road detection from airborne images with 1.6m ground resolution. They argued that a road surface in high resolution images is a light continuous and homogeneous region rather than a thin line. The road is detected by classification followed by noise removal, thinning and pruning. The quality of the results largely depends on the noise removal step since house roofs and drive- ways etc. have similar spectral properties with road surface and therefore disturb the detec- tion. Ruskone (1996) extends the semi-automatic road extraction method developed by Airault and Jamet (1994) to an automatic road extraction method by automatically detecting road seeds from aerial images. This is done by a watershed segmentation algorithm on a gradient image. Road segments are extracted using homogeneity of the road surface bounded by two edges. The road network is constructed based on the general knowledge about the road net- work topology and geometry and the hypotheses about connections between road segments are checked. The main contribution of the work is the validation part to assess the reliability of every generated arc so that an operator may focus his attention on difficult portions and save time on the reliable areas. For this purpose, two algorithms are developed. The first is realized through a supervised learning for different kinds of objects (not only roads) in dif- ferent kinds of contexts to classify each road hypothesis and to compute a probability of belonging to the road class. The other one is based on vehicle detection. A differential geometric approach is presented in Steger (1996). The method is originally designed for the extraction of curvilinear lines from images, and applied to extract road center lines. For each pixel in the images, the second order of Taylor polynomial is com- puted by convolving the image function with derivatives of the Gaussian smoothing ker- nels. The line points are required to have a vanishing gradient and a high curvature in the direction perpendicular to the line. The paper presents a theoretical foundation for line extraction in scale space. The relation between the width of the searched roads and image scale is stated explicitly and a method for image scale calculation is also given. Finally, the individual line points are connected into lines and junctions by a linking algorithm. It is argued that the algorithm can detect roads with sub-pixel accuracy, and the bias caused by asymmetry can be removed. However, the method does not take advantage of road proper- ties, therefore, gaps will appear when roads locally lose their line properties due to occlu- sions, shadows, or poor contrasts. Baumgartner et al. (1997) extend the idea of Heipke et al. (1995), and extract roads from multi-resolution images. Firstly, the bright lines are extracted in an image of reduced reso- lution using the approach of Steger (1996), while edges are extracted in the original high resolution image. Road side hypotheses are generated using the line and edge features in the both resolution levels and the explicit geometric and radiometric knowledge about roads. Road sides are used to construct quadrilaterals representing road parts. Neighbouring road parts are chained into road segments. Roads are constructed by grouping road seg- ments and closing gaps between them. 2.2 Automatic Methods 11

Mayer et al. (1997) also employ the multi-resolution approach of Heipke et al. (1995), and extract lines in coarse image using the line extraction algorithm of Steger (1996). They then take the extracted lines as the input to a ribbon Snakes to verify roads and discriminate them from other line type objects by means of the constancy of the width. Gaps caused by shad- ows etc. are bridged with so-called Ziplock Snakes (Neuenschwander et al., 1995). In Trinder et al. (1997), a highway is automatically extracted with a knowledge-based algo- rithm. They start with edge detection and linking to form segments based on proximity and collinearity. The anti-parallel pairs are then determined by searching for segments with opposite gradient directions, approximately same spatial directions. The anti-parallel pairs are grouped to form road-like features, if they are close to each other, and have same widths, same gradient directions, similar gray values, similar orientations at their nearest ends, small difference in height. The knowledge used for recognition is that a road is a long fea- ture and has a defined width, inward gradient direction and high intensity value. However, besides the need for many threshold settings, the algorithm may also fail at road junctions and occluded parts. A method to generate knowledge based on machine learning is also proposed but not tested. Barzohar and Cooper (1997) use a geometric-stochastic road model and a combined multi- hypothesis generalized Kalman filter to track roads in aerial images. Auto regressive proc- esses are designed to model the road centerline, road width, gray level of road surface, and edge strength. The road candidates are obtained by dynamic programming. The combined generalized Kalman filter tracks a road among candidates in three different cases: no occlu- sion, partial occlusion and complete occlusion. The algorithm finds the best road extension through calculation of the maximum a posterior estimation of road geometry from the image data. The extraction of road network from MOMS-2P data is presented in Wiedemann et al. (1998). They extract lines from image using the method developed by Steger (1996), and construct a weighted graph from the lines and the gaps between them. Road network gen- eration is carried out by calculation of the “best path” between various pairs of points, which are assumed to lie on the road network with high probability. The approach uses radi- ometric information for line extraction and weighting and topological information for the network generation. Since the road model does not always fit to the actual road because of occlusions, the road network cannot be extracted completely. The SRI road extraction system is presented in Fischler and Heller (1998). The system involves three distinct processes, including low resolution road detection and recognition, high resolution 3-D refinement and attribution and interactive editing. In low resolution images, road points are detected and assembled into segments, which are then processed by a semantic filter taking mainly geometrical attributes into account, such as length, direc- tional consistency, smoothness etc. The remaining segments are linked using a set of crite- ria. The results are projected into object space by monoplotting against a terrain elevation model, and then back projected to the entire collection of high resolution images. Using 12 2. REVIEW OF PREVIOUS WORK ON ROAD EXTRACTION additional data sources, the visibility of each segment in the images is determined. Finally roads are extracted from the selected images using Snakes. In Bueckner (1998) roads are extracted from different sensor data, mainly for image regis- tration. The image roads from different sensors are modelled as long strips bounded by anti- parallel edges whose widths and luminance are in certain ranges. He first computes a gra- dient image from which the most likely road edge pixels are determined. These pixels are then grouped to long segments. Finally an A* algorithm is employed to expand from the end of the segments so that the isolated segments are combined. This procedure is con- ducted on different sensor images independently. The junctions, resulted from the intersec- tion of the extracted road segments, are used for image registration. An approach to complete the automatically extracted road network is reported by Wiede- mann (1999). This is necessary since the extracted road network by automatic methods are generally incomplete and fragmented. The idea is based on the fact that a road network is designed to allow fast, cheap, efficient and secure transports. The preliminary link hypoth- eses are defined by a so-called detour factor which is computed as the ratio of the distance along the shortest path of two nodes within the existing network and the distance along a hypothetical optimal path of the two nodes. Only the hypotheses with local maximal detour factor are selected and evaluated based on the image data. If a link hypothesis is accepted, it is inserted into the road network. Heipke et al. (2000) discuss different aspects of image analysis and propose a framework for scene interpretation. The authors argue that the key for a successful image analysis is the reduction of complexity by employing global and local context. A strategy for road extraction in CIR imagery is demonstrated which utilizes a multiscale approach consisting of an enhanced multispectral classification at a coarse scale and object extraction with a fine level of image resolution. This method is further enhanced by explicitly modelling of local context knowledge which is given by trees, especially by rows of trees occluding roads, therefore provides more complete results for road network extraction. A road extraction system in urban areas is presented in Hinz et al. (2001). The system exploits global and local context in coarse and fine scale of images as well as additional data source such as DSM data to support road finding and road extraction. After the urban regions are determined by a texture based image classification at coarse scale, the building outlines and DSM valleys are extracted in urban areas from DSM data. The interest regions are found by fusion of DSM valleys with the detected road markings, image edges as well as dark ribbons extracted from a low resolution gradient images. A grouping process is then started to iteratively connect consecutive markings and construct lane segments from par- allel marking groups. The resulting lane segments are validated by checking their interior for gray value homogeneity. Since cars on road disturb the homogeneity and cause gaps, a car detection procedure is developed. The verified lane segments are connected to construct lanes, and the parallel and collinear lanes are aggregated to set up road segments. Finally, the road segments are linked and the road junctions are constructed to form a road network. 2.2 Automatic Methods 13

Doucette et al. (2001) demonstrate a self-organising road map algorithm for road extraction from high resolution multispectral imagery. The method is essentially a spatial clustering technique adapted to identify and link elongated regions. It takes a classified high resolution multispectral image as input. The spatial clusters are identified by a K-medians cluster algorithm. A minimum spanning tree based technique is then used to link convergent spa- tial cluster locations to derive road topology. A road extraction approach from IKONOS imagery is presented in Dial et al. (2001). The method uses the separate types of information present in the multispectral and panchro- matic imagery of IKONOS data. Image classification is firstly applied on multispectral imagery to generate a mask which eliminates non-road-like pixels, then at each unmasked pixel within the panchromatic image, a set of rectangles whose principle axes lie at different angles from the horizontal are defined using the road width. The variances in rectangles are computed, and the local minima are searched for. Thus, each pixel is then attributed with the directional minima and the number of the minima. Adjacent pixels are clustered to form regions based on pixel attributes. The regions are merged using a set of rules to extract roads and road junctions. Several methods are dedicated to extract road junctions from images. Barsi et al. (2002) introduce a method to detect road junctions in orthoimages. The junction model is created using a neural network with some junction samples collected in images. Several parameters describing junctions are thus obtained, including grey value statistics and edge information in the junction areas. The neural network is then applied with the junction model to detect other road junctions in images. In Wiedemann (2002), a method is developed to improve road junction extraction of the road extraction results. He observes that the approach developed in Wiedemann and Hinz (1999) delivers sometimes cycles instead of crossing points. Firstly, small cycles are selected, and each cycle is replaced by a new node. Then, the road segments in the vicinity of a junction are combined and described by parameterized cubic curves, which are further evaluated using a linear fuzzy function. The crossing corresponding to the combination with the largest evaluation value is taken as the solution of the junction.

Authors Summary Bajcsy and Tavakole (1976) Road Linking and noise removal after thresholding Nagao and Matsuyama Road regions finding by spectral based segmentation, (1980) shape analysis, followed by region linking Fischler et al. (1981) Edge detection followed by minimum cost path determi- nation Zhu and Yeh (1986) Road seeds finding followed by road growing and gap fill- ing

Table 2-2. Summary of automatic road extraction methods. 14 2. REVIEW OF PREVIOUS WORK ON ROAD EXTRACTION

Authors Summary Canning et al. (1987) Road pixel detection and linking using a set of masks Ton et al. (1989) Line detection, road growing, rule-based segment labeling and merging Matsuyama and Hwang Image segmentation followed by road regions extending (1990) and linking Wang et al. (1992) Road tracking by gradient direction profile analysis plus thinning and noise removal Zlotnick and Carnine Detecting road center hypotheses between antiparallel (1993) intensity edges and grouping road center hypotheses to produce continuous, smooth road seeds Heipke et al. (1995) Multi-resolution approach. Line and edge extraction fol- lowed by rule-based grouping Gong and Wang (1996) Classification followed by thinning and noise removal Ruskone (1996) Salient road tracking by homogeneity after seed detection. Road network construction by geometry and topology Steger (1996) Line detection by scale analysis Baumgartner et al. (1997) Generating road segments by combination of results from line and edge extraction. Roads are constructed from road segments by grouping Mayer et al. (1997) Road extraction by ribbon Snakes using line extraction as input, and gap closed with a Ziplock Snakes Trinder et al. (1997) Road extraction through anti-parallel pair generation and grouping based on rules Barzohar and Cooper Road finding by geometric-stochastic model and extension (1997) through combined multi-hypothesis generalized Kalman filter Wiedemann et al. (1998) Find best path after line extraction Frischler and Heller (1998) Road points detection and linking in low resolution, fol- lowed by ribbon Snakes refinement in high resolution Bueckner (1998) Edge extraction, grouping, and edge segment linking Wiedemann (1999) Completion of extracted road network by hypothesizing and evaluating links between free ends Heipke et al. (2000) Road extraction by line extraction and grouping, gap fill- ing supported by row of trees Hinz et al. (2001) Find interest regions from DSM valley, exploit local con- text such as edges, road markings, cars to find homoge- neous regions to form lanes and road segments Doucette et al. (2001) Find and link spatial clusters from spectrally classified high resolution multispectral imagery

Table 2-2. Summary of automatic road extraction methods. 2.3 Semi-automatic Methods 15

Authors Summary Dial et al. (2001) Eliminate non-road-like pixels by multispectral classifica- tion, classify and merge road pixels based on directional spectral variances Barsi et al. (2002) Neural network for road junction detection Wiedemann (2002) Improve road junction extraction of the road extraction results by geometric modelling

Table 2-2. Summary of automatic road extraction methods. 2.3 Semi-automatic Methods

Quam (1978) presents a method for road tracking from high resolution imagery. After a start point, initial direction and road width are supplied by an operator, the road intensity profile model is generated, and parabolic extrapolation is used to predict the future road tra- jectory. The road point is then determined by cross-correlation of the intensity profile at the predicted position with the model. If the correlation peak is poor, the pixel is marked as a potential anomaly, and the method guess ahead another step by parabolic extrapolation. In case of a large number of anomaly points, a road surface change is assumed, and a new pro- file model is created. The road tracking is then repeated with the new model. McKeown and Denlinger (1988) develop an automatic road follower (ARF) which uses multiple cooperative methods for road detection and extraction. This system allows coop- eration among low level processes and aggregation of information by high level analysis. The two low level processes are profile correlation tracker and edge following. Given a road starting point, initial direction and width, each low level method works independently to establish a model of road centerline, road width and other local properties. The intermediate level processes are composed of several modules to detect and report road features such as overpasses, width changes, junctions, surface material changes, vehicles, occlusions etc. They also monitor the state of low level methods and make evaluations concerning the suc- cess of each method. Fua and Leclerc (1990) apply the smooth curve model to find road boundaries in aerial images with Snakes. Road is modelled as ribbon whose smoothly curved edges are parallel. A ribbon is implemented as a polygonal curve. The photometric energy is defined as the sum of the edge strengths along the two boundary curves. The deformation energy is derived from the smoothness and constant width of road. The constant width of road is expressed as the width difference at neighbour vertices, which should be minimized. Heipke et al. (1994) propose a semi-automatic road extraction system from aerial images. An operator provides a starting point and an initial direction, the gradient in a small image window is computed and edges with single pixel width are extracted by thresholding and skeletonization. The road is then extracted by an edge following algorithm using the prede- fined direction matrices. In case of gaps, a large search matrices are used. The extracted 16 2. REVIEW OF PREVIOUS WORK ON ROAD EXTRACTION road edge pixels are represented by a polygon through a raster to vector conversion algo- rithm. In Airault and Jamet (1994), the identification of an image area as a road object is deter- mined by local optimization of a cost function which takes mainly into account the local homogeneity and the anisotropic aspect of the road homogeneity. After manually input of one point and one direction as initialization, the paths with varying length and direction are hypothesized and evaluated using a criterion of homogeneity. The road axis is determined using both detected edges and an a prior geometric model of the road. Grün and Li (1995, 1997a, 1997b) develop two semi-automatic systems to extract roads from satellite and aerial images based on either dynamic programming or LSB-Snakes. Ini- tial seed points are given manually at characteristic road positions. For dynamic program- ming, the image is first processed by a wavelet transformation. The generic road model including photometric and geometric properties of a road is embedded into a dynamic pro- gramming algorithm to maximize a merit function. In LSB-Snakes, the method of active contour models is formulated in a least squares approach and least squares template match- ing (Grün and Baltsavias, 1985) is extended by using a deformable model. It can also simul- taneously use any number of images (two or more) to provide 3-D solution. With more images involved in road extraction, the occlusion cases can be often handled since the occluded road parts might be visible in the other images. Furthermore, through the least squares approach, the precision and reliability can also be assessed. In Vosselman and de Knecht (1995), using an initial starting point and direction plus road width, road positions are computed by least squares matching of the gray value profiles and a road surface model (represented in gray values). The results of matching are used by a Kalman filter to update the parameters of the path model that describes the positions and shapes of the roads. Both techniques have a solid background of statistics; decisions about the usefulness of results can be based on statistical tests. However, problems can occur due to the fact that the assumption that a road has a constant curvature is not correct. Color based energy modelling for road extraction is presented in Zafiropoulos and Schenk (1998). Color energy is embedded into active contour models in two ways: one transforms the color image content into color intensity and color contrast; the other one transforms the color content based on the fused gradient field thus giving intuition for abrupt changes of the image function together with possible detection and localization of boundaries. How- ever, due to the absence of geometric constraints, the performance of the algorithm still needs to be improved. The approach proposed by Price (1999) deals with the extraction of suburban regular street grids. The road network is modelled as a combination of grids with rather regular size. Road junctions are connected via individual road segments of approximately constant width and height. An operator initializes the grid spacing and orientation by manually selecting three points which are the center of three junctions. The road segments are then refined and evaluated by simultaneously matching their sides to the image edges. This proc- ess propagates the grid across the entire scene. During the final verification, height infor- 2.3 Semi-automatic Methods 17 mation obtained from stereo matching and contextual knowledge are used to adjust the positions of consecutive road segments and remove short road portions. Trinder et al. (2000) extract road from aerial images by integrating the principles of the active contour model and simulated annealing. After the approximation points along the road are supplied by an operator, the initial position is represented as B-Spline. The search- ing capacity of simulated annealing is then exploited for evolving a minimum energy state in a rectangular window of the B-Spline for extracting roads. Hu et al. (2000) extract roads based on template matching and neural network after a set of seed points are provided by an operator. The road profiles are then matched with a prede- fined template by cross correlation. The points with local maximum are then fitted to a quadratic curve in a local neighbourhood defined by the road width in order to eliminate non-road pixels. Finally, the road pixels are linked via a neural network by global optimi- zation that takes radiometric and geometric constraints into account. In Zhao et al. (2002), roads are extracted from IKONOS imagery semi-automatically. Firstly, the image is classified using a commercial remote sensing software to generate a road mask image which excludes non-road pixels. Edges are then extracted and traced to form straight lines. The long lines with slow direction change are taken as road seeds. After a start point is supplied by an operator, the next road point is determined by matching a tem- plate with the road mask image and the road seeds. If the matching is poor, the operator is required to assign a control point, and road extraction is resumed by taking the control point as a new start point. A prototype system for semi-automatic road extraction is presented in Baumgartner et al. (2002). The tracking algorithm is in the style of the approach developed in Vosselman and Knecht (1995). An operator measures the first segment of a road axis by providing two points, the road is then tracked by profile matching. In case of multiple poor matches, the system stops tracking, reports reason for stop to the operator, and offer choice of appropri- ate interaction.

Authors Summary Quam (1978) Road tracking by profile matching together with anomaly detection to support dynamically changing road model McKeown and Denlinger Tracking road by profile analysis together with looking for (1988) parallel edges Fua and Leclerc (1990) Ribbon Snakes with smooth curve model Heipke et al. (1994) Edge extraction and following Airault and Jamet (1994) Tracking with homogeneous surface Vosselman and Knecht Tracking using Kalman filter and gray value profile analy- (1995) sis

Table 2-3. Summary of semi-automatic road extraction methods. 18 2. REVIEW OF PREVIOUS WORK ON ROAD EXTRACTION

Authors Summary Grün and Li (1995, 1997a, 1. Road sharpening followed by model driven linear fea- 1997b) ture extraction based on dynamic programming 2. LSB-Snakes by combination of least squares template matching and B-Spline Snakes Zafiropoulos and Schenk Snakes with color images (1998) Price (1999) Matching road segment initialized by operator to image edges in regular street grid scene Trinder et al. (2000) Snakes with energy evolving by simulated annealing Hu et al. (2000) Profile template matching and neural network optimiza- tion Zhao et al. (2002) Image classification, edge extraction together with tem- plate matching Baumgartner et al. (2002) Tracking by road profile matching

Table 2-3. Summary of semi-automatic road extraction methods. 2.4 Automatic Methods using Maps van Cleynenbreugel et al. (1990) use an existing road database as a logical framework to search for new roads. The relationship between the extracted line pieces is established based on the framework. Apart from a road map, also a DTM is used to verify constraints on maximum allowed slope for connectable line segments. Stilla and Hajdu (1994) present a map-guided approach for road detection from high reso- lution images. The system describes map contents as a so-called image description graph, which is then input for knowledge-based image analysis. Based on the knowledge described in this graph, expectations are defined for attribute values of objects in the image. The knowledge from a map in general does not influence the result of image analysis for the generation of the target objects, but rather the processing sequence in which objects are searched dependent of their presence in the map. They state that map knowledge signifi- cantly reduces the processing time. If the map is outdated and needs to be updated, process- ing time is probably not reduced since the expectations for changes are not defined. Plietker (1994) uses georeferenced orthophotos to update an existing road database. The existing roads are used to restrict search regions and provide certain parameters, like road class, road width, for the road extraction process. The edges are extracted in the vicinity of the roads, and the parallel edges with more or less homogenous regions between are found. They are then grouped based on collinearity, and are considered as road segments. The author also points out that this simple method can only handle some roads, more sophisti- cated algorithms have to be developed in order to achieve good results. Vosselman and de Gunst (1997) update the road network using an old road database. The outline of the road in the database is compared to cross profiles taken from the image at the 2.4 Automatic Methods using Maps 19 corresponding position. From this analysis the changed part of a road is detected and a hypothesis for a new link road is generated, and therefore a new junction is located from which the new road is tracked. They develop generalized and specialized models for road recognition. Both models use the cross-correlation coefficient to detect changed road parts and new link roads. The specialized model uses more information such as road width from a manual with standards for road design. But the models do not take the occlusions, such as from trees, cars etc., into account, thus sometimes leading to false detection results. The authors state that more knowledge is needed in order to improve the system. Bordes et al. (1997) extract roads from aerial images using a cartographic road database. The existing database is employed to build a semantic image-independent road model. Therefore, different classes of roads are defined according to their attributes and each class is associated to the most consistent road extraction model. Possible parameters of the model, such as width of roads are tuned according to the road attributes in the existing road database. The road extraction algorithms adapted are: a road following by profile analysis developed by McKeown and Denlinger (1988); and a road following by homogeneity cri- teria developed by Airault and Jamet (1994). The processing starts from the easy road seg- ments, and propagated thereafter. The “easiness” of the road segments is deduced from the road characteristics, locations and the road aspects in the image. This easiness level takes into account the a prior reliability of the road segments in the existing road database and image tokens computed from the end points and midpoint of the segment. The prediction of the a prior reliability is based on the cartographic generalization rules which take into account the significance of the road. The image tokens are computed based either on the radiometric homogeneity or on the radiometric profile characteristics. Klang (1998) proposes a method for the detection of changes between an existing road database and a newly registered satellite image, rectified to an orthophoto. The existing road centerlines in vector format are converted to a raster image, and a road is represented as a chain of pixels with start and end nodes. The satellite image is pre-processed using the method proposed by Steger (1996) to enhance image lines including roads. For each road its start and end nodes are matched to the image data either by least square matching or cross correlation. Based on this matching, the pixels of the road between the start and end nodes are shifted accordingly. The road location is then extracted by Ziplock Snakes from the enhanced image. The new roads which are not in the existing road database are extracted by a simple tracking algorithm after seed point detection. Fiset et al. (1998) conduct automatic map revision by a map-image matching technique. The matching is performed using a multi-layer perceptron (MLP) trained to recognize road segments in SPOT-HRV panchromatic images corresponding to road database being treated. Two template matching methods using the trained MLP weight matrix are devel- oped, one for road segments and one for road junctions. The first method leaves some loca- tions of straight segments not examined in the image due to the limited search directions. The second method achieves good results in matching road junctions but gives poor results while retracing the road segments in the image according to the shift values found for the 20 2. REVIEW OF PREVIOUS WORK ON ROAD EXTRACTION junctions at both ends. The authors then combine the products obtained from the segment matching and junction matching to improve the map-image matching performance. Fortier et al. (2001) use a method similar to Klang (1998) for the correction and updating of road map data from georeferenced aerial images. They also use Snakes to correct the position of the existing road network location. The initialization of the Snakes is taken from the existing road. In order to bring the initialization closer to the imaged road, they first detect road junctions in image using the line detection method developed by Steger (1996), and then match the ends of the existing roads to the detected junctions, and a re-localization is conducted based on the match. From the detected image junctions that are in the existing road network, new roads are also searched by a line following algorithm. Agouris et al. (2001) develop so-called differential Snakes for change detection in road seg- ments. This Snakes is an extension of the standard Snakes by introducing an additional energy which describes the discrepancy between the current Snakes solution and the pre- existing road shape information. In addition to the road shape vector, the uncertainty meas- ures for the nodes of the road vector are assumed to be available. Change is detected if and only if the new image supports the notion that the object has moved beyond the stochastic range of the pre-existing information. If the image content is only weakly suggesting a very small move well within the stochastic limits of the pre-existing information, change will not be detected. Instead, the standard Snakes will be started to try to improve the accuracy of the road segments. Willrich (2002) presents a system to update road data by comparing the road vector from a GIS database to orthophotos. The GIS data provides road properties, e.g. road geometry as well as road attributes like road type (high way, single/multi track, road, and path), road width, road materials (asphalt, concret). Also the scene is classified into several classes as rural, urban and forest. She uses the method developed in Wiedemann et al. (1998) to extract roads. The road properties are adapted to the road extraction algorithm for parame- ter setting. Since the used road extraction method is developed for low resolution imagery, the original high resolution orthophotos are resampled to 1.7m pixel size. After the roads that exist in the GIS are extracted, new roads are also treated by taking the extracted roads as reliable road parts.

Authors Summary van Cleynenbreugel et al. Search new roads using existing road data. Verify (1990) extracted road using DTM to constrain maximum allowed slope Stilla and Hajdu (1994) Infer road presence using knowledge base created from map Plietker (1994) Extracting and grouping edges in the vicinity of existing roads

Table 2-4. Summary of approaches using maps. 2.5 Summary of Existing Methods 21

Authors Summary Vosselman and de Gunst Update road database by junction detection and road (1997) tracking Bordes et al. (1997) Updating roads by profile analysis and homogeneity crite- ria using roads in road database as approximation Klang (1998) Matching digital road map with image followed by Snakes Fiset et al. (1998) Map revision via the combination of road junction and road segment matching through a multi-layer perceptron Fortier et al. (2001) Matching road segments and junctions in road database with image followed by Snakes Agouris et al. (2001) Differential Snakes for change detection using map infor- mation as approximation Willrich (2002) Updating road data using the method introduced in Wiede- mann et al. (1998) with the road properties provided by a GIS data for parameter setting

Table 2-4. Summary of approaches using maps. 2.5 Summary of Existing Methods

Existing approaches show individually that the use of road models and varying strategies for different types of scenes are promising. In low resolution images, roads are usually extracted as line features, while in high resolution images, the road is treated as ribbon, thus roads are usually extracted using edges. Most methods are developed to extract roads in rural areas, few of them can also handle roads in suburban and/or urban areas. Early approaches are mostly devoted to develop low level processing algorithms to extract road- like structures, and extract roads based on simple road models. Because of the absence of an exact road model, roads, whose appearances are not included in the used road model, will not be found. Recent researches raise more directly the problem of modelling and define more complex road models which take into account knowledge about geometry, radiometry, topology and context, and develop more robust methods to handle complex cases. Multiple images can be employed to account for occlusions (Grün and Li, 1997b). Data from different sources is often useful, e.g., DSM information helps to remove false road hypotheses (Price, 1999; Hinz et al., 2001). Context information has proven to be very important (Baumgartner et al., 1997; Heipke et al., 2000). Results also show that the com- bination of several road extractors performs better than either extractor alone (McKeown and Denlinger, 1988). Automatic approaches usually start from road detection. The prevalent road properties used in local tests are the spectrum of road materials, intensity edges at road sides, and the road surface intensity profiles. Usually, the whole image is in the same state of processing. Since so far there is no automated method that is perfect in the sense that all roads are detected, commission and omission errors usually cannot be avoided. An interesting approach in automatic road extraction is multi-resolution analysis. But the resolution factor must be 22 2. REVIEW OF PREVIOUS WORK ON ROAD EXTRACTION carefully chosen in order not to reject in the first step roads which will not be easy to rec- ognized. Semi-automatic approaches present a real interest for map production since full automation is not realistic in the near future. However, the optimization of interaction between operator and computer is a crucial task. This has never been addressed so far. Some semi-automatic approaches require substantial human interactions. Several applications use maps as knowledge source. The map information is used in various ways. van Cleynenbreugel et al. (1990) use map to validate the detected roads, while in Vos- selman and de Gunst (1997), the information in maps is used to search for new roads. In Klang (1998), Fiset et al. (1998) and Fortier et al. (2001), map is updated by map-image matching. In Bordes et al. (1997) and Agouris et al. (2001), the map information has been used as the approximation to start tracking or optimization process by Snakes. These meth- ods did not make full use of the existing information. An existing road database, even out- dated, not only provides approximations, but usually also other information, such as road attributes, global context, etc. which allows to apply different algorithms using various fea- tures to treat different types of roads. In addition, other geodata such as DSM, DTM data may be also incorporated to provide complimentary and also redundant information to sup- port the extraction process and improve results. 3. DATA CONSTRAINTS AND GENERAL STRATEGY

In this chapter, the strategy for road extraction from aerial imagery will be presented. The concept for the designed strategy will be discussed, which is also briefly described in Zhang and Baltsavias (2000). The goal of this thesis is to build an automatic and robust system to extract road networks from aerial imagery by integrating the existing geodatabase and other data sources. In order to increase the success rate and the reliability of the results the system will contain a set of processing tools for feature and cue extraction, and make use of avail- able information as much as possible. In the following, the input data sources are firstly described. Based on the available data, the strategy for 3-D road extraction is outlined.

3.1 Input Data Description

Stereo aerial images, existing geodatabases as well as other data such as DTM and derived DSM are the input for our road reconstruction system. In this section, the characteristics of the image data and the contents of the geodatabases as well as the DTM and DSM data are described. The image data is from ATOMI project provided by the Swiss Federal Office of Topogra- phy. Table 3-1 summarizes the image specifications.

Number of color channels 3, RGB Scan resolution 14µm (Zeiss SCAI) Camera focal length ~300mm Flying height ~5400m Image scale ~1:15,800 Ground resolution ~0.23m Forward/side overlap 60%/25% Orientations Known

Table 3-1. Image specifications of ATOMI dataset.

The exterior orientation parameters are from the results of bundle adjustment. The MATCH-T and IGP software PVD are used to perform the interior orientation of the digi- tized images. The existing geodatabase is available in digital format. In Switzerland, this database is called VEC25 and in Arc/Info coverage format containing road data as well as other geo- graphic objects. VEC25 is generated by digitization of the 1:25,000 topographic maps 24 3. DATA CONSTRAINTS AND GENERAL STRATEGY using a semi-automatic procedure. The RMS error of the VEC25 is around 2.5-7.5m and the maximum one ca. 12.5m (based on empirical values), including generalisation. The VEC25 roads are topologically correct, but due to their partly automated extraction, some errors might exist. Transportation objects in VEC25 include: roads, railways, bridges, tun- nels etc. In this thesis only roads are treated. In VEC25 roads are categorized into 8 classes, namely, highway, 1_class, 2_class,..., 6_class and Q_class. The classification corresponds to the legend of the national map 1:25,000, and is based on their importance in transporta- tion network. Generally, the geometrical properties of roads, such as width, curvature etc., are dependent on the importance of roads. However, there is no explicit values of such attributes for each class of roads (The detailed description of the VEC25 road attributes can be found in Appendix A). The investigation with the image data shows the following facts: • Highways usually have several lanes, and are much wider than other class roads.

• Roads of first and second classes are also well constructed for transportation. Generally they have constant width, and their widths are above ca. 6m and 4m respectively.

• The widths of other class roads vary from ca. 2m to ca. 10m. For instance, a Q_class roads can be as wide as 9m.

In VEC25, the landcover map is also included, where residential areas and forests are dis- tinguished. Note that VEC25 does not distinguish village, suburban and urban areas. How- ever, they probably can be separated by the size of the residential areas. Roads of different classes can exist in various landcover. But generally roads of Q_class are in residential areas, while roads of 3, 4, 5, 6_classes are more dominant in rural and forest areas. Other input data includes a national DTM with 25m grid spacing and an accuracy of 2.5m in lowlands and 10m in the Alps. In addition, DSM data is generated from the stereo images using MATCH-T of Inpho with 1-2m grid spacing. Figure 3-1shows some road images with the VEC25 roads overlaid as yellow lines. It can be seen that the VEC25 roads stay actually in the vicinity of the image roads. The general shapes of the VEC25 roads are correct, but problems can occur, especially at junctions and squares. It is also noted that the VEC25 roads may pass on top of buildings, trees etc. As far as road reconstruction concerned, the VEC25 roads provide the following information: • existence of image roads

• image roads are close to the VEC25 roads

• image roads have similar shape to the VEC25 roads 3.1 Input Data Description 25

a b c

d e f

g h i

j k l

Figure 3-1. Road images and the VEC25 roads. See text for explanations. 26 3. DATA CONSTRAINTS AND GENERAL STRATEGY

In images of such resolution, roads are not projected as lines as in low resolution images. Objects on roads, such as cars, road marks and zebra crossings, are also visible. The appear- ances of roads in images strongly depend on the context. Roads in urban and suburban areas have quite different appearance from the roads in forest and open rural areas. The differ- ences in appearance are partly consequences of different relations between roads and the neighbouring objects. From Figure 3-1, we observe: • In open rural areas, the road sides are generally clearly presented. Roads usually appear as ribbons bounded by two sides (a). There are few buildings in rural areas, they cast shadows on roads and occlude roads if they stand close to roads (b). Another occlusion source is from trees (c). Some roads are dirty (d), while some are actually paths (e). Paths are generally categorized into 5_class or 6_class roads. Usually, a strip of vegetation is observed at the center of a path.

• In forest areas, roads are usually invisible from aerial images (f). When roads are along forest borders, only part of the roads can be seen.

• Compared to open rural areas, more trees and buildings appear in suburban areas (g). Roads appear as ribbons, but the road sides are getting more broken. Only part of the road sides are well defined.

• In urban areas, the density of roads and buildings increases compared to other landcovers (h). Roads are usually occluded very much by buildings along the roads. More cars are found on roads. In most cases, the road sides are not well defined.

• All highways and 1_class roads have road marks, zebra crossings, as well as other traffic signs (i and j). These objects can also be found in most of 2_class and some of Q_class roads (k). Usually they are not present in other class roads.

• The spectral properties of some roads may vary (l).

We also tested our road extraction system on a Belgium dataset provided by the National Geographic Institute (NGI). The dataset also includes stereo imagery, existing geodatabase and DTM. The image specifications are listed in Table 3-2. Both exterior and interior ori- entation parameters are provided by NGI. The images are in black and white, and have quite poor radiometric quality. The pixel footprint (ca. 30cm) is larger than that in the ATOMI data, the occlusions larger due to the 15cm focal length, and no colour is available. The DTM data in the test regions is provided. It has an interval of 40m, the RMS error is around 10m. No DSM is available. The existing geodatabase is also provided in Arc/Info coverage, including road database and landcover information. The RMS error of the road database is around 9m, with the maximum around 25m. The land cover map does not allow distinguishing village – villages are included in open rural area. The contents of the road database are slightly different from the ATOMI data (e.g. different definition of road classes), thus some modifications have been made in our software to derive and read the necessary data for input. The road data- 3.2 3-D Road Extraction Strategy 27 base contains the following attributes: Road class, Road width, Number oflanes , Road type. The attributes Road width and Number oflanes are related to the road class, while Road type is an attribute indicating whether the road is dirt or asphalted. However, in many cases there are significant derivations from these parameters, and they have to be relaxed for the test. Also the road type given is not always correct.

Number of channels 1 Scanner and scan resolution PS1, 15µm Camera focal length ~150mm Image scale ~1:21,000 Ground resolution ~0.31m Forward overlap 60% Orientations Known

Table 3-2. Image specifications of the Belgium dataset.

3.2 3-D Road Extraction Strategy

Figure 3-1indicates the difficulties to extract roads from aerial imagery. Roads have vari- ous forms of representation in the image, depending on their own characteristics and their context. It has been noted in Chapter 2 that many road models for road extraction have been developed. These models are linked to the characteristics of the imagery (resolution in par- ticular) and to the characteristics of roads. Most often the road model chosen would not be suitable for another type of road or in another type of landscape. Roads which fit the models are extracted. If the road, as it appears in the image, does not correspond to the model, the road extraction systems may yield many omission and false detection. Thus, the defined road models should be generic enough to reach a good rate of completeness but precise enough to avoid false detections. In manual extraction, the modelling is based on the oper- ator experience and involves a high level semantic knowledge. It appears that automatic processes also require fine semantic knowledge about topographic objects. One of the sim- plest solution to build a semantic model is to use external knowledge provided by, e.g. an existing map or a geodatabase. As can be seen from the literature review in the previous chapter, due to the complexity of the aerial images, many road extraction algorithms become inefficient. We take two approaches to overcome this problem. On one hand, we design more robust image analysis algorithms by making use of various kind of knowledge. In particular, we consider the inte- gration of existing geodatabase information with colour image data. The information in existing database provides a model of the scene which is most often imprecise and uncer- tain, and out of date. Colour images give the current situation of the scene, but are very complex to analyse without the aid of other auxiliary information. Therefore, the informa- tion provided by the existing geographic database can help in the understanding of the 28 3. DATA CONSTRAINTS AND GENERAL STRATEGY scene, while the images provide real data useful for improving the old road database and updating it. On the other hand, we enlarge the set of image processing algorithms. Thus, for the detection of each specific road, we have a set of algorithms to be selected and used. Therefore, various cues to support road existence and road reconstruction are extracted from images. They are linear features and region features, which can be in 2-D or 3-D. Our strategy for automatic extraction of a 3-D road network from aerial images is therefore to integrate knowledge processing of colour image data and existing digital geodatabases. The system strongly relies on the following three aspects: - Use and fusion of multiple cues about the object existence and of existing information sources. The basic cues used are edges, DSM blobs, colour, shadows, while some second- ary ones, like signalisation strips, context between buildings and roads etc. will also be used. All cues have associated relevant attributes. - Use of existing knowledge, "rules" and models to restrict the search space, treat each object subclass differently, check the plausibility of multiple possible hypotheses, and derive reliability criteria. Using the known object class, e.g. highway, first class road etc., different possible value ranges for the attributes of these objects and certain rules can be used, e.g. road width, horizontal and vertical curvature of roads, signalisation, no road can cross a highway at the same level, etc. The knowledge database is automatically updated and refined using information gained from image analysis. The road includes geometric, radiometric, topological and contextual attributes. - Object-oriented approach in multiple object layers. A first, hierarchical, object layer can divide an object class, e.g. transportation network, to various subclasses, e.g. road classes, railway lines, bridges, pathways etc. A second object layer divides the objects in subclasses according to landcover and terrain relief, since these factors influence considerably the abil- ity of object reconstruction. E.g. roads are divided in: inside forest, at forest borders, open rural areas, urban areas (and possibly city centers). The terrain relief can be used in deter- mining the plausibility of some object parameters like horizontal and vertical curvature etc. Each subclass of the above two layers will have appropriate attributes, a respective knowl- edge/rule database, and processing methods. The processing will proceed from the easiest subclasses to the most difficult ones. Our 3-D road extraction system makes full use of available information about the scene and contains a set of image analysis tools. The management of different information and the selection of image analysis tools are controlled by a knowledge-based system. The general strategy is shown in Figure 3-2. 3.2 3-D Road Extraction Strategy 29

Controller

Cue Extraction Road Database Road and Stereo 2-D: Geodatabase Reconstruction Knowledge Base from L+T color - Straight edges aerial - Road regions - Road class images - Shadows Cue - Road type - Road marks combination - Road marks - Land cover

3-D: - Geometry - Straight edges 2-D & 3-D with 3-D info - Road marks interaction - Width Road design - Length rules,other DSM, DTM - Horizontal & knowledge analysis vertical curvature Results and - Topology accuracy estimation

Road network generation

Figure 3-2. General strategy for road network extraction. The initial knowledge base is established by the information extracted from the VEC25 data and road design rules. This will include information for each individual road as well as for the topology of the whole road network as listed in Figure 3-3, where (a) is for each road, while (b) is for the road junctions of the VEC25 road network. The Z values of the coordinates in (a) are from DTM or DSM data. (a) and (b) are linked via road_id. The item connect_type in (b) indicates that a road is coming to or leaving from the road junction. The detailed procedures to derive information from the VEC25 data can be found in Appendix B.

road_id coordinates class landcover junction_id road_id connect_type x,y,z 1 1 1 1 ... 2_class rural x,y,z 2 1 2 ... 3_class residential x,y,z 3 -1 3 ... 5_class rural ...... (a) (b)

Figure 3-3. Derived information from existing geodatabase. The initial knowledge thus offers a geometric, topological and contextual description of road network in the scene. This information is formed in object-oriented multiple object 30 3. DATA CONSTRAINTS AND GENERAL STRATEGY layers, i.e. roads are divided into various subclasses according to road type, land cover and terrain relief. It provides the global information of the road network topology, and the local description for a road. Therefore, we avoid developing a general road model; instead a spe- cific model can be derived for each road segment. This model provides the approximate location of a road in the scene, as well as the road attributes, such as road class, land cover, presence of road marks, and possible geometry (width, length, horizontal and vertical cur- vature, and so on). The knowledge base is automatically updated and refined using infor- mation gained from image analysis. The road extraction processing then proceeds from the easiest subclasses to the most diffi- cult ones guided by the knowledge base. A road is processed with appropriate methods cor- responding to its model. Certain features and cues are extracted from images, and the road is extracted by a proper fusion of multiple cues. Thus, this fusion provides not only com- plementary information, but also redundant one to account for errors and incomplete results. With stereo images, the system makes an early transition from 2-D image space to 3-D object space by image matching. Therefore, the road hypotheses can be generated directly in 3-D object space, right from the beginning. This will not only enable us to apply more realistic geometric criteria to create hypotheses, but also largely reduce the search space, and speed up the process. Since neither 2-D nor 3-D procedures alone are sufficient to solve the problem of road extraction, we propose to extract the road network with the mutual interaction of 2-D and 3-D procedures. In summary, the main steps of road extrac- tion are: building up of the knowledge base for each VEC25 road, finding 3-D straight edges in the vicinity of the VEC25 road, multispectral classification of images, separation of above-ground objects and ground objects using the DSM and DTM, extraction of other features and cues according to the road class at hand, combination of various cues guided by the knowledge base to find plausible groups of road primitives and link them to extract the road, update the road database and refine the knowledge base with the extracted results. The algorithms and features in our 3-D road extraction system are outlined in Figure 3-4. Except the procedures for the derivation of information from VEC25, all other functions from feature and cue extraction to road reconstruction are implemented in a standalone software package running on SGI platforms. Figure 3-5 shows the user interface, through which the system reads input data, and displays stereo imagery, old road vectors, interme- diate processing results and reconstructed roads. The extracted road centerlines as well as the computed road attributes including lengths, widths are saved in 3-D Arc/Info Sahpefile format that is readily imported into existing GIS software. The software can run in batch and interactive modes. In latter case, an operator can select a road, and perform road recon- struction step by step which allows to check the quality of the intermediate results. This interactive operation also forms the basis for post-editing. 3.2 3-D Road Extraction Strategy 31

Zebra crossings Stereo color Feature 2-D straight edges aerial images Extraction 2-D road marks Image 3-D straight edges Matching 3-D road marks Road regions Image Shadows Classification Vegetation Buildings nDSM Ground and DSM/DTM Generation above-ground areas

Geodatabase Road geometry Information Road attributes (roads, coarse Derivation landcover classes) Road topology Landcover

Figure 3-4. Input data, processing methods, and features and cues used in our road extraction system.

Figure 3-5. User interface of the developed 3-D road extraction system.

4. FEATURE AND CUE EXTRACTION

Image analysis systems for road extraction should have highly developed abilities in image processing and feature extraction to process digital images and measure the various prop- erties of features. In our system, we have to calculate many different features to characterize a road: road side, road surface, road centerline, road marks. The quality of these measure- ments, obtained by image processing and feature extraction, has crucial effects on the higher-level recognition process, thus the early stages of feature extraction must be as accu- rate as possible to obtain good results. In this chapter, several developed algorithms are described. These algorithms are used in our system to extract different features or cues to support road extraction. Finding 3-D edges is a crucial component of our system. The generation of 3-D edges is explained in Section 4.1with the main steps: edge extraction, edge matching and 3-D edge generation. We then analyse the color image data to find the road regions. This is realised by an algo- rithm for image segmentation, where the details are reported in Section 4.2. Since DTM and DSM data are available, we analyse these height information to separate above-ground objects and ground objects to support road extraction in Section 4.3. Road marks and zebra crossings are good indications of the existence of certain roads, such as highways, first class roads and some roads in urban areas. The methods for road mark and zebra crossing detec- tion are presented in Section 4.4.

4.1 3-D Straight Edge Generation

3-D edges have been widely used in digital photogrammetry and computer vision to auto- matically recognise and extract man-made objects from aerial imagery. However, to gener- ate 3-D edges from images is a difficult task. Straight edges have to be extracted from images, and the correspondences of the extracted edges have to be found by stereo match- ing. Although edge extraction and edge matching are two fundamental processes in digital photogrammetry and computer vision, due to the complexity of image data, they are still active research topics. In the following, we first review the previous work on edge extrac- tion and edge segment matching in Section 4.1.1, then we propose our scheme for the gen- eration of 3-D edges in Section 4.1.2. The algorithms for edge extraction and edge matching are described in Section 4.1.3 and Section 4.1.4 respectively. Section 4.1.5 reports the method to compute the 3-D straight edges from the matched 2-D edge segments, in Section 4.1.6 we present the results produced by our system with a discussion.

4.1.1 Review of Related Work

Straight edges are the most popular features in images of man-made objects. Since they often correspond to some important properties of 3-D objects such as object boundaries, 34 4. FEATURE AND CUE EXTRACTION and have simple mathematical representation and meaningful attributes, they have been widely used as the key features in many applications such as man-made object extraction from high resolution aerial images (Airault et al., 1994; Lin et al., 1995; Haala and Hahn, 1995; Bignone et al., 1996; Baumgartner et al., 1997; Fischer et al., 1997), object recogni- tion (Bellaire, 1995; Beis and Lowe, 1997; Lanser et al., 1997) and pose estimation (Lowe, 1992; Kumar and Hanson, 1994; Zhengyou, 1995; Taylor and Krieger, 1995; Lanser, 1997). Although effective feature extraction still remains a non-trial problem, straight edges are generally easier to detect and extract robustly and automatically from images, thus simpli- fying questions of correspondences, reducing computational cost and limiting matching ambiguities. Further, it is easier to determine the position and orientation of a straight edge to subpixel accuracy (Liu and Huang, 1988; Mitiche and Haberlrich, 1989; Schwermann, 1994). Finally, straight edge features can capture more information in images than points, and straight edge-based algorithms use redundant edge points to obtain an increased accu- racy of the measured quantities. For these reasons, a vast amount of literature can be found devoted to straight edge extraction and matching.

4.1.1.1 Edge Extraction

Hough transformation is probably the most widely used approach (Duda and Hart, 1972; Ballard, 1981). In Hough techniques, edge points vote (in transformed parameter space) for all of the straight lines to which they could belong. Line parameters which receive sufficient votes correspond to lines in the image. There are two major difficulties with the Hough method: it does not yield any information as to the location of the line segment being con- sidered. Furthermore, the Hough transform responds equally to a line segment composed of connected points and a set of collinear but non-connected points, as long as they have the same number of points. An alternative approach proposed by Sucker (1977) is based on relaxation techniques. The method utilizes neighbouring edge pixels to weaken or strengthen the probability that the edge pixel belongs to a boundary. Montatanari (1971) proposes a method to detect edge segments based on dynamic pro- gramming to optimize a merit function. He generates a merit figure based on image inten- sity, curve curvature and curve length. The algorithm detects edges in a synthetic image in which some edges are hardly visible due to noise. This method is extended by Ballard and Sklansky (1973), where they use gradient magnitude, gradient direction and closure meas- ure in the evaluation function.They apply the method with directional search to extract cir- cular tumours in chest radiographs. The method developed by Nevatia and Babu (1980), thins and thresholds edge points, and links them to form curves that are then approximated by a set of straight edge segments. Weise and Boldt (1986), however, demonstrate a method which produces straight edge seg- ments by grouping smaller segments together, progressively building long ones. 4.1 3-D Straight Edge Generation 35

Chien and Fu (1990) argue that Ballard and Sklansky’s evaluation function is too specifi- cally designed for one type of application. They develop a more general criterion function which uses both local (e.g. gradient) and global (e.g. curvature) components. They then minimize the criterion function using a modified decision tree search and apply the tech- nique to determine cardiac boundaries in chest X ray images. A widely used edge detector is proposed by Canny (1986). He detects edges by firstly smoothing the image with an isotropic 2-D Gaussian and subsequently extracting the gra- dient in each position by computing finite difference. A non-maximum suppression opera- tion on gradient magnitude is usually applied to yield a thinned representation of the edges. He then groups edge pixels to segments by applying a hysteresis thresholding technique. Burns et al. (1986) extract straight edge segments by grouping pixels into so-called line support regions based on similarity of gradient orientations. Then the structure of the inten- sity surface is used to determine the location and properties of the edge segments. In Boldt (1989), the edge pixels are detected using zero-crossings of the Laplacian. The edge segment is then extracted using a hierarchical grouping algorithm. The grouping proc- ess proceeds iteratively, using measures of collinearity and connectedness. Each iteration results in a set of increasingly longer edge segments. The final segment features can be fil- tered according to length and contrast values supplied by the user. In Nelson (1994), the object boundaries are extracted using a stick-growing method. He uti- lizes both gradient magnitude and direction information, and incorporates explicit linear and end-stop terms. These terms are combined non-linearly to produce an energy landscape in which local minima correspond to linear features that can be represented as line seg- ments.

4.1.1.2 Edge Segment Matching

There is a vast variety of image matching techniques that have been developed over the last few decades. Detailed review of image matching techniques in computer vision can be found in Barnard and Fischler (1982), Faugeras (1993), and Jones (1997). Dhond and Aggarwal (1989) review the topic of structure from stereo from 1981 to 1989. Reviews of image matching in digital photogrammetry can be found in Hannah (1988), Li (1989), Balt- savias (1991) and Heipke (1996). In this section, only the straight line (straight edge seg- ment) matching methods are discussed in more detail. Medioni and Nevatia (1984) develop a system which initially chooses potential line matches based on similarity of orientation and determines the best matches through an iter- ative relaxation strategy designed to enforce the continuity constraint. The intent of their system is to register image pairs, which differ only due to translational motion of the viewer relative to the whole scene. Orientational similarity and continuity are effective constraints for this class of images. The same authors describe a similar technique for matching straight lines in stereo image pairs (Medioni and Nevatia, 1985). Potential matches are limited to those satisfying the epipolar constraint, and exhibiting similar contrast and orientation. The 36 4. FEATURE AND CUE EXTRACTION quality of matches is re-evaluated iteratively, in a manner similar to relaxation, using the continuity constraint. Price (1984) describes a technique for matching closed contours, where the contours are composed of sequentially numbered sets of straight line segments. The lines are matched, utilizing constraints imposed by their neighbouring segments on the contour. In particular, the angle between neighbouring segments in one image must be similar to the angle between the matching segments in the other image, and the line segments which are sequential in one image must match the segments which are sequential in the other image. A straight line matching algorithm relying only upon similarity of a set of descriptive fea- tures of the two lines has been demonstrated in McIntosh and Mutch (1988). The features are computed from the regions in the images from which the lines are extracted. A match function is defined using the difference of features between the two lines. This function determines the strength of similarity between a pair of lines. Pairs which are mutually best matches, based on the match function, are considered corresponding lines. The difficulty arises to tune weights between features in the match function. It is impossible to treat repeated patterns. Another problem occurs when a single line from one image corresponds to two or more lines in the opposite image. This usually occurs when a line in one image is fragmented and appears as two or more shorter segments in the other image. Ayache and Faverjon (1987) propose a line matching method which first selects a small set of left-to-right matches and then attempts to grow this match to include nearby features with heuristics that for neighbouring image features the disparity varies only slowly. This method is implemented as a depth-first tree search associated with a hypothesis-and-test strategy. The method is elegant because it avoids exhaustive search but there is no guarantee that it finds neither the largest nor the best set of available matches. Line segment matching through feature grouping and maximal cliques is presented in Horaud and Skordas (1989). For each line segment in one image, they compute the range of the expected geometric characteristics for its potential assignments in the other image, and obtain a certain number of left-feature-to-right-feature pairs. These pairs constitute the nodes of the correspondence graph. The arcs of the graph (relations of the nodes) are built taking into account both compatibilities and incompatibilities between nodes. This is done by applying a set of rules which are derived from a feature grouping process. They then solve the stereo correspondence problem by finding maximal cliques in the graph by max- imizing a benefit function. The benefit function is defined using the similarity measure between the lines within the assignment. In Sherman and Peleg (1990), the point matching constraints are reformulated and applied to contour segment matching. These constraints are used to select potential matches for each contour segment. Matching decisions are then made according to a local support accu- mulated from the neighbourhood. The strategy is to accept matches in an increment man- ner. Most probable candidates are matched first and are used, with the aid of the reformulated constraints, to introduce restrictions upon other potential matches in their neighbourhood. Neighbourhood support is based on the ordering and disparity gradient 4.1 3-D Straight Edge Generation 37 limit, the latter constraint is quantified so that the neighbouring matches can influence the relative strengths of each other. Nasrabadi (1992) matches curve segments using a coarse-to-fine strategy. At each level, the local feature characteristics of the curve are represented using the generalized Hough trans- formation. Potential matches are found by a similarity measure which is defined by com- paring the Hough accumulator values of the segments in the left and right images. A fuzzy relaxation technique is used to resolve the ambiguities. To bring the curve segments in finer level into correspondence, the pixel disparities from the coarse level is used to guide the search process. Baillard et al. (1992) propose to match straight lines over multiple views (>2). They use the photometric neighbourhood of the line for disambiguation. is used to provide a point to point correspondence on the puratively matched line segments over two views. The similarity of the line’s neighbourhood is then assessed by cross-correlation at the corresponding points. For every purative pair with a high correlation score, they predict the position of the line in the third image using the trifocal tensor (Spetsakis and Aloi- monos, 1990; Sha’ashua, 1994; Hartley, 1995). They then verify the purative matches using geometric and photometric constraints. Bignone (1995) matches straight edge segment in multiple images. He does not perform edge extraction in all images, instead edge segments are only extracted in one image. For each edge segment, he searches for the correspondences in other images by maximizing an edginess measure along the epipolar strip. The matches are then merged and only the con- sistent ones are kept by applying geometric and photometric constraints. Cho (1996) uses straight line matching for automatic orientation. The correspondence problem of straight lines across images is solved through relational matching. The rela- tional matching is achieved by using A* search with heuristic such as unit ordering and a modified forward checking. A method for line matching in two images from a sequence of images is proposed by Chang and Aggarwal (1997). They consider feature matching between two views as a “temporal grouping” process, in addition to the traditional spatial groups established by perceptual grouping in a single image. The perceptual grouping hypotheses are evaluated using a sta- tistical inference paradigm. The matching algorithm uses a relaxation labelling paradigm which includes two cooperative relaxation processes. In the temporal grouping process, spatial groups are used to compute the support for the temporal matching, while in spatial grouping process, temporal groups are used to verify spatial neighbourhood relations. Raymond and Ho (1998) propose a multi-level dynamic programming method for stereo line matching. They first compute a local similarity measure for each line pair by compar- ing line geometric properties. At first level the correspondences for the pairs with high local similarity measure is determined using dynamic programming method. The objective func- tion of dynamic programming at this level is defined to maximize the local similarity meas- ures among the pairs. These matched segments are used to compute a global similarity 38 4. FEATURE AND CUE EXTRACTION measure, and support matching for the remaining line segment pairs in next level, here how- ever the objective function in dynamic programming is redefined to maximize both the local similarity and global similarity measures. A line feature matching technique based on an eigenvector approach is proposed in Sang et al. (2000). They apply a structural compatibility test to reduce the number of possible matches, and generate a set of finite candidate models by combining line segments. They then apply modal analysis in which the Gaussian weighted proximity matrices for reference model and candidate models are constructed to record the relative distance and angle infor- mation between line features for each model. The final matches are determined by compar- ing the modes of the proximity matrices of the two models. This method is non-iterative and not dependent on the initial conditions. Bigand et al. (2000) use the fuzzy integral for line segment matching. The method primarily aims at industrial image analysis. The matches for the line segments are determined by a similarity constraint which is associated with a local matching process, and is computed using a fuzzy integral. By using fuzzy marginal evaluations of the attributes and the fuzzy measures, the distortions introduced by partial occlusion can be handled. Shao (2000) presents a method to match segment features in multiple images. Multiple view epipolar geometry is used to reduce search space and candidates. The edge segments with strong gradient magnitude are processed first, using cross-correlation method to pro- vide a set of initial matches. These initial matches are then filtered through a relaxation pro- cedure, and are subsequently used to predict missing matches caused by occlusions, image noises, or just the improper threshold in feature extraction process. The prediction is real- ized by a local affine transformation which is determined using the initial matches. The newly found matches immediately become supporting evidence for neighbouring feature correspondence. The ambiguities are resolved by the relaxation procedure. Straight lines, together with other image features such as points, regions, are used as prim- itives in structural matching. Wang (1994) develops a structural matching method using lines and points. The best matching of the structural description of the left image to that of the right image is found by a maximum likelihood estimation. He applies this technique to non-metric images as well as Spot and MOMS remote sensed images. Fradkin and Ethrog (1994) first generate feature hierarchy consisting of line segments, par- allel segments, and close polygons. The feature hierarchy is described as an attributed rela- tional graph where features are represented by nodes and links between the nodes correspond to feature relations. A top-down matching algorithm utilizing the maximal clique technique, propagates matching results through the hierarchy levels, employing var- ious hierarchical and topological relationships established between the features. A multi-primitive hierarchical stereo analysis system is demonstrated in Marapane et al. (1994), which consists of three integrated subsystems: region-based analysis module, linear segment-based analysis module, and edge pixel-based analysis module. Stereo analysis is 4.1 3-D Straight Edge Generation 39 performed in multiple stages, incorporating multiple primitives, utilizing a hierarchical control strategy. Results at higher levels of hierarchy are used as guidance at lower levels. Straight line matching are also used to match image data with 2-D or 3-D models for vari- ous applications, such as object recognition and localization. This kind of work are not reviewed here, the references are given in Bhanu, 1984; Faugeras and Hebert, 1986; Grim- son and Lozano-Perez, 1987; Breuel, 1989; Stein and Medioni, 1992; Stein and Medioni, 1992; Christmans et al., 1995; Williams et al., 1997; etc.

4.1.2 Overview of Strategy

In this section, we present our scheme for the generation of 3-D straight edge segment. The strategy is shown in Figure 4-1. The main objective is to derive robust and efficient methods that process raw image data to extract edge segments, find the edge correspondences across images, and transfer them to 3-D object space. The scheme is designed as a bottom-up proc- ess, in which available information and organizational process are introduced at several layers of processing. Input Image Pair

Image Preprocessing

Straight Edge Segment Extraction

Determination of Search Space

Computation of Similarity Measures

Matching Pool: Possible Matches

Structural Matching: Probability Relaxation

3-D Edge Computation & Straight Line Fit: 3-D Straight Edges

Figure 4-1. Scheme for the generation of 3-D straight edge segments. The core of the scheme consists of two main components: edge extraction, edge segment matching. They are two separate processes in the system, each of them is important and possesses particular features. On the other hand, these two stages are related to each other in the sense that the feature extraction and the computation of their attributes must be such that the second stage of the correspondence finding is easy, not sensitive to errors and pre- cise. 40 4. FEATURE AND CUE EXTRACTION

Edge extraction The stereo images are firstly processed by a Wallis filter (Baltsavias, 1991). The filter enhances the edge features in images, thus facilitates the edge extraction process. Furthermore, since the filter is applied to stereo images using same parameters, the naturally occurring brightness and contrast differences are corrected. Edge detection is then conducted in the filtered images using the Canny operator. Straight edge segments are extracted based on the procedures developed by Henricsson (1996). Thus for each straight edge segment, its geometrical attributes, i.e. position, length and orientation are obtained. With color image data, we are also able to compute the photometric and chromatic statistics in the left and right regions of the edge. All this information are used in the successive matching process. Edge matching: After the edge segments are extracted from images, a process is started to find the correspondences for the edge segments. In order to alleviate the computation cost and enhance the robustness, we employ a two-phase scheme. First, we look in a restricted space for candidates for each edge segment, this search spaced is defined using the epipolar geometry constraint, and the available rough DTM or DSM data. The comparison with each candidate is made using edge attributes. A matching score is computed using a weighted combination of various criteria. The candidates as well as the matching scores are stored in the so-called matching pool. The algorithm then exploits edge structure information in the images to achieve a global optimization; therefore not only the individual edge segment, but also the edge structures are matched across images.

4.1.3 Edge Extraction

In our system, we use the Canny edge operator (Canny, 1986) to detect edges. A technique developed in Henricsson (1996) is employed to group the detected edge pixels into a curve which is finally approximated by a series of straight edge segments. This method was chosen because it uses a large amount of information to achieve redundancy, which also accounts for the high quality results, thus providing a more accurate definition of the image features. In addition, the extracted straight edges are attributed by the image information in the regions beside the edge segments (flanking regions). These edge attributes are useful as demonstrated in Henricsson and Baltsavias (1997), where the edge attributes are used to group the edge segments that belong to the same buildings. After the edge map is obtained by applying the Canny edge operator, a threshold is com- puted to discard non-edge pixels or unimportant image features. This is done using an algo- rithm developed in Voorhees and Poggio (1987) by estimating the noise in the image. The edge pixel aggregation is a sequential process, it aggregates the significant contours before the weaker ones, starting from the edge pixels that are highly suitable to serve as seeds for the process. The suitability value is obtained using an approach similar to Noble (1992). During the linking phase, a directional search mask is applied, which extends two pixels from the current position. Thus smaller gaps are bridged. The method uses the extracted key-points as one of the stop conditions. The key-points are defined as strong 2-D intensity variations, including line ends, corners, junctions, and others that reflect prominent events 4.1 3-D Straight Edge Generation 41 in the course of a boundary. This property thus allows to divide contours into meaningful sub-segments. The extracted contours are then approximated by a series of straight edge segments using a technique described in Section 4.1.5.2. The flanking regions are con- structed for each extracted straight edge segment, and the attributes of the flanking regions are computed after a RGB to Lab color space conversion. The photometric region attributes are computed by analysing the L component, while the chromatic region attributes are pri- marily derived from the a and b components. It should be noted that the effects of the out- liers (disturbances) in the flanking regions should be removed. This is realized by applying a minimum volume ellipsoid estimator (MVE) (Rousseeuw and Leroy, 1987). This estima- tor returns the estimated mean vector, the scatter matrix and a list of samples that are con- sidered as inliers. In summary, we extracted the straight edge segments from images using the above approach. Each edge segment has a set of attributes, describing its geometrical, photometric and chromatic properties in the image. The attributes are listed below: • Start and end points

• Orientation

• Length

• Median values of L, a, b bands

• The scatter matrix of a, b

4.1.4 Straight Edge Matching

In this section, we present our developed method for edge segment matching. The method exploits the rich attributes of edge segments and the geometrical structure information of the edge segments in images. The rich attributes include the geometrical description of the edge segments and the photometric information in the flanking regions of the edges. The epipolar constraints as well as the existing height information are used to restrict the search space. A matching score for a possible match is computed by comparing the attributes of the edge segments. The method does not depend on the matching score alone to deliver the results; instead, the matching scores are used as a prior information in the successive struc- tural matching. Therefore, both individual edge segment and the structures of the edge seg- ments across the images are matched. We thus avoid inconsistency problems when using thresholds to decide upon a match. The structural matching is realized through probability relaxation.

4.1.4.1 Construction of Match Pool and Computation of the Similarity Score

1. Determination of search space Obviously, without constraints applied, in order to find a correspondence for an edge seg- ment in one image, all the edge segments in the other image have to be considered. This 42 4. FEATURE AND CUE EXTRACTION results in a serious computational problems; in addition, the quality and reliability of the matching results are largely reduced if geometric constraints are not employed. In our sys- tem, we use epipolar geometry to limit the search space. The epipolar line for a point is computed with the known orientation parameters. Therefore, the two end points of a edge segment in the left image generate two epipolar lines in the right image, and the search for the correspondence(s) for this edge segment can be limited to the band defined by these two epipolar lines. Figure 4-2 illustrates this idea. For an edge segment pq extracted in the left image, ep and eq are the corresponding epipolar lines for point p and q in the right image respectively. The correct match(es) for pq in the right image should lie in the band defined by ep and eq. If a DTM or DSM is available, the search space for correspondence can be further restricted (Baltsavias, 1991). Let us take point p as an example. With the known image orientation information, we can establish the image ray pO1 on which the position of p in object space should lie. By intersecting the image ray with the DTM/DSM, we obtain P, the projection of p on the DTM/DSM. Since the DTM/DSM data are not accurate and have a certain error, the correct position of p in object space should lie below or above P, i.e. between P1 and P2 along the image ray pO1. If we back project P1 and P2 to the right image, the image rays P1O2 and P2O2 will intersect with the epipolar line ep at p1 and p2 respectively. The corre- spondence for the point p should lie between p1 and p2. Similarly for point q, its corre- spondence can be found between q1 and q2.

q1 q q2 k eq i j r ep p p1 p2

O1 O2

P2 ∆h P ∆ h DSM/DTM P1

Figure 4-2. The epipolar band p1p2q2q1 defines the search space for the edge segment pq. Therefore the search space for the correspondence(s) of the edge segment pq is restricted in the quadrilateral p1p2q2q1 in the right image. Any edge segment included in this quadri- lateral (even partially) is a possible candidate, if it intersects the two epipolar lines within this quadrilateral. For example in Figure 4-2, edge segments i, j, k are accepted as candi- 4.1 3-D Straight Edge Generation 43 dates and will be compared with edge segment pq in the left image for similarity measure- ment, while r is rejected because it intersects eq outside q1, q2. It can be seen from Figure 4-2 the size/height of this search band is decreasing with edge length and orientation difference to the epipolar lines. Thus, the proposed technique is not applicable to the edge segments that are parallel to the epipolar lines, since in this case, p1, p2, q1, q2 are on a single epipolar line (see Figure 4-3). In this case, we define the search space by a rectangle around the epipolar line enclosing p1, p2, q1, q2 (the dashed rectangle). The edge segments included in this rectangle and having small directional differences to the epipolar line are selected as candidates. In Figure 4-3, edge segments i and j are taken while k is rejected. In our implementation, the width of the rectangle and the threshold for the directional difference are set to 10 pixels and 5 degree respectively.

k j p q i p1 q1 p2 q2 ep

Figure 4-3. Search space for an edge segment parallel to the epipolar line. 2. Computation of matching scores For each pair of edge segments that satisfies the epipolar constraints above, their rich attributes are used to compute a similarity score. Therefore, the similarity score is a weighted combination of various criteria. A similarity measurement for length is defined as the ratio of the minimum length of the two edges to the maximum one, and computed by (), min Llen Rlen V len = ------(), (4.1) max Llen Rlen The angles of the edge segments must be compared in a slightly different way, because only the absolute difference between the two angle values is relevant. This difference must be compared to the expected maximum difference, in a ratio form in equation (4.1). The max- imum valueT ang is a predefined threshold value for the maximum angle difference. Since in aerial image theκ angle has a large effect on the edge angles, we rotate the edge seg- ments withκ in the left and right image respectively before computing the similarity score. The similarity score for angles of edge segments is given by T Ð L Ð R ω ang ang ang V ang = ------(4.2) T ang ωωω⋅ ω is a weight related to the lengths of the edge segments, and given by= l r , where ω ω l,r are computed for the edge segments in the left and right image respectively. They are defined as a piecewise linear function as (see also Figure 4-4): 44 4. FEATURE AND CUE EXTRACTION

  0 len<5  1  ------()len Ð 4 5<=len=t where t is set to 10.0 in our implementation.

ω

1.0

4 t len (unit: pixel)

Figure 4-4. Definition of weight function. The same weight is applied for the computation of the following flanking region similarity scores. We do not only compute a similarity score of edge geometric properties, but also of photo- metric and chromatic properties in the edge flanking regions. The photometric and chro- matic edge region attributes include the median and standard deviation of the L band, and the median and scatter matrix for the chromatic components, i.e. (a, b) data. For the com- putation of matching scores of the photometric and chromatic attributes, the comparison with each candidate edge segment is made only in the common overlapping area, i.e. ignor- ing length differences and shifts between edge segments. The common areas are deter- mined using epipolar geometry. We first compare the medians of L, a, and b bands in the left and right flanking regions for a pair of edge segments in two images. The similarity measurement is defined as the ratio of minimum median divided by the maximum one. For example, the left region median similarity measurement of the L band for a edge segment pair is computed as: min[] median()left image ,median()right image C = ------Ll max[] median()left image ,median()right image (4.3)

This computation is also applied to the right regions of the L band. Similarly, the median similarity measurements of a and b bands are obtained. Then, we average these scores of L, a, b bands as one region similarity measurement, i.e. we obtain the left and right region similarity measurementCl andCr for a edge pair. In our matching algorithm, we do not assume that the edge pair has the same contrast; instead, we only request that at least one side of the edge pair demonstrates similar brightness. Thus, if bothCl andCr are less than a predefined value, then the two edge segments are treated as different, and we stop com- 4.1 3-D Straight Edge Generation 45

puting similarity scores for them. Finally, the similarity scores for the medianCl and Cr are multiplied with the weight, where the weight is defined as in Figure 4-4, i.e.

ω ⋅ V ml = Cl For the left region ω ⋅ (4.4) V mr = Cr For the right region

The chromatic property of a region in the (a, b) data is represented by the scatter matrix as

c11 c12 c = (4.5) c21 c22

We further describe this property as an ellipse as illustrated in Figure 4-5. The two ellipses represent the scatter matrix in the left and right flanking regions of the edge l respectively. The shape of the ellipse is determined by its axes and orientation, derived from the scatter matrix. Thereby, the similarity of a region chromatic property can be achieved through comparing the shapes of the respective ellipses. The orientation and the roundness of the ellipse are given by equation (4.6) and equation (4.7) respectively.

l

Figure 4-5. Ellipses representing the spread of a, b in the flanking regions of a straight edge segment l. The ellipses are described by their axes and orientations.

1 2 ⋅ c Dir = --- atan------12 - (4.6) 2 2 2 c11 + c22 Det() c Q = 4------2 (4.7) tr ()c

The similarity measurement of a chromatic property is performed by comparing the orien- tations and roundnesses of the ellipses in the left and right regions for an edge segment pair. The similarity score of ellipse orientation is computed with a form similar to equation (4.2). For ellipse roundness, we use a form similar to equation (4.1). 46 4. FEATURE AND CUE EXTRACTION

All the scores from the comparison of the geometrical and photometric attributes of the edge segments are from 0 to 1, and the total similarity score is the weighted average of all scores. The similarity score computation starts from the longer edge segments, while the very short ones (< 5 pixels) are ignored.

4.1.4.2 Structural Matching with Probability Relaxation

After performing similarity measurement computation, we construct a matching pool and attach a similarity score to each edge pair. However, one still has problems to determine the best matches. The difficulty comes from how to decide on a threshold and how to treat the case when a line is broken or partially occluded. In addition, matching using a very local comparison of line attributes does not necessarily give results consistent in a local neigh- bourhood. To solve these problems, we employ structural matching in our system. Structural matching establishes a correspondence from the primitives of one structural description to the primitives of a second structural description (Haralick and Shapiro, 1993). A structural description is defined by a set of primitives and their interrelationships, e. g. the structural description of a digital image consists of the image features and the rela- tions among them. Many methods for structural matching were developed in the past (Vos- selman, 1992). The early attempts rely on graph searching techniques with heuristic measures. More recent efforts are based on energy minimization using simulated annealing, mean field theory or deterministic annealing, and relaxation labelling. The pioneering work in relaxation labelling is normally credited to Waltz (1975) who con- sidered the problem of line drawing interpretation studied earlier by Clowes (1971) and Huffman (1971). His formulation of the consistent labelling problem allows only the unam- biguous interpretation of line segments. This is achieved by sequentially filtering out incon- sistent label pairs on connected segments. This approach was made popular by Rosenfeld et al. (1976) who show that Waltz’s filtering can be carried out in parallel and could there- fore be implemented as a network of processors, each associated with one object in the image. An extensive theoretic underpinning of the consistent labelling problem has been given by Mackworth (1977), Haralick et al. (1979), Henderson (1979), Shapiro et al. (1981), Nudel (1983) and others. An excellent survey can be found in Kittler and Illing- worth (1985). A lot of research has been devoted to various aspects of relaxation labelling, such as edge and feature enhancement (Krishnakumar et al., 1990; Duncan et al., 1992; Hancock and Kittler, 1990; Alnuweiri and Kumar, 1991), object labelling (Udupa and Ajjanagadde, 1990; Grün and Wang, 1998), texture segmentation (Hsiao and Sawchuk, 1989), shape and stereo matching (Goshtasby and Page, 1984; Ibison and Zapalowski, 1986; Vosselman, 1992; Christmas et al., 1995; Yaonan, 1994). In our system, the structural matching for edge segment correspondence is realized through probability relaxation. The structure of the edge segments extracted from the image is described as an attributed relational graph, where the nodes of the graph represent straight edge segments, and the links between the nodes the relations (See Figure 4-6. For explana- tion of relations, see also Figure 4-7 on page 48). 4.1 3-D Straight Edge Generation 47

......

Figure 4-6. Graph representation of the structure of edge segments in a image.

We represent the straight edge segments in the left image as a set L, L={li}, i=1, 2, … n, the straight edge segments in the right image as a set R, R={rj}, j=1,2,… m. The mapping from the left graph to the right one is represented as T. Assuming the right type of mapping T,we seek the probability that li matches rj, i.e. the matching problem becomes the computation of a conditional probability P{li = rj | T}. Note here the equal sign means “match to”. Using Bayes formula, we can write:

Pl{}= r ,T Pl{}= r T = ------i j (4.8) i j PT{} By applying the total probability theorem, we obtain

……{},,… ,,… ∑ ∑ ∑ ∑ Pl1 = r1 li = r j T ∈ ∈ ∈ ∈ Pl{}= r T = ------r1 R ri Ð 1 R ri + 1 R rm R ------(4.9) i j …… {},,… ,,… ∑ ∑ ∑ Pl1 = r1 li = r j T ∈ ∈ ∈ r1 R ri R rm R

We assume that the relationship between (li, rj) and (lh, rk) is independent of the relations of other pairs, and that (li =rj) does not by itself provide any information on (lh =rk) (Christ- mas et al., 1995). Factorizing the numerator and denominator in equation (4.9) we obtain: Pl{}= r Ql()= r {} i j i j Pli = r j T = ------m (4.10) {}() ∑ Pli = rh Qli = rh h = 1 where n m () {}(), , {} Qli = r j = ∏ ∑ PTli r j;lh rk li = r j,lh = rk Plh = rk (4.11) h = 1, hi≠ k = 1

The value of Q expresses the support that is given to the hypothesis match (li =rj) from their neighbouring edge segments taking into consideration the relations between them. 48 4. FEATURE AND CUE EXTRACTION

The solution of the problem of edge matching defined in equation (4.8) can be obtained by combining equation (4.10) and equation (4.11) in an iterative scheme (Rosenfeld et al., 1976; Hummel and Zucker, 1983; Kitter, 1986; Christmas et al., 1995). Thus, ()t {}()t () ()+ P l = r Q l = r P t 1 {}l = r R = ------i j i j i j ()t {}()t () (4.12) ∑ P li = rh Q li = rh ∈ rh R The previously computed similarity scores are taken here as the initial probability ()0 {} P li = r j for a possible match pair li and rj. The constructed match pool greatly speeds up the probability relaxation process because only the edge segments in the match pool are involved. { (), , PTli r j;lh rk li = r j,lh = rk is called “compatibility function”. It is in the range between 0 and 1.0, quantifying the compatibility between the match (li =rj) and a neigh- bouring match (lh =rk). If two pairs of potential matches share the same type of relation, they are defined as compatible since they structurally support each other, otherwise if two pairs violate some basic matching constraints, like uniqueness, relative position, etc., they are considered as incompatible. The compatibility function plays an important role in the process of structural matching. In our system, it is defined using the difference of the geometrical relation measurements between (li, lh) in the left image and (rj, rk) in the right image. In the following, we first define the geometrical relation of a pair of edge segments, then we derive the compatibility function using these measurements. We use four different measurements to describe the mutual relations between edge seg- ments li and lh, two for angle relations, two for distance relation (Sang et al., 2000). The four relations are illustrated in Figure 4-7. •r1(li, lh): α

• r2(li, lh): β

• r3(li, lh): AB/CD

• r4(li, lh): 4(AB+CD)/(AC+AD+BC+BD)

B li B A β A α C C lh D D (a) (b)

Figure 4-7. Relations between a pair of edge segments. (a) angle relations, (b) distance relations. See text for details. 4.1 3-D Straight Edge Generation 49

The first measurement r1is α, the angle between li and lh. β is the angle between li and the line joining the center points of li and lh. The third measure is the ratio between AB and CD. The measure r4 used in our implementation is the ratio of the sum of edge segments to the average distance between the end points of the edge segments. Note that r2 and r3 are direc- tional, while r1, r4 are not. The above four relations describe the relative orientation, posi- tion, length and the distance between two edge segments respectively. The compatibility function is then defined using the differences between the relational measurement of two pairs (li, lh) and (rj, rk) as

min[] r ()l , l , r ()l , l min[] r ()l , l , r ()l , l ∆α ⋅⋅∆β 3 l h 3 l h ⋅4 l h 4 l h cos cos ------[](), , (), ------[](), , (), - (4.13) max r3 ll lh r3 ll lh max r4 ll lh r4 ll lh

The compatibility measure yields the maximum value 1.0 when the relations between (li, lh) and (rj, rk) are the same. It decreases as the differences increase. When the value of the compatibility measure becomes 0, it indicates that the two pairs are independent of each other.

4.1.5 3-D Straight Edge Generation

After the edge segments are matched across images, our next step is to compute the 3-D straight edges in object space. This can be done by computing the 3-D positions for the edge pixels on the matched edge segments. A straight line in object space is also straight in aerial images, however, the inverse does not hold. Consider the case, for example, a road goes up and downhill, its sides can be appear as straight lines in images. Our procedure to generate 3-D straight edges is first to compute the 3-D positions for the edge pixels of the matched edge segments. This can be done for each pixel in the overlap length of the corresponding edges. Thus, we obtain a chain of ordered 3-D points, we then segment the obtained chain into 3-D straight lines using a spit-and-merge algorithm. These two procedures are described in Section 4.1.5.1 and Section 4.1.5.2 respectively.

4.1.5.1 3-D Edge Computation

The 3-D positions of the edge pixels are computed by the well-known photogrammtry for- ward intersection. The main concern here is to find the corresponding pixels from the matched edge segments. We examined following three methods. In Figure 4-8, we repre- sent edge pixels as small squares, and the straight edge segment l in solid lines, l is gener- ated from the pixel chain. 50 4. FEATURE AND CUE EXTRACTION

l A’ ep A

Figure 4-8. Pixel correspondence for 3-D computation. The rectangles represent edge pixels. A is an edge pixel on an edge segment in the left image (not shown) corresponding to the edge segment l in the right image. The grey line ep is the epipolar line of A in the right image. A’ is the intersection of l with ep. The correspondence of A in the right image should be on the pixel chain and close to A’. For pixel A in the left image, its epipolar line ep (shown in grey line) in the right image can be computed using the given orientation parameters. The correspondence of A in the right image should be on the pixel chain, and close to the intersection of ep with l, A’. In the first method we take a 3 by 3 window centred on A’, and obtain the candidate matches for A from the edge pixels on the chain and inside the window. We then compute the orthgonal dis- tances from the candidates to ep (the dashed line), and take the pixel with the smallest dis- tance as the correspondence of A. The above method works fine except in the case shown in Figure 4-9, where neighbouring edge pixels A, B and C are aligned parallel to the epipolar line. This occurs when the road edge has a direction close to the direction of the epipolar line. Even the number of the edge pixels in left and right images may differ (Figure 4-9(b)).Therefore, the method may find one and the same point as correspondence for A, B and C, resulting in a saw-tooth chain in object space. l

A B C A’ B’ C’ ep (a)

l ep A B C A’ C’ (b)

Figure 4-9. Pixel grouping for 3-D computation. Edge pixels A, B and C in the left image are aligned parallel to the epipolar line. They are grouped and the group center is B.We only find the correspondence and compute the 3-D position for the group center B. (a) the correspondence of B is B’. A’, B’ and C’ are grouped and the group center is B’. B’ is taken as the correspondent position of B. (b) the numbers of the edge pixels in the left and right image differ. The correspondence of B is C’. A’ and C’ are grouped and the group center is taken as the corresponding position of B. 4.1 3-D Straight Edge Generation 51

We apply a “pixel grouping” method (method 2) to improve the solution in this case. That is, we take pixels A, B and C in a group, and only find the correspondence and compute the 3-D position for the group center. The pixel grouping procedure is applied to the edges in the left and right images if necessary. Thus, the pixel group centers in the left image in Figure 4-9(a) and Figure 4-9(b) are B, its corresponding position in the right image is B’ for the case of Figure 4-9(a), and the middle of A’ and C’ for Figure 4-9(b). Another method (method 3) to compute the 3-D position is directly from the fitted 2-D straight edge segments. We select along the straight edge segment in the left image points at an interval of 1pixel. Then, for each point we take the intersection of ep with the corre- sponding straight edge segment in the right image as its correspondence. We applied the above three methods in different cases, and compared the results with man- ually extracted 3-D edge segments. This is done by comparing the computed 3-D edge posi- tions with the reference data, and calculating the coordinate differences (See Chapter 6 for details of computation). Table 4-1summarises the RMS errors from the comparison for a straight vertical edge (case 1) and a straight edge with 20 degree difference to the epipolar line (case 2). The three methods deliver similar results for case 1, while improvements are obvious for case 2 using method 2 and method 3, especially in Z direction.

DX DY DZ method 1 0.136755 0.011757 0.085529 method 2 0.136355 0.011684 0.085155 Case 1 method 3 0.136420 0.011729 0.097824 method 1 0.056595 0.122870 0.610071 method 2 0.052046 0.113061 0.437204 Case 2 method 3 0.055884 0.122865 0.452631

Table 4-1. Comparison of 3 methods for 3-D edge computation (unit: m)

The computed 3-D edge chains using the above three methods for case 2 are plotted in Figure 4-11 in a 2-dimensional X’O’Z’ space. This space is illustrated in Figure 4-10. The X’ axis is defined as the projection of the chain in XOY plane, and Z’ is parallel to Z. The origin O’ is the projection of first point of the chain in XOY plane.

Z Z’ Y X’ O O’ X

Figure 4-10. Definition of X’O’Z’ space. 52 4. FEATURE AND CUE EXTRACTION

Figure 4-11. Plot of the computed 3-D edge chains in X’O’Z’ space for a straight edge segment with 20 degree difference to the epipolar line using the three methods. The results of method 1, method 2 and method 3 are plotted in black, red and green lines respectively. In Figure 4-11 the chains computed by the above three methods are plotted in black, red and green respectively. It is observed that the saw-tooth in the black chain are mostly avoided by method 2. And the green chain by method 3 are much more smoother than the black or red ones. It is worth to note that the proposed method cannot compute 3-D positions for the edge seg- ments that are parallel to the epipolar line, even they are matched by our algorithm. In this case, their 3-D positions are computed using the DSM data by monoplotting.

4.1.5.2 3-D Straight Edge Fitting

The curve segmentation problem has been addressed extensively in computer vision litera- ture. A popular segmentation method is the Hough transform (Ballard and Brown, 1982). This method tries to find straight lines from a sparse set of points. In our application points are already organized along the edges; Thus, the Hough transform would unnecessarily increase the computational complexity. Ramer (1972) presents a simple algorithm to approximate planar curves by polygon. He bases his approximation on a minimum offset criterion. Pavlidis and Horowitz (1974) use a least squares algorithm to fit straight lines to portions of the curve, and iterate a split-merge procedure to refine the initial segmentation. Grimson and Pavlidis (1985) find the breakpoints of a curve by comparing the original to a smoothed version of the curve. Discontinuities are then easily detected, and regular curve fitting is performed only between discontinuities. Fischler and Bolles (1986) describe two methods, one of them passes a “stick” of a certain width and length over the curve, and the other looks at the curve from different “views” followed by a selection of breakpoints according to the maximum votes obtained from these views. Both methods are based on segmenting the curve over different scales, and on perceptual organization. Grimson (1989) suggests an approach which is a combination of split-and-merge andψ Ð s algorithm. Wuescher and Boyer (1991) describe an algorithm based on a constant curvature criterion. We adapted the algorithm described in Lowe (1987) for partitioning the curve and generat- ing straight lines. The method recursively subdivides a chain at points with a certain devi- ation from a line connecting its end points, this process is repeated and produces a tree of possible subdivisions. Then, unwinding the recursion back up the tree, a decision is made 4.1 3-D Straight Edge Generation 53 at each junction point as to whether to replace the current low level description with a single higher level segment using the same deviation threshold. The algorithm originally works in 2-D, we extend it to 3-D in our case for the generation of 3-D straight edge segments. The computed 3-D edge pixel chain is firstly transformed in X’O’Z’ space (see Figure 4-10), a 2-D curve represented in X’O’Z’ space is obtained, which is then processed by the split and merge algorithm. Once the split points for the curve are found, the points are transformed back to the original O-XYZ object space, and the pixel chain between the two adjacent split points are fitted to a 3-D straight line.

4.1.6 Results and Discussion

The proposed method for 3-D straight edge generation described in the previous sections has been implemented, and many tests have been carried out in different terrains and land- covers, including rural areas, suburban, urban and hills. In this section, some test results are presented to illustrate the effectiveness of the developed algorithm. The image data used are extracted from the aerial images of the ATOMI project. Figure 4-12 shows the results of edge detection and straight edge extraction.

a b

c d

Figure 4-12. Edge detection and straight edge extraction. (a) Original image and the detected edge pixels, (b) preprocessed image and the detected edge pixels, (c) extracted straight edge segments from the original image, (d) extracted straight edge segments from the preprocessed image. In Figure 4-12(a) edge pixels (shown in red) are detected from the original image, while in Figure 4-12(b) edge pixels are detected in the image preprocessed by a Wallis filter. The 54 4. FEATURE AND CUE EXTRACTION extracted straight edge segments from the original image and the preprocessed image are presented as white lines in Figure 4-12(c) and Figure 4-12(d) respectively. For the purpose of edge detection and straight edge extraction, the same parameters are applied to the orig- inal and the preprocessed images. The parameters include the sigma for the Canny operator, a parameter for the computation of edge detection threshold, and the maximum allowable deviation for straight edge extraction. It can be seen from Figure 4-12(b) that the image noise is smoothed and the image features are enhanced by the filter. Thus, a certain amount of edge pixels due to noise are excluded, while the quality of the edge pixels corresponding to image features is improved. Conse- quently, better results of straight edge segments are achieved from the preprocessed image (see the edge segments at the road sides and the building boundaries). Figure 4-13 and Figure 4-14 show examples of straight edge extraction and matching, where the extracted straight edge segments are presented in white lines, and the matched edge segments are shown in blue.

a b

Figure 4-13. Straight edge segment extraction and matching. White lines: extracted straight edge segments. Blue lines: matched edge segments. (a) left image, (b) right image. 4.1 3-D Straight Edge Generation 55

a b

Figure 4-14. Straight edge segment extraction and matching. White lines: extracted straight edge segments. Blue lines: matched edge segments. (a) left image, (b) right image. In order to allow a better inspection of the quality of edge extraction and matching, we implemented a GUI tool within our road extraction system, which allows an operator to select a straight edge segment and check visually how well the straight edge is generated from the edge pixel chain, and displays the candidate matche(s) and the correct matche(s) determined by the algorithms. One example of such GUI based inspection is given in Figure 4-15, where Figure 4-15(a) and Figure 4-15(b) are image patches of the left image and right image respectively.

a b

Figure 4-15. GUI based quality evaluation. White and blue lines are the extracted and the matched straight edge segments respectively. (a) left image. Red: pixel chain for a straight edge segment. (b) right image. Red: candidate matches; Yellow: correct match. 56 4. FEATURE AND CUE EXTRACTION

The white lines in the images are the extracted straight edge segments, the matches are shown in blue. The red in the left image is the pixel chain fitted to a straight edge segment that is currently selected by the operator, the red lines in the right image are the candidates, while the correct match determined by the algorithm is show in yellow. For the two test images, we manually counted the number of extracted edge segments, the number of the found matches and the number of the correct matches to assess the perform- ance of the algorithm. The performance evaluation for Figure 4-13 and Figure 4-14 is pre- sented in Table 4-2.

Number of Extracted Edge Segments Number of Number of Left Image Right Image Found Matches Correct Matches Dataset 181 75 41 40 Dataset 2 789 809 460 455

Table 4-2. Quantitative evaluation of straight edge segment matching.

The matching approach has also been applied in building extraction in the ATOMI project (Niederöst, 2000), and in ground-based cloud image matching for 3-D cloud mapping (Seiz, 2002). The matching approach has a high success rate and most importantly is very reliable. It makes use of rich attributes for matching, including edge geometrical properties and edge flanking region photometric and chromatic properties. This is an advantage over other approaches that only use edge geometry and gray scale information. The developed struc- tural matching method achieves locally consistent results, allows matching in case of par- tially occluded edges, broken edges, etc. The use of the similarity score prior to structural matching greatly speeds up the process. Although used here for straight edges, this method can be easily extended to arbitrary edges, or even points, if some of the matching criteria (feature attributes) are excluded and adapted.

4.2 Image Segmentation for Road Region Separation

Image segmentation is a key step in a number of approaches to image analysis. The purpose of image segmentation is to partition the image space into meaningful regions. There are many methods for image segmentation in the image analysis literature, some of them are also employed to support man-made object extraction using various image data (Gong and Wang, 1996; Mason and Baltsavias, 1997; Heipke et al., 2000; Dial et al, 2001). In this section, we describe the procedure in our system to segment color aerial images to find road regions. For this purpose, a data clustering method, ISODATA (Iterative Self- Organizing Data Analysis Technique), has been implemented. The original RGB color images are transformed into different color spaces, and appropriate image data are selected. In the following, we present the procedure for performing ISODATA in Section 4.2.1. In 4.2 Image Segmentation for Road Region Separation 57

Section 4.2.2 we analyse the original RGB image to determine appropriate image data for clustering. Finally, the results of image segmentation are presented in Section 4.2.3.

4.2.1 The Clustering Algorithm

Clustering implies a grouping of pixels in multispectral space. Pixels belong to a particular cluster are therefore spectrally similar. In order to quantify this relationship it is necessary to devise a similarity measure. Many similarity metrics have been proposed, but those used commonly in clustering procedures are usually simple distance measures in multispectral space. The most frequently encountered is the Euclidean distance. Ifx1 andx2 are two pixels whose similarity is to be checked then the Euclidean distance between them is

(), dx1 x2 = x1 Ð x2

1 --- 2 = {}()Ð t()Ð x1 x2 x1 x2 (4.14) 1 --- N 2 = ∑ ()x Ð x 2 1i 2i i = 1 whereN is the number of spectral components. By using a distance measure it should be possible to determine clusters in data. Often however there could be several acceptable cluster assignments of the data, so that once a candidate clustering has been found it is desirable to have a means by which the quality of clustering can be measured. The availa- bility of such a measure should allow one cluster assignment of the data to be chosen over all others. A common clustering criterion or quality indicator is the sum of squared error (SSE) meas- ure. It is defined as ()t() SSE= ∑ ∑ xÐ mi xmÐ i ∈ Ci xCi 2 (4.15) = ∑ ∑ xmÐ i ∈ Ci xCi ∈ wheremi is the mean of the ith cluster andxCi is a pattern assigned to that cluster. The outer sum is over all clusters. This measure computes the cumulative distance of each pat- tern from its cluster center for each cluster individually, and sums those measures over all clusters. If it is small, the distance from the pattern to cluster means are all small, and the clustering would be regarded favourably. As indicated in Richards and Jia (1999),SSE has a theoretical minimum of zero, which corresponds to all clusters containing only a single data point. As a result, if an iterative method is used to seek the natural clusters or spectral classes in a set of data then it has a guaranteed termination point, at least in principle. In practice it may be too expensive to 58 4. FEATURE AND CUE EXTRACTION allow natural termination. Instead, iterative procedures are often stopped when an accepta- ble degree of clustering has been achieved. Once clustering is completed, or at any suitable interventing stage, the clusters can be examined to see whether • a cluster can be split into two new clusters. This is done by prespecifying a stan- dard deviation in each spectral band beyond which a cluster should be halved.

• some clusters are so close together that they represent an unnecessary division of the data, and thus they should be merged.

Therefore, for a complete operation of the clustering, the following parameters should be specified: • the desired number of clusters

• the minimum number of iterations

• the minimum number of pixels in a cluster

• the maximum standard deviation to initiate cluster splitting

• the minimum distance between cluster center to initiate cluster merging.

With the above parameters specified, the algorithm runs in the following steps (Haala and Brenner, 1999): 1. Each pixel is assigned to one of the predefined clusters by the criterion defined in (4.15). 2. The means of the new clusters are computed from the pixels assigned in step 1. 3. Aggregated clusters are split, if the standard deviation is larger than the specified thresh- old. Clusters with center distance smaller than the predefined parameter are merged. The above steps are repeated until the maximum number of iterations is reached. The results of the ISODATA algorithm of course depend on the quality of the input data. If the data in feature space is distributed in almost isolated natural groups, these clusters can be detected very reliably. In the next section, we analyse the original RGB color image, and select appropriate data for the clustering procedure.

4.2.2 Selection of Image Data

A common practice for the selection of input data is that each kind of data should efficiently provide useful information, and that all the data should have a good separability among them, i.e. there should be maximum complementary information and minimum redundant information. The latter is also for computation consideration. Clustering cost increases with the number of features used to describe pixel vectors in multispectral space, i.e. with the number of spectral bands associated with a pixel. Therefore, it is necessary to ensure that no more features than necessary are utilized when performing clustering. Features that do 4.2 Image Segmentation for Road Region Separation 59 not aid discrimination, by contributing little to the separability of spectral classes, should be discarded since they will represent a cost burden. We are interested in the following regions in the scene under analysis: roads, vegetation, shadows and buildings. The surfaces of buildings may have different appearances in the scene. In addition to grey building roofs, red and/or orange roofs are also typical in Euro- pean areas. Figure 4-16 shows a typical RGB image in a suburban scene.

Figure 4-16. A scene RGB image. Usually, the clustering operation can be conducted directly in this RGB image. However, several techniques available can be applied to enhance the features such as shadows and vegetation so that they are more isolated in feature space. The redundancy of the image data can be checked by examining the covariance matrix {},,,… t between the spectral bands. Supposexxx is a pixel vector defined as= 1 x2 xn , n is the number of spectral bands. The covariance matrix is given by N x 1 t Cov = ------∑ ()x Ð m ()x Ð m N Ð 1 i i (4.16) i = 1 wherexi is individual pixel vector,mN is the mean vector, and is the total number of pix- els. If there is correlation between the responses in a pair of spectral bands the correspond- ing off-diagonal element in the covariance matrix will be large by comparison to the diagonal terms. On the other hand, if there is little correlation, the off-diagonal terms will be close to zero. This behaviour can also be described in terms of the correlation matrix R whose elements are related to those of the covariance matrix by 60 4. FEATURE AND CUE EXTRACTION

x ⁄ ()x ⋅ x Rij = Cov ij Cov ii Cov jj (4.17) x whereRij is an element of the correlation matrix, andCov ij are elements of the covari- ance matrix.Rij describes the correlation between band i and band j. The covariance matrix and correlation matrix for Figure 4-16 between R, G, B bands are

2584.84 1812.31 1405.22 1812.31 1507.81 1145.55 1405.22 1145.55 987.70 and 1.00 0.92 0.88 0.92 1.00 0.94 0.88 0.94 1.00 It is noted that for this image the data between bands are highly correlated. This suggests that the original RGB image contains redundant information, and not all of its bands are suitable for clustering. A further processing can be done on this image data with principal component transformation (PCT). The PCT transformed image preserves the essential information content of the image with a reduced number of transformed dimension. Thus, the purpose of PCT is to transform the original image into a new coordinate system so that the feature bands will have zero correlation, i.e. the covariance matrix will be diagonal. If the vectors describing the pixels are represented asy in the transformed coordinate system, then it is desired to find a transformationT of the original coordinates, such that yTx= (4.18) It is not difficult to infer thatT is actually the transposed matrix of the eigenvectors of Covx . The computation of eigenvalue ofCovx can be done by solving the equation Covx Ð λI = 0 (4.19) where I is the identity matrix. With the eigenvaluesλ , the elements ofT can be found by x λ Cov Ð iI ti = 0 (4.20)

The eigenvalues of the covariance of Figure 4-16 are 4798.25 212.84 69.26 It is obvious that the first component accounts for 98% of the total variance of the data in the image. The PCT transformed images of Figure 4-16 are shown in Figure 4-17, where the information redistribution is illustrated. The last two components appear dull and poor in contrast. The high contrast displayed is a result of a contrast enhancement applied to the components for the purpose of display. This serves to highlight the poor signal to noise ratio. On the other hand, most of the data spread is in the direction of the first principal com- ponent, this component contains most of the information in the image. 4.2 Image Segmentation for Road Region Separation 61

a c b

Figure 4-17. PCT transformed image. (a) first component, (b) second component, (c) third component. The Ratio of different spectral bands from the same image also finds many applications in image analysis. A good example is the use of the ratio of the infrared band to the red band as a vegetation index. Similarly we define a greenness measure using band R and G, com- puted as

GRÐ ------(4.21) GR+ The greenness measure defined above enhances vegetation in a RGB image. The greenness image for Figure 4-16 is present in Figure 4-18(a). As can be seen the vegetation has high grey values, and appears very bright in the image, while other objects have low grey values and thus are generally dark. In order to enhance shadows in the image, a RGB to HSI color space transformation is per- formed, and the S component is selected. The S component of the Figure 4-16 is shown in Figure 4-18(b). The bright areas in the image correspond to shadows. 62 4. FEATURE AND CUE EXTRACTION

a b

Figure 4-18. (a) Greenness image, (b) S component image in HSI color space. 4.2.3 Results and Discussion

The result of the clustering algorithm applied to the scene of Figure 4-16 using the first component of PCT, greenness image and the S component of the HSI space is presented in Figure 4-19. Five classes are determined. They correspond to road regions, green objects, shadow areas, dark roofs and red roofs. With this classification, we avoid the difficulty to select thresholds for image segmentation. Problems exist when some pixels on grey building roofs have similar spectral response to roads; these pixels are falsely classified into the road class. These errors can be overcome by employing height information (see Section 4.3). With such data set and clustering algorithm, sometimes we encounter problems to separate road regions and bare soil, especially when the roads are not paved. Figure 4-20 shows such a case. The original image is presented in Figure 4-20(a). Figure 4-20(b), Figure 4-20(c) and Figure 4-20(d) are the first component image of PCT, greenness image and the S com- ponent image respectively. The clustering result is shown in Figure 4-20(e), where the road surface and the bare soil are clustered into a single class. Clustering can be applied to any number of bands of image data. The appropriate band selection can be determined by image analysis techniques, such as PCT. Furthermore, with multispectral images, more information can be employed to make the features more prom- inent, thus improving the clustering results. An alternative method to improve image seg- mentation is to use supervised classification. The training samples can be obtained with the aid of an existing GIS (Heipke et al., 2000). However, this is not done in this dissertation. 4.2 Image Segmentation for Road Region Separation 63

Legend road green objects dark roofs red roofs shadows

Figure 4-19. Clustering results.

a b c

d e

Figure 4-20. Clustering cannot separate unpaved roads from bare soil. (a) original RGB image, (b) the first component of PCT transformed image, (c) greenness image, (d) the S component from HSI color space, (e) clustering results. The road and the bare soil are clustered into a single class. 64 4. FEATURE AND CUE EXTRACTION

4.3 DSM and DTM Analysis

DSMs have recently found many applications in digital photogrammetry, such as ortho- photo generation and building extraction. As described in Section 4.1.4, the DSM or DTM has been used in straight edge matching to restrict the search space. They can also be used to support road extraction. A DSM is a geometric description or reconstruction of the phys- ically sensed surface, it ideally models the man-made objects as well as the terrain, thus provides information about the objects which are characterized by their relative heights with respect to the surroundings. Subtracting the DTM from the DSM results in a so-called normalized DSM (nDSM), which enables the extraction of above-ground objects, including buildings and trees. nDSM is an important cue in building detection and extraction. Depending on the quality of the original DSM and the derived nDSM, buildings can be extracted by using nDSM alone (Weidner and Förstner, 1995; Vosselman, 1999; Maas 1999) or by combination with optical images (Baltsavias et al., 1995; Haala and Brenner, 1999; Nevatia et al., 1999; Niederöst, 2000; Vögtle and Steinle, 2000; Cord et al., 2001). The latest advancement of building detection using DSM can be found in Baltsavias et al. (2001). The nDSM can also be employed in a 3-D road extraction system to verify if a 3-D straight edge is on the ground. The edges above the ground are thus discarded from further process- ing since road sides are on the ground, therefore the computation complexity is reduced, and many false hypotheses are avoided from the beginning. This information can also be used to reason if a region is on the ground. Furthermore, it can compensate the missing information in classification data. The overall approach in this section is to use height information to partition an image into regions, which have potential to be the buildings and trees, therefore makes the separation of above-ground objects and ground objects possible. The DSM data can be generated using stereo images (Krzystek, 1991), or directly by a laser scanner sensor (Wehr and Lohr, 1999) or airborne Interferometers SAR (Dowman, 2001). The quality of the DSM is an important issue in 3-D reconstruction methods. Baltsavias et al. (1996) and Grün et al. (2000) compare the performance of different digital photogrammetric systems in the auto- matic generation of digital surface models. A comparison of data acquisition and process- ing between photogrammetry and airborne laser scanning is presented in Baltsavias (1999), the quality of DSM from laser scanning and the quality of DTM derived from laser scan- ning are reported in Huising and Pereira (1998) and Petzold et al. (1999). As described in Section 3.1, in our work DSMs with 1-2m grid size, generated from aerial imagery with PHODIS TS are used. A national DTM with 25m grid spacing and an accuracy of 2.5m in the lowland and 10m in the Alps is also available. Figure 4-21 shows a scene in a lowland area with its corresponding DSM and DTM. The DSM and DTM are shown as grey scale images, the higher the DSM/DTM values are, the brighter they appear in the images. The scene is rather flat. Figure 4-21(b) illustrates the presence of the standing objects such as buildings and trees, as it is expected in the DSM while not existing in the DTM. 4.3 DSM and DTM Analysis 65

a b c

Figure 4-21. Example of DSM and DTM data showing presence and absence of above- ground objects. (a) Image, (b) DSM, (c) DTM. Such a DTM is quite helpful for certain image processing, as it is used in straight edge matching. However, the accuracy of the DTM is not good enough for a reliable separation of above-ground objects and ground objects. Substracting the DSM from this DTM may result in false detections or some above-ground objects missing. Therefore, in order to achieve good quality of the normalized DSM, we propose to extract above-ground objects from the DSM data only, therefore avoiding to introduce errors from the DTM data. Several methods for detecting above-ground objects, particularly buildings, from the DSM have been introduced so far. Weidner and Förstner (1995) apply morphological operators to the DSM data to compute an approximation of the terrain surface using gray scale open- ing. Thus, the extraction of above-ground objects is simply a morphological top-hat trans- formation of the DSM followed by a thresholding. This method is extended by Eckstein (1995), who develops a dual rank operator to select above-ground objects. The main prob- lem using the above methods is to define the optimal operator size, i.e. the size of the window (or circle) that has to be examined. In addition, the methods also encounter prob- lems in sloped terrain. Using a window entirely contained in a building’s outlines results in roof points falsely set as ground points. On the other hand, the operator size should be small enough to preserve small forms of the topographic surface. Using a large window size raises the probability of false detection. In addition, with increasing operator size, the ter- rain features get smoothed, and a horizontal shift of terrain features is observed. In sum- mary, the window size of the operator should be large enough to prevent the operator from running into roofs, and on the other hand the window should be small enough to preserve small forms of the topographic ground surface. In order to overcome this problem, Schiewe (2000) proposes a method called compressed opening. He performs openings from the top 66 4. FEATURE AND CUE EXTRACTION and bottom of the DSM simultaneously using different window sizes. The procedure is repeated with accordingly varying window sizes until the opening results from the top and bottom are identical. This method is computationally very expensive since at each level the results of opening from the both sides have to be checked and compared. Vosselman (2000) develops a slope based method to derive DTM from laser altimetry data, where a point is classified as a terrain point if there is no other point such that the height difference between these points is larger than the allowed maximum height difference at the distance between these points. He also shows that this method is closely related to the erosion operator used in grey scale mathematical morphology. Again the difficulty still exists to determine the parameters that has the same meaning as the operator size in the above methods. In this work, I propose to use the Multiple Height Bin (MHB) method developed by Balt- savias et al. (1995). In the MHB the DSM heights are grouped into consecutive bins of a certain size. This results in the segmentation of the DSM in relatively few regions that are always closed and easy to extract. The method avoids the selection of the operator size. Fur- thermore, it is only conducted on high objects, thus leaving the topographic ground surface unaffected. The method is simple, fast and very effective. The only parameters required are the minimum building height (usually 3 - 4m) and the minimum and maximum building Niederöst (2000) extends this method for automatic building detection, and the details of this method can be found therein. Here only the outline of the method is described. If the DSM is viewed as a function of planimetric coordinatesXY and , i. e. ZFXY= (), , the method can be interpreted as one of placing planes parallel to the plane ofXOY ; each plane then slices the function in the areas of the intersection, resulting in sets of binary bins. The sizes and the centers of the bins are computed. Figure 4-22 illustrates this idea, where two planes atZH= i andZH= j are used to slice the function. At each slice level, the bins obtained from the previous level are traced. If the locations of the centers of a bin at consecutive levels do not differ much, the bin height exceeds the minimum building height, and the bin size is in a certain range, then the bin is considered as an above-ground object, i.e. it is either a building or tree. The MHB method proceeds from the top of the DSM to the bottom. In summary, the procedures of the MHB method can be described as below: (1) Slicing the DSM from the top (2) Computing the sizes and centers of the obtained bins (3) Slicing DSM at next level. Tracing bins obtained from the previous level; Select- ing bins belonging to above-ground objects; Computing sizes and centers of new bins (4) Repeat step (3), until the bottom of the DSM is reached 4.3 DSM and DTM Analysis 67

Z Hi

Hj

O Y

DSM

X

Figure 4-22. Geometrical interpretation of the MHB technique. Planes at heightsH i and H j slice the DSM, resulting in bins. Consecutive bins are traced to find above-ground objects. Figure 4-23 shows the results obtained by applying the MHB method on the DSM data of the scene in Figure 4-21. The detected bins (corresponding to above-ground objects) are superimposed on the scene image in white.

a b c

Figure 4-23. Detected above-ground objects using the MHB method. (a) Image, (b) DSM, (c) detected above-ground objects superimposed on image in white. The above-ground objects, including buildings and trees, are successfully detected. The road region is free from bins, this means that the road region, including the areas shadowed by the standing trees, is on the terrain from the DSM analysis. 68 4. FEATURE AND CUE EXTRACTION

Figure 4-24 is another example in a complex scene with dense buildings. The above-ground objects are correctly detected by the MHB method. As in Figure 4-23, there are no detected bins on the road regions.

a b c

Figure 4-24. Detected above-ground objects using the MHB method. (a) Image, (b) DSM, (c) detected above-ground objects superimposed on image in white. 4.4 Road Mark and Zebra Crossing Detection

Road marks and zebra crossings are good indications of road existence. They are generally found on main roads and roads in urban areas. Both of them have distinct color (usually white or yellow). In high resolution images, road marks are white thin lines with a certain width. The zebra crossings are shown as yellow strips. Road marks give the road direction and the road centerline, while the zebra crossings define the local road width and local road direction. Thus, they can be used to guide the road extraction process or verify the extrac- tion results. In addition, in many cases the correct road centerlines can be even derived directly from present road marks and/or zebra crossings. This is especially useful when the road sides are occluded or not well-defined, such as in cities or city centers.

Figure 4-25. A typical urban road image with road marks and zebra crossing. Figure 4-25 shows a typical road image with road marks and zebra crossing. The width of the road marks is constant, this information can be obtained from road construction rules. 4.4 Road Mark and Zebra Crossing Detection 69

However, the length of road marks may vary for different purposes in traffic guidances. The color of the zebra crossing is distinctive and is rarely found in other places in images. Gen- erally this color can only be found on some of roads. Thus, the detection of zebra crossing can be done as following: segment the image by using color information, and analyse the shape of the derived clusters. The same approach cannot be applied to road mark detection, because the objects with white appearances are manifold in the world and in aerial imagery. A low threshold in segmentation will result in many false hypotheses, while with a high threshold, the road marks might also not be detected. In our system, we treat road marks as white linear objects, i.e. white thin lines, whose width are assumed to be bounded and known. Therefore, we can apply line extraction techniques for road mark detection. In this way, we avoid using any hard threshold in the detection process. Moreover, the road marks are detected with subpixel accuracy. In the following, we discuss relevant previous approaches to line detection in Section 4.4.1. The image line model and the line extraction technique used in our system is described in Section 4.4.2. In Section 4.4.3 we present our procedure for zebra crossing extraction.

4.4.1 Review of Related Work

The basic procedures for line extraction are line point detection followed by linking. In this sense, line extraction is similar to edge extraction. They differ in the detection techniques and the criteria for linking. In edge extraction, an edge pixel is determined by the magnitude of the gradient, and the gradient information such as magnitude, orientation etc. is used to link the detected edge pixels into a chain. This information, however, is not available in line extraction. An appropriate mathematical models for estimating the plausibility and the ori- entation of lines should be proposed. Various approaches for line extraction can be found in the literature. Some of them are designed for binary images (Smith, 1987; Argialas, 1992; Kaichang, 1992). Steger (1998) categorizes the existing approaches for line detection from grey scale images in four groups, namely gray-value-based approaches, differential geometric approaches, special- filter-based approaches and line-model-based approaches. Gray-value-based approaches extract lines by considering the grey values of the image only and use purely local criteria, e.g. local grey value differences. In special-filter-based approaches the filters are designed to enhance linear structure, so that lines can be easily obtained by thresholding. Differential geometric approaches treat the images as surfaces, and extract lines as ridges by using var- ious differential geometric features. The line-model-based approaches are quite similar to the differential geometric approaches, except an explicit line model is used, e.g. a line has a constant width, or constant contrast to the background, or it is bounded by an edge on each side. Gerig et al. (1993) and Sato et al. (1997) try to enhance lines in the images so that they can be extracted using simple thresholding. In Gerig et al. (1993) the enhancement is conducted by using the second derivatives of the Gaussian, while Sato et al. (1997) first obtain the line direction and the response from the Hessian matrix of an image filtered with Gaussian 70 4. FEATURE AND CUE EXTRACTION smoothing kernels, and the enhancement is then done by applying a transformation to the obtained principal second derivatives which makes linear structure more prominent, pro- vides continuity in the line direction in case of signal loss, and suppresses unwanted struc- tures such as blobs or surfaces having a line profile. The approach given in Koller et al. (1994) is based on the model that a line has two corre- sponding edges, on each side of the line, which can be extracted by applying two edge detection filters that are the first derivatives of the Gaussian with different sign. The authors combine the two respective responses in a non-linear way, resulting in a sharp maximum of the filter at the position of the line points. The derivatives in which the two edge operators have to be applied are determined from the eigenvalues of the Hessian matrix of the image. Lines can be extracted by differential geometric approaches. In this scheme, the image is regarded as an intensity landscape, and represented by a function z=f(x, y). Within this interpretation of the image, lines can be defined to be ridges. Numerous attempts have been made to construct ridges for use in image and shape analysis, consequently many defini- tions for ridges can be found in literature. A classification of the various definitions for ridges is presented in Eberly et al. (1993). Haralick et al. (1983) use a facet model to extract ridge points from images. They fit a cubic polynomial to the image data in a window. The authors derive an explicit formula for the angle that maximizes the absolute values of the second directional derivative. The ridge points are then obtained from a quadratic function that is generated from the polynomial in the detected direction and by setting the directional derivative to zero. Wang et al. (1993) use a similar approach for optical character recognition (OCR), where third order Chebyshev polynomials are employed to fit the image over a window for each pixel. A Hessian matrix is then formed by partial derivatives of the image, and the ridge direction is obtained. The ridge direction is further discretized to the pixel grid, and the ridge point is located by searching the local maximum perpendicular to the ridge direction based on comparing grey values in appropriate directions. Steger (1996) also uses a differential geometric approach to detect curvilinear structures in 2-D and 3-D images. Gaussian masks are used to estimate the derivatives of the image, the algorithm scales to lines of arbitrary widths. By using explicit models for lines, and various types of line profile models, the bias in the extracted lines can be predicted and removed. The linking of the detected ridge points is starting from the point with maximum second derivatives, lines are then constructed by adding the appropriate neighbour to the current lines. Since the local direction of the line is fairly accurately estimated by the algorithm, only three neighbouring points that are compatible with this direction are examined. The choice regarding an appropriate neighbour to be added to the line is based on the distance between the respective point locations and the angle difference of the two points. 4.4 Road Mark and Zebra Crossing Detection 71

4.4.2 Image Line Model and Line Extraction

We adapt the standard mathematical notation to describe the image as an intensity land- scape, thus the image is represented as a function z=f(x, y), where x and y are the coordi- nates of the pixel and z is the grey value. The Hessian H is defined as ∂2G ∂2G ------2 ∂ ∂ ∂ x x y (4.22) ∂2G ∂2G ------∂ ∂ 2 y x ∂ y ω ω λ λ We use1 and2 to represent the unit eigenvector of H,1 and2 the corresponding λ > λ eigenvalues, with1 2 . Immediately we have: λ λ ω ω 1) H is symmetric,1 and2 are real,1 and2 are orthogonal each other; 2) The second derivative of f in the direction ofω is

″ f = ωT Hω (4.23)

ω ω therefore1 and2 are the directions in which the second directional derivatives are λ > λ extrema, and1 2 are the values of these extrema. We adapt the definition of a ridge in Eberly et al. (1993) as “height definition”. The ridge is thought as a path one follows in the mountains, where there is always a drop both on the left and the right side. This definition is also used in Maintz (1996) where he calls it an “intuitive ridge”. This coincides with our case of road marks on the road surface. The grey values of road marks are higher then those of the road surface on each side of the road marks. If we represent the image as an intensity landscape, the gradient always points to the direction of the steepest ascent. This is illustrated in Figure 4-26. Points 2 and 3 are non- ridge points, their gradients generally point towards the ridge, while point 1is on the ridge, the gradient is aligned with the ridge. For each point on the ridge, a bar profile is observed in the direction perpendicular to the ridge, therefore the second derivative of the image intensity function perpendicular to the ridge direction at a ridge point will have a local minima, thus the direction perpendicular to the ridge corresponds to the largest negative ω eigenvalue of the Hessian matrix H, i.e.1 .

1

3 2

Figure 4-26. Intensity profile of road marks. Black point: ridge point. Grey points: non- ridge points. 72 4. FEATURE AND CUE EXTRACTION

ω ω In order to find1 and2 for each ridge point, the first and second partial derivatives of the discrete image need to be estimated. Several algorithms have been proposed in the lit- erature. In our work, the method developed in Busch (1994) is adapted. The method uses a facet model to represent the image around the ridge points, and describes the image in the form of a continuous function as a polynomial of second order. This method is chosen because it is fast, in addition, although it may create problem in extraction of thick lines, it delivers quite good results for thin line extraction in images. Since the road marks in the images in our work are only few pixels wide, this method is expected to provide fairly pre- cise position of the road marks. The polynomial function of row and column coordinates takes the form (), 2 2 fxy = k0 +++k1xk2 yk3x +k4xy + k5 y (4.24)

The coefficients of the polynomial are determined by a least square fit of the polynomial to the image data around the ridge point. Therefore, the Hessian matrix H is equivalent to 2k k 3 4 (4.25) k4 2k5 The computation of the eigenvalues and eigenvectors can be done by solving the character- ω istic equation of H. The angle corresponding to1 is the direction perpendicular to the ridge, and is given by ∂2G 2------1 ∂x∂y 1 2k α ==--- atan------atan------4 2 2 (4.26) 2 ∂ G ∂ G 2 k5 Ð k3 ------Ð ------∂2 y ∂2x As we observed in Figure 4-26, the profile at a ridge point in the directionα is bar shape, thus we describe this profile as a parabola by gr()= abrcr++2 (4.27) We now compute the coefficients of a, b, and c. We denote the directional derivative of f at ′ the point (x, y) in the directionα byf α()xy, . It is defined as ′ fxh()+ sinα, yh+ cosα Ð fxy(), f α = lim ------(4.28) h → 0 h It follows directly from this definition that ′ ∂ f ∂ f f = ------()αxy, sin + ------()αxy, cos (4.29) α ∂x ∂x The curve g can be obtained by cutting the surface f(x, y) with a plane that is orientated in the desired directionα and is orthogonal to the row-column plane. Therefore, the deriva- 4.4 Road Mark and Zebra Crossing Detection 73 tives of g is the directional derivatives of f. To cut the surface f at (x, y) with a plane in direc- tionα , we simply requirexr= sinα ,yr= cosα .This produces the curvef α()r .

() ()α α ()2α αα 2α 2 (4.30) f α r = k0 ++k1 sin + k2 cos rk3 sin + k4 sin cos + k5 cos r

By comparing equation (4.27) and equation (4.30), immediately we obtain

α α bk= 1 sin + k2 cos (4.31) 2α αα 2α c = k3 sin + k4 sin cos + k5 cos ′ Wheng ()r = 0 , we obtain the position of the line point by b r = Ð------(4.32) 0 2c A point is declared as a line point if its position falls within the pixel’s boundary, and the value c is significantly large. In summary, the procedure for the computation of ridge points is: 1. Fit equation (4.24) to the grey values in the image window 2. Determine the direction perpendicular to ridge using equation (4.26) 3. Compute b, c and line point position by equation (4.31) and equation (4.32) Therefore for each line point, we obtain the following information: • The local direction of the line

• The second directional derivative in the direction perpendicular to the line

• The precise sub-pixel location of the line

We adapt the method described in Steger (1996) to link the individual line points into lines. The extracted points are sorted according to their second directional derivatives. The algo- rithm starts from the pixels with maximum second directional derivatives, and adds the appropriate neighbour points to the current line. Since the line point detection algorithm will yield a fairly accurate estimate of the local direction of the line, only points in three directions will be examined. The choice regarding to the appropriate neighbour to be added to the line depends on the distance between the respective sub-pixel point locations and the angle difference of the two points. This adding process is continued until no more line points are found in the current neighbourhood, or it reaches a line point that is already added to another line. The detailed description of the procedure can be found in Steger (1996). The above described algorithms are employed to extract road marks. In practice, the fact that road marks are white and brighter than the surrounding road surface gives us the pos- 74 4. FEATURE AND CUE EXTRACTION sibility to simplify the extraction process. Thus, the white pixels are firstly detected by the extraction of the regions with predefined spectral pattern responses, and the line extraction process described in the previous paragraph is only applied on the detected white pixels. In RGB imagery, different colors are obtained by mixing different shades of the three basic colors Red, Green and Blue. White is obtained by mixing high intensities in all three bands. Therefore, the RGB image is decomposed into its three bands resulting in three gray scale images. Each of these images represents the shade of the corresponding basic color required to form a certain output color in the original image. Knowing the final color, we can define the required ranges of the shades in the Red, Green, and Blue images to output this particular color. The Red, Green and Blue images can be binarised according to the cri- terion whether the appropriate shade range exists or not. Binarizing the images requires tol- erance factors that can be obtained from a prior knowledge about color formation or via some training sets. Multiplying the binary images, we obtain a final binary image that sig- nifies whether the desired color exists in the original image or not. LetT i signify the required response of the sensor to output a white color in red, green and blue bands, then for the detection of white pixels, the RGB image can be binaried as follows: , (), > 1 gi j T k Bk =  (4.33)  0, Otherwise whereBk is the binarized image containing the candidates for the object in that specific band,g()ij, is the gray value associated with the()ij, spatial location. Before multiplica- tion of the obtained binary imagesBk , an additional constraint is applied. To a white pixel, the differences of the intensity values between R, G, B bands should be limited, i.e, < gm Ð gn T mn, = 123,, (4.34) mn≠ The pixels satisfying (4.33) and (4.34) are considered as possible road mark pixel in the original RGB image. In order that the road mark pixels will not be excluded by the above binary procedure the three thresholdsT i and the thresholdT for band difference are rather relaxed. The line extraction process is then conducted only on the pixels determined by the above process. The extracted lines are further processed by the procedures described in Section 4.1.5.2, thus a set of straight lines are obtained. The straight lines are described by the following attributes: • starting and ending points

• orientation

• the average of second directional derivatives

The line detection is conducted on the left and the right image separately, and the 3-D lines are generated by our developed structural matching method (See Section 4.1.4). 4.4 Road Mark and Zebra Crossing Detection 75

Figure 4-27 gives an example for road mark detection applied to the image of Figure 4-25. The segmented image is shown in Figure 4-27(a). The thresholds used for binarization are 150, for R, G, B bands, the threshold for band difference is selected as 50. It is observed in Figure 4-27(a) that lots of non road mark pixels remain, including vehicles, road shoulders, etc. Figure 4-27(b) is the result obtained by the line extraction procedure. The extracted lines are shown in black superimposed on the original image. All the road marks are extracted as bright lines by the line extraction algorithm, while many other bright objects are avoided. Still some white objects that have similar spectral response and similar widths to road marks are also extracted as lines, see for instance the walls of the houses beside the road. These randomly distributed bright linear objects, however, can be discarded by the procedures of cue combination.

a

b

Figure 4-27. Example of road mark detection through line extraction. (a) segmented image, (b) extracted thin white lines (candidate road marks) superimposed on image. 4.4.3 Zebra Crossing Detection

Zebra crossings are composed of several thin yellow strips. The strips have similar length and width. These strips are aligned with a certain interval, forming a rectangle right cross- ing roads. Thus, one axis of the rectangle defines the road width, while the other axis is par- allel to the road, i.e., indicates the road local direction. Furthermore, the center of the rectangle generally corresponds to the road center (See Figure 4-25, a road image showing road marks and zebra crossing). Our procedures for zebra crossing detection are presented in Figure 4-28. As in road mark detection, the color information is firstly used for image segmentation. The thresholding process described in section 4.2.3 is applied. For yellow zebra, only the intensities of bands R and G are used, because yellow can be obtained by mixing red together with high inten- sity green shades. Thus, the thresholding process is applied to R and G bands of the road 76 4. FEATURE AND CUE EXTRACTION image. The difference of the pixel grey values between R and G bands should be limited. Again, the thresholds are relaxed in order to include all the zebra crossing pixels. A mor- phological closing is then applied to bridge the gaps between the thin strips. The structure size for closing is not difficult to determine, since the width of the thin strips and the interval between them can be obtained from road constriction rules. With connected labelling a set of clusters are obtained. Only the clusters with a certain size are kept, while the small ones are discarded. Then, the shape of the cluster is analysed. The rectangle-like clusters are selected as zebra crossings. The center, the short and long axes of the detected zebra cross- ings are computed using spatial moments. Input images

ImageSegmentation

Morphological closing

Connected labelling

Spatial moments computation

Yellow Zebra: Center, long and short axes

Figure 4-28. Procedures for zebra crossing detection. In the continuous domain, moments are defined as

∞ ∞ m = xi y j fxy(), dyxd ij ∫Ð∞ ∫Ð∞ (4.35)

In the discrete domain of image raster data, the integrals have to be replaced by sums as i j (), mij = ∑∑ x y fxy (4.36)

The center for the detected cluster is given by m m x ==------10-, y ------01- (4.37) m00 m00 The axes and the orientation of the cluster are computed from the second order central moments. The central moments in the discrete domain are expressed as ()i()j (), mij = ∑∑ xxÐ yyÐ fxy (4.38) and the direction of the short axis is given by 4.4 Road Mark and Zebra Crossing Detection 77

1 2m11 θ = --- atan------(4.39) 2 m20 Ð m02 From the center of the cluster, the short axis can be obtained by tracing along the direction θ to the border of the cluster. Similarly tracing along the direction perpendicular toθ , the long axis of the cluster can be determined. Note that the center of this cluster indicates the road center in the zebra crossing area, while the long and short axes define the road local width and the local direction respectively. The zebra crossing detection applied to Figure 4-25 is shown in Figure 4-29. Figure 4-29(a) is the segmented image using color information. As in road mark detection the thresholds used in R and G bands are 150, the threshold for band difference is selected as 50. Figure 4- 29(b) shows the image after morphological closing. Although several clusters are obtained, only one cluster has significant size in this case, and the size is very close to the rectangle defined by the length of the thin strip and the road width. The center, the long axis and short axis of the cluster are shown in Figure 4-29(c) superimposed on the original image. Figure 4-29(d) is an enlarged portion of Figure 4-29(c). The center of the zebra crossing is represented as a red circle. Obviously it is located on the road center line. The green line is the short axis of the cluster, it corresponds to the road direction. The long axis, shown in blue line in Figure 4-29(d), is perpendicular to the road direction, and defines the road width in the zebra crossing area.

a

b

c d

Figure 4-29. Example for zebra crossing detection. (a) segmented image, (b) after morphological closing, (c) the center, long axis and short axis superimposed on original image, (d) enlarged zebra crossing image with the detected center and axes. 78 4. FEATURE AND CUE EXTRACTION

4.5 Summary

In this chapter, several approaches for the extraction of various features from images have been presented. Edges are extracted, and straight edge segments are computed. With a stereo matching algorithm, the correspondences for straight edge segments across images are found. 3-D straight edges are then computed from the matched edge segments. Three 3-D computation methods are implemented and compared. Image segmentation is achieved by data clustering. The original RGB image is analysed to determine the appropriate data for clustering so that the interesting features are more prominent and are easier to be sepa- rated, while the amount of the input data to the clustering is limited. Above-ground objects and ground objects are separated from the available height data. This is achieved by a Mul- tiple Height Bin method. Road marks and zebra crossings are also detected and extracted in the images. The methods employ the specific spectral response of road marks and zebra crossings, and exploit the shapes of them to achieve good quality of results. The extracted road marks are transformed to object space through the developed matching algorithm. 5. 3-D ROAD RECONSTRUCTION

This chapter describes the procedures for 3-D reconstruction of roads using the extracted features and cues. The process consists of different intermediate and interrelated processes, every process provides more abstract and more object related information to its higher level process. The main strategy is to use knowledge obtained from the existing database and extracted cues to 1) exclude irrelevant features as much as possible and as early as possible; 2) provide primitives that most probably belong to the road. Each VEC25 road (defined as between the two junctions) is back projected onto images using the existing orientation parameters. Thus, the image patches enclosing the road are defined using the position of the road and the maximum error of the prior road database. According to our road extraction strategy introduced in Chapter 3, the road extraction system focuses on the image regions, and a set of data processing tools described in Chapter 4 is activated to extract features and cues. The most important features are 3-D straight edges since the road sides are among them. In Section 5.1, 3-D parallel overlapping edges are found and evaluated to find the possible road sides that are parallel. Due to occlu- sions or shadows, roads might be broken and some road sides may not be visible in images. The road segments with only one side visible in one or two images in the occluded/shad- owed areas can be recovered by the techniques introduced in Section 5.2. In addition, Gaps are formed and evaluated. In Section 5.3, road segments are linked to reconstruct the road. This is done by finding an optimal path among the road segment candidates that maximizes a merit function. Highways, first class roads and most second class roads usually have road marks on them. These roads can also be extracted using the detected road marks and zebra crossings. The procedures to extract roads using road marks are described in Section 5.4. In Section 5.5, we deal with road junction generation. This is realized by intersecting the extracted roads with the aid of the topology of the VEC25 data. With the extracted roads and road junctions, the road network is reconstructed. The proposed method and various processes discussed in this chapter are all implemented. Finally, in Section 5.6, the results of the reconstructed roads, road junctions and road networks are presented.

5.1 Finding 3-D Parallel Road Sides

Roads appear in high resolution images such as the one used in our work with two parallel boundaries. Thus, our first step is to find from edge segments the possible road sides that are parallel. We start from an edge deduction process to remove irrelevant edges in Section 5.1.1. This process not only results in a significant reduction in computational com- plexity but many false hypotheses can be avoided right from the beginning. The 3-D paral- lel edges are then formed from the remaining edges, they are evaluated to find road segments. These procedures are covered in Section 5.1.2 and Section 5.1.3 respectively. 80 5. 3-D ROAD RECONSTRUCTION

5.1.1 Removal of Irrelevant Edges

With the presence of the VEC25 road and its error known a priori, we restrict our working areas in the vicinity of the VEC25 road, i.e. edges outside the VEC25 road error buffer will not be considered further. The definition of the VEC25 road error buffer is illustrated in Figure 5-1. The grey line is the VEC25 road, the real road is represented as a grey region. The VEC25 road error buffer is between two solid lines, it is defined in object space using the VEC25 road vector, the maximum error and the approximate road width. half of road width

maximum VEC25 error

VEC25 road error b k

j

uffer i

Figure 5-1. Definition of the VEC25 road error buffer. The VEC25 road and the correct road are represented as grey line and grey region. The VEC25 road error buffer is defined between the two solid black lines. i, j, k are extracted straight edge segments. Since the shapes of the VEC25 roads are generally correct, edges having large directional difference to the VEC25 road segments are discarded. The directions of edges are com- pared with the VEC25 road segments, and the directional differences between the edges and the VEC25 road segments are computed. If an edge covers several VEC25 road seg- ments, the direction of the edge is compared with that of the segments covered, and the min- imum directional difference is taken. The edges with directional differences above a prespecified threshold are discarded. The threshold should be rather relaxed so that the road sides will not be excluded. In Figure 5-1, edge k is kept, while edges i, j are discarded. Since roads are on the ground, edges on above-ground objects are removed by checking with the nDSM, i.e. edges on the nDSM blobs are removed. Furthermore, edges with its slope exceeding the maximum allowable slope are also avoided. The maximum allowable slope can be found from road design rules. These two cases are shown in Figure 5-2, both edges i and j are discarded. i j nDSM blob maximum allowable slope (a) (b)

Figure 5-2. Two cases should be avoided. (a) Straight edge segment i is on a nDSM blob, (b) the slope of the straight edge segment j exceeds the maximum allowable slope. 5.1 Finding 3-D Parallel Road Sides 81

Figure 5-3 shows the effect of the edge deduction process. In Figure 5-3(a) and Figure 5- 3(b), the extracted edges from the images are represented as white lines. The irrelevant edges are removed and the remaining edges are presented in Figure 5-3(c) and Figure 5- 3(d). It is observed that the amount of edges is significantly decreased.

a b

c d

Figure 5-3. Effect of edge deduction. The extracted straight edge segments and the VEC25 roads are shown as white and yellow lines respectively. (a) and (b): the extracted straight edge segments from road images, (c) and (d): the remaining straight edge segments after the edge deduction process. 82 5. 3-D ROAD RECONSTRUCTION

5.1.2 Forming Parallel Edges

A pair of 3-D straight edge segments i and j are considered as parallel if they have similar orientations in 3-D space (see Figure 5-4), i.e. their directional difference in the XY plane and their slope difference are small. The parallel measurement is defined as

⋅ S pg = Sα Sβ (5.1) whereSα andSβ are given by

α α β β T α Ð 1 Ð 2 T β Ð 1 Ð 2 Sα ==------, Sβ ------T α T β (5.2) α α β β 1 and2 are the directions of i and j in XY plane, and1 ,2 are their slopes.T α and T β are the maximum allowable directional difference and slope difference. They are set to 10 degree in our implementation.

Z w i β j 1 β Y 2 α 2 α 2

O X

Figure 5-4. 3-D parallel edge segments i and j.α andα are their directions in XY β β 1 2 plane,1 ,2 are their slopes. i and j must overlap. The distance w between i and j must be within a certain range. Mini- mum and maximum distances depend on the road class defined in the VEC25 data. The heights of i and j should be similar. Thus, cases where i and j are parallel but with big height difference are avoided. The height similarity measure is defined as

T h Ð h1 Ð h2 S ph = ------(5.3) T h h1 andh1 are the heights of the middle points of i and j,T h is the maximum allowable height difference. Figure 5-5 shows the found 3-D parallel edges for road images in Figure 5-3, they are shown as blue lines. 5.1 Finding 3-D Parallel Road Sides 83

Figure 5-5. The found parallel 3-D straight edge segments shown as blue lines superimposed on images. 5.1.3 Evaluation of the Area between Parallel Edges

The found 3-D parallel edges consist of many overlapping segments; some of them actually correspond to road edges and some are edges of other ground objects, e.g. agriculture areas. To eliminate unwanted segments and confirm the correct road sides, the 3-D parallel edges are evaluated and several criteria are used as a first screening process to retain only those most likely to be road edges. The evaluation is carried out in image space using the knowl- edge obtained from cues. The procedures are described in the following. The 2-D edges corresponding to the 3-D parallel edges form quadrilaterals in image space. The quadrilaterals are checked with the image segmentation result and the nDSM data. Only the quadrilaterals that are on the ground and belong to the class road are considered as road, while those belonging to class vegetation or containing above-ground objects are immediately rejected. Suppose there are totallyNN pixels in a quadrilateral,h is the number of pixels on nDSM blob in the quadrilateral (the nDSM is generated in object space; here it is back projected in images),N r is the number of road pixels. Then, the degree that the quadrilateral belongs to the class road is computed as N N S = ------r ⋅ ------h- (5.4) pr N N

The computation ofS pr is conducted on stereo images. The larger value ofS pr from the two images is taken as an attribute of the 3-D parallel edges. Only the 3-D parallel edges with itsS pr value above a prespecified threshold are accepted. On most roads, especially on 1_class and 2_class roads, this threshold should be close to 1.0. Since dirty roads in rural areas may contain a central strip of vegetation, in order to detect such road segments, the 84 5. 3-D ROAD RECONSTRUCTION threshold in this case should not be too strict. In our system, this value is set as 0.7. Further- more, the strip should posses similar orientation with the quadrilateral, and the distances between the strip and the two edges should be similar. From (5.1), (5.3) and (5.4), a general measure for a pair of 3-D parallel edge segments can be computed as ⋅⋅ S prsp = S pg S ph S pr (5.5)

When the road under process is a first class or second class road, a procedure to check if the quadrilaterals contain any detected road marks is performed. If the contained road mark has the same orientation with the quadrilateral, and the distances between the road mark and the two edges are similar, then the 3-D parallel edges are considered to be hypotheses for road sides. An example is presented in Figure 5-6, where the extracted road marks (black lines) verify that the 3-D parallel edges are road sides.

j i

Figure 5-6. Road marks verify the parallel edges belonging to the road. The black lines are the extracted road marks, the parallel edges (shown as blue lines) are confirmed to be road sides. From the 2-D edges, the edge gradient magnitudes and orientations are collected. They are computed by the averages of the gradient magnitude and orientation of each pixel on the edge. The found 3-D parallel edges have high possibilities of belonging to a road, they are called Possible Road Sides that are Parallel (PRSP). Each PRSP thus is described by a pair of 3- D straight edges and their corresponding 2-D edges, and holds a set of attributes: • coordinates of the start and end points of the overlapping edges (both in 2-D and 3-D)

• coordinates of the center line of the 3-D parallel edges 5.2 Evaluation of Missing Road Sides and Gap Bridging 85

• length and width of the PRSP

• measure S prsp • existence of road marks

• means of edge gradient magnitudes of the parallel edges

• means of edge gradient orientations of the parallel edges

Figure 5-7 shows the found PRSPs for the road images in Figure 5-3(a) and Figure 5-3(b).

Figure 5-7. Examples of the found PRSPs shown as blue lines superimposed on images. 5.2 Evaluation of Missing Road Sides and Gap Bridging

In this section, we describe the methods to find more possible 3-D road sides. These proce- dures are necessary because not all the 3-D road segments can be obtained from the proce- dures described in the previous sections. The reason is that some of 3-D road sides are missing. The absence of 3-D road sides can be caused by shadows, occlusions, or they actu- ally do not exist, e.g. in the area where a parking lot is situated next to the road. Depending on the relations between the road segments and the neighbouring objects, sun angle, viewing direction, existence of moving cars on road, there are various types of miss- ing road sides in images. In the following, we first classify the types of the missing road sides based on the existence of 2-D and/or 3-D edges in the occluded/shadowed areas, and examine how different types of the missing road sides can be recovered. Then, we propose procedures to recover the missing 3-D road sides for the reconstruction of the occluded/ shadowed road segments in Section 5.2.2. The recovered road sides and the reconstructed 86 5. 3-D ROAD RECONSTRUCTION road segments are further checked using the extracted cues to ensure that they really belong to roads. This is done in Section 5.2.3. As a special case of missing road sides, gaps are treated and bridged. The procedures are presented in Section 5.2.4.

5.2.1 Classification of Missing Road Sides

We have made an investigation of the typical missing road sides in the processed images. In Figure 5-8 - Figure 5-17, the image patches of the occluded/shadowed road segments (where the missing road sides occur) are shown together with the 2-D and/or 3-D edges extracted in the problematic areas. In the figures, the 2-D and 3-D straight edge segments are shown in white and blue lines respectively, and are represented in lower and upper cases. 1 and 2 are the labels of the road sides. The quadrilaterals in red thin lines are the generated PRSPs, and are labelled as I, II. The dashed blue lines in the figures are the pro- posed solutions for the missing road sides. The first type of missing road sides (T1) is that the occlusions/shadows occur at the end of a road segment, and a part of one road side is invisible in both images. An example is shown in Figure 5-9, a part of side 1 at the end of the road segment is partially occluded by trees, the other road side (side 2) is visible in both images (see Figure 5-9(a) and (b)). Therefore, one PRSP (I) is generated for a part of the road segment (Figure 5-9(c)). It is understanda- ble that the missing road side can be recovered by extending the 3-D edge AB to C. Thus, The rest of the road segment can be reconstructed using the extension BC and the 3-D edge segment EF.

C

1 1 B F A 2 2 I E a b c D

Figure 5-8. T1 of missing road sides: occlusions/shadows occur at the end of a road segment, a part of the one side of the road segment disappears in both images, the other side is visible in both images. (a) and (b) image patches in the left and right images showing that side 1 is partially occluded, while side 2 is visible. (c) shows the missing road side can be recovered by extending the 3-D edge AB to C, and the occluded road segment can be reconstructed using BC and EF. The second type (T2) of missing road sides is similar to T1 (Figure 5-9), the occlusion/ shadow occurs at the end of the road segment, and side 1 is partially occluded in both images. The difference is that in this case side 2 of the road segment is visible in the left image, but only partly visible in the right image as shown in Figure 5-9(a) and (b). Again, we can generate a PRSP for a part of the road segment. In order to reconstruct the occluded road segment, we can first extend the 3-D edge AB to C as shown in Figure 5-9(c), which 5.2 Evaluation of Missing Road Sides and Gap Bridging 87 is the 3-D position of the end point c of the 2-D edge ac. Then, DE can be extended to F. Therefore, the occluded part of the road segment can be reconstructed by BC and EF.

D A (a) I 1 1 E B(b) 2 2

F C(c)

a b c

Figure 5-9. T2 of missing road sides: occlusions/shadows occur at the end of a road segment, one of the road side is occluded in both images, the other side is visible in one image, but partially occluded in the other image. (a) and (b) image patches in the left and right images showing that side 1 is partially occluded in both images, while side 2 is visible in the left image but is partially occluded in the right image. (c) shows the missing road sides can be recovered by extending AB to C, DE to F. There are cases where the occlusions/shadows occur in between a road segment, resulting in one road side of a road segment broken into several fragments. Based on the existing 3- D edge, two or more PRSPs can be generated for the road segment. The missing road side and the occluded part can be recovered by connecting the adjacent PRSPs at the both sides of the problematic area. An example is given in Figure 5-10, where the missing road side is caused by partial occlusions in both images (Figure 5-10(a) and (b)). As shown in Figure 5-10(c), two PRSPs I and II are generated, and the missing road side and the occluded part of the road segment can be reconstructed by connecting I and II. We define this kind of missing road side as T3.

II 1 1 2 2

a b I c

Figure 5-10. T3 of missing road sides: occlusions/shadows occur in between a road segment, one road side of a road segment is broken into several fragments, while the other road side is visible in both images. (a) and (b) image patches in the left and right images showing that side 1 is visible in both image, while side 2 is partially occluded. (c) The missing road side is recovered by connecting the sides of the neighboring PRSPs I and II. 88 5. 3-D ROAD RECONSTRUCTION

A similar situation to T3 is shown in Figure 5-11. In this case, a part of a road segment between two PRSPs is occluded by a moving car (Figure 5-11(a)). Thus, one road side of this part is visible only in one image (Figure 5-11(b)), the other side of this part appears in both images. The occluded road segment can be also recovered by directly connecting the neighbouring PRSPs.

II

I a b c

Figure 5-11. T4 of missing road sides: occlusions/shadows occur in between a road segment, one road side between two PRSPs is occluded by a moving object, it is only visible in one image. (a) image patch in the left image, a part of the road side is missing, (b) image patch in the right image. (c) The missing road side is recovered by connecting the sides of the neighboring PRSPs I and II. Figure 5-12 is an example of T5 of missing road sides, where one of the road side is visible in images, while the other side does not exist because it is not defined in reality (see Figure 5-12(a)). In the figure, only the image patch of the left image is shown. In this case, the missing road side is hypothesized using the visible side and the width information of the neighbouring PRSP. That is, the solution is given by shifting the visible side in its oppo- site direction with the shift value equal to the width of the neighbouring PRSP (see the dashed blue lines in Figure 5-12(b)).

a b

Figure 5-12. T5 of missing road sides: one road side is visible in both images, the other side does not exist. (a) image patch showing that the road segment has only one side, the other side is not defined, (b) the missing road side is inferred using the visible side and the width information of the neighbouring PRSP (the dashed blue lines). The T6 of missing road sides is that one of the road sides is visible in both images, the other side is occluded in both images. An example is given in Figure 5-13. The method used to treat the T5 can be applied. The hypothesized road side is shown in Figure 5-13(b) in dashed blue line. 5.2 Evaluation of Missing Road Sides and Gap Bridging 89

a b c

Figure 5-13. T6 of missing road sides: one road side is visible in both images, the other side does not appear in either image. (a) and (b) image patches in the left and right images showing one road side is visible in images, while the other side is invisible in both images. (b) the missing road side is inferred using the visible side and the width of the neighbouring PRSP (the dashed blue lines). An even worse case than T6 is also observed. Beside that one of the road side is totally invisible in both images, the other side appears only in one image (see Figure 5-14(a) and (b)). To handle this case, the visible 2-D edge (ab in Figure 5-14(c)) is transformed into 3- D using the DTM. Then, the missing road side is recovered by shifting the transformed 3- D edge as done for T5 and T6.

a

a b b c

Figure 5-14. T7 of missing road sides: one road side is invisible images, the other side is only visible in one image. (a) image patch in the left image, both road sides are occluded, (b) image patch in the right image, one road side is visible, (c) the visible 2-D road side ab is transformed to 3-D using the DTM, and then used to recover the missing road sides as done for T5 and T6 (the dashed blue lines). A case of T8 of missing road sides is shown in Figure 5-15. In this area, the road side 1 appears in both images. The road side 2 is visible in the left image, but occluded in the right image (Figure 5-15(a) and (b)). Therefore, we obtain the 3-D information for the side 1, while the side 2 is only in 2-D. We can transform the 2-D edge into 3-D using the DTM data. The visible 3-D edge together with the transformed edge deliver the 3-D road segment in this area (Figure 5-15(c)). 90 5. 3-D ROAD RECONSTRUCTION

2

1 1 a b c

Figure 5-15. T8 of missing road sides: one road side is visible in both images, the other side is only visible in one image. (a) image patch in the left image, the road side 1 and 2 are visible, (b) image patch in the right image. The road side 1 is visible, side 2 is occluded. (c) The 2-D edge is transformed into 3-D using the DTM, and forms overlapping parallel edges with the visible 3-D edge to deliver the 3-D road segment. An example of T9 of missing road sides is given in Figure 5-16, where both road sides of a road segment are visible in one image (Figure 5-16(a)), but invisible in the other image (Figure 5-16(b)). Thus, for the road segment we obtain a pair of parallel overlapping 2-D edges in one image. They are then transformed into 3-D using the DTM, and are used to reconstruct the corresponding road segment.

a b

Figure 5-16. T9 of missing road sides: both road sides are visible in one image, but invisible in the other image. (a) image patch in the left image showing that both road sides are visible, (b) image patch in the right image, both road sides are occluded by trees. The found 2-D overlapping parallel edges in the left image are transformed into object space using the DTM to reconstruct the road segment. Cases where both road sides of a road segment do not appear in both images are also observed. One of such cases is that the road surface is visible in images, but the road sides do not exist. This usually occurs in urban areas, but can also be found in small villages in rural areas (T10, see Figure 5-18(a) and (b)). The other cases (T11, see Figure 5-18(c) and (d)) are that the road sides exist in reality, but are completely occluded/shadowed in both images by neighbouring objects. Therefore, no edges corresponding to the road sides are available. In these cases, the problematic areas of the roads are actually gaps. Thus, gaps are special cases of missing road sides. Since they cannot be recovered from the images, they will be treated in Section 5.2.4 in gap generation. 5.2 Evaluation of Missing Road Sides and Gap Bridging 91

a c

b d

Figure 5-17. T10 and T11 of missing road sides: gaps. (a) and (b) road sides are not defined in the gap areas. The road surfaces are visible in images. (c) and (d) road segments are totally occluded, the road surface are not visible in images. It is found from the above analysis that in the problematic areas some of the missing road sides can be recovered and the corresponding road segments can be reconstructed. This is possible only when at least one of the road sides exists at least in one image. The T1 to T9 of missing road sides can be further summarized as the following: (1) the missing road sides are collinear with the neighbouring PRSPs, i.e. one of the sides of the problematic areas shares the same 3-D or 2-D straight edge segment with the PRSPs. Thus, the missing sides can be recovered by extending the existing 3-D/2-D straight edges of the PRSPs (T1 and T2), or by directly con- necting the neighbouring PRSPs (T3 and T4). (2) the missing road sides belong to the road segments where there is no PRSP. Thus, the missing road sides can be only inferred using the isolated 3-D edges (T5, T6) or 2-D edges (T7) with the width information of the neighbouring PRSPs. (3) the parallel road sides exist only in one image, thus, they do not have full 3-D information (T8), or totally lose 3-D information (T9). The missing 3-D infor- mation can be obtained from the DTM. It is worth to note that we do not try to exhaustively list all the cases of the occluded/shad- owed road segments. Indeed, there are many cases which can be more complex. However, the missing road sides in the occluded/shadowed areas are usually the combination of the types listed above, thus they can be treated by combining the above methods. 92 5. 3-D ROAD RECONSTRUCTION

5.2.2 Reconstruction of Missing Road Sides

As can be seen from the last section, different types of the occluded/shadowed road seg- ments can be reconstructed using different methods to recover the missing road sides. The methods to recover different types of the missing road sides are summarized in Table 5-1.

Methods Applied Types Extension of 3-D/2-D edges T1, T2 Linking adjacent PRSPs T3, T4 Hypothesizing using the vis- T5, T6, T7 ible 3-D/2-D edges Forming 3-D parallel edges T8, T9

Table 5-1. Summary of methods for the reconstruction of missing road sides.

Thus, our system first tries to extend the PRSPs as much as possible using the 3-D and 2- D edges for the cases of T1 and T2, and linking the adjacent PRSPs for T3 and T4. Then, the isolated 3-D and 2-D edges are checked to reconstruct road segments of T8 and T9. Finally, the remaining edges are used to try to infer the missing road sides for T5, T6 and T7. Linking the neighbouring PRSPs for the cases of T3 and T4 is straightforward. To gen- erated road segment candidates for the cases of T8 and T9, the methods described in Section 5.1.2 and Section 5.1.3 can be applied. In the following, the procedures to recon- struct road segments by extension of PRSPs using 3-D/2-D edges and by hypothesising road sides using visible edges are presented.

5.2.2.1 Reconstruction of Missing Road Sides by Extension of PRSPs

The procedure to extend PRSP using 3-D edge is shown in Figure 5-18. In the figure, region I represents a PRSP formed by the overlapping parallel 3-D straight edges AB and CD. Thus, the PRSP is extended to C’ as long as region II is also a road.

B H A I II D C C’

Figure 5-18. Extension of PRSP using 3-D edge information. See text for details. The position of C’ is determined by the intersection of DC with the plane that is perpendicular to BA and passes through A. Suppose the vectors of BA and DC are {},, {},, a1 b1 c1 anda2 b2 c2 , the straight edge DC can be expressed as 5.2 Evaluation of Missing Road Sides and Gap Bridging 93

xxÐ c yyÐ c zzÐ c ------==------(5.6) a2 b2 c2 and the plane perpendicular to BA and passing A is given by

()()() a1 xxÐ a ++b1 yyÐ a c1 zzÐ a = 0 (5.7) (),, (),, wherexc yc zc andxa ya za are the coordinates of C and A respectively. From equa- tion (5.6) and (5.7), the coordinates of C’ can be computed. Figure 5-19 shows the procedure to extend a PRSP using a 2-D edge. We use upper case to represent 3-D edge segments, and lower case for 2-D edge segments. Suppose that eg is a 2-D straight edge segment, and ef is occluded in the other image. Thus, we can only gen- erate a 3-D straight edge for segment gf, which is shown as AB in the figure. CD is another 3-D straight edge parallel to AB. Region I is a PRSP formed by the overlapping area of AB and CD.

e B(f) III A(g) D’ I D C

Figure 5-19. Extension of PRSP using 2-D edge information. See text for details. To reconstruct the occluded road segment (shown as region III), the PRSP is extended to D’. The 3-D information of point e is obtained by the intersection of the image ray passing through e, and the 3-D edge AB. D’ can then be determined similarly to C’ in Figure 5-18.

5.2.2.2 Hypothesizing Missing Road Sides

To recover the missing road sides of T5, T6 and T7, the system hypotheses the missing road sides using the existing 2-D or 3-D edges and the width from the already found PRSPs. The procedure is illustrated in Figure 5-20 where AB is a single 3-D edge, and its right side (with respect to the direction of A->B) is road, then its opposite side A’B’ is generated by shifting AB in the direction perpendicular to AB in planimetric plane. The shift value is taken from the width of the PRSP which most probably belongs to the road. The Z values of the coordinates of A’, B’ are taken from A and B respectively. In case the existing edges are in 2-D, they are first converted into 3-D using the DTM, and are then used to hypothe- size the missing road sides. 94 5. 3-D ROAD RECONSTRUCTION

B

w B’ A

A’

Figure 5-20. Generation of opposite side of 3-D straight edges. AB is a single 3-D straight edge segment, its opposite side A’B’ is generated by shifting AB in the direction perpendicular to AB in planimetric plane. The Z values of the coordinates of A’, B’ are taken from A and B respectively. It should be noted that not all the isolated edges are used to hypothesize the missing road sides. Before the decision is made to take an edge segment to hypothesize its opposite part, the edge is checked. Firstly, shadow edges are not suitable for this purpose, because they usually do not correspond to road sides. Secondly, one side of the edge should be the class road, at least in one image, and the shifting is conducted in the direction toward this side. Furthermore, the priority is given to the edges with grass/road transit (see Figure 5-21).

grass shadow road road

(a) (b) (c)

Figure 5-21. Criteria to select edge segments to hypothesize missing road sides. (a) shadow edges are not used, (b) one side of the edge should be the class road, (c) edges with grass/road transit are preferred. 5.2.3 Evaluation of the Road Segment Candidates

We have introduced the methods in the previous sections to reconstruct the road segments in the problematic areas. Based on the types of the missing road sides, corresponding types of road segment candidates (RSCs) are obtained. In order to ensure that the obtained road segment candidates belong to roads, they are evaluated using the knowledge obtained from cues. The attribute values describing the quality of the RSCs are computed, which include additional evidence about the presence of the road. Common attributes for RSCs are the evaluation using the image segmentation data and the nDSM. In addition, specific attributes for each type of RSC are also computed. The specific attributes are mainly the relations between the RSC and its adjacent PRSPs. They give reliability measures for the RSC. The evaluation using the image segmentation data and the nDSM is carried out in image space and object space respectively. However, the evaluation process also uses the interac- tions between these two spaces. From the image segmentation result, grass and trees are in the same class, they can be separated here using the nDSM data, i.e., if a pixel of vegetation class is on a nDSM blob, it is considered as a tree pixel. 5.2 Evaluation of Missing Road Sides and Gap Bridging 95

Firstly, the RSC is checked with the nDSM in object space. The two sides of the RSC form a quadrilateral on the nDSM. We collect and count the number of pixel of ground objects, trees, and buildings. The nDSM blobs contain buildings and trees, they can be separated with the image segmentation data, i.e., if a pixel is on a nDSM blob, it is projected into the image space to verify whether it is a tree pixel or a building pixel. We then compute a meas- ure that the RSC is on the ground. Suppose there are totally M pixels in the quadrilateral, the measure is computed as

M S = 1 Ð ------b- (5.8) ground M

M b is the number of pixel of buildings. It is required that there should be no buildings between the sides of RSC. However, trees are allowed, because some of the missing road sides are caused by tree occlusions (tree leaves above the road). The evaluation with the image segmentation data is conducted in image space. The two sides of a RSC in the image also form a quadrilateral. In order to accept the RSC as a road, there should be no grass found in the quadrilateral, because most of roads are grass free. If shadow is found in the quadrilateral, each shadow pixel has to be verified whether it is on the ground or not by projecting it onto the nDSM. For each quadrilateral, the pixels of road, vegetation, tree, ground shadow and building are collected, and the degree that the quadrilateral belongs to a road is computed as

N ++N N S = ------r s t (5.9) image N

N is the number of total pixels in the quadrilateral,N r ,N s andN t are the numbers of pixels of road, ground shadow and tree respectively. This evaluation is carried out in the left and right images separately, the larger one is then taken as an attribute of the RSC. In order to treat dirty roads where a central strip is present, the grass pixels are also collected. It is required that the cluster of the grass pixels should have a similar orientation to the quadri- lateral, and the distances between the cluster and the edges of the quadrilateral should be similar.

Beside the measures ofS ground andS image , each type of RSC also receives specific meas- ures to assess the reliability. The criteria for the measures are listed in Table 5-2 according to the type of the RSC. In the following, the definition and computation of the measures are described. 96 5. 3-D ROAD RECONSTRUCTION

Type of RSC Criteria for Specific Measures T1, T2 • Extension should not be too long related to the PRSP • Adjacent PRSPs should be collinear and have similar widths T3, T4 • RSC should not be too long related to the longer PRSP. T5, T6, T7, T8, T9 • A RSC close to and collinear or curvilinear with a PRSP is more reliable

Table 5-2. Criteria of specific reliability measures for different types of RSC.

The length ratio between the PRSP and the extension is taken into account to assess the reli- ability of the RSC. A short extension from a long PRSP is more reliable than a long exten- sion from a short PRSP.

lPRSP Sext = ------(5.10) lPRSP + lext

WherelPRSP andlext are the lengths of the PRSP and the extension respectively. For the RSCs of type T3 and T4, the two adjacent PRSPs at the both sides of the RSC should have the similar width, and be collinear in object space. A measure for this purpose is defined as t Ð ∆w t Ð ∆α w ⋅ α Slink1 = ------(5.11) tw tα where∆w and∆α are the width difference and orientation difference between the two adjacent PRSPs.tw andtα are the maximum allowable width difference and orientation difference respectively. They are set to 1.5m and 15.0 degree in our system. In addition, a measure similar to (5.10) is also applied to the RSCs of type T3 and T4, i.e, a shorter RSC with longer collinear PRSPs is more reliable. The measure is defined in (5.12) as the ratio of the length of the longer PRSP to the sum of the PRSP and the RSC.

lPRSP Slink2 = ------(5.12) lPRSP + lRSC In case of a RSC generated by hypothesising missing road side using 3-D/2-D edge, the RSC is checked whether it is close to and curvilinear with a PRSP. An isolated RSC is less reliable than a RSC close to and collinear with a PRSP. This reliability measure is defined as 5.2 Evaluation of Missing Road Sides and Gap Bridging 97

t Ð l t Ð ∆α l ⋅ α Sshift = ------(5.13) tl tα l is the distance between the RSC and its nearest PRSP, and∆α is their orientation differ- ence.tl andtα are the predefined values, they are set to 20.0m and 45 degree. The measure defined in (5.13) is also applied to the RSCs of types T7 and T8.

Finally, a total measure,S RSC , is computed for each RSC based on the measures defined in (5.8) - (5.13). It is computed as a product ofS ground ,S image , and its specific measures. Similar to PRSPs, each RSC is described by a pair of 3-D straight parallel lines, and holds a set of attributes: • coordinates of the start and end points of the 3-D parallel lines

• coordinates of the center line of the 3-D parallel lines

• length and width of the RSC

• measure SRSC • means of edge gradient magnitudes and orientation of visible edges

5.2.4 Gap Definition and Evaluation

This section describes the procedures to form and evaluate gaps between neighbouring PRSPs1. A gap between a pair of neighbouring PRSPs that belong to a road, actually rep- resents a road part where no road side is available in images. Several methods have been proposed to define and bridge gaps in literature. Heipke et al. (2000) search for local con- text, e.g., row of trees, while in Mayer et al. (1997), gaps are closed by Ziplock Snakes. Other methods first generate the gaps, then the gaps are checked using geometric and pho- tometric information (Baumgartner et al., 1997). This method is also used in our system.

5.2.4.1 Gap Definition

Usually, a gap is formed by directly linking the two neighbouring PRSPs. In our system, the shape of the VEC25 road corresponding to the gap area is also used to form the gap. In the following, the gap formed by direct linking is called M1Gap, and M2Gap for the one formed by using the VEC25 road shape. An example of M1Gap is shown in Figure 5-22(a). It is formed by directly linking the end point of P and the start point of Q. P and Q are neighbouring PRSPs. Thus, the gap is closed by a straight line. M2Gap is generated by adapting the shape of the VEC25 road. This is shown in Figure 5-22(b).

1. From this section, we use the same term “PRSP” to represent the PRSP and RSC, since both of them are road primitives, geometrically described by a pair of 3-D parallel lines, and containing a set of attributes. We only distinguish them where necessary. 98 5. 3-D ROAD RECONSTRUCTION

Q Cr α c P

α v Vr Q C (a) r

α q1 P c

p2 α V v r IV q’1 p’2 II III (b)

Figure 5-22. Gap definition. The PRSPs and their centerlines are represented as black lines. The grey lines are the VEC25 roads. The dashed black lines are the formed gap α α areas, the centerlines of the gaps are shown as dashed black lines.v andc are the turning angles of the VEC25 road and the formed gap respectively. (a) gap formed by directly linking PRSPs P and Q, (b) gap formed by adapting the shape of the VEC25 road. ( ,,) (),, Firstly, points p2 x p y p z p and q1 xq yq zq are projected onto the VEC25 road seg- ′ ′ ments, the projections are labelled asp2 andq1 respectively. The coordinate differences ′ ′ dxp ,dyp between p2 andp2 ,dxq ,dyq between q1 andq1 are computed. Then, the ′ ′ VEC25 road vertices betweenp2 andq1 are shifted accordingly. The shift values are given by δ λ() xi = dxp + dxq Ð dxp δ λ() (5.14) yi = dyp + dyq Ð dyp where i = II, III, IV..., and p ′i λ 2 = ------′ ′ ′ -′ (5.15) p2 II+++ p2 III p2 IV IVq1

Thus, the shifts of vertex II is mainly influenced bydxp anddyp . This influence decreases ′ with the vertex close toq1 , and the shift values are gradually dominated bydxq ,dyq . Note that the shifts are conducted on the planimetric plane. The heights of the shifted ver- tices can be computed by the following procedures. Firstly, p2 and q1 are projected onto the

DTM, the DTM heightsz pdtm andzqdtm for p2 and q1 are obtained, and the differences dzp betweenz p andz pdtm ,dzq betweenzq andzqdtm are computed. Then, each vertex is projected onto the DTM to obtain a DTM height, this height is further shifted according todzp anddzq , and will be used as the height of the vertex. The shift value is determined similarly as in (5.14) and (5.15). 5.2 Evaluation of Missing Road Sides and Gap Bridging 99

It is easily to understand that M1Gaps are efficient on straight roads with short gap length, while M2Gaps might be more useful for long and curved occlusion areas. Whether M1Gap or M2Gap will be used to close a gap is determined by a evaluation process.

5.2.4.2 Gap Evaluation

Firstly, the shape of the formed gap should approximately comply with that of the VEC25 road. The degree that the gap complies with the VEC25 road is measured based on the dif- ferences of their turning angles. It is defined as ∆α (5.16) S gs = 1.0 Ð ------T ta ∆α T ta is a predefined value, it is set to 45 degree in our system, is the average difference of the turning angles, computed as n 1 ∆α = --- ∑ α Ð α (5.17) n c v i i = 1 α α n is the number of the vertices,v andc are the turning angles of the VEC25 road and the formed gap respectively. Since M1Gap is straight, in order to compute∆α , the vertices of the VEC25 road in the α α gap area are projected on the gap. The definition ofc andv can be found in Figure 5-22. In addition, the slope difference between the consecutive segments should be in a certain range. A measure related to the slope difference in the gap area is given by n 1 T Ð ∆β S = --- ∑ ------sd i- (5.18) gv n T i = 1 sd ∆β where is the slope difference between the neighbouring segments,T sd is the maximum allowable slope difference. The formed gap is back projected onto images and is further evaluated in image space. In order for the gap to be accepted as road, one of the following conditions has to be satisfied: (1) within the gap is a road region, or (2) within the gap is a shadow region or shadow mixed with road region, or (3) the gap is caused by tree occlusion, or (4) road marks are extracted within the gap. For the evaluation of the gap, the procedures introduced in Section 5.2.3 to evaluate RSCs with image segmentation data and nDSM are applied, and a measure is computed as

N ++N N Ð N Ð N S = ------r s t b g- (5.19) gi N 100 5. 3-D ROAD RECONSTRUCTION

whereNN is the total number of pixels in the gap,r ,N s ,N t ,N b ,N g are the numbers of pixels of road, shadow, tree, building and grass respectively.S gi yields value 1.0 when the gap is a road. It takes negative value in the case when the gap is dominated by grass or build- ings. To demonstrate the effectiveness of this evaluation process and the necessity to use multiple cues, an example is given in Figure 5-23. The dashed yellow lines in Figure 5-23(a) define the gap area between a pair of PRSPs. This area is found to be a mixture of road and shadow pixels from the image segmentation result as shown in Figure 5-23(b). From Figure 5-23(c) where the detected above-ground objects are presented in white superimposed on the image, it is observed there is no above-ground objects over the gap. Thus, the shadow pixels in the gap as well as the gap itself are confirmed to belong to the road.

a b c

Figure 5-23. nDSM data confirms shadow pixels belonging to a road. (a) The dashed yellow lines define the gap between two PRSPs, (b) image segmentation result shows that the gap area is a mixture of road and shadow, (c) The shadow pixels in the gap area are on the ground. The above-ground objects are shown in white superimposed on image. Finally, the link probability of a pair of PRSPs evaluated by the gap is defined as ⋅⋅ S g = S gs S gv S gi (5.20)

S g equal to 1.0 means that the two adjacent PRSPs should be linked. With the value decreasing, the link probability is getting lower. WhenS g is close to 0 or takes negative value, e.g., the gap is on the buildings or on a grass field, the two PRSPs should not be linked. The above evaluation process are conducted for both M1Gap and M2Gap. The one with largerS g value is taken. 5.3 3-D Road Segment Linking for Road Reconstruction 101

It should be noted that since the gap is constructed either by a direct linking of PRSPs or by adapting the shape of the VEC25 road, it may not always correctly reflect the missing part of the road. Thus, in our implementation, if a gap is very long, for example, longer than

120m, even if theS g is high, a low reliability will be assigned to the gap, indicating a manual check by an operator is needed after the road extraction.

5.3 3-D Road Segment Linking for Road Reconstruction

With the extracted PRSPs, the road is reconstructed by linking the PRSPs belonging to the road. Although the PRSPs are evaluated, false hypotheses might exist. Therefore, the goal of linking is twofold. First of all, the extracted road segments should be aggregated as far as possible to complete the road extraction. This implies that the PRSPs belonging to the road should be selected and connected, and the gaps should be bridged by the linking algo- rithm. Secondly, this also implies that the PRSPs not belonging to the road should be rejected. Therefore, the algorithm must be very selective in which PRSP it adds to the road. In the following section, the PRSPs are firstly aligned in correct order with the aid of the VEC25 road. We then introduce the method to link the PRSPs in Section 5.3.2. The road is found by searching for an optimal path that has maximal length while having a similar shape to the VEC25 road.

5.3.1 Align PRSPs with aid of the VEC25 Road

In order to link PRSPs, firstly they must be aligned in the correct order so that the end point of one PRSP is linked with the start point of the next PRSP. Usually, this can be done by checking the distances as shown in Figure 5-24(a). P and Q are two PRSPs, the black lines 1 p1p2 and q1q2 are their centerlines. Since in this case dist(p2,q1) is shorter than dist(p2,q2), a decision is made that P will be linked with Q by connecting p2 and q1.

Q q1 q1 q2 Q q2 p2 P p2 P

p1 p1 (a) (b)

Figure 5-24. Align PRSPs using distance criterion. P and Q are PRSPs, their centerlines are represented as black lines. The dashed lines show the distances between the vertices of the centerlines of P and Q. (a) P and Q are correctly aligned using the distance criterion. (b) Distance criterion does not always work well in occluded curved roads. The correct road centerline is represented as thin black line. P and Q will be wrongly aligned if the distance criterion is used.

1. dist(x, y) stands for the distance between point x and y. 102 5. 3-D ROAD RECONSTRUCTION

This usually works fine; but problems might occur on curved roads. An example is given in Figure 5-24(b). The road (in black) is occluded between p2q1. Since here dist(p2,q1)is longer than dist(p2, q2), a wrong decision will be made if the distance criterion is used. This problem is avoided in our system with the aid of the VEC25 road. The VEC25 road is represented as sequential straight segments, each segment is defined by two adjacent verti- ces. In Figure 5-25(a), the VEC25 road is shown in grey line with its segments numbered, the black solid circles represent its vertices, I, II, III ... are the vertex numbers. As in Figure 5-24, the black lines p1p2 and q1q2 are the centerlines of two PRSPs. We first check in which VEC25 road segments the points p1,p2,q1,q2 are located. This can be done by pro- jecting the points onto the nearest VEC25 road segments. With this information, to align PRSPs becomes easy and reliable. For example, since p1 is located in segment 1, and p2 is in segment 2, thus the start point and the end point for PRSP P are assigned to p1,p2 respec- tively. Similarly q1,q2 are the start point and the end point of Q. In case that both end points of a PRSP are in the same segment of the VEC25 road, the alignment can be done by checking the distances between the segment vertex and the pro- jections. This is illustrated in Figure 5-25(b). Since dist(II,p’1) is shorter than dist(II,p’2), the start point and end point for P are assigned to p1,p2 respectively. Similarly the start and end points of Q can be determined.

III 2 3 IV II p2 q1 4 1 Q P q2 V I p1 (a) III q’ q’2 II p’1 p’2 2 I IV q Q q2 p1 P p2 1 (b)

Figure 5-25. Align PRSPs using the VEC25 road. The black lines are the centerlines of PRSPs, the VEC25 road is represented as grey lines. (a) P, Q are in different VEC25 road segments, (b) P and Q are in the same VEC25 road segment. 5.3.2 Road Segment Linking for Road Reconstruction

After the PRSPs are aligned and the gaps between them are evaluated, the next work is to link the PRSPs to extract the road. The linking process is straightforward. It is based on the best-first method and runs iteratively. Linking starts from both ends of a given PRSP. Linkable candidates are searched for in its neighbourhood. The candidates should have similar widths to the PRSP. The width similar- ity measure is defined by 5.3 3-D Road Segment Linking for Road Reconstruction 103

∆w Sw = 1 Ð ------(5.21) T w ∆ wherewT is the width difference,w is the maximal allowable width variation. This value is set to 1.5m in our current implementation. If there is one candidate whose start point is the same as the end point of the PRSP, a direct link is made. Otherwise, the one with the largestS g value is linked. The linking is then repeated on the added PRSP until no more PRSP can be added. This results in a chain of PRSPs with gaps closed. The above linking process is then conducted on the remaining PRSPs. Thus, finally several chains may be obtained. The longest one is taken as the result for the extracted road. It is worth noting the following on the selection of starting PRSPs. In our implementation, the PRSPs are ranked according to their attributes. The priority is given to those which are long, with strong edges, and the edge orientations pointing outward to the bounded region defined by the PRSP. The rank value is computed by

Mag l ⋅ ------R prsp = S prsp + ------⋅ - (5.22) 2.0 tedge 30.0

S prsp is defined in (5.5), Mag is the gradient magnitude of the sides of the PRSP,tedge is the threshold used to extract edge pixel in edge detection,l is the length of the PRSP. Furthermore, the PRSPs with low reliabilities are not suitable as starting PRSP. For exam- ple, RSCs will not be used as starting PRSP for grouping. The linking process introduced above can be implemented in a more general manner by a linking function. The definition of linking function is based on the following consideration. Firstly, the extraction result should be as long as possible by aggregating road segments and bridging gaps. Secondly, since the shape of the VEC25 road is generally correct, the resulted path should possess a shape similar to the VEC25 road.

Suppose Pi is a PRSP, Pi+1 is an adjacent PRSP of Pi, the function is defined as

(), ()⋅ i ⋅ ⋅ i + 1 ⋅ LPi Pi + 1 = ∑ li S prsp + lii, + 1 S g + li + 1 S prsp Sw (5.23) whereli ,li + 1 are the lengths of the center lines of Pi and Pi+1,lii, + 1 is the gap length between them. The above measure is a sum of the weighted lengths of the PRSPs and gaps. It is obvious that the function gives high measures to long curves with a shape similar to the VEC25 road. Furthermore, if Pi+1 is determined byS g as the best linking candidate of Pi, e.g. S g is close to 1.0, then the function is served as local grouping. Thus, the defined function pos- sesses the properties of local grouping and global optimization. 104 5. 3-D ROAD RECONSTRUCTION

From the definition of the linking function, the goal now is to find a subset among all PRSPs that maximizes the linking function. The subset that provides the maximum value is the extracted road. Similar to the local grouping described above, we start from a PRSP with a high rank value, candidates in its neighbour are searched for. The difference to the local grouping is that no decision is made at this stage as which PRSP will be connected. Instead, each candidate searches for its successors in its own neighbourhood. Finally, a trees is con- structed at each side of the PRSP (see Figure 5-26).

...... P ......

Figure 5-26. Trees generated at the both sides of a PRSP P. The PRSPs are represented as black dots. The maximum and the subset can be found by searching the trees, and comparing the obtained values. Since the PRSPs are evaluated by the accumulated cues, there are very few false hypotheses among them. This is true, especially for roads in rural areas. Thus, the number of combination is not high. However, the method has been implemented using a multistage optimization approach, as proposed by Sha’ashua and Ullman (1988) and Li (1997).

5.4 3-D Road Reconstruction using Road Marks

Highways, first class roads and most second class roads usually have road marks on them. These roads can be extracted using the detected road marks and zebra crossings. In rural areas, the extracted road using road marks can also be used to verify the extraction results using PRSPs. In complex areas, such as in the city or city center, the road sides are generally occluded very much, and sometimes, it is impossible to identify them. However, some of these roads can be successfully extracted by exploiting road marks. In Section 4.4, the algorithms for road mark detection and zebra crossing have been intro- duced. The road marks are detected as thin bright lines. Thus, in addition to road marks, other objects appearing as thin bright lines in the image are also extracted. These objects must be avoided for road extraction. First of all, the line extraction can be conducted in the regions that are in the VEC25 road error buffer, on the ground (as determined by the nDSM), and belong to the road class (as determined by image segmentation). Secondly, lines with orientations significantly different from that of the VEC25 road are discarded. Finally, lines with large slopes are removed. The methods used are the same as introduced in Section 5.1.1. 5.4 3-D Road Reconstruction using Road Marks 105

The remaining lines will be further checked and verified by other cues. The verified road marks are then linked to extract road. These procedures are detailed in Section 5.4.1and Section 5.4.2 respectively.

5.4.1 Evaluation of Road Marks

In this section, we describe the procedure to evaluate the extracted thin lines. From this process, the road marks are verified, and the verified road marks are classified according to their position on the road, i.e., whether they are in the center of the road or at the borders of the road. For each line, two regions are constructed on its both sides in image space (see Figure 5- 27). The regions are constructed parallel to the line, and have the same length as the line. The widths of the regions are set to 8 pixels (around 2.0m in object space). The rational for this value is that highways, first and second class roads are wider than 4.0m. If zebra cross- ings are present in the road images and are extracted, the widths of the extracted zebra crossings are used.

1 width 2

Figure 5-27. Definition of regions beside road marks. The road marks are represented as black line, the widths of the regions are around 2m in object space. The pixels of road, vegetation, shadow and building are collected and counted in each region. Then, the measure as defined in equation (5.4) is computed. If the values of this measure in both regions are close to 1.0, the line is accepted as a road mark, and it is in the center of the road. Otherwise, if only one region demonstrates this property, it is considered as a road mark but has to be further checked whether it is in the center or at the road border. This is achieved with the method introduced in Section 5.2.3. We compute a measure ⋅ Ser = S ground S image for this region (S ground andS image are defined in equation (5.8) and (5.9)). If the value ofSer is close to 1.0, this road mark is considered as possibly in the road center. Otherwise if grass is found in the region, the road mark is at the road border.

If the both regions of a line are questionable, theSer values for the regions are computed. Whether the line is a road mark or not is determined by following rules: (1) grass found in both regions, it is not a road mark;

(2) grass found in one region, whileSer equals to 1.0 in the other region, it is a road mark at road border; 106 5. 3-D ROAD RECONSTRUCTION

(3)Ser values in both regions equal to 1.0, it is a road mark in the center of the road. The next step is to align road marks with the aid of the VEC25 road using the method intro- duced in Section 5.3.1. The road marks at road borders are then further classified whether they are at the left or right border of the road. For example in Figure 5-28, the region 2 of road mark P is road while region 1is not, thus P can be determined at the left border of the road. Similarly road mark Q in Figure 5-28 is at the right border of the road.

P V 1 Q 1 2 2 IV

III

Figure 5-28. Classify road marks at road borders. The grey line is the VEC25 road. The region 2 of road mark P is road while region 1is not, thus P is at the left border of the road. Similarly road mark Q is at the right border of the road. 5.4.2 Road Mark Linking for Road Reconstruction

The road marks should be linked for road extraction. The method developed in Section 5.3.2 for PRSP linking is employed. The road marks in the center of the road, at the left border, and at the right border are linked separately. In order for a reliable linking, the area between neighbouring road marks are checked and evaluated. Firstly, a quadrilateral is constructed as shown in Figure 5-29. P and Q are neigh- bouring road marks in the center of a road, pe is the end point of P, qs is the start point of Q. Two lines are generated from pe and qs, perpendicular to P and Q respectively. The two lines thus define a quadrilateral. The method used to evaluate gaps between neighbouring PRSPs is applied here on the formed quadrilateral.

Q qs

pe P

Figure 5-29. Quadrilateral generation for evaluation of road mark linking. Finally, the road marks are linked using the method introduced in Section 5.3.2. When a zebra crossing is present, the grouping and linking start from the road mark that is closest 5.5 Road Junction and Road Network Generation 107 to the center of the zebra crossing, and has similar orientation with the short axis of the zebra crossing. If a road is extracted both by road marks and PRSPs, both results are com- pared to verify the extraction.

5.5 Road Junction and Road Network Generation

Road junctions are important features of the road network. However, it is even more diffi- cult to model and extract road junctions from images than roads. In our system, we do not try to model the road junctions; instead, we reconstruct junctions based on the extracted roads, i.e. road junctions are generated by intersecting the extracted roads. This is illus- trated in Figure 5-30, where the grey lines represent the extracted road center lines and the solid circle is the generated road junction by intersection of roads. Once the junction point is obtained, the extension areas (the dashed lines) of the extracted roads are checked. If these extensions are considered as roads, then the reconstructed junction is accepted.

l2

J l p 1 2 l3

p1

Figure 5-30. Junction generation by road intersection. The grey lines l1, l2, l3 are the extracted roads. The junction is determined by the intersection of l1, l2, l3 in object space. The knowledge which roads are needed for the reconstruction of junction J comes from the VEC25 data. As described in Chapter 3, we inherit the road network topology from the VEC25, i.e., for each road junction, the information about how many and which roads con- nect to it are recorded. In the following, the algorithm to compute the junction coordinates is given.

Suppose p1, p2 are the last two points on the center line of the extracted road l1, their coor- (),, (),, {},, dinates arex p1 y p1 z p1 andx p2 y p2 z p2 . The vectora1 b1 c1 forp1 p2 is then { , , } given byx p2 Ð x p1 y p2 Ð y p1 z p2 Ð z p1 . Suppose the coordinates of the junction are ()xyz,, . Then, the following equation can be obtained

xxÐ p2 yyÐ p2 zzÐ p2 ------===------t1 (5.24) a1 b1 c1 i.e.

xx= p2 + a1t1 y = y p2 + b1t1 (5.25) z = z p2 + c1t1 they can be written in matrix form as 108 5. 3-D ROAD RECONSTRUCTION

AXÐ L = 0 (5.26) where

x 100Ða1 x p2 y A ===010Ðb , X , L y 1 z p2 (5.27) 001Ðc1 z p2 t1

For each road connecting to J, we can derive a equation as (5.26). Finally, the coordinates of J as well as the unknownsti are given by

t Ð1 t XA= ()A A L (5.28)

The proposed method for junction generation is simple and fast, it can handle most of road junctions in rural areas. However, it might create errors in cases of complex roads, e.g. a circle junction. Thus, further investigation to model the road junctions is needed. With extracted roads and road junctions, the road network is obtained. This 3-D road net- work data is saved in 3-D Arc/Info Shapefile format that is readily imported by existing GIS software. The results inherit other attributes from the VEC25 data with the road lengths updated, and road widths appended.

5.6 Results and Discussion

The described system has been implemented as a standalone software package with graphic user interface running on the SGI platform. We have tested the system on different ATOMI datasets in various landscapes and terrains. In the test sites, almost all road types in Swit- zerland can be found. We also tested our road extraction system on a Belgium dataset pro- vided by the National Geographic Institute (NGI). In the following, the typical results of the road extraction in the test sites of Switzerland and Belgium will be given.

5.6.1 Results of Road Reconstruction from Edges

Figure 5-31shows the reconstructed roads in rural areas using edges in the test sites in Switzerland. The VEC25 roads are shown in yellow, the white lines are the straight edge segments after the edge deduction process, the blue lines are the found PRSPs, the extracted road centerlines are presented in pink. The results show that the developed system works very well. Generally, roads in rural areas are correctly and completely extracted by our sys- tem. 5.6 Results and Discussion 109

a b

c d

e f

Figure 5-31. Road extraction in rural areas using edges in the test sites in Switzerland. The VEC25 roads are shown as yellow lines, the white lines are the straight edge segments after the edge deduction process, the blue lines are the found PRSPs. The extracted road centerlines are presented as pink lines. (a) and (b) normal roads in rural areas, (c) a path with a strip of vegetation at the center, (d) a 2_class road, (e) a road whose sides do not exist at the turning area, (d) a curved road. The results of the reconstructed roads in the test sites of Belgium are shown in Figure 5-32. We did not change anything in our system except the image segmentation procedure. With the black and white image, we only cluster a single band data to try to find road regions. Most of the roads in rural area are correctly reconstructed. 110 5. 3-D ROAD RECONSTRUCTION

Figure 5-32. Road extraction in rural areas using edges in the test sites in Belgium. The edge based method is also tested to reconstruct roads in suburban and urban areas. Sev- eral results are shown in Figure 5-33. It can be seen that the performance of the system is not stable in urban areas. Some roads are correctly reconstructed (see images (a) and (b) in the figure), however many roads are only partially extracted (images (c) and (d)), or even totally missing (e). On these images, the road edges are not well defined or too fragmented, thus only few or no 3-D parallel edges belonging to the roads can be obtained from the images. Therefore, although some or all road regions are found by the system, a precise delineation of road sides or road centerlines cannot be obtained. (f) and (g) show that our system also fails to extract some roads in villages in the Belgium test sites. It can be pred- icated that the performance of the system will decrease when working in city centers. 5.6 Results and Discussion 111

a

b

c c

e d

f g

Figure 5-33. Road extraction using edges in suburban, urban areas in the test sites in Switzerland, and in villages in the test sites in Belgium. See text for explanations. 112 5. 3-D ROAD RECONSTRUCTION

5.6.2 Results of Road Reconstruction using Road Marks

Figure 5-34 shows the examples of road reconstruction using road marks. The roads in the image belong to highway, 1_class or 2_class. Roads in (a) and (b) in this figure are in rural area. Both road centerline and border lines are extracted. These roads can also be extracted using edges (see Figure 5-31(d) for road (b)).

a

d

b

c e

Figure 5-34. Road extraction using road marks in the test sites in Switzerland. See text for explanations. The advantages of road extraction based on road marks can be found in the applications in complex areas, such as in urban areas. Since the primitives used for road reconstruction in 5.6 Results and Discussion 113 this method are road marks, thus even if the road sides are not defined or totally missing, the road center lines can still be correctly and reliably extracted as shown in (d) and (e). Image (c) shows the results of a highway, the lane border lines are extracted. In the Belgium test sites, only few roads are first class or second class roads. Because the image does not have good quality, the road marks are not recognizable. One example of a first class road image is shown in Figure 5-35(a). It is difficult even for a human to identify the road marks. Although a Wallis filter has been applied to enhance the road marks (see (b)), only few road marks are extracted (c), the road thus cannot be extracted.

a b c

Figure 5-35. Road marks are not extracted on a first class road in the Belgium test site due to poor image quality. The road is thus not extracted. (a) Road image, (b) The Wallis enhanced image, showing the difficulty to identify road marks, (c) only few road marks are extracted (the black lines). 5.6.3 Results of Road Junction and Road Network Generation

Figure 5-36 shows examples of road junction generation in the test sites in Switzerland. The yellow lines are the VEC25 roads. Obviously the quality of the reconstructed road junctions depends on that of the extracted roads. When roads connected to the junction are well extracted, the method delivers good results for junctions. Thus, in rural areas, almost all the road junctions can be correctly reconstructed. This is also confirmed by the experi- ment results in the test sites of Belgium as shown in Figure 5-37. Figure 5-38 shows the cases where road junctions cannot be reconstructed by the proposed method, because roads connected to the junctions are not completely extracted. 114 5. 3-D ROAD RECONSTRUCTION

Figure 5-36. Results of road junction reconstruction in the test sites in Switzerland. 5.6 Results and Discussion 115

Figure 5-37. Results of road junction reconstruction in the test sites in Belgium. 116 5. 3-D ROAD RECONSTRUCTION

Figure 5-38. Problematic cases of road extraction and junction generation. Yellow lines are the VEC25 roads, the extracted roads are shown as pink lines. The road junctions cannot be reconstructed by the developed method since the roads are not well extracted due to the complexity in the junction areas. With the extracted roads and road junctions, the road network is generated. Examples are shown in Figure 5-39 and Figure 5-40. Figure 5-39 contains the road networks in rural areas in the test sites in Switzerland, Figure 5-40 is a reconstructed road network in the Bel- gium test site. The scenes in Figure 5-39(a) and Figure 5-40 are generally flat, while the scene in Figure 5-39(b) is hilly. The results show that the performance of our system is not influenced by the terrain changes. 5.6 Results and Discussion 117

a

b

Figure 5-39. Reconstructed road networks in the test sites of Switzerland. 118 5. 3-D ROAD RECONSTRUCTION

Figure 5-40. Reconstructed road network in the test site of Belgium. 6. PERFORMANCE EVALUATION

This chapter discusses the performance evaluation of the developed road extraction system. Internal self-diagnosis and external evaluation of the obtained results are of major impor- tance for the relevance of automatic systems for practical applications. However, only rel- atively little work has been carried out in this area, which is briefly reviewed in Section 6.1. In Section 6.2, several factors which contribute to the internal self-diagnosis are discussed. The measures for self-diagnosis should take into account the reliability of the generated road primitives as well as the consistency between them. External evaluation is conducted by comparing the extracted roads with manually measured reference data. The quality measures to assess the extraction results are defined in Section 6.3. Finally, in Section 6.4, quantitative evaluation results in two test sites are presented.

6.1 Review of Related Work

An internal evaluation can be based upon the traffic light paradigm (Förstner, 1996): a green light stands for a result found to be correct as far as the diagnosis tool is concerned, a yellow light implies that further checking is necessary, and a red light means an incorrect result. Ruskone (1996) develops two algorithms to assess the reliability of the extracted road seg- ments. The first one is realized through a supervised learning for different kinds of objects (not only roads) in different kinds of contexts to classify each road hypothesis and to com- pute a probability of belonging to the road class for the road hypothesis. The other one is based on vehicle detection with a neural network classifier to confirm the road existence. In the algorithm of LSB-Snakes developed in Grün and Li (1997a, 1997b), the occluded road areas are automatically detected in four overlap images. This is done by using iterative weight functions, in which the weights of observations (photometric and geometric) in the adjustment are related to the ratio of their residuals and the variance factor. The algorithm starts with a flat weight function. After a certain number of iterations the weight function becomes steep so that the weights of the observations with big residuals get smaller, thus the influence of blunders is reduced, and the results of the algorithm are not affected by occlusions or shadows. Wiedemann and Hinz (1999) first divide the extracted road network into independent road sections at road junctions and high curvature points. Then, the internal evaluation for the road sections is conducted by a fuzzy fusion of some quality measures, such as length, mean curvature, length of gaps. The method is further extended in Hinz et al. (2002) by introducing a global road network quality measure. This can help an user in deciding quickly if the whole scene has to be reprocessed. In addition, in order to classify the results following the traffic light paradigm based on the fuzzy values, a length-weighted histogram technique is applied. 120 6. PERFORMANCE EVALUATION

External evaluation is usually done in terms of geometric quality, completeness and correct- ness. The extraction results and the reference data are matched and compared with each other. The coordinate differences between the computed corresponding points on the two datasets are used to calculate mean, root mean squares and/or standard deviations. Com- pleteness measures the percentage of the reference data that is covered by the extracted roads, while correctness is the percentage of the extracted roads covered by the reference data. Airault and Jamet (1995) evaluate a semi-automatic road extraction method in terms of geo- metric quality of the result and of data capture time compared with manual data capture. The geometric quality is assessed by means, root mean squares and standard deviations. For this purpose, the extracted roads and the reference data are composed of two graphs. Each vertex in one graph is matched to the other graph by looking for the nearest neighbour. The time required by the semi-automatic system at different stages are recorded, and compared with that by manual data capture process to assess the effectiveness of the system. In McKeown et al. (2000), the evaluation is directed towards measuring the quality of semi- automatic road extraction with different levels of manual intervention. The reference data is generated by a procedure starting at manually selected points, followed by automatic road tracking and manual editing. Roads are extracted as regions, and matching of the extracted data with the reference data is carried out using an intersection operation. Only the completeness and correctness of the extracted data are further considered. Heipke et al. (1998) proposed a methodology for evaluation of automated road extraction algorithms. A set of quality measures are defined concerning correctness, completeness, quality, redundancy, RMS difference etc. In order to match both road networks, a buffer is constructed around the reference roads or the extracted roads. The buffer width is chosen as the half width of the roads. The parts of the extracted roads within the buffer of the ref- erence data are considered as matched extraction. This implies that a road is considered as correctly extracted if the extracted road axis lies between the road sides. The matching is performed in raster domain by converting both data into raster format. The proposed meth- odology is not only used to evaluate the performance of a single road extraction algorithm, but also applied to compare the performance of different road extraction systems. A similar evaluation scheme has been also used by Fischer and Bolles (1998) and Harvey (1999). In Fischer and Bolles (1998), the authors observed that the reference data might include ambiguous road-like entities captured by an operator. Such entities are not involved in the evaluation. Although both the extracted and the reference data are in 3-D in their work, the evaluation is performed in 2-D, and the matching is carried out in the raster domain using a buffer. The buffer width is defined based on the variation between the ref- erence models produced by different topographers. However, this value is not explicitly stated in their report. In Harvey (1999), two road tracking systems using manually supplied or automatically detected starting points are evaluated. In the work of Fischer and Bolles (1998) and Harvey (1999), only completeness and correctness have been considered. 6.2 Internal Quality Evaluation 121

Wiedemann (2002) uses a similar evaluation scheme as in Heipke et al. (1998) to assess the road junction extraction results. The extracted junctions and the reference junctions are matched by finding the nearest neighbour within a buffer. Then, completeness is defined as the ratio of the number of matched reference junctions to the number of all reference junc- tions, and correctness represents the percentage of the correctly extracted road junctions with respect to all extracted junctions. The RMS difference is also computed to assess the geometrical accuracy of the extracted junctions.

6.2 Internal Quality Evaluation

Internal evaluation refers to self-diagnosis of an automatic system. The internal evaluation process usually provides precision and reliability measures of the results. The ability of self-diagnosis is necessary for any object extraction system for practical application, since none of the systems is expected to always deliver perfect results. Usually, human operators are needed to check and edit the results returned by the extraction system. Thus, the relia- bility values of the results provided by internal evaluation will be of great help to assist and guide the operator during post-editing, because he can focus his attention only on the results with low reliability. Internal evaluation can be based on the traffic light paradigm as proposed by Förstner (1996). In order to compute internal evaluation measures, the necessary information that accounts for the precision and reliability of the partial reconstruction results should be col- lected, which is then combined to derive a single value. Thus, the first important point is the choice of criteria that permit internal quality evaluation. Then the criteria values have to be combined to compute the internal evaluation measures for a extracted road. The scheme of the combination should be devised such that either errors of commission (keep gross erroneous results) or errors of omission (exclude correct results) should be avoided. As a road is reconstructed by the extraction and connection of road primitives, including PRSPs, RSCs and gaps, the internal evaluation essentially combines the information of the reconstructed result as well as the primitives that make up the extracted road. Thus, we introduce two types of measures: an overall quality measure of the extraction result, and measures for the primitives. If a results does not pass the overall quality test, a further test is conducted to find in which segments the errors occur. The traffic light paradigm is then applied to each road segment. In addition to assess the reliability of each individual road, a road extraction system should also include measure to assess the reliability of the reconstructed road junctions. This might be done by comparing the distance between the reconstructed junctions and the junctions in the VEC25 road. Also, the image information and the nDSM data around the recon- structed road junctions can be used. This work is not done yet in this dissertation, and will not be included in the following. The overall quality of the extraction result can be obtained from the following criteria: (1) length difference between the extraction result and the VEC25 road 122 6. PERFORMANCE EVALUATION

(2) shape difference between the extraction result and the VEC25 road (3) total length of PRSPs related to the length of the extraction result (4) integrated quality measures of PRSPs, RSCs and gaps which make up the result (5) coordinate differences between the extraction result and the VEC25 road We thus can define a set of measures using these criteria to assess the overall quality of the extraction result. In the current implementation, due to limited time available, only meas- ures related to the first three criteria are defined and computed:

length of extraction m = ------(6.1) 1 length of VEC25 road

∆α m2 = 1.0 Ð ------(6.2) T s ∆α is the average absolute difference of the turning angles between the extraction result and the VEC25 road,T s is set to 45.0 degree.

length of PRSPs m = ------(6.3) 3 length of extraction

A low value ofm1 indicates that only part of road has been extracted by the system, and further investigation is required to check whether the rest of the road does not exist any more, or the system just cannot extract it. In the latter case, a manual operation is needed for the completion of the road extraction. This can also be assessed by the distance between the end points of the extracted road and the junctions of the VEC25 road. A large distance also indicates the need of inspection or manual extraction.m2 measures the shape similar- ity between the extraction results and the VEC25 road. Since the shape of VEC25 road is generally correct, a small value ofm2 usually implies that there might be errors in the extraction results.m3 measures the amount of the total PRSP length in the extraction result. A small value ofm3 means the result might not be reliable, because gaps dominate the result, and thus the road might be occluded too much. In the current system, we do not com- bine the three measures to a single value. Instead, in order to accept an extraction result as correct, all the values of the three measures should be larger than 0.70. If any one of them is lower than 0.3, it will be considered as an incorrect result. The other results are taken as uncertain. The above defined measures give the overall quality of the extracted results. Based on the values, they tell whether a road should be manually completed or edited. However, usually these measures are not enough for a practical system in road extraction. They cannot pro- vide information as in which segments actually the problems are. Thus, a further detailed investigation is necessary to check the components that compose the extracted road. We place emphasis on the gap areas in the result. As gaps represent the road parts that are occluded by neighbouring objects or shadows, parallel road sides or road marks in the gap 6.2 Internal Quality Evaluation 123 areas are invisible. In our system, the gaps are closed either by direct linking of neighbour- ing PRSPs or by adapting the shape of the corresponding VEC25 road, they may not always correctly reflect the missing parts of road. Thus, especially long gaps have to be checked by an operator. Furthermore, if there is a wrong gap in the extraction result, that usually means either the PRSP(s) connected by the gap is(are) wrong, or the two PRSPs should not be immediately linked. Therefore finding a wrong gap will also help to locate the wrong PRSPs related to it, thus facilitating manual correction. Based on this consideration, we define an internal quality measure using gap information as ⋅ ⋅ qw= s S gs + wg S gg (6.4) whereS gs is the measure defined in equation (5.16),S gg is the percentage of the non-grass, non-building pixels in the gap area.ws andwg are weights. If the shape of the existing road is reliable, the value ofws can be set larger thanwg . In current test,ws andwg are selected as 0.6 and 0.4 respectively. In order to classify the extraction results following the traffic light paradigm based on the value of q, two thresholdstu andtl are required, so that the results with q larger than tu are accepted, results with q less thantl are considered as incorrect and will be discarded, while those with q values betweentl andtu will be checked/verified by an operator. Thus, the selection oftu andtl is very important, especially fortu , because in real applications, the “correct” results determined by an automatic system should be really correct, so that the operator will not check them. On the other hand, a smaller valuetl might be appropriate. Although a smallertl will include some incorrect results in the class of the “yellow light”, however, since this class of results has to be checked by the operator, a smallertl will not have a negative influence on the production quality. In the current test in our system,tu and tl are selected as 0.75 and 0.40 respectively. The above internal evaluation method has been implemented and applied to the test dataset in the project ATOMI. In the following, several examples of internal self-diagnosis will be given. In the figures, the yellow lines are the VEC25 roads, the correctly extracted and accepted results are shown in green lines, while those that have to be further investigated/ verified, are represented in blue lines. The red lines imply incorrect results.

Figure 6-1shows examples of results using the overall quality measures. The values ofm1 , m2 andm3 of the result in (a) are 0.95, 0.80 and 0.99 respectively, the results is thus accepted by the system. In (b), them1 value is low (0.33), indicating that only part of the road is extracted, and will be checked by an operator whether the rest of the road does not exist any more, or the system just cannot extract them. The road shown in (c) are mostly occluded by trees, leading to a low value ofm3 (0.51). This result will be checked and ver- ified by an operator. 124 6. PERFORMANCE EVALUATION

a b c

Figure 6-1. Internal quality assessment through the overall quality measures. (a) accepted result (m1 ,m2 ,m3 )=(0.95, 0.80, 0.99), (b) and (c) results have to be checked/verified. In (b)m1 is low (0.33), only part of the road is extracted, while in (c)m3 is low (0.51), the road is occluded very much. In Figure 6-2, we show the quality assessment of the results using the defined gap-based quality measures. In (a) and (b), the gaps (between the two red dots) are not very long, the closed gaps have similar shape to the VEC25 roads. Thus, although the gap areas are occluded by trees and shadows, they still receive relatively large values of the quality measure. The q values for the gaps 1, 2, and 3 in (a) are 0.98, 0.95 and 0.93 respectively, and 0.87 for the gap in (b). They are therefore accepted as correct results. The gap shown in (c) are extremely long (127.3m), although its shape is similar to the corresponding VEC25 road, it has to be checked and verified by the operator. The gap in (d) is bridged by M1Gap (formed by direct linking of the neighbouring PRSPs, see Section 5.2.4.1), because in this case M2Gap (formed by adapting the shape of the corresponding VEC25 road) contains non-tree above- ground objects, which is not allowed in the extraction process. The value q of the gap in (d) is extremely low (0.32) due to the fact that it possesses a shape different from the corre- sponding VEC25 road. Thus, the result of the bridged gap is discarded, and the gap area should be manually extracted by the operator. 6.2 Internal Quality Evaluation 125

3

2

1

a b

c d

Figure 6-2. Internal quality assessment through the gap-based quality measure. (a) and (b) show short gaps (between the two red dots) with similar shape to the VEC25 road. The q values for the gaps 1, 2, and 3 in (a) are 0.98, 0.95 and 0.93 respectively, and 0.87 for the gap in (b). They receive high quality values, and are accepted. (c) a extremely long gap (127.3m) closed by adapting the shape of the corresponding VEC25 road. Although its shape is similar to the VEC25 road, it still need to be verified by an operator. (d) a gap with the shape different to the VEC25 road. It receives very low reliability (q=0.32). It is discarded, and the gap area will be manually extracted by an operator. For 1_class and 2_class roads, the assessment of the reliability of the extraction results in our system is also done through the comparison of the results by using parallel edges and by using road marks. If the extraction results by the two types of primitives are similar, then the extracted road centerlines are considered reliable. In contrast, if the two results differ, the system will give a low reliability to the result, and ask the operator to check. One exam- ple is shown in Figure 6-3. In (a) both the road marks and road edges are clearly visible, the road extraction results by using parallel edges and road marks coincide. Thus, the results are correct and accepted. This is not the case in (b), where the road are occluded very much by trees. The road is only extracted by using road marks. Thus, the system gives a low reli- ability to the result, indicating that it has be to verified by the operator. 126 6. PERFORMANCE EVALUATION

b a

Figure 6-3. Internal evaluation for roads with road marks. (a) The results from road marks and parallel edges coincide. The results have high reliability, and are accepted. (b) The road is extracted only by using road marks. The result has to be verified by an operator. 6.3 Scheme for External Evaluation

The external evaluation is conducted by comparing the extracted results with manually measured reference data. The difference between the two datasets are taken as the indica- tion of the performance of the road extraction system. The quality measures used in this work aim at assessing completeness and correctness as well as geometrical accuracy. Completeness measures the percentage of the reference data that is covered by the extracted roads, i.e.

length of matched reference completeness = ------length of reference (6.5)

Correctness is the percentage of the extracted roads covered by the reference data, it is com- puted by

length of matched extraction correctness = ------(6.6) length of extraction In order to evaluate the shape difference between the extraction results and the reference data, and detect the cases where the results wiggle around the reference data, a measure based on the differences of the turning angles of the two datasets will be suitable. As can be seen from previous sections, this criterion has been employed in the road extraction process and for the internal self-diagnosis. While it is quite effective as an indication of shape similarity on a single road, however, to define a measure based on this criterion that 6.3 Scheme for External Evaluation 127 will give a value to assess the shape quality for the whole road network is not done yet. In the following, this evaluation will not be included. In addition, the quality of the reconstructed road junctions should be assessed, too. This might be done by comparing the distance between the reconstructed road junctions and the junctions in the VEC25 road. This evaluation is not done yet in this dissertation, and will not be included. The geometrical quality is assessed by mean and RMS of the distances between the extracted road and the reference data. Thus, the vertices of a road in one dataset have to be matched on the corresponding road in the other dataset, and the geometrical measures are computed using the coordinate differences between the vertices in one dataset and their nearest neighbours on the other dataset. The RMS expresses the geometrical accuracy of the extracted road data. In this work, the external evaluation is conducted directly in object space in a semi-auto- matic manner. That is, the identification of corresponding roads in the two datasets is made by an operator, while the quality measures are computed automatically. Given a pair of corresponding roads in the two datasets, each vertex on the extracted road (ER) is projected onto the reference road (RR), the projection is taken as the corresponding point of the vertex. The computation for projection is shown in Figure 6-4. Assume A is a vertex on ER, l is a segment of RR. The the projection of A on l, A’, is obtained by the inter- section of l with the plane passing through A and perpendicular to l.

Z l A

A’

Y Φ

X O

Figure 6-4. Computation of the projection of a point onto a straight segment in object space. A is a point and l is a 3-D straight segment.Φ is the plane passing through A and perpendicular to l. A’ is the projection of A on l, obtained by the intersection of l withΦ . The geometrical quality can be calculated using the coordinate differences between the ver- tices and their correspondences. Note that we do not create a buffer around the reference data or the extracted road, all the vertices on the roads are used to compute the geometrical quality. 128 6. PERFORMANCE EVALUATION

A vertex on ER is considered as correct if and only if it satisfies the following conditions: (1) the distance between the vertex and its projection on a horizontal plane is less than 1.0m (2) the distance between the vertex and its projection in Z direction is less than 2.0m Here 1.0m and 2.0m are actually the required accuracy for road extraction in the ATOMI project. Based on the found correspondences of vertices, the lengths (distances between the two matched vertices) of matched extraction and matched reference can be computed. We describe below the procedures to compute the matched extraction. Suppose A, B are two vertices of a segment in ER, A’ and B’ are their projections on RR. We consider the following three cases to compute the length of the matched extraction for the segment AB. Case 1: both A and B are correct vertices (see Figure 6-5). If there is no vertex between A’ and B’ (Figure 6-5(a)), then AB is a correctly extracted road segment, i.e. matched extrac- tion. In case of vertices existing between A’ and B’, the vertices are projected on AB. If the vertices satisfy the above conditions (1) and (2) (see Figure 6-5(b)), AB is also a correctly extracted road segment. If there are vertices that violate the condition (1) or (2), this implies that only part of AB is correct (Figure 6-5(c)). Suppose there is only one such vertex C’ between A’ and B’. We search from C’ to A’ to find the first point P’ that satisfies the required conditions (1) and (2). Similarly Q’ is found in the segment of C’B’ from C’ to B’. The corresponding points of P’ and Q’ on AB is P and Q, then AP and BQ are parts of the correctly extracted road, while PQ is not.

A P Q B A B A B

A’ C’ B’ A’ B’ A’ B’ P’ Q’ C’ (a) (b) (c)

Figure 6-5. Compute matched extraction: case 1. Case 2: either A or B is incorrect, for instance the vertex A in Figure 6-6 is not correct. We search from A to B to find the first point satisfying the required accuracy (point C). We then treat segment CB as we do for AB in Figure 6-5 to find the correctly extracted road seg- ments between C and B. 6.4 Experiments and Results 129

A C B

Figure 6-6. Compute matched extraction: case 2. Case 3: neither A nor B are correct. If there is no vertex between A’ and B’ (Figure 6-7(a)), then AB is unmatched extraction. In case of vertices existing, the vertices are projected on AB. If the vertices violate the condition (1) or (2) (see Figure 6-7(b)), AB is also unmatched extraction. If there are vertices that satisfy the conditions (1) and (2), this implies that part of AB is correct (Figure 6-7(c)). Assume there is only one such vertex C’ between A’ and B’. We search from C’ to A’ to find the first point P’ that satisfies the required conditions. Similarly, Q’ is found in the segment of C’B’ from C’ to B’. The corresponding points of P’ and Q’ on AB are P and Q respectively, then PQ is a matched extraction.

A P A B A B Q B

C’ A’ P’ Q’ A’ B’ A’ C’ B’ B’ (a) (b) (c)

Figure 6-7. Compute matched extraction: case 3. The procedures to find the matched reference is similar, the only difference is that in the computation the segments of the reference data is checked whether they are in the buffer of the extracted road.

6.4 Experiments and Results

The proposed scheme for external evaluation has been used to evaluate the results of our road reconstruction system in two test sites in Switzerland and Belgium. In this section, the evaluation results and a detailed analysis are presented. The characteristics of the test scenes are shown in Table 6-1. Both scenes are in rural areas, the Swiss scene contains several small villages. 130 6. PERFORMANCE EVALUATION

Name of the test scene Test scene description Length of the refer- ence data (km) Swiss Scene hilly, open rural with villages 14.70 Belgium Scene flat, open rural 12.86

Table 6-1. Test scene descriptions.

Figure 6-8 and Figure 6-9 show the images, the reference data, and the results delivered by our road reconstruction system. In both figures, (a) is the scene image, (b) is the manually extracted reference data, and (c) the extracted roads by our system. Note that road extrac- tion and evaluation have been carried out only in rural areas. The evaluation results for the two test sites are listed in Table 6-2.

Quality Measures Swiss Scene Belgium Scene completeness 93% 97.6% correctness 96.3% 98.1% DX -0.08 0.04 Mean (m) DY 0.07 -0.11 DZ 0.19 0.27 DX 0.45 0.57 RMS (m) DY 0.310.72 DZ 0.62 0.89

Table 6-2. Quality measures. 6.4 Experiments and Results 131

a

b c

Figure 6-8. Extracted roads and reference data in the test site in Switzerland. Road extraction and evaluation are only conducted in open rural areas. (a) Image, (b) reference data, (c) extracted roads. 132 6. PERFORMANCE EVALUATION

a

b c

Figure 6-9. Extracted roads and reference data in the test site in Belgium. (a) Image, (b) Reference data, (c) extracted roads. Table 6-2 shows that our road extraction system delivers quite good and stable results. Almost all roads are reconstructed. Both correctness and completeness are very high. The geometrical accuracy in planimetry and height are better than 1m, fulfilling the accuracy requirements of project ATOMI. The completeness measure indicates that not all the roads are extracted, especially for the Swiss scene. This is because some roads in villages are not extracted as shown in Figure 6-10. 6.5 Summary 133

a b

Figure 6-10. Roads are not extracted in villages in the test site in Switzerland. Green lines are the reference data, extracted roads are represented as pink lines. It is worth to note that in the Swiss test site the highways are not evaluated. This is because the delineations of highways in the extracted results and reference data are much different (see Figure 6-11). In fact, there is no uniform manner to depict a highway: it is represented as a single line in maps of scale 1:25,000 (a); it is described by the lane border lines in our extracted results (b); while in reference data it is depicted by the road sides (c).

a b c

Figure 6-11. Different delineations of a highway in map, extracted results and reference data. (a) a highway represented as a single line in a map of scale 1:25,000, (b) a highway extracted by lane border lines, (c) a highway represented by road sides in reference data. 6.5 Summary

In this chapter, we discussed the issues of internal self-diagnosis and external evaluation of the obtained results. The choice of criteria for internal self-diagnosis is discussed. The internal evaluation should take into account the information of the reconstructed result as well as the primitives that make up the extracted road. We proposed two types of quality measures. An overall quality of the extraction result is assessed by comparison of the shape and length of the extracted road with that of the VEC25 road. We also use the components of the extracted road, especially the gaps, to evaluate the quality of the result, which can 134 6. PERFORMANCE EVALUATION tell, if the result is uncertain or incorrect, where actually the problems are. Furthermore, for roads with road marks, their quality are also assessed by comparison of the results achieved by using road edges and road marks. The tests show that the defined measures are effective to a certain extent. For the external evaluation, we used several quality measures to assess completeness and correctness as well as geometrical accuracy, and presented a method for the computation. The evaluation results in two test sites show that the proposed quality measures adequately capture the impression obtained when visually inspecting the extracted road data. Besides, the result of the external evaluation is an indication for the applicability of the road extrac- tion system: it can serve as an automatic road reconstruction tool in open rural areas. 7. CONCLUSIONS AND OUTLOOK

In this dissertation, a new system has been presented for the 3-D reconstruction of road net- works from stereo aerial imagery, which integrates processing of color image data and existing geodatabases. We have focused on the entire process chain from derivation of knowledge from an existing geodatabase, extraction of features and cues from images and height data, to the generation of road primitives, grouping of primitives for road extraction, reconstruction of road junctions and road networks, and evaluation of system performance. In this chapter, we first summarize the major achievements of the developed system in Section 7.1. Based on the discussions, conclusions are drawn in Section 7.2. Finally, in Section 7.3 recommendations for further research are given.

7.1 Summary

The most outstanding characteristics of our developed system are integration of existing geodatabase in image analysis and extraction and fusion of multiple cues to support road extraction. The system exploits the existing geodatabase to initiate the road extraction proc- ess. Roads are divided into various subclasses according to road type and land cover. Thus, the system uses existing knowledge, image context, rules and models to restrict the search space, treats each road subclass differently, checks the plausibility of multiple possible hypotheses, and derives reliable criteria, therefore provides reliable results. The system contains a set of feature extraction tools to extract various cues about road existence, and fuses multiple cues and existing information sources. This fusion provides not only com- plementary, but also redundant information to account for errors and incomplete results. Working on stereo images, the system makes an early transition from 2-D image space to 3-D object space. The road hypothesis is generated directly in 3-D object space. This not only enables us to apply more geometric criteria to create hypotheses, but also largely reduces the search space, and speeds up the process. The hypotheses are evaluated in 2-D images using accumulated knowledge information. Whenever 3-D features are incomplete or entirely missing, 2-D information from stereo images is used to infer the missing fea- tures. By incorporating multiple knowledge, the problematic areas caused by shadows, occlusions etc. can be often handled. Based on the extracted roads, road junctions are gen- erated. The reconstructed 3-D road networks are saved in Arc/Info Shapefile format, thus they can be readily imported into existing software packages for practical use. Many tests of the system have been carried out using various datasets in different land- scapes. The experiments show that more than 93% of the roads in rural areas are correctly and reliably extracted by the developed system, and the achieved accuracy of the road cen- terlines is better than 1m both in planimetry and height. This indicates that the developed 136 7. CONCLUSIONS AND OUTLOOK system can serve as an automatic tool to extract roads in rural areas for digital road data production.

7.1.1 Summary of Feature and Cue Extraction

An important component of the developed system is feature and cue extraction. Several sophisticated data processing techniques have been developed, and many experiments were performed for extracting features useful for the discrimination of objects and support of road extraction. These features have facilitated the reliable reconstruction of roads in aerial images. 3-D straight edges are prominent in man-made environments, and usually correspond to man-made object boundaries in images. We developed robust and efficient methods that process raw image data to extract edge segments, find edge correspondences across images, and transfer them to 3-D object space. The scheme is designed as a bottom-up process, in which available information and organizational process are introduced at several layers of processing. Firstly, edge pixels are detected, and then are aggregated to generate edges with small gaps bridged based on the criteria of proximity and collinearity. The edges are checked to derive straight edge segments. For each straight edge segment, we compute geo- metric and photometric attributes. The photometric and chromatic properties are estimated from the “L”, “a” and “b” channels after a RGB to Lab color space conversion and include the median and scatter matrix. With the extracted edge segments and computed edge attributes, a matching process is started to find the correspondences for the edge segments across images. In order to alleviate the computation cost and enhance the robustness, we employ a two-phase scheme. The candidates for each edge segment are searched for in a restricted space, which is defined using the epipolar geometry constraint and available rough DTM or DSM data. The comparison with each candidate is made using edge attributes. A matching score for each candidate match is computed using a weight combi- nation of various criteria. The candidates as well as the matching scores are stored in so- called matching pool. The algorithm then exploits edge structure information in the images to achieve consistent results; therefore, not only the individual edge segments, but also the edge structures are matched across images. This is realized through structural matching with probability relaxation. The 3-D positions of edge pixels are computed by photogram- metric forward intersection using the corresponding edge pixels from the matched edge segments. We examined and compared three methods for pixel correspondence in compu- tation of 3-D coordinates of edge pixels. A data clustering algorithm has been implemented in our system to segment color aerial images and find road regions. In order to achieve a good and fast image segmentation, the input data to the clustering algorithm should hold enough information, while the amount of data for clustering should be limited. For this purpose, the original RGB images are trans- formed into different color spaces to enhance features such as shadows and vegetation so that they are more isolated in feature space. We also applied the principal component trans- formation technique to analyse the original RGB image data and select the appropriate 7.1 Summary 137 image bands for clustering. It is found that most of the data spread is in the direction of the first principal component, and accounts for most of the total variance of the data in the image. The first component is thus selected for data clustering, while the last two compo- nents are discarded. We also exploit height data to support road extraction. Because a DSM ideally models the man-made objects as well as the terrain, subtracting the DTM from the DSM results in a normalized DSM, which enables the extraction of above-ground objects, including build- ings and trees. The nDSM is employed in our road extraction system to check if a 3-D straight edge or a region is on the ground. Since the available DTM data is not always accu- rate enough, we extract above-ground objects directly from the DSM data, thus avoiding introducing errors from the DTM. In our system, nDMS is generated using the Multiple Height Bin method in which the DSM heights are grouped into consecutive bins of a certain size. This results in segmentation of the DSM in relatively few regions that are always closed and easy to extract. The method is simple and fast and very effective, and avoids the difficulty of parameter setting. The only parameters required are the minimum building height and the minimum and maximum building sizes. Furthermore, it is only conducted on high objects, thus leaving the topographic ground surface unaffected. Road marks and zebra crossings are good indications of the existence of roads. They can be used to extract roads or verify the extraction results. In addition, in many cases the cor- rect road centerlines can be even derived directly from the present road marks and/or zebra crossings. This is especially useful when the road sides are occluded or not well-defined, such as in cities or city centers. In our system, we treat road marks as white linear objects and apply line extraction techniques for road mark detection. The road marks are firstly detected by thresholding in the R, G, B channels in image, and then extracted using an image line model. The shape of an image line is represented as a second order polynomial; it is fitted to grey values as a function of the pixel’s row and column coordinate. We then compute the line local direction using this model and get the profile in the direction perpen- dicular to the line. The profile is described as a parabola, from which the precise position of each line point is obtained. The detected line points with similar direction and second derivative are linked. Straight lines are obtained by least squares fitting, and the 3-D straight lines are generated by the developed structural matching algorithm. For the detection of zebra crossings, the image is segmented using color information to extract the thin lines which compose the zebra crossing. Then, the gaps between zebra stripes are bridged by morphological closing. We thus obtain several clusters by connected labelling. The clusters are analysed and the rectangle-like clusters with a certain size are selected as zebra cross- ings.

7.1.2 Summary of 3-D Road Reconstruction

The extracted features and cues together with the information derived from the existing geodatabase are used to extract roads. This is achieved by finding road primitives and link- ing them in sequence. The road primitives are derived from straight edges or road marks. 138 7. CONCLUSIONS AND OUTLOOK

The reconstruction consists of different intermediate and interrelated processes, whereby every process provides more abstract and more object related information to its higher level process. The main strategy is to use knowledge from existing geodatabases and extracted cues to exclude irrelevant features as much as possible and as early as possible, and provide reliable primitives that are most probably belonging to a road. We start from a feature deduction process using knowledge from existing geodatabases, road design rules, as well as the extracted cues. Thereby, edges or lines that are above the ground, far away from old road vectors, do not comply with the directions of the old road vector, or violate the road design rules are discarded. This process not only results in a sig- nificant reduction in computational complexity but also rejection of many false hypotheses right from the beginning. Possible road sides that are parallel are formed using 3-D straight edges, and verified by nDSM and image segmentation data. The system is robust to handle cases where one road side is missing. This is done either by extension of PRSPs using 2-D and 3-D edges or by hypothesizing the missing road sides using single road edges. One of the difficulties in road extraction is to treat gaps caused by occlusions and shadows. In our system, the gaps are bridged using the shape of the old road vectors. This is quite robust and efficient especially on occluded curved roads. During processing, we make extensively use of complementary and redundant cues to ensure reliable generation of primitives and rejection of false hypotheses. The primitives are then connected to extract the roads by maximizing a merit function. The function combines various measures for the primitives and gaps as well as the shape information of the existing road vectors. Thus, road segments are selected and connected with gaps bridged while the false hypotheses are rejected. Based on the extracted roads, the road junctions are generated. Highways, first class roads and most second class roads are also extracted using the detected road marks and zebra crossings. In rural areas, the extracted roads using road marks are also used to verify the extraction results using edges. In complex areas, such as in cities or city centers, the road sides are generally occluded very much, and sometimes it is impossible to identify them. However, some of these roads are successfully extracted by exploiting road marks.

7.2 Conclusions

All functions mentioned above have enabled the system to perform an efficient and reliable road extraction from aerial imagery. It is shown in this dissertation that the developed system fulfils the requirements of the ATOMI project and achieves the goals stated in Section 1.2. Thus, from an application point of view, the major result of this dissertation is that the developed system can be used for automatic road network extraction from aerial imagery in rural areas for digital road data production. In the following, the conclusions are drawn regarding the specific aspects of the system. Compared to the other work (see Chapter 2) many tests were done, which makes the conclusions more reliable. The utilization of existing geodatabase proved to be useful. It provides the approximate position of the road in the image, thus significantly reduces the search space. The road 7.2 Conclusions 139 width information can be used to adjust parameters for hypothesis generation, while the road class attribute enables the system to exploit different features (road edges, road marks) to extract different classes of roads. In addition, the shape knowledge of the existing roads is quite helpful to guide the road extraction process. It is shown that in problematic areas the shape information can be employed to effectively bridge gaps caused by occlusions or shadows. The use of the topology of the road database facilitates the reconstruction of the road junctions. Furthermore, the existing road can also be used to assess the quality of the reconstruction results to a certain extent. The fusion of information from different cues is one of the key components in our system to achieve reliable road reconstruction. This is a major advantage of our system over the most other reported methods, which depend only on a single cue. In fact, the cue fusion process plays an important role in our system, and finds applications in various processes from irrelevant feature deduction, hypothesis generation to gap evaluation. The nDSM and image segmentation data have been effectively employed to discard edges that are on the above-ground objects, in grass field or have large slope from further processing, conse- quently many false hypotheses are avoided in the very early stage of the whole process chain. The use and fusion of redundant information results in reliable generation of hypoth- eses. The extracted road marks verify that the generated PRSPs are actually roads. It is pos- sible to reliably infer some missing 3-D road sides only when the multiple cues are available. In addition, the fusion of redundant information can be used to compensate incomplete results and account for errors. For instance, shadows on the roads are confirmed by nDSM that they actually belong to the roads. In conclusion, the use and fusion of mul- tiple cues is one of the important keys to reduce the complexity of image analysis, to account for errors and incomplete results, and to increase success rate and reliability for road reconstruction. Working directly in 3-D object space for hypothesis generation and road reconstruction is an advantage of our system. It is shown that more geometric criteria and realistic rules can be applied and many false hypotheses can be avoided, thus increasing the success rate and reliability of the results. Two techniques are applied to recover the missing road sides. Such processes are necessary, because not all 3-D road sides are available due to shadows or occlusion. The proposed evaluation methods are important and suitable to ensure that the recovered results are roads. Gaps are bridged by directly linking the neighbouring road primitives or by adapting the shape of the existing roads. The latter proved to be more effi- cient especially on occluded curved roads. The road primitives are connected to extract the roads by maximizing a merit function. The defined linking function combines various measures for the road primitives and gaps as well as the shape information of the existing road, thus resulting in long curves with shape similar to the existing road. The test results show that various roads are correctly reconstructed, including dirty roads, curved roads and partially occluded roads, etc. The system can also handle some of the roads in urban areas. In general, the success rate is low compared to the performance in rural areas. The main problem is that in urban areas the road sides are usually not well defined or totally invisible 140 7. CONCLUSIONS AND OUTLOOK due to shadows and occlusions, thus even if the system finds the road regions from image segmentation data and nDSM, a precise delineation of road sides or road centerlines is still impossible. It is shown that the reconstruction of junctions is possible by intersection of the recon- structed roads. Thus, to reconstruct a junction at least two roads connecting to the junction under process must be correctly extracted. Otherwise, the junction cannot be reconstructed by the proposed method. This usually occurs in the difficult cases, where the junction areas are occluded very much. In addition, the method might not properly handle complex junc- tions, such as circle junctions. However, it is shown that most of the junctions in rural areas can be correctly reconstructed, resulting in a more complete road network. Several measures have been defined for self-diagnosis and external performance evalua- tion. For internal evaluation, the information of the VEC25 roads proved to be useful. m1 (see the definition in Section 6.2) is quite effective to find the cases where roads are not completely extracted.m2 andm3 might not always deliver reliable results. A low value of m3 indicates that the result is dominated by gaps, but this does not necessarily mean the reconstructed road is not correct. More problems can be caused bym2 , since it depends much on the shape of the existing road. Because of the generalization effect, the shape of the existing roads, especially at the junction areas, might not be correct. We also proposed other measures to assess the overall reliability of the extracted results. However, they are not implemented yet. The measureq particularly focus on the gap areas. The idea is rea- sonable because the reconstruction errors occur most often related to gaps. But for the same reason asm2 it may not always effective. A better use of the existing VEC25 roads to assess the reliability of the extraction results might be done by first finding some knowledge of the difference between the VEC25 roads and the reference data through a certain learning proc- ess, then applying the obtained knowledge to the extraction results in other images. The existence of multiple cues might be a good indicator of the reliability of the reconstructed results. For example, the reconstructed first class or second class roads using parallel edges can be confirmed by the results using road marks. A method is also proposed but not imple- mented to assess the reliability of the reconstructed road junctions. The quality measures defined for external evaluation adequately capture the impression obtained when visually inspecting the extracted road data. The computation is conducted in 3-D in a semi-auto- matic manner. This can be improved if a 3-D buffering technique is developed. We do not try to evaluate the shape and detect the cases where the results wiggle around the reference data. The measurem2 defined for internal evaluation might be a choice for this purpose. The evaluation for the quality of the reconstructed road junctions is not done, too. Perform- ance evaluation, especially internal self-diagnosis, remains the weakest part of this disser- tation. Good performance of a object extraction system naturally depends on the algorithms for feature and cue extraction. The individual algorithm proposed in this dissertation exhibits only very few remaining problems. The performance of the developed straight edge seg- ment matching method shows that the method achieved a high success rate and most impor- 7.2 Conclusions 141 tantly is very reliable. This is also confirmed in other applications (Niederöst, 2000; Seiz, 2002). The use of rich attributes for matching, including edge geometrical properties and edge flanking region photometric and chromatic properties, is an advantage over other approaches that only use edge geometry and gray scale information. Locally consistent results have been achieved by structural matching, which allows matching in case of par- tially occluded edges, broken edges, etc. This method can be easily extended to arbitrary edges, or even points, if some of the matching criteria (feature attributes) are excluded and adapted. For the computation of the position of the straight edges, it is found in Section 4.3 that good results can be achieved by directly using the matched edge segments. The remain- ing problem in 3-D position computation is for the edges parallel to the epipolar line. Although in this case the edge segments are matched, the 3-D position cannot be obtained. This can be solved by using multiple overlapping imagery. The clustering method for image segmentation proved to be suitable for our purpose. In general good results have been achieved. It is also shown that some errors might exist in the segmentation results. For instance, the method cannot always discriminate unpaved roads and bare soil. Although this does not affect much of the whole performance of road extrac- tion, this problem can be avoided by using multiple spectral imagery, texture information or more advanced algorithms. The separation of above-ground objects and ground objects from DSM data is possible. The MHB method is preferred because it is fast and effective. It avoids the difficulty to select parameters. Since the DSM is generated in a commercial photogrammetric system, errors exist which might create problems to separate above-ground objects from the bald earth. The use of a more precise height data, e.g. laser scanning data, can help. It is shown that the line extraction method can be applied to extract road marks on first class, second class roads and highways. The method for road mark extraction is robust and fast, the extracted results are in sub-pixel accuracy. The color information allows a rough seg- mentation of road mark pixels from background. It is understandable that some objects that have similar spectral response and similar width to road marks are also extracted. This will not create much problem in road extraction as they can be discarded by the procedures of cue combination. Zebra crossings have been extracted by the proposed method. The prior knowledge about the color of the zebra crossings simplifies the detection process. It is shown that the width and the orientation of a zebra crossing can be extracted. Such infor- mation is useful, the width of zebra crossings can be used as a reliable parameter to generate PRSPs. Linking can start from the primitives that are close to the zebra crossings and have a similar orientation to the zebra crossings. It is also shown that the developed system can also work on black/white images without DSM data. The performance on a Belgium dataset shows that good results have been achieved. This is because there are not much occlusions/shadows on roads in open rural areas. In addition, the use of geometrical criteria and information of the existing roads including the positions and shape plays an important role in the system. It is also shown that many roads in small villages cannot be extracted, because the road sides in this areas are 142 7. CONCLUSIONS AND OUTLOOK difficult to be identified in such images. Even road marks cannot be extracted. It is easy to understand that it is more difficult to assess the reliability of the extraction results with such dataset, because the redundant information are not available.

7.3 Outlook

We believe that we have demonstrated a usable model of automatic road extraction system which integrates processing of color image data and existing geodatabases. This study is an important step towards the development of research on man-made object extraction from aerial images. There are still a lot of interesting issues which can be and sometimes need to be improved. Future work should be directed towards improving each individual compo- nent, wherever possible, and also system related and conceptual improvements. Main topics which can contribute to the advancement of our system are: (1) improvement of image segmentation and nDSM generation. In the current implementation, the image segmentation has been done by a clustering algo- rithm; however, it can be improved by adapting more sophisticated image anal- ysis algorithms, such as supervised image classification. Regarding nDSM generation, the algorithm can be further refined. In addition, more accurate data, such as those from laser scanning can be used to precisely locate and delineate above-ground object boundaries. (2) Improve the method or develop new method for road junction generation to handle complex cases. (2) Automatic derivation of quality control measures. In our system, the quality is assessed using the difference between the extracted road and the old road vector, and the information of the road primitives that make up the results. Further research is needed to find more general and more robust measures which allow following the traffic light principle. (3) Improvement of methods to evaluate the generated road hypotheses and gaps. In the current implementation, this is done by counting pixels of certain objects, or by looking for road marks. More robust methods have to be developed by better use of context. (4) Integration of more components and processes in the framework to treat roads of complex cases, as they may appear in suburban and urban areas. In the cur- rent system, roads are extracted mainly by finding road sides and road marks. This is quite effective in rural areas where road sides are generally visible. How- ever, this also indicates the limitations of our system: it cannot handle the cases where many road sides are missing or are not well defined, or road marks are not present. Thus, other methods have to be developed to infer road sides from additional cues, existing knowledge or manual input. In addition to the further research, our future work will also include the following aspects for software development in the framework of the project ATOMI: 7.3 Outlook 143

(1) Post-filtering of results to remove small errors, smooth road centerlines etc. (2) More automated and batch processing of larger input data. (3) Program modifications to be able to use only orthoimages instead of stereo pairs. (4) Investigations to examine cost/benefit for use of color and DSM, and 1:30,000 images with 15cm lens in rural areas. (5) Investigations using laser-derived DTMs/DSMs.

APPENDIX

A. VEC25 road attributes

The VEC25 road network information can be found in the website of the Swiss Federal Office of Topography (http://www.swisstopo.ch/). The attributes for roads are listed in Table A-1.

Attribute name Description Examples ObjectId Explicit and stable iden- 327, 689 tification code ObjectOrigin Original of the data ObjectVal Kind of Object 1_class, 2_class YearOfChange Year of map update (year 1976, 1998 of aerial photo) BridgeType Type of bridge Bruecke, Steg TunnelType Type of tunnel Galerie, Tunnel

Table A-1. VEC25 road attributes.

B. Derivation of knowledge from existing geodatabase

In ATOMI project, the existing geodatabase is available for the whole territory of Switzer- land in Arc/Info coverage format. Thus, it has to be processed in order to derive the neces- sary road information which is then organized in a proper structure to be input to the road extraction system. This structure should be designed such that it will facilitate the succes- sive processing. The geodatabase contains various geographic objects including buildings, houses, commu- nication and hydrographic networks and landcover areas. All these objects are represented by vector data and have semantic attributes. Our work is first to classify roads according to the landcover. The landcover information of the road, together with the road vector data and road class information will be stored in the designed structure. This is done by an automatic procedure developed in Arc/Info environment. The procedure is illustrated in Figure B-1.

Geodatabase

Residential Forest Forest Road Areas Areas Border Network

Villages Urban Buffer

Forest Border Buffer Zone Union

Landcover Map Identity

Output fromArc/Info

Road Network Roads with Attributes

Figure B-1. Processing geodatabase data to derive roads and road attributes in Arc/Info. 148 APPENDIX

Firstly, roads and the landcover information are extracted from the database. The landcover information distinguishes residential areas and forest areas, the remaining areas are consid- ered as open rural areas. Depending on the size of the residential areas, they can be catego- rized into village, town or city. In order to locate the roads at the forest border, a forest border buffer zone is created, the buffer width is estimated from the RMS error of the geo- database. The landcover map is overlaid with road network by spatial analysis in Arc/Info, and the landcover information for each road is derived. An example of road classification using landcover information is given in Figure A-2, where the open rural areas, residential areas and forests are represented as white, orange and green regions, while the roads in open rural areas, residential areas, inside forests and along forest borders are depicted as blue lines, yellow lines, black lines and pink lines respectively. Therefore, for each road we obtain: • geometric description (polyline)

• road class

• type of landcover

Figure B-2. Classify the VEC25 roads according to landcover information. Open rural areas, residential areas and forests are represented as white, orange and green regions. Roads in open rural areas, residential areas, inside forests and along forest borders are depicted in blue lines, yellow lines, black lines and pink lines respectively. APPENDIX 149

In addition to the information for each road, the road network topology information is also generated and derived from the database. For each road junction, the following information is obtained (see also Figure A-2): • geometric description (point)

• the number of roads connected

• roads connected, and connect type, i.e. road coming to or leaving from the junc- tion

Road 1

Road Junction Road 3

Road 2

Figure B-3. Road junction and its attributes. The road junction and roads are represented as dot and solid black lines. The dashed arrow lines indicate roads coming to or leaving from the junction.

The topology information is useful for road network generation after the roads are extracted from the imagery. The information derived for each road as well as for the road network compose the initial knowledge database of the road extraction system. They are stored in a special data struc- ture which have two separate parts. The first part is for roads and contains road geometry and attributes, The other part is for the road network topology. The two parts are related via the road IDs.

BIBLIOGRAPHY

Agouris, P., Stefanidis, A., Gyftakis, S. (2001). Differential Snakes for Change Detection in Road Segments. Photogrammetric Engineering and Remote Sensing,67 (12):1391-1399.

Airault, S., Ruskone, R., Jamet, O. (1994). Road Detection from Aerial Images: A Coop- eration Between Local and Global Methods. Satellite Remote Sensing I, SPIE Vol. 2315, pp. 508-518.

Airault, S., Jamet, O. (1995). Evaluation of Operationality of a Semi-automatic Road Net- work Capture Process. Digital photogrammetry and Remote Sensing’95, SPIE Vol. 2646, pp. 180-191.

Alnuweiri, H.M., Prasanna Kumar, V.K. (1991). Fast Image Labeling Using Local Opera- tors on Mesh-Connected Computers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13 (2):202-207.

Argialas, D.P., Krishnamurthy, S. (1990). Detection of Lines and Circles in Maps and Engineering Drawings. International Archives ofPhotogrammetry and Remote Sensing, Vol. 26, Part B3, pp. 392-399.

Atalay, V., Yilmaz, M.U. (1998). A Matching Algorithm Based on Linear Features. Pat- tern Recogniton Letters, 19 (9):857-867.

Ayache, N., Faverjon, B. (1987). Efficient Registeration of Stereo Images by Matching Graph Description of Edge Segments. International Journal ofComputer Vision ,1 (2):107-131.

Baillard, C., Schmid, C., Zisserman, A., Fitzgibbon, A. (1999). Automatic Line Matching and of Buildings from Multiple Views. Internatioanl Archives of Photogrammetry and Remote Sensing, Vol. 32, Part 3-2W5, pp. 69-80.

Baird, H.S. (1985). Model-based Image Matching using Location. The MIT Press, Cam- bridge, MA.

Bajcsy, R., Tavakoli, M. (1976). Computer Recognition of Roads from Satellite Pictures. IEEE Transactions on System, Man and Cybernetics, 6 (9):623-637.

Ballard, D.H. (1981). Generalizing the Hough Transform to Detect Arbitrary Shapes. Pat- tern Recognition, 13 (2):111-122.

Ballard, D.H., Brown, C.M. (1987). Computer Vision. Prentice-Hall, Inc., Englewood Cliffs, NJ. 152 BIBLIOGRAPHY

Baltsavias, E.P. (1991). Multiphoto Geometrically Constrained Matching. Ph. D. Disserta- tion, Report No. 49, Institute of Geodesy and Photogrammetry, ETH Zurich, Swit- zerland.

Baltsavias, E.P., Mason, S., Stallmann, D. (1995). Use of DTMs/DSMs and Orthoimages to Support Building Extraction. In: Grün, A., Kuebler, O., Agouris, P. (Eds.), Auto- matic Extraction ofMan-made Objects from Aerial and Space Images , Birkhäuser Verlag, Basel, pp. 189-198.

Baltsavias, E.P., Mason, S., Li, H., Sinning, M. (1996). Automatic DSM Generation by Digital Photogrammetry: Example Morteratsch Glacier. Survey World, 4 (2):18-21.

Baltsavias, E.P. (1999). A Comparison between Photogrammetry and Laser Scanning. ISPRS Journal of Photogrammetry and Remote Sensing, 54 (2/3):83-94.

Baltsavias, E.P., Grün, A., Gool, V.G. (2001). Automatic Extraction of Man-Made Objects From Aerial and Space Images (III). A. A. Balkema Publishers, Lisse.

Barnard, S.T., Fischler, M.A. (1982). Computational Vision. Technical Note 261, SRI International.

Barzohar, M., Cohen, M., Ziskind, I., Cooper, D.B. (1997). Fast Robust Tracking of Curvy Partially Occluded Roads in Clutter in Aerial Imagers. In: Grün, A., Baltsavias, E.P., Henricsson, O. (Eds.), Automatic Extraction ofMan-made Objects from Aerial and Space Images (II), Birkhäuser Verlag, Basel, pp. 277-286.

Baumgartner, A., Steger, C., Wiedemann, C., Mayer, H., Eckstein, W., Ebner, H. (1996). Update of Roads in GIS from Aerial Imagery: Verification and Multi-resolution Extraction. International Archives ofPhotogrammetry and Remote Sensing , Vol. 31, Part B3, pp. 53-58.

Baumgartner, A., Steger, C., Mayer, H., Eckstein, W. (1997). Multi-resolution, Semantic Objects and Context for Road Extraction. In: Förstner, W., Pluemer, L. (Eds.), Semantic Modelling for the Acquisition of Topographic Information from Images and Maps, Birkhäuser Verlag, Basel, pp. 140-156.

Baumgartner, A., Hinz, S., Wiedemann, C. (2002). Efficient Methods and Interfaces for Road Tracking. International Archives ofPhotogrammetry and Remote Sensing , Vol. 34, Part 3B, pp. 309-312.

Barsi, A., Heipke, C., Willrich, F. (2002). Junction Extraction by Artificial Neural Net- work System - JEANS. International Archives ofPhotogrammetry and Remote Sensing, Vol. 34, Part 3B, pp. 18-21.

Beis, J.S., Lowe, D.G. (1997). Shape Indexing Using Approximate Nearest-neighbour Search in High Dimensional Spaces. Proceedings ofComputer Vision and Pattern Recognition, pp. 1000-1006. BIBLIOGRAPHY 153

Bellaire, G. (1995). Hashing with a Topological Invariant Feature. Proceedings of2nd Asian Conference on Computer Vision, Vol. 2, pp. 598-602.

Bhanu, B. (1984). Representation and Shape Matching of 3-D objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6 (3):340-350.

Bienenstock, E. (1988). Neural-like Graph-Matching Techniques for Image Procesing. In: Anderson, D.Z. (Ed.), Neural Information Processing Systems, Addison-Wesley, Reading, MA, pp. 211-235.

Bigand, A., Bouwmans, T., Dubus, J.P. (2001). A New Stereomatching Algorithm Based on Linear Features and the Fuzzy Integral. Patter Recogniton Letters, 22 (2):133- 146.

Bignone, F. (1995). Segment Stereo Matching and Coplanar Grouping. Technical Report BIWI-TR-165, Image Science Lab, Institute for Communication Technology, ETH Zurich, Switzerland.

Bignone, F., Henricsson, O., Fua, P., Stricker, M. (1996). Automatic Extraction of Generic House Roofs from High Resolution Aerial Imagery. In: Buxton, B., Cipolla, R. (Eds.), Fourth European Conference on Computer Vision, Vol. 1064 of Lecture Notes in Computer Science, Springer-Verlag, Berlin, pp. 85-96.

Boldt, M., Weise, R., Riseman, E. (1989). Token-based Extraction of Straight Lines. IEEE Transactions on Systems, Man and Cybernetics, 19 (6):1581-1594.

Bordes, G., Giraudon, G., Jamet, O. (1997). Road Modeling Based on a Cartographic Database for Aerial Image Interpretation. In: Förstner, W., Pluemer, L. (Eds.), Semantic Modelling for the Acquisition of Topographic Information from Images and Maps, Birkhäuser Verlag, Basel, pp. 23-139.

Boyer, R.C., Kak, A.C. (1988). Structural for 3-D Vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10 (2):146-166.

Breuel, T.M. (1989). Adaptive Model Base Indexing. Proceeding ofDARPA Image Under- standing Workshop, pp. 805-814.

Bueckner, J. (1998). Model Based Road Extraction for the Registration and Interpretation of Remote Sensing Data. International Archives ofPhotogrammetry and Remote Sensing, Vol. 32, Part 4, pp. 85-90.

Burns, J., Hanson, A.R., Riseman, E.M. (1986). Extracting Straight Lines. IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 8 (4):425-455.

Canning, J. Kim, J.J., Rosenfeld, A. (1987). Symbolic Pixel Labeling for Curvilinear Fea- ture Detection. Proceedings ofDARPA Image Understanding Workshop , pp. 242- 256.

Canny, J.F. (1986). A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8 (6):679-698. 154 BIBLIOGRAPHY

Chang, Y.L., Aggarwal, J.K. (1997). Line Correspondences from Coorperating Spatial and Temporal Grouping Process for a Sequence of Images. Computer Vision and Image Understanding, 67 (2):186-201.

Cho, W. (1996). Relational Matching for Automatic Orientation. International Archives of Photogrammetry and Remote Sensing, Vol. 31, Part B3, pp. 111-119.

Christmas, W., Kittler, J., Petrou, M. (1995). Structural Matching in Computer Vision using Probabilistic Relaxation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17 (8):749-764. van Cleyenbreugel, J., Fierens, F., Suetens, P., Oosterkinck, A. (1990). Delineating Road Structures on Satellite Imagery by a GIS Guided Technique. Photogrammetric Engi- neering and Remote Sensing, 56 (6):893-898.

Clowes, H.B. (1971). On Seeing Things. Artificial Intelligence, 2 (1):79-116.

Cord, M., Jordan, M., Cocquerez, J.P. (2001). Accurate Building Structure Recovery from High Resolution Aerial Imagery. Computer Vision and Image Understanding,82 (1):138-173.

Davis, L.S. (1979). Shape Matching using Relaxation Techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1 (1):60-72.

Dhond, U., Aggarwal, J. (1989). Structure from Stereo - a Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19 (6):1489-1510.

Dial, G., Gibson, L., Poulsen, R. (2001). IKONOS Satellite Imagery and its Use in Auto- mated Road Extraction. In: Baltsavias, E.P., Grün, A., Gool, L.V. (Eds.), Automatic Extraction ofMan-Made Objects from Aerial and Space Images (III), A. A. Balkema Publishers, Lisse, pp. 357-367.

Doucette, P., Agouris, P., Stefanidis, A., Musave, M. (2001). Self-organised Clustering for Road Extraction in Classified Imagery. ISPRS Journal ofPhotogrammetry and Remote Sensing, 55 (1):347-358.

Dowman, I. (2001). Airborne Interferometric SAR: A Review of the State of the Art and of OEEPE Activities. Proc. ofOEEPE Workshop on Airborne Laserscanning and Interferometric SAR for Detailed Digital Elevation Models, CD-ROM, pp. 12-22.

Duda, R.O., Hart, P.E. (1972). Use of the Hough Transformation to Detect Lines and Curves in Pictures. Communications of the ACM, 15 (1):11-15.

Duncan, J.H., Birkholzer, T. (1992). Reinforcement of Linear Structure using Parameter- ized Relaxation Labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 (5):502-515.

Eberly, D., Gardner, R., Morse, B., Pizer, S., Scharlach, C. (1993). Ridges for Image Anal- ysis. Technical Report TR93-055, Department of Computer Science, University of North Carolina, Chapel Hill, NC. BIBLIOGRAPHY 155

Eckstein, W., Munkelt, R. (1995). Extraction Objects from Digital Terrain Models. In: Schenk, T. (Ed.), Remote Sensing and Reconstruction for Three-Dimensional Objects and Scenes, SPIE Vol. 2572, pp. 43-51.

Eidenbenz, C., Kaeser, C., Baltsavias, E.P. (2000). ATOMI - Automated Reconstruction of Topographic Objects from Aerial Images using Vectorized Map Information. Inter- national Archives ofPhotogrammetry and Remote Sensing , Vol. 33, Part B3/1, pp. 462-471.

Faugeras, O. (1993). Three-dimensional Computer Vision - A Geometric Viewpoint. Arti- ficial Intelligence Series. The MIT Press, Cambridge, MA.

Faugeras, O.D., Hebert, M. (1986). The representation, Recognition, and Locating of 3-D Objects. International Journal of Robotics Research, 5 (3):27-52.

Fischler, M.A., Tenenbaum, J.M., Wolf, H.C. (1981). Detection of Roads and Linear Structures in Low-resolution Aerial Imagery using Multi-source Knowledge Inte- gration Technique. Computer Graphics and Image Processing, 15:201-223.

Fischler, M.A., Bolles, R.C. (1986). Perceptual Organization and Curve Partitioning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 8 (1):100-105.

Fischer, A., Kolbe, T.H., Lang, F. (1997). Integration of 2D and 3D Reasoning for Build- ing Reconstruction using a Generic Hierarchical Model. In: Förstner, W., Pluemer, L. (Eds.), Semantic Modeling for the Acquisition of Topographic Information from Images and Maps, Birkhäuser Verlag, Basel, pp. 159-180.

Fischler, M.A., Heller, A.J. (1998). Automated Techniques for Road Network Modeling. Proceedings DARPA Image Understanding Workshop, pp. 501-516.

Fischler, M.A., Bolles, R.C. (1998). Evaluation of a Road-Centerline Data Model. Pro- ceedings of DARPA Image Understanding Workshop, pp. 493-498.

Fiset, R., Cavayas, F., Mouchot, M., Solaiman, B., Desjardins, R. (1998). Map-image Matching using a Multi-layer Perceptron: the Case of the Road Network. ISPRS Journal of Photogrammetry and Remote Sensing, 53 (2):76-84.

Förstner, W. (1996). 10 Pros and Cons Against Performance Characterization of Vision Algorithms. In: European Conference on Computer Vision, Workshop on Perfor- mance Characteristics of Vision Algorithms, pp. 13-29.

Fortier, M.F.A., Ziou, D., Armenakis, C., Wang, S. (2001). Automated Correction and Updating of Road Database from High Resolution Imagery. Canadian Journal of Remote Sensing, 27 (1):76-89.

Fradkin, M., Ethrog, U. (1994). Feature Matching for Automatic Generation of Distortion- less Digital Orthophoto. Integrating Photogrammetric Techniques with Scene Analy- sis and Machine Vision III, SPIE Vol. 3072, pp. 153-164. 156 BIBLIOGRAPHY

Fua, P., Leclerc, Y.G. (1990). Model Driven Edge Detection. Machine Vision and Applica- tion, 3 (1):45-56.

Gerig, G., Koller, T., Szekely, G., Brechbuehler, C., Kuebler, O. (1993). Symbolic descrip- tion of 3-D structures applied to cerebral vessel tree obtained from MR angiography volume data. In: Barrett, H.H., Gmitro, A.F. (Eds.), Information Processing in Med- ical Imaging, Vol. 687 of Lecture Notes in Computer Science, Springer-Verlag, Ber- lin, pp. 94-111.

Gharhaman, D.E., Wong, A.K.C., Au, T. (1980). Graph Optimal Monomorphism Algo- rithms. IEEE Transactions on System, Man and Cybernetics, 10 (4):181-188.

Ghostasby, A., Page, C.V. (1984). Image Matching by a Probabilistic Labeling Process. Proceedings of International Conference on Pattern Recognition, pp. 307-309.

Gong, P., Wang, J. (1996). Road Network Extraction from Airborne Digital Camera Data. Proceedings ofSecond International airborne remote sensing conference and exhi- bition: Technology Measurement & Analysis, Vol. III, pp. 387-395.

Grimson, W.E.L., Pavlidis, T. (1985). Discontinuity Detection for Visual Surface Recon- struction. Computer Vision, Graphics and Image Processing, 30:316-330.

Grimson, W.E.L., Lozano-Perez, T. (1987). Localizing Overlapping Parts by Searching the Interpretation Tree. IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 9 (4):469-482.

Grimson, W.E.L. (1989). On the Recognition of Curved Objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11 (6):632-642.

Grün, A., Baltsavias, E.P. (1985). Adaptive Least Square Correlation with Geometrical Constraints. Proceedings of Computer Vision for Robots, SPIE Vol. 595, pp. 72-82.

Grün, A., Li, H. (1995). Road Extraction from Aerial and Satellite Images by Dynamic Programming. ISPRS Journal ofPhotogrammetry and Remote Sensing , 50 (4):11- 20.

Grün, A., Li, H. (1997a). Linear Feature Extraction with 3-D LSB-Snakes. In: Grün, A., Baltsavias, E.P., Henricsson, O. (Eds.), Automatic Extraction ofMan-made Objects from Aerial and Space Images (II), Birkhäuser Verlag, Basel, pp. 287-298.

Grün, A., Li, H. (1997b). Semi-Automatic Linear Feature Extraction by Dynamic Pro- gramming and LSB-Snakes. Photogrammetric Engineering and Remote Sensing,63 (8):985-995.

Grün, A., Wang, X. (1998). CC-Modeler: A Topology Generator for 3-D City Models. ISPRS Journal of Photogrammetry and Remote Sensing, 53 (5):286-295.

Grün, A., Baer, S., Buehrer, T. (2000). DTMs derived automatically from DIPs. Geoinfor- matics, 5:36-39. BIBLIOGRAPHY 157

Haala, N., Hahn, M. (1995). Data fusion for the detection and reconstruction of buildings. In: Grün, A., Kuebler, O., Agouris, P. (Eds.), Automatic Extraction ofMan-Made Objects from Aerial and Space Images, Birkhäuser Verlag, Basel, pp. 211-220.

Haala, N., Brenner, C. (1999). Extraction of Buildings and Trees in Urban Environments. ISPRS Journal of Photogrammetry and Remote Sensing, 54 (2/3):130-137.

Hancock, E.R., Kittler, J. (1990). Edge Labeling using Dictionary-base Relaxation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12 (2):165-181.

Hannah, M.J. (1988). Digital Stereo Image Matching Techniques. International Archives of Photogrammetry and Remote Sensing, Vol. 27, Part B3, pp. 280-293.

Haralick, R.M., Shapiro, L.G. (1979). The Consistent Labeling Problem: Part II. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2):173-184.

Haralick, R.M., Shapiro, L.G. (1992). Computer and Robot Vision, Volume 1. Addison- Wesley, Reading, MA.

Hartley, R.I. (1995). A Linear Method for Reconstruction from Lines and Points. Proceed- ings of International Conference of Computer Vision, pp. 882-887.

Harvey, W.A. Jr. (1999). Performance Evaluation for Road Extraction. Proceedings of ISPRS Workshop on 3D Geospatial Data Production: Meeting Application Require- ments, pp. 175-185.

Heipke, C., Englisch, A., Speer, T., Stier, S., Kutka, R. (1994). Semi-Automatic Extraction of Roads from Aerial Images. International Archives ofPhotogrammetry and Remote Sensing, Vol. 30, Part 4, pp. 353-360.

Heipke, C., Steger, C., Multhanmmer, R. (1995). A Hierarchical Approach to Automatic Road Extraction From Aerial Imagery. In: McKeown, D.M., Dowman, I. (Eds.), Integrating Photogrammetric Techniques with Scene Analysis and Machine Vision II, SPIE Vol. 2486, pp. 222-231.

Heipke, C. (1996). Overview of Image Matching Techniques. OEEPE Workshop on the Application of Digital Photogrammetric Workstations. OEEPE Official Publica- tions, No. 33, pp. 173-189. http://phot.epfl.ch/workshop/wks96/art_3_1.html. Accessed on 23 Noverber, 2002.

Heipke, C., Mayer, H., Wiedemann, C., Jamet, O. (1998). External Evaluation of Auto- matically Extracted Road Axes. Photogrammetrie, Fernerkundung, Geoinformation, 2:81-94.

Heipke, C., Pakzad, K., Straub, B.M. (2000). Image Analysis for GIS Data Acquisition. Photogrammetric Record, 16 (9):963-985.

Henderson, G. (1984). A Note on Discrete Relaxation. Computer Vision, Graphics and Image Processing, 28:384-388. 158 BIBLIOGRAPHY

Henricsson, O. (1996). Analysis of Image Structures using Color Attributes and Similarity Relations. Ph. D. Dissertation, Report No. 59, Institute of Geodesy and Photogram- metry, ETH Zurich, Switzerland.

Henricsson, O., Baltsavias, E.P. (1997). An Evaluation of 3-D Building Reconstruction with ARUBA. In: Grün, A., Baltsavias, E.P., Henricsson, O. (Eds.), Automatic Extraction ofMan-Made Objects from Aerial and Space Images (II), Birkhäuser Verlag, Basel, pp. 65-76.

Hinz, S., Baumgartner, A., Mayer, H., Wiedemann, C., Ebner, H. (2001). Road Extraction Focussing on Urban Areas. In: Baltsavias, E.P., Grün, A., Gool, L.V. (Eds.): Auto- matic Extraction ofMan-Made Objects from Aerial and Space Images (III), A. A. Balkema Publishers, Lisse, pp. 255-266.

Hinz, S., Widemann, C., Ebner, H. (2002). Self-diagnosis within Automatic Road Net- work Extraction. International Archives ofPhotogrammetry and Remote Sensing , Vol. 34, Part 2, pp. 185-191.

Horaud, R., Skordas, T. (1989). Stereo Correspondence Through Feature Grouping and Maximal Cliques. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11 (11):1168-1180.

Hsiao, J.Y., Sawchuk, A.A. (1989). Supervised Textured Image Segmentation using Fea- ture Smoothing and Probabilistic Relaxation Techniques. IEEE Transactions on Pat- tern Analysis and Machine Intelligence, 11 (12):1279-1292.

Hu, S., Zhang, Z., Zhang, J. (2000). An Approach of Semiautomatic Road Extraction from Aerial Images Based on Template Matching and Neural Network. International Archives of Photogrammetry and Remote Sensing, Vol. 33, Part B3/2, pp. 994-999.

Huffman, D.A. (1971). Impossible Objects as Nonsense Sentences. In: Metzer, B., Michie, D. (Eds.), Machine Intelligence, Vol. 6, Edinburgh University Press, Edingburgh, pp. 295-323.

Huising, E.J., Pereira, L.M.G. (1998) Errors and Accuracy Estimates of Laser Data Acquired by Various Laser Scanning Systems for Topographic Application. ISPRS Journal of Photogrammetry and Remote Sensing, 53 (3):245-261.

Ibison, M.C., Zapalowski, L. (1986). On the Use of Relaxation Labeling in Correspon- dence Problem. Pattern Recognition Letters, 4:103-110.

Jones, G. (1997). Constraints, Optimization, and Hierarchy: Reviewing Stereoscopic Cor- respondence of Complex Features. Computer Vision and Image Understanding, 65(1):57-78.

Kaichang, D., Zongjian, L., Jian, L. (1990). A Thematic Reading System. International Archives of Photogrammetry and Remote Sensing, Vol. 26, Part B4, pp. 237-243. BIBLIOGRAPHY 159

Kittler, J., Illingworth, J. (1985). Relaxation Labeling Algorithms - A Review. Image and Vision Computing. 3 (4):206-216.

Klang, D. (1998). Automatic Detection of Changes in Road Database Using Satellite Imagery. International Archives ofPhotogrammetry and Remote Sensing , Vol. 32, Part 4, pp. 293-298.

Koller, T.M., Gerig, G., Szekely, G., Dettwiler, D. (1994). Multi-scale Detection of Curvi- linear Structures in 2-D and 3-D Image Data. Technical Report BIWI-TR-153, Communication Technology Laboratory, ETH Zurich, Switzerland.

Krishnakumar, N., Sitharama, S., Holyer, R., Lybanon, M. (1990). Feature Labeling in Infrared Oceanographic Images. Image and Vision Computing, 8 (2):142-147.

Krzystek, P. (1991). Fully Automatic Measurement of Digital Terrain Models. Proceed- ings of 43th Photogrammetric Week, pp. 203-214.

Kumar, R., Hanson, A.R. (1994). Robust Methods for Pose Determination. Proceedings of NSF/ARPA Workshop on Performance versus Methodology in Computer Vision, pp. 41-57.

Lanse, S., Zierl, C., Munkelt, O., Radig, B. (1997). MORAL - A Vision-based Object Rec- ognition System for Autonomous Mobile System. Proceedings ofComputer Analy- sis ofImages and Patterns ’97, Vol. 1296 of Lecture Notes in Computer Science, Springer-Verlag, Berlin, pp. 33-41.

Li, H. (1997). Semi-automatic Road Extraction from Satellite and Aerial Images. Ph. D. Dissertation, Report No. 61, Institute of Geodesy and Photogrammetry, ETH Zurich, Switzerland.

Li, M.X. (1989). Hierarchical Multi-point Matching with Simultaneous Detection and Location of Breaklines. Ph. D. Dissertation, Department of Photogrammetry, Royal Institute of Technology, Stockholm, Sweden.

Lin, C., Huertas, A., Navatia, R. (1995). Detection of Buildings from Monocular Images. In: Grün, A., Kuebler, O., Agouris, P. (Eds.), Automatic Extraction ofMan-made Objects from Aerial and Space Images, Birkhäuser Verlag, Basel, pp. 125-134.

Liu, Y., Hunang, T.S., Faugeras, O.D. (1990). Determination of Camera Location from 2D and 3D Line and Point Correspondences. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12 (1):28-37.

Lowe, D.G. (1987). Three-dimensional Object Recognition from Single Two-dimensional Images. Artificial Intelligence, 31 (3):355-395.

Lowe, D.G. (1992). Robust Model-based Motion Tracking Through the Integration of Search and Estimation. International Journal of Computer Vision, 8 (2):113-122.

Mackworth, A.K. (1977). Consistency in Networks of Relations. Artificial Intelligence,8 (1):99-118. 160 BIBLIOGRAPHY

Mason, S., Baltsavias, E.P. (1997). Image-Based Reconstruction of Informal Settlements. In: Grün, A., Baltsavias, E.P., Henricsson, O. (Eds.), Automatic Extraction ofMan- Made Objects from Aerial and Space Images (II), Birkhäuser Verlag, Basel, pp. 97- 108.

Maas, H.-G. (1999). Closed Solution for the Determination of Parametric Building Mod- els from Invariant Moments of Airborne Laserscanner Data. International Archives of Photogrammetry and Remote Sensing, Vol. 32, Part 3-2W5, pp. 193-199.

Marapane, S.B., Trivedi, M.M. (1994). Multi-Primitive Hierarchical (MPH) Stereo Analy- sis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16 (3):227- 240.

Matsuyama, T., Hwang, V.S. (1990). SIGMA: A Knowledge-Based Aerial Image Under- standing System. Plenum Press, New York.

Mayer, H., Laptev, L., Baumgartner, A., Steger, C. (1997). Automatic Road Extraction Based on Multi-scale Modelling, Context and Snakes. International Archives of Photogrammetry and Remote Sensing, Vol. 32, Part 3-2W3, pp. 106-113.

McIntosh, J.H., Mutch, K.M. (1988). Matching Straight Lines. Computer Vision, Graphics and Image Processing, 43 (3):386-408.

McKeown, D.M., Denlinger, J.L. (1988). Cooperative Methods for Road Tracking in Aerial Imagery. Proceedings ofIEEE Computer Vision and Pattern Recognition Conference, pp. 662-672.

McKeown, D.M., Bulwinkle, T., Cochran, S., Harvey, W., McGlone, C., Shufelt, J. (2000). Performance Evaluation for Automatic Feature Extraction. International Archives of Photogrammetry and Remote Sensing, Vol. 33, Part B2, pp. 379-394.

Medioni, G., Navatia, R. (1984). Matching Images Using Linear Features. IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 6 (8):675-685.

Medioni, G., Navatia, R. (1985). Segment-based Stereo Matching. Computer Vision, Graphics and Image Processing, 31 (1):2-18.

Mitiche, A., Habelrih, G. (1989). Integration of Straight Line Correspondences Using Angular Relations. Pattern Recognition, 22 (3):299-308.

Montanari, J. (1971). On the Optimal Detection of Curves in Noisy Pictures. Communica- tions of the ACM, 14 (5):335-345.

Nagao, M., Matsuyama, T. (1980). A Structural Analysis of Complex Aerial Photographs. In: Nadler, M. (Ed.), Advanced Application in Pattern Recognition, Plenum Press, New York, Vol. 1, pp. 1-199.

Nasrabdi, N.M. (1992). A Stereo Vision Technique Using Curve-segments and Relaxation Matching. IEEE Transactions on Pattern Analysis and Machine Intelligence,14 (5):566-572. BIBLIOGRAPHY 161

Nelson, R.C. (1994). Finding Line Segments by Stick Growing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16 (5):519-523.

Neuenschwander, W., Fua, P., Szekely, G., Kubler, O. (1995). From Ziplock Snakes to Velcro Surfaces. In: Grün, A., Kuebler, O., Agouris, P. (Eds.), Automatic Extraction ofMan-made Objects from Aerial and Space Images , Birkhäuser Verlag, Basel, pp. 105-114.

Nevatia, R., Babu, K. (1980). Linear Feature Extraction and Description. Computer Graphics and Image Processing, 13 (3):257-269.

Nevatia, R., Huertas, A., Kim, Z. (1999). The MURI Project for Rapid Feature Extraction in Urban Areas. International Archives ofPhotogrammetry and Remote Sensing , Vol. 32, Part 3-2W5, pp. 3-14.

Niederöst, M. (2000). Reliable Reconstruction of Buildings for Digital Map Revision. International Archives ofPhotogrammetry and Remote Sensing , Vol. 33, Part B3, pp. 635-642.

Noble, J.A. (1992). Finding Half Boundaries and Junctions in Images. Image and Vision Computing, 10 (4):219-232.

Nudel, B. (1983). Consistent-labeling Problems and Their Algorithms: Expected Com- plexities and Theory Based Heuristics. Artificial Intelligence, 21(1/2):135-178.

Quam, L.H. (1978). Road Tracking and Anomaly Detection in Aerial Imagery. Proceed- ings of DARPA Image Understanding Workshop, pp. 51-55.

Pavlidis, T., Horowitz, S.L. (1974). Segmentation of Plane Curves. IEEE Transactions on Computers, 23 (8):860-870.

Petzold, B., Reiss, P., Stoessel, W. (1999). Laser Scanning - Survey and Mapping Agen- cies Are Using A New Technique for the Derivation of Digital Terrain Models. ISPRS Journal of Photogrammetry and Remote Sensing, 54 (2/3):95-104.

Plietker, B. (1994). Semi-automatic Revision of Street Objects in ATKIS Database DLM 25/1. International Archives ofPhotogrammetry and Remote Sensing , Vol. 30, Part 4, pp. 311-317.

Price, K.E. (1984). Matching Closed Contours. Proceedings of7th international Confer- ence on Pattern Recognition, pp. 990-992.

Price, K.E. (1999). Road Grid Extraction and Verification. International Archives ofPho- togrammetry and Remote Sensing, Vol. 32, Part 3-2W5, pp. 101-106.

Ramer, U. (1972). An Iterative Procedure for the Polygonal Approximation of Plane Curves. Computer Graphics and Image Processing, 1:244-256.

Ramond, K.K.Y., Ho, W.P. (1998). A Multi-level Dynamic Programming Method for Ste- reo Line Matching. Patter Recogniton Letters, 19 (9):839-855. 162 BIBLIOGRAPHY

Richards, J.A., Jia, X. (1999). Remote Sensing Digital Image Analysis. Springer-Verlag, Berlin.

Rosenfeld, A., Hummel, R., Zucker, S. (1976). Scene Labelling by Relaxation Operations. IEEE Transactions on System, Man and Cybernetics, 6 (6):420-433.

Rousseeuw, P.J., Leroy, A.M. (1987). Robust Regression and Outlier Detection. John Wiley and Sons, New York.

Ruskone, R. (1996). Road Network Automatic Extraction by Local Context Interpretation: Application to the Production of Cartographic Data. Ph. D. Thesis, Marne-La-Vallee University, France.

Sang, H.P., Kyoung, M.L., Sang, U.L. (2000). A Line Feature Matching Technique Based on an Eigenvector Approach. Computer Vision and Image Understanding,77 (3):263-283.

Sato, Y., Nakajima, S., Atsumi, H., Koller, T., Gerig, G., Yoshida, S., Kikinis, R. (1997). 3D Multi-scale Line Filter for Segmentation and Visualization of Curvilinear Struc- tures in Medical Images. In: Troccaz, J., Grimson, E., Moesges, R. (Eds.), Computer Vision, Virtual Reality and Robotics in Medicine and Medical Robotics and Com- puter-Assisted Surgery, Vol. 1205 of Lecture Notes in Computer Science, Springer- Verlag, Berlin, pp. 213-222.

Shao, J., Mohr, R., Fraser, C. (2000). Multi-image Matching Using Segment Features. International Archives ofPhotogrammetry and Remote Sensing , Vol. 33, Part B3, pp. 837-844.

Schiewe, J. (2000). Improving the Integration of Digital Surface Models. International Archives of Photogrammetry and Remote Sensing, Vol. 33, Part B3, pp. 807-814.

Schwermann, R. (1994). Automatic Image Orientation and Object Reconstruction Using Straight Lines in Close-range Photogrammetry. International Archives ofPhoto- grammetry and Remote Sensing, Vol. 30, Part 5, pp. 349-356.

Seiz, G., Baltsavias, E.P., Grün, A. (2002). Cloud Mapping From the Ground: Use of Pho- togrammetric Methods. Photogrammetric Engineering and Remote Sensing,68 (9):941-951.

Shapiro, L.G., Haralick, R.M. (1981). Structural Description and Inexact Matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3 (9):504-519.

Sha’ashua, A. (1994). Trilinearity in Visual Recognition by Alignment. Proceedings of European Conference of Computer Vision, Vol. 1, pp. 479-484.

Sha’ashua, A., Ullman, S. (1988). Structural Saliency: The Detection of Globally Salient Structures Using a Locally Connected Network. Proceedings of2nd International Conference on Computer Vision, pp. 321-327. BIBLIOGRAPHY 163

Sherman, D., Peleg, S. (1990). Stereo by Incremental Matching Contours. IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 12 (11):1102-1106.

Smith, R.W. (1987). Computer Processing for Line Images: A Survey. Pattern Recogni- tion, 20 (1):7-15.

Spetsakis, M.E., Aloimonos, J. (1990). Using Line Correspon- dences. International Journal of Computer Vision, 4 (3):171-183.

Steger, C. (1996). An Unbiased Detector of Curvilinear Structures. Technical Report FGBV-96-03, FGBV, Informatik IX, Technical University Munich.

Stein, F., Medioni, G. (1992). Structural Indexing: Efficient 3-D Object Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 (2):125-145.

Stein, F., Medioni, G. (1992). Structural Indexing: Efficient 2-D Object Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14 (12):1198-1204.

Stilla, U., Hajdu, A. (1994). Map-aided Structural Analysis of Aerial Images. Interna- tional Archives ofPhotogrammetry and Remote Sensing , Vol. 30, Part 3/2, pp. 475- 482.

Taylor, C.J., Kriegman, D.J. (1995). Structure and Motion from Line Segments in Multiple Images. IEEE Transactions on Pattern Analysis and Machine Intelligence,17 (11):1021-1032.

Ton, J., Jain, A.K., Enslin, W.R., Hudson, W.D. (1989). Automatic Road Identification and Labeling in Landsat 4 Images. Photogrammetria, 43:257-276.

Trinder, J.C., Wang, Y., Sowmya, A., Palhang, M. (1997). Artificial Intelligence in 3-D Feature Extraction. In: Grün, A., Baltsavias, E.P., Henricsson, O. (Eds.), Automatic Extraction ofMan-made Objects from Aerial and Space Images (II), Birkhäuser Verlag, Basel, pp. 257-266.

Trinder, J.,C., Maulik, U., Bandyopadhyay, S. (2000). Semi-automated Feature Extraction Using Simulated Annealing. International Archives ofPhotogrammetry and Remote Sensing, Vol. 33, Part B3/2, pp. 905-908.

Udupa, J.K., Ajjanagadde, V.G. (1990). Boundary and Object Labeling in 3D Images. Computer Vision, Graphics, and Image Processing, 51 (3):355-369.

Voegtle, T., Steinle, E. (2000). 3-D Modelling of Buildings Using Laser Scanning and Spectral Information. International Archives ofPhotogrammetry and Remote Sens- ing, Vol. 33, Part B3, pp. 927-934.

Vosselman, G. (2000). Slope Based Filtering of Laser Altimetry Data. International Archives of Photogrammetry and Remote Sensing, Vol. 33, Part B3, pp. 935-942.

Voorhees, H., Poggio, T. (1987). Detecting Blobs as Textons in Natural Images. Proceed- ings of DARPA Image Understanding Workshop, pp. 892-899. 164 BIBLIOGRAPHY

Vosselman, G., Knecht, J.D. (1995). Road Tracing by Profile Matching and Kalman Filter- ing. In: Grün, A., Kuebler, O., Agouris, P. (Eds.), Automatic Extraction ofMan- made Objects from Aerial and Space Images, Birkhäuser Verlag, Basel, pp. 255-264.

Vosselman, G., Gunst, M.D. (1997). Updating Road Maps by Context Reasoning. In: Grün, A., Baltsavias, E.P., Henricsson, O. (Eds.), Automatic Extraction ofMan- made Objects from Aerial and Space Images (II), Birkhäuser Verlag, Basel, pp. 267- 276.

Vosselman, G. (1999). Building Reconstruction Using Planar Faces in Very High Density Height Data. International Archives ofPhotogrammetry and Remote Sensing , Vol. 32, Part 3-2W5, pp. 87-92.

Waltz, D. L. (1975). Understanding Line Drawings of Scene with Shadows. In: Winston, P.H. (Ed.), The Psychology of Computer Vision, McGraw-Hill, New York, USA.

Wang, J., Paul, M.T., Philip, J.H. (1992). Road Network Detection from SPOT Imagery for Updating Geographical Information System in the Rural-urban Fringe. Interna- tional Journal of Geographical Information Systems, 6 (2): 141-157.

Wang, L., Pavlidis, T. (1993) Direct Gray-scale Extraction of Features for Character Rec- ognition. IEEE Transactions on Pattern Analysis and Machine Intelligence,15 (10):1053-1067.

Wang, Y. (1994). Structural Matching for Non-Metric Images. Integrating Photogrammet- ric Techniques with Scene Analysis and Machine Vision III, SPIE Vol. 3072, pp. 143-152.

Wehr, A., Lohr, U. (1999). Airborne Laser Scanning - An Introduction and Overview. ISPRS Journal of Photogrammetry and Remote Sensing, 54 (2/3):68-82.

Weidner, U., Förstner, W. (1995). Towards Automatic Building Extraction from High Res- olution Digital Elevation Models. ISPRS Journal ofPhotogrammetry and Remote sensing, 50 (4):38-49.

Weise, R., Boldt, M. (1986). Geometric Grouping Applied to Straight Lines. Proceedings ofIEEE Computer Society Conference on Computer Vision and Patter Recognition , pp. 489-495.

Wiedemann, C. (1999). Completion of Automatically Extracted Road Networks Based on the Function of Roads. International Archives ofPhotogrammetry and Remote Sens- ing, Vol. 32, Part 3-2W5, pp. 209-215.

Wiedemann, C. (2002). Improvement of Road Crossing Extraction and External Evalua- tion of the Extraction Results. International Archives ofPhotogrammetry and Remote Sensing, Vol. 34, Part 3B, pp. 297-300. BIBLIOGRAPHY 165

Wiedemann, C., Heipke, C., Mayer, H., Hinz, S. (1998). Automatic Extraction and Evalu- ation of Road Network from MOMS-2P Imagery. International Archives ofPhoto- grammetry and Remote Sensing, Vol. 32, Part 3/1, pp. 285-291.

Wiedemann, C., Hinz, S. (1999). Automatic Extraction and Evaluation of Road Networks from Satellite Imagery. International Archives ofPhotogrammetry and Remote Sensing, Vol. 32, Part 3-2W5, pp. 95-100.

Williams, M.L., Wilson, R.C., Hancock, E.R. (1997). Multiple Graph Matching with Bayesian Inference. Patter Recognition Letters, 18 (11/13):1275-1281.

Willrich, F. (2002). Quality Control and Updating of Road Data by GIS-driven Road Extraction from Imagery. International Archives ofPhotogrammetry and Remote Sensing, Vol. 34, Part 4, pp. 761-767.

Zhang, Y. (1994). Integration of Segmentation and Stereo Matching. Ph. D. Dissertation, Technical University of Delft, The Netherlands.

Zafiropoulos, P., Schenk, T. (1998). Color-based Energy Modeling for Road Extraction. International Archives ofPhotogrammetry and Remote Sensing , Vol. 32, Part 3/1, pp. 408-417.

Zhang, C., Baltsavias, E.P. (2000). Knowledge-Based Image Analysis for 3-D Edge Extraction and Road Reconstruction. International Archives ofPhotogrammetry and Remote Sensing, Vol. 33, Part B3/2, pp. 1008-1015.

Zhang, Z. (1995). Estimating Motion and Structure from Correspondences of Line Seg- ments between Two Perspective Images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17 (12):1129-1139.

Zhao, H., Kumagai, J., Nakagawa, M., Shibasaki, R. (2002). Semi-automatic Road Extrac- tion from High-resolution Satellite Image. International Archives ofPhotogramme- try and Remote Sensing, Vol. 34, Part 3B, pp. 406-411.

Zhu, M., Yeh, P. (1986). Automatic Road Network Detection on Aerial Photographs. Pro- ceedings ofCVPR’86 Computer Society Conference on Computer Vision and Pat- tern Recognition, pp. 34-40.

Zlotnick, A., Carnine, P.D. Jr. (1993). Finding Road Seeds in Aerial Images. Computer Vision, Graphics, and Image Processing, 57 (2):243-260.

Zucker, S., Hummel, R., Rosenfeld, A. (1977). An Application of Relaxation Labelling to Line and Curve Enhancement. IEEE Transactions on Computer, 26 (4):28-37.

ACKNOWLEDGEMENTS

First of all, I would like to thank my supervisor Prof. Dr. Armin Grün, for letting me work on the project ATOMI, for providing me an excellent environment, and for his direct sup- port to large parts of my research. His serious and significant research had a great influence on my work. Many thanks go to my direct supervisor, Dr. Emmanuel Baltsavias, for his kind supervision and many good ideas. Special thanks for all fruitful discussions, invaluable suggestions and critical remarks. His serious and significant research, and hard working spirit have a great influence on my work and my future life. I would also like to thank him for his effort to ensure my financial support during my study. I am indebted to my co-referent Prof. Dr. Christian Heipke of the University of Hanover, Germany, for evaluating this dissertation and giving all the inspiration and critical guidance in the limited time he could spare for me. I have been quite influenced by his publications and works on road extraction. I would like to thank my ex-advisor Prof. Dr. Shunji Murai of the University of Tokyo, Japan, for his kind supervision, invaluable suggestions, fruitful discussions, constant encouragement and great support. Many thanks also due to Dr. He Changcui of FAO, and Mrs. Zhang Peihong for their con- stant encouragement and great support. The project presented in this dissertation was financially supported by the Swiss Federal Office of Topography, Bern. I also thank the National Geographic Institute of Belgium for letting me use their data in this dissertation. I would like to thank all the ATOMIs, in particular Mr. Christoph Kaeser, Dr. Stefan Voser, Mr. Liam O’Sullivan from L+T for fruitful discussions on road extraction and valuable sug- gestions from a user point of view. I would like to acknowledge all my colleagues at IGP who have helped me in countless ways, in particular Nicola D’Apuzzo, Wang Xinhua, Markus Niederöst, Jochen Willneff, Maria Pateraki, Zhang Li, Daniela Poli, Fabio Remondino, Gabriela Seiz, Martin Sauerbier, Karsten Lambers. All these years I enjoyed their friendliness, their help, their good sprit and their jokes. Beat Rueedin, our computer administrator, provided constant and reliable system support and solutions. The helpness and friendliness of the institute secretaries, especially Mrs. Gertrud Rothenberger, are also greatly appreciated. Last but not least, I wish to express my unfathomable gratitude to my parents, my wife Wu Zhenghong and my brother for their understanding and their enduring patience during this very trying period. I could have never completed this dissertation without their greatest sup- port. In particular, I am in great debt to my wife in taking a lot time from her to concentrate on this study during these years.