Cartographic Generalization Is Usually Performed Globally for a Geographic Data Set Utilizing

Dr. Andriani Skopeliti holds a Dipl. Eng. and a Dr. Eng. in Cartography from the National Technical University of Athens (School of Rural and Surveying Engineering). She is working at the Cartography Laboratory (NTUA) as a researcher and teaching assistant for 12 years. She has an extensive research experience in cartography and GIS. She has co-authored many journal and conference papers and has participated in several European and National projects.

Professor Lysandros Tsoulos

Cartography Laboratory

National Technical University of Athens

Lysandros Tsoulos is professor of Cartography at the Department of Surveying Engineering - National Technical University of Athens [NTUA]. He holds a degree in Surveying engineering from the University of Thessalonica and a Ph.D. from the NTUA. In 1975 he joined the Hellenic Navy Hydrographic Service [HNHS] where he worked for 17 years (Directorate of Cartography and the HNHS Computing Center). In 1992 he was elected as member of the faculty of the School of Surveying Engineering at the NTUA. Currently he is director of the NTUA Geomatics Center. He has coordinated a number of National and EU research projects. His research interests include cartographic design, composition and generalization, expert systems in cartography and digital atlases.

Thalia Tzamakou, born in Athens on1980, received her Dipl. Eng. from the school of Rural and Surveying Engineering (NTUA) in 2004. She is currently a student in the geoinformatics postgraduate studies. She is a member of the Technical Chamber of Greece. A COMPARATIVE STUDY OF THE INFLUENCE OF SIMPLIFICATION ALGORITHMS ON LINEAR FEATURES’ SHAPE AND POSITION

Dr. Andriani Skopeliti, Prof. Lysandros Tsoulos & Thalia Tzamakou

Cartography Laboratory, Faculty of Rural and Surveying Engineering

National Technical University of Athens

Email: [email protected], [email protected], [email protected]

Abstract

This paper presents a comparative study of several simplification algorithms and their influence on two elements inherent to linear features: shape and horizontal position. Algorithms, which use different criteria for vertex elimination, are applied to lines with different shape complexity, using a range of tolerance values to produce generalization results for a number of map scales. The evaluation is performed through a quantitative assessment of the results, utilizing shape distortion measures and horizontal position displacement measures.

1. Problem Definition

The scope of this paper is to study the influence of line simplification algorithms on the shape and the horizontal position of the resulting lines based on three factors:

 The algorithm, as a mechanism that alters the geometry of a line by eliminating a number of vertices

 The tolerance value, as a factor that determines the severity of the influence of the algorithm

 The character/shape of cartographic line, subject to the simplification process.

In accordance with the general research plan envisioned by the authors (Skopeliti 2001, Skopeliti & Tsoulos 2001), the experiment is conducted in the following framework: a. Simplification algorithms: Seven (7) widely used simplification algorithms (Euclidean Distance, Rewman - Witkam, Zhao and Saalfeld, Douglas Peucker, Visvalingam-Whyatt, Wall – Danielson and Latecki - Lakamper) that use different geometric criteria and process the cartographic line in various ways are compared with respect to their influence on cartographic lines. b. Tolerance values: An incrementally changing range of tolerance values are used. This enables the detailed monitoring of the algorithms’ effect on the cartographic line. c. Data: Segments of the US coastline comprise the test data set with varying line characteristics and complexity at scale ~ 1:40 0000. d. Measures: Measures that have been introduced by the authors are used to describe quantitatively the shape of the simplified lines along with adopted measures for the horizontal position displacement.

The paper is organized into five sections: The first section elaborates on the simplification algorithms used in the study and the way they are applied. The measures used for the description of the shape and the displacement of the horizontal position of the simplified lines follow. The third section refers to the data set of lines of varying complexity used in the experiment. In the fourth section the results of the comparative study of the simplification algorithms in terms of shape change and horizontal position displacement are analysed, taking into account the line shape complexity. Finally, conclusions and suggestions for further research are drawn.

2. Line Simplification Algorithms

Seven (7) simplification algorithms are used in this study. Each algorithm utilizes a measure/tolerance associated with the geometry of the line and processes the line either locally, unconstrained extended locally or globally through the elimination of a number of vertices. The list of the algorithms used along with their basic characteristics appears in Table 1.

Algorithm Measure Processing Euclidean Distance [ed] Distance Local Rewman - Witkam (Reumann and Witkam, Bandwidth Unconstrained 1974) [rw] extended local Zhao - Saalfeld (Zhao and Saalfeld, 1997) [zs] Bandwidth Unconstrained extended local Douglas - Peucker (Douglas and Peucker, 1973) Distance Global [dp] perpendicular to the baseline Visvalingam -Whyatt (Visvalingam and Whyatt Area Global 1993) [vw]

Wall - Danielson (Wall and Danielson, 1984 ) Area Unconstrained [wd] extended local

Latecki and Lakamper (Latecki and Lakamper, Angle & Global 1999) [ll] Distance

Table 1. Simplification algorithms

The detailed description of each of these algorithms is beyond the scope of this paper and can be found in the original publications mentioned above or elsewhere (e.g. McMaster 1987, Shi & Cheung 2006, etc.).

Since the tolerance represents a different entity for each algorithm (e.g. distance – area) and the way they are used by different algorithms differs, the range of values utilized varies considerably. As a result, the comparison of the influence of the algorithms cannot be based on tolerance values and another measure is required to quantify the algorithms’ influence and allow for the comparison of the results. This measure indicates the percentage of the initial vertices that make up the vector line that are retained after the simplification process and it will be called hereafter vertex percentage. Utilizing the vertex percentage the comparison of the influence of different algorithms on a line is feasible. Several researchers have used this measure to compare simplification algorithms such as McMaster (1987, 1989), Shi and Cheung (2006) etc. In this paper, the vertex percentage is constrained in the range from 100% to 35% for all the algorithms applied. This approach sets a common lower boundary as to the extent of simplification applied to the lines and limits the algorithms’ effect. However, for the comparison of the results of more than one line for each algorithm, the vertex percentage cannot be used, since it differs for each line. Thus, the tolerance values for each algorithm are transformed into a common range e.g. 0.1-1. 3. Measures for comparison of line simplification results

4. 3.1 Line shape measures

The authors previously developed a methodology for the parametric description of the shape of natural linear features i.e. coastlines, streams, (Skopeliti & Tsoulos, 2001). The fundamental concept in this approach was the use of measures, which are calculated at different levels of resolution. Utilizing the complete set of measures/parameters found in the literature (Buttenfield 1991, Bernhardt 1992, McMaster 1986, Jasinski 1991, etc.) and an extended data set with line segments varying considerably in shape, a subset of measures describing linear features’ shape were selected (Skopeliti et al., 2005).

Data at scale 1:40000 were derived from the National Oceanic and Atmospheric Administration’s Medium Resolution Digital Vector Shoreline (NOAA, 2003). Line segments selected had a minimum length and the required variety in shape and they were characterized as: very smooth, smooth, sinuous and very sinuous. In order to support statistical processing, the data set was composed of 133 line segments. Principal Components Analysis – PCA was performed on the measures computed for all the linear segments in the data set. Based on the “scree” test and the “latent root” criterion three factors retained, which represented 91% of the variance of the tested measures. Taking into account the PCA results and cartographic judgment the three (3) measures adopted for the parametric description of lines’ shape are: the “average magnitude angularity”, the “error variance” and the “average angularity”. The measures selected exhibit low correlation between them and - therefore - they can be used in a cluster analysis procedure for their classification into groups of similar complexity. For hierarchical cluster analysis, the Ward method and the Euclidean distance are utilized and the lines in the dataset are successfully classified into the above mentioned four (4) categories/groups.

The assessment of the quality of linear features due to generalization calls for the quantitative description of the resulting lines’ shape. Shape change of a linear feature with respect to its original form can be assessed through the description of the resulting shape with the above group of measures. In cluster analysis the distance between two lines in the parameters’ space implies similarity. The distance between the original and the generalized line implies shape change. Based on the above two (2) measures are adopted for the quantitative assessment of shape change due to generalization: a. the shape of the generalized line described by the above mentioned group of measures and b. the change in the generalized line’s shape in comparison with its original shape (implemented by the distance between the original and the generalized line in the parameters’ space).

Figure 1. Experimental data classified in four (4) groups:

smooth, smooth with strong arc, sinuous and very sinuous

3.2 Measures for horizontal position displacement

The research community has identified a number of measures for the assessment of the positional displacement between the original and the generalized line. In this study the average Euclidean distance from the original to the generalized line or from the generalized to the original line, the Hausdorff distance (Abbas et al., 1995) and the ratio of the area between the original and the generalized line to the length of the original line (McMaster, 1987) are used.

5. Data set used for the comparison of the line simplification algorithms

A subset of the data set (38 lines) from the previous analysis is used. Lines are classified in four (4) groups: smooth, smooth with strong arc, sinuous and very sinuous (Figure 1).

5. Comparative study of the line simplification algorithms

6. 5.1 Simplification algorithms and their effect on the shape of the lines

In this section, the algorithms’ influence on the shape of lines for a common range of vertices percentage is assessed based on: a. the average shape value of the original lines and b. The average shape value of the simplified lines for each tolerance value and algorithm.

Or. Lines %

s Average max min Tolerance Vertices’ Max Min Average Average m h t i

r Shape tolerance tolerance increment Percentage Shape Shape Shape Shape o g l Value value (m) value (m) (m) (%) range Value Value Value Change A ed 30.31 785 1 1 100-35 30.31 18.95 23.84 21% dp 30.31 86.75 0.25 0.25 100-35 30.31 26.60 28.23 7% rw 30.31 188.85 0.1 0.25 100-35 30.31 19.59 24.78 18% zs 30.31 85.6 0.1 0.5 100-35 29.12 21.34 25.87 15% vw 30.31 19981 1 20 100-35 30.31 24.87 26.96 11% wd 30.31 15141 1 20 100-35 30.31 13.99 19.58 35% ll 30.31 47801 1 100 100-35 30.31 10.20 15.20 50%

Table 2. Summary of qualitative results on the simplified lines’ shape change and

the range of tolerance values for each algorithm

Table 2 shows the range of tolerance values, the vertices percentage range, the characteristic values of the simplified lines’ shape and the average shape change for each algorithm. It is observed that according to the influence of the algorithms on the simplified lines shape for a common range of vertex percentage, they can be classified according to their performance as follows: Douglas Peucker, Visvalingam - Whyatt, Zhao - Saalfeld, Rewman - Witkam, Euclidean Distance, Wall - Danielson, Latecki – Lakamper, utilizing the Minimum Shape Value or the Average Shape Value or the percentage of the Average Shape Change of the simplified lines. Three columns of Table 2, list the minimum and the maximum value of the tolerances applied for each algorithm and the relevant increment. One can see the considerable difference in these values, which is expected, and it is due to the different geometric entities they represent. The difference in the range of the tolerance values used by the Zhao - Saalfeld and the Rewman - Witkam algorithms is also observed. Although the tolerance value represents the same geometric measure in these two algorithms, that of a bandwidth, it is applied in a different way resulting in faster elimination of vertices for the first algorithm and thus smaller tolerance values are needed to achieve a certain vertices percentage. The tolerance values for the Zhao - Saalfeld algorithm are similar to those of the Douglas - Peucker algorithm.

24 e p a h S

e 20 g a r e v A

dp ed rw vw 12 zs ll wd Linear (vw) Linear (dp) Linear (rw) Linear (ed) Logarithmic (wd) Linear (zs) Logarithmic (ll) 8 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 1,1 Transformed Tolerance Values Figure 2. Average shape values and transformed tolerance values

in the range of 0.1 – 1 for each algorithm

In Figure 2, the average shape value for each algorithm and tolerance value is plotted in the same graph for all the algorithms. This is implemented by transforming the tolerance values utilized by each algorithm in a common range (0.1 – 1), by applying an algorithm specific linear transformation to the original tolerance values. In this graph, one can observe that a straight line is fitted to the graphs of five of the algorithms and a logarithmic line is fitted to the remaining two of them. According to the slope of the linear model, these algorithms can be classified as follows: Douglas - Peucker, Visvalingam - Whyatt, Zhao - Saalfeld, Rewman - Witkam, and Euclidean Distance. The linear model implies that the shape of the simplified lines changes with a constant rate. According to the slope of the logarithmic model, these algorithms can be classified as follows: Wall - Danielson and Latecki - Lakamper. The logarithmic model implies that the shape of the simplified lines changes with a fast rate and then it is stable. It must be pointed out that the classification of the algorithms, which is based on the trend of the shape change and the rate of change, is consistent with their previously presented classification (the one based on their influence on the lines shape). 5.2 Simplification algorithms and their effect on the shape of groups of lines with different line complexity

In this section, the existence of a relationship between the algorithms’ influence on a set of lines grouped according to their shape is examined. In Figure 3, the average shape and the maximum tolerance value are shown for each group and algorithm. It is observed that the average shape values are clearly distinguishable for each group and form a zone for each group for the various algorithms. Regarding the tolerance values used to maintain a vertex percentage range between 100% and 35%, there is a common pattern for the following algorithms: Euclidean Distance, Wall - Danielson and Latecki - Lakamper and another a common pattern for the algorithms Douglas - Peucker, Visvalingam - Whyatt, Zhao - Saalfeld and Rewman - Witkam. In the first case, the maximum tolerance values have a declining trend as the complexity of the lines in the groups rises. For the Euclidean distance algorithm, this is due to the fact that the resolution of the digitized lines in the spatial database is relevant with the complexity of the lines. As a result different tolerance values are needed to achieve the same vertex conservation percentage for the smoother lines that were digitized with smaller resolution. The Wall - Danielson algorithm, which summarizes the area formed by consecutive triplets of vertices until a limit is reached, uses a measure with values that increase as the complexity of the line decreases. The same is true for the Latecki - Lakamper algorithm, which uses a measure based on three consecutive vertices, which is equal to the product of the lengths between the vertices and the turning angle. Small values of lengths and angles are present in the more complicate lines and higher in the smoother ones. In the second case, the algorithms have a common pattern which exhibits: the highest tolerance values for Group 3, which consists of more complex lines but details with less amplitude than Group 4, medium values for Group 2, which consists of smooth lines but with an arc and thus with an amplitude and the smaller values for Group1, which consists of smooth lines but with no arc. This observation is logical for these four algorithms that work with the bandwidth or the distance perpendicular to the baseline criterion for vertex elimination.

A graph created for each algorithm and the tolerance values applied with the average shape value for each group of lines, shows that the trend lines for the different groups for each algorithm do not differ from those of the individual lines. This proves that the pattern of influence of these simplification algorithms, which simply eliminate vertices utilizing a geometric criterion, is common for each line they process within the group independently of the lines’ shape characteristics.

70 1 100000 0 8 7 1 4 0 3 1 8 8 1 2 1 9 4 2 9 1 1 1 1 1 7 1 0 4 4 4 5 4

Tol. Value Group1 4 2 3 3

60 1 1 2 2 1 1 1 Tol. Value Group2 1 1 1 Tol. Value Group3 10000 1 1 0 4

Tol. Value Group4 7 2 4 1 4

Average Shape Group 1 4 ) 50 7 e Average Shape Group 2 2 l a

Average Shape Group 3 c S Average Shape Group 4 c i 5 8 3 m

7 1000 6 h e 6 t i

p 40 r 9 a 7 a h 5 3 g s 8

o , e 8 L ( g 8

1 a 1 e 5 r u 1 5 e l 7 6 v a , 1 ,

30 , 6 1 5 A V , 9

1 8 8 6 , 8 7 100 e , 0 6 5 c 6 6 , 5 n 4 3 a 4 4 5 r 1 5 , 2 e , 2 l 7 , 4 2 o 2 2 T 20 2

0 1 ed dp rw zs vw wd ll

Figure 3. Average shape values (lines) and maximum tolerance values (bars) for each algorithm and each group of lines

6. Horizontal position and algorithms

The displacement of the position of the lines resulting from the simplification algorithms is quantified by the measures described in paragraph 3.2. In table 3, only the values of the average distance from the original to the generalized line and the average Hausdorff distance are reported. The average distance from the generalized to the original line is always smaller than the average distance from the original to the generalized line and has quite similar values to the ratio of the area between the original and the generalized line to the length of the original line. Due to this it has been decided to comment on only these two measures that report the average and the maximum of the horizontal position change.

dp vw zs rw ed wd ll Average distance from the original to the simplified line (m) 11.4 15.9 27.0 27.5 29.7 57.9 60.2 Average Hausdorff distance (m) 37.3 60.9 82.4 95.2 89.5 152.2 203.5

Table 3. Horizontal position change measures

Regarding the influence of the algorithms on the horizontal position for a common range of vertex percentage it becomes obvious from Table 3 that the algorithms can be classified in the following order utilizing both measures: Douglas - Peucker, Visvalingam - Whyatt, Zhao - Saalfeld, Rewman - Witkam, Euclidean Distance, Wall - Danielson and Latecki - Lakamper. In addition to this and concerning the displacement of the horizontal position as it is presented in a graph with the position displacement values and the transformed tolerance values, it appears that algorithms can be classified into two groups based on two factors: a. the values of the horizontal position change, expressed by the average distance or the Hausdorff distance and b. the model of the trend line fitted to these values. The first group consists of the following algorithms: Douglas Peucker, Visvalingam - Whyatt, Zhao - Saalfeld, Rewman - Witkam, and Euclidean Distance, whose horizontal position displacement values are smaller and display a linear trend. The second group consists of the Wall - Danielson and Latecki – Lakamper algorithms whose horizontal position displacement values are greater and display a logarithmic trend.

7. Conclusions

This paper elaborates on the results of a case study on the comparison of seven well-known simplification algorithms regarding their influence on the shape and on the horizontal position displacement of the cartographic line. Lines of various shapes and degrees of complexity are used; a range of tolerance values is applied with each algorithm in order to achieve a common vertex percentage range. Several conclusions are drawn regarding: a. the severity of the algorithms’ effect on the shape and the displacement of the horizontal position of the lines, b. the trend of change for these values with relation to the tolerance values and c. the relation between the algorithms’ behavior and the shape of the original line. In the future, it is the authors’ intention to explore in more detail these issues by including more sophisticated algorithms such as Li and Openshaw, 1992, Wang & Muller 1989) etc. The results will be evaluated visually in order to establish a relationship between the quantitative simplification results and the cartographic judgment. This will inevitably lead to the creation of a knowledge base for the “automatic” generalization of linear features.

8. References

Abbas, I, Grussenmeyer, P & Hottier, P 1995, ‘Controle de la planimetrie d’ une base de donnes vectorielles: une nouvelle methode basee sur la distance de Hausdorff: la methode du controle lineaire’ Bullletin S.F.T.P. no. 137 (1995-1), pp. 6-11. Bernhardt, M C 1992, ‘Quantitative characterization of cartographic lines for generalization’, MSc Thesis, Department of Geodetic Science and Surveying, Ohio State University, Columbus, Ohio.

Buttenfield, B 1985, ‘Treatment of cartographic line’, Cartographica 22(2): 1-26.

Douglas, D & Peucker, T 1973, ‘Algorithms for the reduction of the number of points required to represent a digitized line or its caricature’, The Canadian Cartographer, vol. 10, pp. 112–122.

Jasinski, M J, 1990, ‘The comparison of complexity measures for cartographic lines’, NCGIA Report no. 90-1, NCGIA, Santa Barbara, CA.

Latecki, L J & Lakamper, R 1999, ‘Polygon evolution by vertex deletion’ in Nielsen, M Johansen P, Olsen, O F & Weickert, J Editors, 1999, ‘Scale-Space Theories in Computer Vision, Proceedings of International Conference on Scale-Space'99’, vol. LNCS 1682.

Li, Z & Openshaw, S 1992, ‘Algorithms for automated line generalization based on a natural principle of objective generalization’, IJGIS, 6 (5), pp. 373-389.

McMaster, R B, 1986, ‘A statistical analysis of mathematical measures for linear simplification’, The American Cartographer, vol. 13(2), pp. 103-116.

McMaster, R B 1987, ‘Automated line generalization’, Cartographica 24(2), pp. 74-111.

National Oceanic & Atmospheric Administration (NOAA) ’Medium Resolution Digital Vector Shoreline’, http://sposerver.nos.noaa.gov/projects/shoreline/ shoreline.html.

Reumann, K. & Witkam, A P M 1974, ‘Optimizing curve segmentation in computer graphics’, In Proceedings of International Computing Symposium, North- Holland Publishing Company, Amsterdam, pp. 467–472.

Shi & Cheung 2006, ‘Performance Evaluation of Line Simplification Algorithms for Vector Generalization’, The Cartographic Journal, vol. 43 no. 1, pp. 27–44.

Skopeliti, A 2001, ‘Error Assessment in Cartographic Data Bases’, PhD Thesis, Cartography Laboratory, National Technical University of Athens, Greece.

Skopeliti, Α. & Tsoulos, L, ‘A Methodology for the Assessment of Generalization Accuracy’, Fourth Workshop on Progress in Automated Map Generalization, August 2-4, 2001, Beijing, China. Skopeliti, A Tsoulos, L & Tzamakou T, ‘The Shape of linear features – Structure Definition and Processing Over the Web’, in Proceedings GIS PLANET 2005, ISBN 972- 97367-5-8, Estoril Portugal, 30 May – 2 June, 2005.

Visvalingam, M & Whyatt, J D 1993, ‘Line generalization by repeated elimination of points’, The Cartographic Journal, 30 (1), pp. 46-51.

Wall, K & Danielsson, P E 1984, ‘A Fast Sequential Method for Polygonal Approximation of Digitized Curves’, Computer Vision, Graphics and Image Processing, vol. 28, pp. 220- 227.

Wang, Z & Muller, J C 1998, ‘Line generalization based on analysis of shape characteristics’, Cartography and GIS, vol. 25 (1).

Zhao, Z & Saalfeld, A 1997, ‘Linear-time sleeve-fitting polyline’, In Autocarto 13, ACSM/ASPRS’97 Technical Papers, Seattle, Washington, April 1997, vol. 5, pp. 214–223.