<<

Scientific Research and Essays Vol. 6(13), pp. 2850-2857, 4 July, 2011 Available online at http://www.academicjournals.org/SRE DOI: 10.5897/SRE11.565 ISSN 1992-2248 ©2011 Academic Journals

Full Length Research Paper

Clustering analysis of the districts in for traffic accidents between 2002 and 2007

Ahmet 1*, Nuriye KABAKUS 1, M. Yasin CODUR 1, Ahmet ATALAY 2, and Necla ULUGTEKIN 3

1Faculty of Engineering, Atatürk University, 25240, Erzurum, . 2Vocational College of , Atatürk University, 25530, Narman, Erzurum, Turkey. 3Faculty of Civil Engineering, Technical University, 34469, Istanbul, Turkey.

Accepted 8 June, 2011

In this study, clustering analysis was done by using date of road traffic accidents (RTAs) in districts of Erzurum in Turkey occurring at 2002 to 2007 years. Province of Erzurum has eighteen districts. Road surface situation, solstice, vehicle type and number of RTAs are used in clustering analysis. Clustering analysis was done by using both traditional k-means and fuzzy c-means techniques. Districts are divided five cluster by clustering analysis are done according to two techniques. Also five risk levels were identified by center values of clusters. Risk levels of districts were demonstrated in thematic maps. The thematic maps were constituted by using geographical information systems (GIS). The thematic maps demonstrated members of cluster that districts are separated by clustering analysis according to both traditional k-means and fuzzy c-means techniques. Results obtained from this study were compared. It was observed that fuzzy c-means technique gives accurate and consistent results at least k-means technique. Also, It was determined that GIS is advantageous to show and understand the results on the thematic maps.

Key words: Clustering analysis, K-means, Fuzzy c-means, geographical information systems, road traffic accident.

INTRODUCTION

Road traffic accidents (RTAs) have carried on problem clustering is to find out the classification structure of the that is the most important as our country. In year, far from data. The term ‘‘classification” was defined by Platts 5000 RTAs have been occurred in Turkey and these are (1980) as “ordering or arranging objects into groups or concluded with 5000 deaths and 160,000 injuries sets on the basis of their similarities or relationships” (Anonymous, 2006).These number have reflected deaths (Budayan et al., 2009). and injuries in accident location. In many European The clustering analysis had been done in two different countries, accident following time has been used later forms in this study. Firstly, traditional k-means method accident. In the time, occurred deaths and injuries are had been used. Secondly, fuzzy c-means method had registered. Therefore, the number of deaths and injuries been used. In clustering analysis, number of RTAs, road are more than registered statistics in our country. surface situation in accident location, vehicle type in In this study, clustering analysis had been done accident and solstice at accident time were used. GIS according to RTAs occurred in province of Erzurum for software was used to demonstrate the results of 2002 to 2007. Clustering is defined as ‘‘a mathematical clustering analysis. Thematic maps of districts are done technique designed for revealing classification structures by using GIS software. The province of Erzurum have in the data collected in the real world phenomena” nineteen districts, in study eight districts are examined. (Mirkin, 1996). In other words, the main purpose of Because the district of is new district, data of the district is not. The district of Pazaryolu is showed by white color in thematic maps. The purpose of this study is both to group similar *Corresponding author. E-mail: [email protected] districts of Erzurum according to RTAs and to compare Tortum et al. 2851

Table 1. The data base of this study.

Number of RTAs Score of road surface Score of Score of vehicle type District name (NRTAs) situation (SRSS) solstice (SS) (SVT) Erzurum 2087 3.12 5.64 3.55 Akale 289 0.16 0.50 0.25 Ilıca 61 -0.22 -0.41 -0.15 192 -0.05 0.04 0.05 Pasinler 160 -0.08 -0.04 -0.14 Köprüköy 49 -0.25 -0.45 -0.35 13 -0.35 -0.50 -0.32 Hınıs 31 -0.30 -0.45 -0.27 Karayazı 11 -0.32 -0.53 -0.32 15 -0.32 -0.52 -0.32 Karaçoban 4 -0.36 -0.54 -0.32 Narman 25 -0.32 -0.47 -0.38 7 -0.31 -0.42 -0.32 Đspir 46 -0.25 -0.45 -0.27 170 0.76 0.00 0.07 Pazaryolu 0 0.00 0.00 0.00 9 -0.35 -0.53 -0.32 Tortum 54 -0.24 -0.44 -0.24 enkaya 34 -0.31 -0.44 -0.22

Table 1. According to traditional k-means clustering center value of cluster.

Traditional k-means Variable Cluster number 1 2 3 4 5 NRTAs 289.00 14.38 2087.00 174.00 48.80 SRSS 0.16 -0.33 3.12 0.21 -0.25 SS 0.50 -0.50 5.64 0.00 -0.44 SVT 0.25 -0.32 3.55 -0.01 -0.25

the results of different clustering techniques, namely expressed with only number. These factor scores were used in traditional cluster analysis and fuzzy c-means. clustering analysis. Data base used in this study were shown in Table 1. Factor analysis is way to reduce a large number of variables into MATERIALS AND METHODS a smaller number of dimensions (factors) that are comprehensible. Principal components analysis, which was used in the present Data base was used this study, it had been gotten from statistics of study, makes it possible for a few dimensions to account for most of RTAs for 2002-2007 years. The statistics were prepared by police the information in a large data set, especially if there is substantial registrations. In this study, database are occurred number of RTAs, redundancy. A principal components solution has the property that number of RTAs in road surface situation, number of RTAs in each component is independent of all the others. There are no solstice and number of vehicle type in accident. Also, in this study, underlying assumptions; each principal component is an exact aggregated spatial data had been used, because there is not weighted sum of the original variables. The principal components values x and y coordinate of location of RTAs in registrations in model was chosen because it provides the most direct Turkey. representation of the data. Road surface situations are dry, wet, muddy, snowy and iced. In clustering analysis, we used both traditional k-means and Solstice is variables that are daytime, night and twilight. Vehicle fuzzy c-means methods. According to results, two methods were types are bicycle, horse vehicle, motor bicycle, motorcycle, minibus, compared. In traditional k-means, object is in one cluster. In fuzzy pickup, truck, tow truck, bus, tractor, land vehicle, private goal, work c-means, object may be in clusters. Fuzzy c-means identify relate machine, ambulance, tanker and train. Factor analysis was used degree of object to clusters. The sums relate degree of object is these variables that are road surface situation, vehicle types and one (Isık and Camurcu, 2007). solstice. As each variable, factor scores were gotten by using factor In traditional methods, either one object is in one cluster or is not. analysis. These factor scores were used so that the variables are Really, neither one object is completely one cluster nor is not. That 2852 Sci. Res. Essays

is to say, the object is membership to the cluster. This is in fuzzy The fuzziness of the memberships is controlled by m which takes algorithms. In fuzzy algorithms, membership of object is identified. values higher than 1. The closer the m value to 1, the more crisper Fuzzy algorithms include much information than traditional the membership values. As the values of m become progressively algorithms (Kocyigit and Korurek, 2005, Aydin et al, 2006). higher, the resulting memberships become fuzzier (Hammah and Curran, 1998). Pal and Bezdek (1995) advised that m should take values between 1.5 and 2.5, and the number of clusters should be Traditional k-means method between 2 and. More detailed information about the algorithm can be found in Bezdek (1981). K-means method divide unit to cluster and estimate parameter of cluster. K-means methods divide units to cluster with following steps (Ozdamar, 2002; Tatlıdil, 1996). RESULTS AND DISCUSSION

1. According to getting information, it is accepted that mean cluster value is each p variable of fist k observation. Distances of whole In this study, it is shown that districts are grouped by units are calculated to mean cluster values. using traditional k-means and fuzzy c-means methods. 2. Each remaining n-k observation, it is assigned to the most near The aim of study is to identify similar districts according to mean cluster and after each assignation, mean cluster values are RTAs. It was identified that number of cluster is five in usually calculated by Euclidean distance. 3. Variance is minimum in cluster and is maximum between clusters both traditional k-means and fuzzy c-means. We used to as much as whole units are assigned to k cluster. analysis variables of number of RTAs, road surface 4. Covariance matrix is minimum in cluster and is equal to criteria of situation, vehicle types and solstice. According to the convergence or is much small variance different as much as it is variables results of clustering are same as each other, continued to break. thus, thematic maps are showed as either traditional k-

means cluster or fuzzy c-means cluster. Fuzzy c-means method In this study, the district of Pazaryolu was demonstra- ted with white in maps (Figures 1, 2, 3 and 4), because Since the concept of fuzzy sets (Zadeh, 1965) was introduced, Pazaryolu has no date from 2002 to 2007. Districts were fuzzy clustering has been widely discussed, studied, and applied in divided on five clusters by using both traditional k-means various areas, and different fuzzy clustering methods have been developed. One widely used algorithm which is the fuzzy c-means method and fuzzy c-means method (Figures 1 and 3). algorithm, was first presented by Dunn (1974) and further These maps show clusters with different design. Center developed by Bezdek (1981). Subsequent revisions came from values of clusters that are getting by traditional k-means Roubens (1982), Gath and Geva (1989), Gu and Dubuisson (1990) and fuzzy c-means were demonstrated in Tables 1 and and Xie and Beni (1991), Höppner et al. (2000), but the most 3. At the same time, risk degrees of districts are desig- commonly used algorithm has remained as Bezdek’s fuzzy c- nated by center values of clusters in both traditional k- means (Budayan et al., 2009). In this method, as cluster analysis, the following procedures are necessary; means and fuzzy c-means (Tables 2 and 4). According to clusters center values, risk maps were Step 1: Dates are a date series or pattern series X= {x 1, x 2, x 3,…, occurred (Figures 2 and 4). Risk maps demonstrated that xn},in general, c is identified, (2 1 is the fuzziness index. cluster is district of A kale, third cluster is districts of Tortum et al. 2853

Figure1. According to traditional k-means method, clusters of districts.

Pasinler, Oltu and Horasan, fourth cluster is districts of Uzundere and Olur. Two clustering methods have only Ilıca, Đspir, Tortum, Köprüköy and enkaya, fifth cluster is difference that district of Hınıs is in fifth cluster as districts of Çat, Tekman Hınıs, Karayazı, Narman, traditional k-means but in fourth cluster as fuzzy c-means 2854 Sci. Res. Essays

Figure 2. According to k-means clustering, risk map of districts.

(Figures 1 and 3). was different in fuzzy c-means than k-means (Figures 1 It was identified that fuzzy c-means clustering results and 3). Also there is similar situation in risk maps are similar to k-means clustering. The district of Hınıs (Figures 2 and 4). Tortum et al. 2855

Figure 3. According to fuzzy c-means method, clusters of districts.

According to both traditional k-means and fuzzy c- risk with districts, similarly fifth level points to the highest means, risk maps are same cluster maps. Risk maps are risk with districts (Figures 2 and 4). Risk levels were occurred by five risk levels. First level points to the lowest arranged from little to very much. Risk levels were 2856 Sci. Res. Essays

Figure 4. According to fuzzy c-means clustering, risk map of districts.

identified by center values of clusters in both traditional k- density with districts have high risk for RTAs. means and fuzzy c-means methods (Tables 2 and 4). It was identified that the getting results from two clustering methods are similar. It was observed that fuzzy c-means is reliable than traditional k-means in this study. Conclusion Because of this is that fuzzy c-means is more few effected from initial values than traditional k-means In this study, clustering analysis of district had been done (Atalay and Tortum, 2010). by using occurred RTAs data in Erzurum between 2002 Geographical information systems (GIS) are a very and 2007. The clustering analysis is fixed in two different important and comprehensive management tool for traffic form both traditional k-means and fuzzy c-means. The safety. GIS software was used to demonstrate the getting aim of this paper is to identify similar districts for RTAs. results from this study. The demonstration of the results Five clusters are occurred by using both traditional k- had provided by using thematic maps for easily under- means and fuzzy c-means. We identified that districts of stood. This study has demonstrated that it is necessary to Erzurum center and Askale have high risk for 2002 to implement further preventative measures in districts with 2007. The development, high population and traffic high risk determined in this research. Tortum et al. 2857

Table 2. According to traditional k-means values in risk degrees.

Traditional k-means Variable Risk degree 1 2 3 4 5 NRTAs 14.38 48.80 289.00 174.00 2087.00 SRSS -0.33 -0.25 0.16 0.21 3.12 SS -0.50 -0.44 0.50 0.00 5.64 SVT -0.32 -0.25 0.25 -0.01 3.55

Table 3. According to fuzzy c-means clustering center value of cluster.

Fuzzy c-means Variable Cluster number 1 2 3 4 5 NRTAs 173.58 12.33 2087.00 288.87 48.90 SRSS 0.23 -0.33 3.12 0.16 -0.25 SS -0.00 -0.50 5.64 0.50 -0.44 SVT -0.00 -0.32 3.55 0.25 -0.25

Table 4. According to fuzzy c-means clustering values in risk degrees.

Fuzzy c-means Variable Risk degree 1 2 3 4 5 NRTAs 12.33 48.90 173.58 288.87 2087.00 SRSS -0.33 -0.25 0.23 0.16 3.12 SS -0.50 -0.44 -0.00 0.50 5.64 SVT -0.32 -0.25 -0.00 0.25 3.,55

ACKNOWLEDGEMENT Gath I, Geva AB (1989). Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell., 11(7): 773-780. Gu T, Dubuisson B (1990). Similarity of classes and fuzzy clustering. The work described in this paper was supported by a Fuzzy Sets Syst., 34(2): 213-221. grant from the Scientific and Technological Research Hammah RE, Curran JH (1998). Fuzzy cluster algorithm for the Council of Turkey (TUBITAK) under project number automatic identification of joint sets. Int. J. Rock Mech. Min. Sci., 108M046. 35(7): 889-905. Hoppner F, Klawonn F, Kruse R, Runkler T (2000). Fuzzy Cluster

Analysis. John Wiley&Sons, Chichester, p. 300.

Isık M , Camurcu AY (2007). Practically performance fixing of K-means, REFERENCES K-meoids and fuzzy C-means algorithms. Istanbul Commerce

University, Nat. Appl. Sci. J., 6(11): 31-45. Anonymous (2006). Traffic Accidents Statistics, Turkey Statistical Kocyigit Y, Korurek M (2005). Classification of EMG signals by using Institution. Ankara, Turkey.(www.tuik.gov.tr). wave transformation and fuzzy logic classifier. ITU J., 4(3): 25-31. Atalay A, Tortum A (2010). According to Traffic Accidents Between Ozdamar K (2002). Statistical Date Analyses with Package Programs. 1997-2006 Years Clustering Analysis of Provinces in Turkey. 4. Press, Kaan Bookstore. Eski ehir. Pamukkale Univ. J. Eng. Sci., 16(3): 335-343. Pal NR, Bezdek JC (1995). On cluster validity for the fuzzy C-means Aydın AC, Tortum A (2006) Yavuz, M., Prediction of concrete elastic model. IEEE Trans. Fuzzy Syst., 3(3): 370-379. modulus using adaptive neuro-fuzzy inference system. Civil Eng. Roubens M (1982). Fuzzy clustering algorithms and their cluster Environ. Syst., 23: 295-309. validity. Eur. J. Oper. Res., 10(3): 294-301. Bezdek JC (1981). Pattern recognition with fuzzy objective function Sen Z (2004). Modeling principles with Fuzzy Logic in Engineering. algorithms.Norwell, MA, USA: Kluwer Academic Publishers, p. 272. Water Foundation Publications, Istanbul. Budayan C, Dikmen I, Birgonul MT (2009). Comparing the performance Tatlıdil H (1996). Practical Many Variables Statistical Analysis. Cem of traditional cluster analysis, self-organizing maps and fuzzy C- Offset. Ankara. means method for strategic grouping. Expert Syst. Appl., 36: 11772- Xie XL, Beni G (1991). A validity measure for fuzzy clustering. IEEE 11781. Trans. Patt. Anal. Mach. Intell., 13(8): 841-847. Dunn JC (1974). Well separated clusters and optimal fuzzy partitions. Zadeh LA (1965). Fuzzy sets. Inf. Control., 8(3): 338-353. J. Cybern., 4(3): 95-104.