Clustering Analysis of the Districts in Erzurum for Traffic Accidents Between 2002 and 2007
Total Page:16
File Type:pdf, Size:1020Kb
Scientific Research and Essays Vol. 6(13), pp. 2850-2857, 4 July, 2011 Available online at http://www.academicjournals.org/SRE DOI: 10.5897/SRE11.565 ISSN 1992-2248 ©2011 Academic Journals Full Length Research Paper Clustering analysis of the districts in Erzurum for traffic accidents between 2002 and 2007 Ahmet TORTUM 1*, Nuriye KABAKUS 1, M. Yasin CODUR 1, Ahmet ATALAY 2, and Necla ULUGTEKIN 3 1Faculty of Engineering, Atatürk University, 25240, Erzurum, Turkey. 2Vocational College of Narman, Atatürk University, 25530, Narman, Erzurum, Turkey. 3Faculty of Civil Engineering, Istanbul Technical University, 34469, Istanbul, Turkey. Accepted 8 June, 2011 In this study, clustering analysis was done by using date of road traffic accidents (RTAs) in districts of Erzurum in Turkey occurring at 2002 to 2007 years. Province of Erzurum has eighteen districts. Road surface situation, solstice, vehicle type and number of RTAs are used in clustering analysis. Clustering analysis was done by using both traditional k-means and fuzzy c-means techniques. Districts are divided five cluster by clustering analysis are done according to two techniques. Also five risk levels were identified by center values of clusters. Risk levels of districts were demonstrated in thematic maps. The thematic maps were constituted by using geographical information systems (GIS). The thematic maps demonstrated members of cluster that districts are separated by clustering analysis according to both traditional k-means and fuzzy c-means techniques. Results obtained from this study were compared. It was observed that fuzzy c-means technique gives accurate and consistent results at least k-means technique. Also, It was determined that GIS is advantageous to show and understand the results on the thematic maps. Key words: Clustering analysis, K-means, Fuzzy c-means, geographical information systems, road traffic accident. INTRODUCTION Road traffic accidents (RTAs) have carried on problem clustering is to find out the classification structure of the that is the most important as our country. In year, far from data. The term ‘‘classification” was defined by Platts 5000 RTAs have been occurred in Turkey and these are (1980) as “ordering or arranging objects into groups or concluded with 5000 deaths and 160,000 injuries sets on the basis of their similarities or relationships” (Anonymous, 2006).These number have reflected deaths (Budayan et al., 2009). and injuries in accident location. In many European The clustering analysis had been done in two different countries, accident following time has been used later forms in this study. Firstly, traditional k-means method accident. In the time, occurred deaths and injuries are had been used. Secondly, fuzzy c-means method had registered. Therefore, the number of deaths and injuries been used. In clustering analysis, number of RTAs, road are more than registered statistics in our country. surface situation in accident location, vehicle type in In this study, clustering analysis had been done accident and solstice at accident time were used. GIS according to RTAs occurred in province of Erzurum for software was used to demonstrate the results of 2002 to 2007. Clustering is defined as ‘‘a mathematical clustering analysis. Thematic maps of districts are done technique designed for revealing classification structures by using GIS software. The province of Erzurum have in the data collected in the real world phenomena” nineteen districts, in study eight districts are examined. (Mirkin, 1996). In other words, the main purpose of Because the district of Pazaryolu is new district, data of the district is not. The district of Pazaryolu is showed by white color in thematic maps. The purpose of this study is both to group similar *Corresponding author. E-mail: [email protected] districts of Erzurum according to RTAs and to compare Tortum et al. 2851 Table 1. The data base of this study. Number of RTAs Score of road surface Score of Score of vehicle type District name (NRTAs) situation (SRSS) solstice (SS) (SVT) Erzurum 2087 3.12 5.64 3.55 Aşkale 289 0.16 0.50 0.25 Ilıca 61 -0.22 -0.41 -0.15 Horasan 192 -0.05 0.04 0.05 Pasinler 160 -0.08 -0.04 -0.14 Köprüköy 49 -0.25 -0.45 -0.35 Çat 13 -0.35 -0.50 -0.32 Hınıs 31 -0.30 -0.45 -0.27 Karayazı 11 -0.32 -0.53 -0.32 Tekman 15 -0.32 -0.52 -0.32 Karaçoban 4 -0.36 -0.54 -0.32 Narman 25 -0.32 -0.47 -0.38 Olur 7 -0.31 -0.42 -0.32 Đspir 46 -0.25 -0.45 -0.27 Oltu 170 0.76 0.00 0.07 Pazaryolu 0 0.00 0.00 0.00 Uzundere 9 -0.35 -0.53 -0.32 Tortum 54 -0.24 -0.44 -0.24 Şenkaya 34 -0.31 -0.44 -0.22 Table 1. According to traditional k-means clustering center value of cluster. Traditional k-means Variable Cluster number 1 2 3 4 5 NRTAs 289.00 14.38 2087.00 174.00 48.80 SRSS 0.16 -0.33 3.12 0.21 -0.25 SS 0.50 -0.50 5.64 0.00 -0.44 SVT 0.25 -0.32 3.55 -0.01 -0.25 the results of different clustering techniques, namely expressed with only number. These factor scores were used in traditional cluster analysis and fuzzy c-means. clustering analysis. Data base used in this study were shown in Table 1. Factor analysis is way to reduce a large number of variables into MATERIALS AND METHODS a smaller number of dimensions (factors) that are comprehensible. Principal components analysis, which was used in the present Data base was used this study, it had been gotten from statistics of study, makes it possible for a few dimensions to account for most of RTAs for 2002-2007 years. The statistics were prepared by police the information in a large data set, especially if there is substantial registrations. In this study, database are occurred number of RTAs, redundancy. A principal components solution has the property that number of RTAs in road surface situation, number of RTAs in each component is independent of all the others. There are no solstice and number of vehicle type in accident. Also, in this study, underlying assumptions; each principal component is an exact aggregated spatial data had been used, because there is not weighted sum of the original variables. The principal components values x and y coordinate of location of RTAs in registrations in model was chosen because it provides the most direct Turkey. representation of the data. Road surface situations are dry, wet, muddy, snowy and iced. In clustering analysis, we used both traditional k-means and Solstice is variables that are daytime, night and twilight. Vehicle fuzzy c-means methods. According to results, two methods were types are bicycle, horse vehicle, motor bicycle, motorcycle, minibus, compared. In traditional k-means, object is in one cluster. In fuzzy pickup, truck, tow truck, bus, tractor, land vehicle, private goal, work c-means, object may be in clusters. Fuzzy c-means identify relate machine, ambulance, tanker and train. Factor analysis was used degree of object to clusters. The sums relate degree of object is these variables that are road surface situation, vehicle types and one (Isık and Camurcu, 2007). solstice. As each variable, factor scores were gotten by using factor In traditional methods, either one object is in one cluster or is not. analysis. These factor scores were used so that the variables are Really, neither one object is completely one cluster nor is not. That 2852 Sci. Res. Essays is to say, the object is membership to the cluster. This is in fuzzy The fuzziness of the memberships is controlled by m which takes algorithms. In fuzzy algorithms, membership of object is identified. values higher than 1. The closer the m value to 1, the more crisper Fuzzy algorithms include much information than traditional the membership values. As the values of m become progressively algorithms (Kocyigit and Korurek, 2005, Aydin et al, 2006). higher, the resulting memberships become fuzzier (Hammah and Curran, 1998). Pal and Bezdek (1995) advised that m should take values between 1.5 and 2.5, and the number of clusters should be Traditional k-means method between 2 and. More detailed information about the algorithm can be found in Bezdek (1981). K-means method divide unit to cluster and estimate parameter of cluster. K-means methods divide units to cluster with following steps (Ozdamar, 2002; Tatlıdil, 1996). RESULTS AND DISCUSSION 1. According to getting information, it is accepted that mean cluster value is each p variable of fist k observation. Distances of whole In this study, it is shown that districts are grouped by units are calculated to mean cluster values. using traditional k-means and fuzzy c-means methods. 2. Each remaining n-k observation, it is assigned to the most near The aim of study is to identify similar districts according to mean cluster and after each assignation, mean cluster values are RTAs. It was identified that number of cluster is five in usually calculated by Euclidean distance. 3. Variance is minimum in cluster and is maximum between clusters both traditional k-means and fuzzy c-means. We used to as much as whole units are assigned to k cluster. analysis variables of number of RTAs, road surface 4. Covariance matrix is minimum in cluster and is equal to criteria of situation, vehicle types and solstice. According to the convergence or is much small variance different as much as it is variables results of clustering are same as each other, continued to break. thus, thematic maps are showed as either traditional k- means cluster or fuzzy c-means cluster.