<<

Journal of the Korean Society of Marine Environment & Safety Research Paper Vol. 21, No. 3, pp. 253-258, June 30, 2015, ISSN 1229-3431(Print) / ISSN 2287-3341(Online) http://dx.doi.org/10.7837/kosomes.2015.21.3.253 Estimating Suitable Function for Multimodal Traffic Distribution Function

Sang-Lok Yoo* ․ Jae-Yong Jeong** ․ Jeong-Bin Yim** * Graduate school of Mokpo National Maritime University, Mokpo 530-729, Korea ** Professor, Mokpo National Maritime University, Mokpo 530-729, Korea

Abstract : The purpose of this study is to find suitable probability distribution function of complex distribution like multimodal. is broadly used to assume probability distribution function. However, complex distribution data like multimodal are very hard to be estimated by using normal distribution function only, and there might be errors when other distribution functions including normal distribution function are used. In this study, we experimented to find fit probability distribution function in multimodal area, by using AIS(Automatic Identification System) observation data gathered in Mokpo port for a year of 2013. By using chi-squared , gaussian mixture model(GMM) is the fittest model rather than other distribution functions, such as extreme value, generalized extreme value, logistic, and normal distribution. GMM was found to the fit model regard to multimodal data of maritime traffic flow distribution. Probability density function for collision probability and traffic flow distribution will be calculated much precisely in the future.

Key Words : Probability distribution function, Multimodal, Gaussian mixture model, Normal distribution, Maritime traffic flow

1. Introduction* traffic time and speed. Some studies estimated the collision probability when the ship Maritime traffic flow is affected by the volume of traffic, tidal is in confronting or passing by applying it to normal distribution current, wave height, and so on. Analyzing maritime traffic flow is (Fujii et al., 1974). And the proximity toward a hazard, defined very important in the perspective of evaluating for the hazard of in AASHTO(American Association of State Highway and each route and the collision probability. Therefore, estimating the Transportation Officials) and the regulations of maritime traffic probability density function(pdf) is crucial to enhance the safety of safety audit, was calculated based on the navigation distance to maritime traffic. estimate the collision probability with normal distribution function In previous research, Silveira et al.(2013) studied the collision (Yim, 2010; Yim and Kim, 2010; AASHTO, 2014). Normally, probability and traffic pattern on the coast of Portugal, but they studies assume the probability density function of traffic vessels as only drew a of navigation speed and location distribution normal distribution function. However, complex distribution data and calculated the number of traffic. Giuliana et al.(2013) estimated like multimodal are very hard to be estimated by using normal anomalies by applying Kernel to traffic density distribution function only, and there might be errors when other on the Italian coast. Fangliang et al.(2012) analyzed the elements distribution functions except normal distribution function are used. like navigation speed and traffic distance in the waterway of The GMM(Gaussian Mixture Model), combined with multiple Netherlands and Shanghai, and applied it to normal distribution and normal distribution, is very useful to analyze very complex log-normal distribution function. Qiang et al.(2014) analyzed the distribution like multimodal. The GMM has been used as an characteristic of traffic by applying navigation speed in Singaporean analyzing tool in various fields such as biology, economics, channel to and weibull distribution. Liu et business administration, physics, astronomy, engineering, and so al.(2013) examined the traffic flow with the normal distribution forth. Especially, GMM is used a lot in estimating the probability and exponential distribution function drawn by the distribution of density function from multi-variate data(Ravindra et al., 2010; Gonzalez-Longatt et al., 2012). This study adopts GMM to examine distribution of vessels and estimate . * First Author : [email protected], 061-241-2750 Corresponding Author : [email protected], 061-240-7170 Sang-Lok Yoo ․ Jae-Yong Jeong ․ Jeong-Bin Yim

2. Method of Research and then calculated each distance between the location of each vessel and the center point. 2.1 Scope of Study Area To test , we applied chi-squared() test. This study was conducted for 1 year, from January 1 to According to the result of  test, it was found that GMM is fit December 31, 2013, and used AIS observation data in Mokpo port. in this case, so we applied different type of GMM and selected the As shown in Fig. 1, study area is in Mokpogu that vessels are fit model with Akaike Information Criterion(AIC) and Bayesian passing. information criterion(BIC). Desirable GMM was chosen from this process.

data 34.78 center 3. Estimation of Probability Distribution Function datum line study area 3.1 Examining Probability Distribution Function for Test There are various types of probability distribution function. 34.77 Since this study indicates data  into the value of ) ° (

t positive(+) and negative(-), such distributions that do not satisfy a L the condition of >0 and 0≦≦1 like and beta distribution are excluded. Given sample data were analyzed by 34.76 using extreme value distribution(EV), generalized extreme value distribution(GEV), logistic distribution, normal distribution, and gaussian mixture model(GMM). By using fitdist and fitgmdist functions in MATLAB(2014a), we 34.75 126.27 126.28 126.29 126.3 126.31 drew Fig. 4. The formulae from (1) to (5) show the probability Lon( ) ° density function(pdf) for each 5 distribution function refer to Fig. 1. Scope of study area (Mokp port, Korea). MATLAB(MATLAB, 2014a; MATLAB, 2014b; MATLAB, 2014c).

 (extreme value pdf for sample data ) can be described as 2.2 Procedure of Study formula (1). The process of this study is shown as Fig. 2. Vessels were classified into entry and departure, and the average position                 (1) (34.7656°N, 126.2926°E) of vessels was set to the center point,        

Where  and  a and a .

 (generalized extreme value pdf for ) can be depicted as formula (2).

                  i f ≠             (2)     i f     

Fig. 2. Study procedure to select the suitable traffic Where  the of pdf. distribution function. Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function

-3 Also, log (logistic pdf for ) can be depicted as formula (3). x 10 Probability Density Function of outbound(July, 2013) 2 EV 1.8   GEV  Logistic  1.6      Normal log      (3)  GMM     1.4

1.2 y t i l i b

 (normal pdf for ) can be depicted as formula (4). a 1 b o r

P 0.8

    0.6        (4)  0.4

0.2 And  (GMM pdf for ) can be depicted as formula (5).  0 -1000 -800 -600 -400 -200 0 200 400 600 800 Distance from the center                (5) Fig. 3. Distribution fitting.           Table 1.  of models

Where um and m stand for the mean and of th EV GEV Logistic Normal GMM gaussian distribution. cm is the m mixture coefficient of gaussian distribution which means the radio of given data and the  65.53 27.39 117.47 87.59 24.97 probability that one sample data is shown at mth gaussian distribution. 3.3 Selecting suitable Gaussian Mixture Model and Estimating Parameter 3.2 Goodness of Fit The various gaussian mixture models were applied to select  The  values were compared to evaluate the GMM, EV, GEV, optimal model. Various GMM is described in Fig 4, where GMM2 logistic, and normal distribution function. At first, divide the means the mixture of 2 gaussian models, GMM3 of 3, GMM4 of of estimated distribution into k intervals, i.e., [a0, a1), [a1, a2), ⋯, 4, GMM5 of 5, GMM6 of 6 gaussian models.

[ak-1, ak), and then calculate each value, Nj (j=1, 2,⋯, k), for each

 -3 Probability Density Function of outbound(July, 2013) interval to compute  test statistics. Where Nj means the number x 10 1.8 th of Xi at j interval. Assuming that samples are in the designed GMM2 1.6 GMM3 distribution,  (the expected ratio of X at jth interval) is calculated  i GMM4 and test statistics is drawn by using formula (6)(Wikipedia, 2015). 1.4 GMM5 GMM6 1.2

     y

  t

  i 1  l   (6) i   b  a

    b o

r 0.8 P

0.6 For sample data outbound vessels in July,  for each distribution function is shown in Table 1, which shows GMM is 0.4  outstanding since  of GMM is lower than those of other 0.2 distribution functions. As shown in Fig 3, for the closeness to 0 sample data GMM marks higher than other models to confirm -1000 -800 -600 -400 -200 0 200 400 600 800 Distance from the center GMM is fit to test. Fig. 4. Various gaussian mixture model fitting. Sang-Lok Yoo ․ Jae-Yong Jeong ․ Jeong-Bin Yim

However, the more gaussian models mixed, the more Table 3. Model parameters of outbound(July, 2013) created, so overfitting problem is raised. For this reason, formula (7) and (8) were used to calculate AIC and BIC which can solve Month    overfitting problem.(Akaike, 1974; Schwarz, 1978). Jul -256, 27, 348 90807, 31636, 8611 0.07, 0.59, 0.35    ln  (7)    ln ∙ ln (8) Table 4 and 5 show traffic data for each month with suitable GMM by BIC criterion and the parameter is classified into n : sample size inbound and outbound vessels. From April to June, GMM4 was fit  : number of estimated parameters in the model for both inbound and outbound vessels, and from October to  : maximized value of the likelilhood function for the model December, GMM4 was fit for inbound vessels and GMM3 was fit for outbound vessels. On the other hand, unnecessary models could be composed with For sample data of inbound vessels in May, GMM4 is fit and a lot of parameters. However, penalty will be imposed in this case it’s described in Fig. 5. Where  forms gaussian distributions at to prevent constituting complex model. It is so called the principle –820 m, -77 m, 206 m, and 422 m, and each mixture coefficient(c) of parsimony. In the case of AIC, penalty is 2k, and the case of was 0.04, 0.08, 0.37, and 0.51 respectively. BIC, penalty is k⦁ln(n). So penalty of BIC is much harder than The goal of modeling is to get suitable probability distribution that of AIC since ln(n) is much larger than 2 when n is large. In which can well express the given sample’s distribution. In reality, these reasons, we adopted BIC. however, it’s not too much to say that it’s impossible to describe The comparison between AIC and BIC to each GMM for the sample distribution into one model. The alternative way is to use outbound vessels in July is shown in Table 2. When it’s GMM which can approximate various data sets by using multiple considered with AIC criterion, GMM6 is desirable. However, due gaussian distribution functions, so GMM is considered as the fit to the overfitting problem, we chose BIC as the criterion and model for maritime traffic distribution. selected GMM3 as the fit model.

-3 Table 2. AIC & BIC of models x 10 Probability Density Function of inbound(May, 2013) 3.5 Gaussian four peak GMM2 GMM3 GMM4 GMM5 GMM6 3

AIC 22855 22813 22803 22797 22793 2.5

BIC 22882 22857 22862 22873 22885

y 2 t i l i b a b o r 1.5 P Table 3 shows the parameter, calculated by fitgmdist function in

MATLAB(MATLAB, 2014c). It shows the parameter of GMM3, 1 the fit model for outbound vessels in July.

GMM3 forms each gaussian distribution with the center of 0.5 256m, 27m, and 348m, and each ratio of data is 0.07, 0.59, and

0.35 respectively. It shows that data are clustered in u2(27 m) and 0 -1000 -800 -600 -400 -200 0 200 400 600 800 u3(348 m) with large mixture coefficient of 0.59 and 0.35 Distance from the center respectively. Therefore, we can assume the parameters, u u ⋯ 1, 2, , Fig. 5. Gaussian mixture model fitting. um, as commonly used mainly routes. Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function

Table 4. Type & parameters of GMM(inbound)

Month Model   

January GMM3 -506, 138, 371 49836, 18334, 5391 0.03, 0.42, 0.55

February GMM3 -491, 131, 353 49166, 19606, 5689 0.03, 0.39, 0.58

March GMM3 -453, 139, 382 48398, 12698, 5332 0.03, 0.46, 0.51

April GMM4 -801, -65, 167, 403 3251, 72003, 14619, 4650 0.03, 0.08, 0.40, 0.50

May GMM4 -820, -77, 206, 422 1158, 78926, 18766, 5538 0.04, 0.08, 0.37, 0.51

June GMM4 -832, -93, 203, 417 1301, 75950, 19262, 5168 0.04, 0.11, 0.40, 0.45

July GMM3 -663, 128, 394 17829, 28514, 5980 0.02, 0.44, 0.54

August GMM3 -544, 132, 387 44564, 24727, 5084 0.02, 0.43, 0.55

September GMM3 -434, 114, 364 93227, 22399, 5479 0.02, 0.39, 0.59

October GMM4 -406, 109, 304, 410 84949, 21094, 2652, 2077 0.03, 0.41, 0.31, 0.25

November GMM4 -786, 126, 318, 436 19294, 24470, 3232, 2840 0.05, 0.39, 0.22, 0.34

December GMM4 -883, -223, 134, 373 957, 90563, 18813, 5490 0.03, 0.05, 0.37, 0.55

Table 5. Type & parameters of GMM(outbound)

Month Model   

January GMM3 -263, 18, 330 81755, 19655, 8463 0.05, 0.55, 0.40

February GMM3 -582, 42, 333 53887, 26608, 8000 0.02, 0.65, 0.33

March GMM3 -305, 42, 346 77354, 20997, 9480 0.04, 0.61, 0.35

April GMM4 -822, 2, 100, 386 2151, 43166, 24065, 7250 0.03, 0.29, 0.42, 0.26

May GMM4 -846, -74, 102, 396 529, 70612, 26628, 7885 0.05, 0.15, 0.54, 0.26

June GMM4 -858, -76, 83, 378 227, 70779, 24670, 8041 0.05, 0.14, 0.47, 0.34

July GMM3 -256, 27, 348 90807, 31636, 8611 0.07, 0.59, 0.35

August GMM3 -195, 15, 321 72002, 25428, 10571 0.14, 0.47, 0.40

September GMM3 -319, -21, 298 106900, 29953, 10735 0.04, 0.46, 0.50

October GMM3 -298, 8, 290 86916, 28847, 8442 0.05, 0.48, 0.47

November GMM3 -903, 41, 359 169, 39418, 7984 0.02, 0.62, 0.36

December GMM3 -915, 56, 372 217, 38698, 6903 0.03, 0.66, 0.31 Sang-Lok Yoo ․ Jae-Yong Jeong ․ Jeong-Bin Yim

4. Conclusions pp. 2218-2245. [6] Gonzalez-Longatt, F. M., J. L. Rueda, I. Erlich, D. Normal distribution is broadly used to assume probability Bogdanov and W. Villa(2012), Identification of Gaussian distribution function. However, complex distribution data like Mixture Model using Mean Mapping Optimization: multimodal are very hard to be estimated by using normal Venezuelan Case, 2012 3rd IEEE Pes Innovative Smart Grid distribution function only, and there might be errors when other Technologies Europe(ISGT Europe), pp. 1-6. distribution functions including normal distribution function are [7] MATLAB(2014a), Programming, MATLAB version 8.3 used. (R2014a). In this study, we experimented to find fit probability [8] MATLAB(2014b), Statistical Toolbox : Fit Probability distribution function in multimodal area, by using AIS observation Distribution object to data, MATLAB Version 8.3(R2014a). data gathered in Mokpo port for a year of 2013. [9] MATLAB(2014c), Statistical Toolbox : Fit Gaussian Mixture As the result of this study, GMM is the fittest model rather Distribution to data, MATLAB Version 8.3(R2014a). than other distribution functions, such as EV, GEV, logistic, and [10] Liu, Z. B., Y. H. Fu and Y. S. Cong(2013), The Simulation normal distribution. GMM was found to the fit model regard to of Vessel Traffic Flow Based on Congruential Generator, multimodal data of maritime traffic flow distribution. Data were International Conference on Remote Sensing, Environment clustered in the mean() which has large mixture coefficient(), and Transportation Engineering(RSETE), pp. 179-182. [11] Qiang, M., J. Weng and S. Li(2014), Analysis of AIS-based so we can assume the parameters, u1, u2, ⋯, um, as commonly used mainly routes. Vessel Traffic Characteristics in the Singapore Strait, 93rd Probability density function for collision probability and traffic Annual Meeting of Transportation Research Board, pp. 1-19. flow distribution will be calculated much precisely in the future. [12] Ravindra, S., B. C. Pal and R. A. Jabr(2010), Statistical We hope this advance would help enhancement of navigation Representation of Distribution System Loads Using Gaussian safety and vessel traffic services. Mixture Model, IEEE Transactions ON Power Systems, Vol. 25, No. 1, pp. 29-37. Acknowledgements [13] Schwarz, G. E.(1978), Estimating the Dimension of a Model, Annals of Statistics, Vol. 6, No. 2, pp. 461-464. [14] Silveira, P. A. M., A. P. Teixeira and C. Guedes Soares This Project was supported by Honam Sea Grant R&D (2013), Use of AIS Data to Characterise Marine Traffic Program fund of 2015. Patterns and Ship Collision Risk off the Coast of Portugal, References Journal of Navigation, Vol. 66, pp. 879-898. [15] Yim, J. B.(2010), Development of Collision Risk Evaluation Model Between Passing Vessel and Mokpo Harbour Bridge, [1] AASHTO(2014), LRFD Bridge Design Specifications, Journal of Korean Navigation and Port Research, Vol. 34, Customary US. Units, 7th Edition, pp. 141-161. No. 6, pp. 405-415. [2] Akaike, H.(1974), A New Look at the [16] Yim, J. B. and D. H. Kim(2010), Statistical Parameter Identification, IEEE Transactions on Automatic Control, Vol. Estimation to Calculate Collision Probability Between 19, No. 6, pp. 716-723. Mokpo Harbor Bridge and Passing Vessels, Journal of [3] Fangliang, X., H. Ligteringen, C. V. Gulijk and B. Ale(2012), Korean Navigation and Port Research, Vol. 34, No. 8, pp. AIS Data Analysis for Realistic Ship Traffic Simulation 609-614. Model, Proceedings of IWNTM' 2012, pp. 44-49. [17] Wikipedia(2015), Tutorial for Goodness of fit, http://en.wiki [4] Fujii, Y., H. Yamanouchi and N. Mizuki(1974), A Study pedia.org/wiki/Goodness_of_fit. Factors Affecting the Frequency of Accidents in Marine Traffic, Journal of Navigation, pp. 239-247. [5] Giuliana, P., M. Vespe and K. Bryan(2013), Vessel Pattern Received : 2015. 05. 14. Knowledge Discovery from AIS Data : A Framework for Revised : 2015. 06. 12. Anomaly Detection and Route Prediction, Entropy, Vol. 15, Accepted : 2015. 06. 26.