Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function

Journal of the Korean Society of Marine Environment & Safety Research Paper Vol. 21, No. 3, pp. 253-258, June 30, 2015, ISSN 1229-3431(Print) / ISSN 2287-3341(Online) http://dx.doi.org/10.7837/kosomes.2015.21.3.253 Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function Sang-Lok Yoo* ․ Jae-Yong Jeong** ․ Jeong-Bin Yim** * Graduate school of Mokpo National Maritime University, Mokpo 530-729, Korea ** Professor, Mokpo National Maritime University, Mokpo 530-729, Korea Abstract : The purpose of this study is to find suitable probability distribution function of complex distribution data like multimodal. Normal distribution is broadly used to assume probability distribution function. However, complex distribution data like multimodal are very hard to be estimated by using normal distribution function only, and there might be errors when other distribution functions including normal distribution function are used. In this study, we experimented to find fit probability distribution function in multimodal area, by using AIS(Automatic Identification System) observation data gathered in Mokpo port for a year of 2013. By using chi-squared statistic, gaussian mixture model(GMM) is the fittest model rather than other distribution functions, such as extreme value, generalized extreme value, logistic, and normal distribution. GMM was found to the fit model regard to multimodal data of maritime traffic flow distribution. Probability density function for collision probability and traffic flow distribution will be calculated much precisely in the future. Key Words : Probability distribution function, Multimodal, Gaussian mixture model, Normal distribution, Maritime traffic flow 1. Introduction* traffic time and speed. Some studies estimated the collision probability when the ship Maritime traffic flow is affected by the volume of traffic, tidal is in confronting or passing by applying it to normal distribution current, wave height, and so on. Analyzing maritime traffic flow is (Fujii et al., 1974). And the proximity toward a hazard, defined very important in the perspective of evaluating for the hazard of in AASHTO(American Association of State Highway and each route and the collision probability. Therefore, estimating the Transportation Officials) and the regulations of maritime traffic probability density function(pdf) is crucial to enhance the safety of safety audit, was calculated based on the navigation distance to maritime traffic. estimate the collision probability with normal distribution function In previous research, Silveira et al.(2013) studied the collision (Yim, 2010; Yim and Kim, 2010; AASHTO, 2014). Normally, probability and traffic pattern on the coast of Portugal, but they studies assume the probability density function of traffic vessels as only drew a histogram of navigation speed and location distribution normal distribution function. However, complex distribution data and calculated the number of traffic. Giuliana et al.(2013) estimated like multimodal are very hard to be estimated by using normal anomalies by applying Kernel density estimation to traffic density distribution function only, and there might be errors when other on the Italian coast. Fangliang et al.(2012) analyzed the elements distribution functions except normal distribution function are used. like navigation speed and traffic distance in the waterway of The GMM(Gaussian Mixture Model), combined with multiple Netherlands and Shanghai, and applied it to normal distribution and normal distribution, is very useful to analyze very complex log-normal distribution function. Qiang et al.(2014) analyzed the distribution like multimodal. The GMM has been used as an characteristic of traffic by applying navigation speed in Singaporean analyzing tool in various fields such as biology, economics, channel to beta distribution and weibull distribution. Liu et business administration, physics, astronomy, engineering, and so al.(2013) examined the traffic flow with the normal distribution forth. Especially, GMM is used a lot in estimating the probability and exponential distribution function drawn by the distribution of density function from multi-variate data(Ravindra et al., 2010; Gonzalez-Longatt et al., 2012). This study adopts GMM to examine frequency distribution of vessels and estimate parameter. * First Author : [email protected], 061-241-2750 Corresponding Author : [email protected], 061-240-7170 Sang-Lok Yoo ․ Jae-Yong Jeong ․ Jeong-Bin Yim 2. Method of Research and then calculated each distance between the location of each vessel and the center point. 2.1 Scope of Study Area To test goodness of fit, we applied chi-squared() test. This study was conducted for 1 year, from January 1 to According to the result of test, it was found that GMM is fit December 31, 2013, and used AIS observation data in Mokpo port. in this case, so we applied different type of GMM and selected the As shown in Fig. 1, study area is in Mokpogu that vessels are fit model with Akaike Information Criterion(AIC) and Bayesian passing. information criterion(BIC). Desirable GMM was chosen from this process. data 34.78 center 3. Estimation of Probability Distribution Function datum line study area 3.1 Examining Probability Distribution Function for Test There are various types of probability distribution function. 34.77 Since this study indicates sample data into the value of ) ° ( t positive(+) and negative(-), such distributions that do not satisfy a L the condition of >0 and 0≦≦1 like gamma distribution and beta distribution are excluded. Given sample data were analyzed by 34.76 using extreme value distribution(EV), generalized extreme value distribution(GEV), logistic distribution, normal distribution, and gaussian mixture model(GMM). By using fitdist and fitgmdist functions in MATLAB(2014a), we 34.75 126.27 126.28 126.29 126.3 126.31 drew Fig. 4. The formulae from (1) to (5) show the probability Lon( ) ° density function(pdf) for each 5 distribution function refer to Fig. 1. Scope of study area (Mokp port, Korea). MATLAB(MATLAB, 2014a; MATLAB, 2014b; MATLAB, 2014c). (extreme value pdf for sample data ) can be described as 2.2 Procedure of Study formula (1). The process of this study is shown as Fig. 2. Vessels were classified into entry and departure, and the average position (1) (34.7656°N, 126.2926°E) of vessels was set to the center point, Where and mean a location parameter and a scale parameter. (generalized extreme value pdf for ) can be depicted as formula (2). i f ≠ (2) i f Fig. 2. Study procedure to select the suitable traffic Where means the shape parameter of pdf. distribution function. Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function -3 Also, log (logistic pdf for ) can be depicted as formula (3). x 10 Probability Density Function of outbound(July, 2013) 2 EV 1.8 GEV Logistic 1.6 Normal log (3) GMM 1.4 1.2 y t i l i b (normal pdf for ) can be depicted as formula (4). a 1 b o r P 0.8 0.6 (4) 0.4 0.2 And (GMM pdf for ) can be depicted as formula (5). 0 -1000 -800 -600 -400 -200 0 200 400 600 800 Distance from the center (5) Fig. 3. Distribution fitting. Table 1. of models Where um and m stand for the mean and standard deviation of th EV GEV Logistic Normal GMM gaussian distribution. cm is the m mixture coefficient of gaussian distribution which means the radio of given data and the 65.53 27.39 117.47 87.59 24.97 probability that one sample data is shown at mth gaussian distribution. 3.3 Selecting suitable Gaussian Mixture Model and Estimating Parameter 3.2 Goodness of Fit The various gaussian mixture models were applied to select The values were compared to evaluate the GMM, EV, GEV, optimal model. Various GMM is described in Fig 4, where GMM2 logistic, and normal distribution function. At first, divide the range means the mixture of 2 gaussian models, GMM3 of 3, GMM4 of of estimated distribution into k intervals, i.e., [a0, a1), [a1, a2), ⋯, 4, GMM5 of 5, GMM6 of 6 gaussian models. [ak-1, ak), and then calculate each value, Nj (j=1, 2,⋯, k), for each -3 Probability Density Function of outbound(July, 2013) interval to compute test statistics. Where Nj means the number x 10 1.8 th of Xi at j interval. Assuming that samples are in the designed GMM2 1.6 GMM3 distribution, (the expected ratio of X at jth interval) is calculated i GMM4 and test statistics is drawn by using formula (6)(Wikipedia, 2015). 1.4 GMM5 GMM6 1.2 y t i 1 l (6) i b a b o r 0.8 P 0.6 For sample data outbound vessels in July, for each distribution function is shown in Table 1, which shows GMM is 0.4 outstanding since of GMM is lower than those of other 0.2 distribution functions. As shown in Fig 3, for the closeness to 0 sample data GMM marks higher than other models to confirm -1000 -800 -600 -400 -200 0 200 400 600 800 Distance from the center GMM is fit to test. Fig. 4. Various gaussian mixture model fitting. Sang-Lok Yoo ․ Jae-Yong Jeong ․ Jeong-Bin Yim However, the more gaussian models mixed, the more parameters Table 3. Model parameters of outbound(July, 2013) created, so overfitting problem is raised. For this reason, formula (7) and (8) were used to calculate AIC and BIC which can solve Month overfitting problem.(Akaike, 1974; Schwarz, 1978). Jul -256, 27, 348 90807, 31636, 8611 0.07, 0.59, 0.35 ln (7) ln ∙ ln (8) Table 4 and 5 show traffic data for each month with suitable GMM by BIC criterion and the parameter is classified into n : sample size inbound and outbound vessels. From April to June, GMM4 was fit : number of estimated parameters in the model for both inbound and outbound vessels, and from October to : maximized value of the likelilhood function for the model December, GMM4 was fit for inbound vessels and GMM3 was fit for outbound vessels.

Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function

Statistical Characterization of Tissue Images for Detec- Tion and Classification of Cervical Precancers

The Instat Guide to Choosing and Interpreting Statistical Tests

Continuous Dependent Variable Models

Fitting Population Models to Multiple Sources of Observed Data Author(S): Gary C

Pdf) of a Random Process X(T) and E() the Mean

Section 2, Basic Statistics and Econometrics 1 Statistical

Statistical Evidence of Central Moments As Fault Indicators in Ball Bearing Diagnostics

Targeted Maximum Likelihood Estimation in Safety Analysis

Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations

Maximum Likelihood Vs. Bayesian Parameter Estimation

Guidance for Data Quality Assessment

Statistics Students' Identification of Inferential Model Elements Within Contexts of Their Own Invention