Table of Contents Statistical Analysis Measures of Statistical Central

Table of contents Statistical analysis Measures of statistical central tendencies Measures of variability Aleatory uncertainties Epistemic uncertainties Measures of statistical dispersion or deviation The range Mean difference The variance The standard deviation Coefficient of variation Measures of uncertainty Systems of events Entropy Random (stochastic) variables Discontinuous (discrete) random variables Moments of discrete random variables Probability distributions of discrete random variables Binomial distribution Poisson distribution Continuous random variables Probability Density Function Cumulative Distribution Function Probability distributions of continuous random variables Uniform distribution Simpson’s (triangular) distribution Normal distribution Lognormal distribution Shifted exponential distribution Gamma distribution Shifted Rayleigh distribution Type I Largest value (Gumbel) distribution Type III Smallest values (for ε = 0 it is known as the Weibull distribution) Beta distribution Type I Smallest values distribution Combinations of random variables Measures of statistical central tendencies Measures of central tendency of a set of data x1, x2, ..., xN locate only the centre of a distribution of measures. Other measures often are needed to describe data. The mean is often used to describe central tendencies. Mean has two related meanings in statistics: • the arithmetic mean • the expected value of a random variable. In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average. The term "arithmetic mean" is preferred in mathematics and statistics. The arithmetic mean is analytically defined on a data set x1, x2, ..., xN as it follows: 1 N μ()x ==xx∑ i N i=1 Measures of variability Statistics uses summary measures to describe the amount of variability or spread in a set of data x1, x2, ..., xN. The variability applies to the extent to which data points in a statistical distribution or data set diverge from the average or mean value. Variability also refers to the extent to which these data points differ from each other. There are several commonly used measures of variability: range, mean difference, variance and standard deviation as well as the combined measure of variability defined as coefficient of variation with respect to the mean value. Uncertainty represents a state of having limited knowledge where it is impossible to exactly describe the existing state, a future outcome, or more than one possible outcome. The uncertainty (doubt) in statistics and probability theory represents the estimated amount or percentage by which an observed or calculated value may differ from the true value. Uncertainties can be distinguished as being either aleatory or epistemic. Aleatory uncertainties Objective or external or irreducible uncertainty arises because of natural, unpredictable variability of the wave and wind climate or of ship operations. The inherent randomness normally cannot be reduced although the knowledge of the phenomena may help in quantifying the uncertainty. Epistemic uncertainties Uncertainty is due to a lack of knowledge about the climate properties. The epistemic (or subjective or internal or modelling) uncertainty can be reduced with sufficient study, better measurement facilities, more observations or improved modelling and, therefore, expert judgments may be useful in its reduction. The range A measure of statistical dispersion or deviation is a real number that is zero if all the data are identical, and increases as the data becomes more diverse. It cannot be less than zero. Most measures of dispersion have the same scale as the quantity being measured. In other words, if the measurements have units, such as metres or seconds, the measure of dispersion has the same units. Basic measures of dispersion include: • Range • Mean difference • Variance Additional measures are: • Standard deviation – the square root of the variance • Coefficient of variation – the standard deviation divided by the mean value • (See Excel example: GraduationRateofNavalArchitectureinZagreb) The example presents the statistical properties of the input and output rates of numbers of students naval architecture at the Faculty of Mechanical Engineering and Naval Architecture at the University of Zagreb. Studij brodogradnje 120 115 110 105 100 Upisano 95 90 Diplomiralo 85 80 75 70 65 60 55 50 45 40 Upisano/diplomiralo 35 30 25 20 15 10 5 0 Godina The range In descriptive statistics, the range is the length of the smallest interval which contains all the data of a dataset x1, x2, ..., xN. The range is calculated by subtracting the smallest observation (sample minimum Smin) from the greatest (sample maximum Smax) and indicates of statistical dispersion R=Smax-Smin. The range, in the sense of the difference between the highest and lowest scores, is also called the crude range. The midrange point, i.e. the point halfway between the two extremes, is an indicator of the central tendency of the data. It is not appropriate for small samples. The mean difference In probability theory and statistics, the mean difference is used as a measure of how far a set of numbers of a dataset x1, x2, ..., xN are spread out from each other. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean (expected value). For random variable X = x1, x2, ..., xN with mean value μ the mean difference of X is: 1 N MDx() MD() x=−∑ abs ( xi μ ) RMD() x = N i=1 or the relative mean difference is then μ The variance In probability theory and statistics, the variance is another indicator used as a measure of how far a set of numbers are spread out from each other. For random variable X with expected value (mean) μ = E[X], the variance of X is: NN 222211 Var() x==σ ∑∑ ( xii −μμ ) = x − NNii==11 Proof: NN2 N 22211μμi 1 22 Var() x==σ ∑∑ ( xii −μμμ ) = ( x ) − 2 + N = ∑ x i − NNii==111 NN i = 1 The standard deviation The widely used measure of variability or diversity in statistics and probability theory is the standard deviation. It shows how much variation or "dispersion" there is from the "average". The standard deviation is the square root of its variance: σ ()X = Var () X The standard deviation, unlike variance, is expressed in the same units as the data. The coefficient of variation Other measures of dispersion are dimensionless (scale-free). In They have no units even if the variable itself has units. In widest use is the coefficient of variation defined as follows: σ ()X COV() X = μ()X For measurements with percentage as unit, the coefficient of variation and the standard deviation will have percentage points as unit. Measures of uncertainty Systems of events Random events are in general considered as abstract concepts and the relations among events are characterized axiomatically. The algebraic structure of the set of events turns out to be Boolean algebra. The disjoined random events E j with probabilities pi = p(Ei ) , iN=1, 2,⋅⋅⋅ , configure a system SN in a form of an N-element finite scheme: ⎛ EE12⋅⋅⋅ EjN ⋅⋅⋅ E⎞ S = ⎜ ⎟ N ⎜⎟ppEppE= = ⋅⋅⋅ ppE = ⋅⋅⋅ p = pE ⎝ 1122() () jj() NN()⎠ N The probability of a system of events SN is then in general pp()SNi=≤∑ 1. For a i=1 complete distribution is porp(PSNN ) ( )= 1. A system of N events: E1, E2, ... , EN is called a complete system of events if the following axioms hold: EkNk ≠ ∅(1,2,,) = ⋅⋅⋅ (a) E j E k = ∅ ( for j ≠ k ) (b) E12++⋅⋅⋅+=EEIN (c) The "∅" in (a) and (b) means an impossible event and "I" in (c) denotes a sure event. The fact that Ej and Ek are exclusive is expressed in (b). The (c) denotes that at least one of the events Ek, k = 1, 2, ..., N, occurs. Entropy Uncertainty of a single stochastic event E with known probability p=p(E)≠0 plays a fundamental role in information theory. To each probability can be assigned the equivalent number (2) of probabilities or events ν (E) =1/ p(E). The entropy of a single stochastic event E can be interpreted according to Wiener (1948) either as a measure of the information yielded by the event or how unexpected the event was and can be defined as the logarithm of the equivalent number of events ν (E) as follows: HE()== log()22ν E log1/()[] pE =− log() 2 pE The unit of unexpectedness H (1/ 2) = 1 expresses how unexpected is for example to get a tail when flipping a coin. More important than unexpectedness of a single stochastic event are the uncertainties of systems of N events. The uncertainty of a complete system S of N events can be expressed as the weighted sum of unexpectedness of all events by the Shannon’s entropy (Shannon and Weaver, 1949), as it follows: NN N HpNjjjjjj()S ==∑∑ logυ pppp log(1/ ) =− ∑ log jj==11 j = 1 The uncertainty of an incomplete system of N events S can be defined as the limiting case of the Renyi’s entropy (1970) of order 1, as it is shown: N R1 1 HppNjj()S =− ∑ log p()S j=1 The definition of the unit of uncertainty according to Renyi (1970) is not more and not less arbitrary than the choice of the unit of some physical quantity. E.g., if the logarithm applied is of base two, the unit of entropy is denoted as one "bit". One bit is the uncertainty of a system of two equally probable events. If the natural logarithm is applied, the unit is denoted as one "nit". Outcomes with 0 probabilities do not change the uncertainty. By convention, 0 log 0= 0. Some characteristics of the probabilistic uncertainty measures and properties of the entropy are summarized next. The entropy HN(S) is equal to zero when the state of the system S can be surely predicted, i.e., no uncertainty exists at all. This occurs when one of the probabilities of events pi, i=1,2,...,N is equal to one, let us say pk=1 and all other probabilities are equal to zero, pj=0, j≠k.

Load more