Table of contents

Statistical analysis

Measures of statistical central tendencies

Measures of variability

Aleatory uncertainties Epistemic uncertainties

Measures of or The difference The The

Measures of uncertainty Systems of events Entropy

Random (stochastic) variables Discontinuous (discrete) random variables Moments of discrete random variables Probability distributions of discrete random variables

Continuous random variables Probability Density Function Cumulative Distribution Function Probability distributions of continuous random variables Uniform distribution Simpson’s (triangular) distribution Lognormal distribution Shifted Shifted Type I Largest value (Gumbel) distribution Type III Smallest values (for ε = 0 it is known as the ) Type I Smallest values distribution

Combinations of random variables Measures of statistical central tendencies

Measures of of a set of x1, x2, ..., xN locate only the centre of a distribution of measures. Other measures often are needed to describe data. The mean is often used to describe central tendencies. Mean has two related meanings in statistics: • the • the of a . In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or . The term "arithmetic mean" is preferred in mathematics and statistics. The arithmetic mean is analytically defined on a data set x1, x2, ..., xN as it follows: 1 N μ()x ==xx∑ i N i=1 Measures of variability

Statistics uses summary measures to describe the amount of variability or spread in a set of data x1, x2, ..., xN. The variability applies to the extent to which data points in a statistical distribution or data set diverge from the average or mean value. Variability also refers to the extent to which these data points differ from each other. There are several commonly used measures of variability: range, mean difference, variance and standard deviation as well as the combined measure of variability defined as coefficient of variation with respect to the mean value. Uncertainty represents a state of having limited knowledge where it is impossible to exactly describe the existing state, a future outcome, or more than one possible outcome. The uncertainty (doubt) in statistics and represents the estimated amount or percentage by which an observed or calculated value may differ from the true value. Uncertainties can be distinguished as being either aleatory or epistemic.

Aleatory uncertainties

Objective or external or irreducible uncertainty arises because of natural, unpredictable variability of the wave and wind climate or of ship operations. The inherent normally cannot be reduced although the knowledge of the phenomena may help in quantifying the uncertainty.

Epistemic uncertainties

Uncertainty is due to a lack of knowledge about the climate properties. The epistemic (or subjective or internal or modelling) uncertainty can be reduced with sufficient study, better measurement facilities, more observations or improved modelling and, therefore, expert judgments may be useful in its reduction. The range

A measure of statistical dispersion or deviation is a real number that is zero if all the data are identical, and increases as the data becomes more diverse. It cannot be less than zero. Most measures of dispersion have the same scale as the quantity being measured. In other words, if the measurements have units, such as metres or seconds, the measure of dispersion has the same units.

Basic measures of dispersion include: • Range • Mean difference • Variance Additional measures are: • Standard deviation – the square root of the variance • Coefficient of variation – the standard deviation divided by the mean value •

(See Excel example: GraduationRateofNavalArchitectureinZagreb) The example presents the statistical properties of the input and output rates of numbers of students naval architecture at the Faculty of Mechanical Engineering and Naval Architecture at the University of Zagreb.

Studij brodogradnje 120 115 110 105 100 Upisano 95 90 Diplomiralo 85 80 75 70 65 60 55 50 45 40 Upisano/diplomiralo 35 30 25 20 15 10 5 0

Godina The range

In , the range is the length of the smallest interval which contains all the data of a dataset x1, x2, ..., xN. The range is calculated by subtracting the smallest observation (sample minimum Smin) from the greatest (sample maximum Smax) and indicates of statistical dispersion R=Smax-Smin. The range, in the sense of the difference between the highest and lowest scores, is also called the crude range. The midrange point, i.e. the point halfway between the two extremes, is an indicator of the central tendency of the data. It is not appropriate for small samples.

The mean difference

In probability theory and statistics, the mean difference is used as a measure of how far a set of numbers of a dataset x1, x2, ..., xN are spread out from each other. It is one of several descriptors of a , describing how far the numbers lie from the mean (expected value). For random variable X = x1, x2, ..., xN with mean value μ the mean difference of X is: 1 N MDx() MD() x=−∑ abs ( xi μ ) RMD() x = N i=1 or the relative mean difference is then μ

The variance

In probability theory and statistics, the variance is another indicator used as a measure of how far a set of numbers are spread out from each other. For random variable X with expected value (mean) μ = E[X], the variance of X is: NN 222211 Var() x==σ ∑∑ ( xii −μμ ) = x − NNii==11 Proof: NN2 N 22211μμi 1 22 Var() x==σ ∑∑ ( xii −μμμ ) = ( x ) − 2 + N = ∑ x i − NNii==111 NN i = 1

The standard deviation

The widely used measure of variability or diversity in statistics and probability theory is the standard deviation. It shows how much variation or "dispersion" there is from the "average". The standard deviation is the square root of its variance: σ ()X = Var () X The standard deviation, unlike variance, is expressed in the same units as the data.

The coefficient of variation

Other measures of dispersion are dimensionless (scale-free). In They have no units even if the variable itself has units. In widest use is the coefficient of variation defined as follows: σ ()X COV() X = μ()X For measurements with percentage as unit, the coefficient of variation and the standard deviation will have percentage points as unit. Measures of uncertainty

Systems of events

Random events are in general considered as abstract concepts and the relations among events are characterized axiomatically. The algebraic structure of the set of events turns out to be Boolean algebra.

The disjoined random events E j with probabilities pi = p(Ei ) , iN=1, 2,⋅⋅⋅ , configure a system SN in a form of an N-element finite scheme:

⎛ EE12⋅⋅⋅ EjN ⋅⋅⋅ E⎞ S = ⎜ ⎟ N ⎜⎟ppEppE= = ⋅⋅⋅ ppE = ⋅⋅⋅ p = pE ⎝ 1122() () jj() NN()⎠

N The probability of a system of events SN is then in general pp()SNi=≤∑ 1. For a i=1 complete distribution is porp (PSNN ) ( )= 1. A system of N events: E1, E2, ... , EN is called a complete system of events if the following axioms hold:

EkNk ≠ ∅(1,2,,) = ⋅⋅⋅ (a)

E j E k = ∅ ( for j ≠ k ) (b)

E12++⋅⋅⋅+=EEIN (c) The "∅" in (a) and (b) an impossible event and "I" in (c) denotes a sure event. The fact that Ej and Ek are exclusive is expressed in (b). The (c) denotes that at least one of the events Ek, k = 1, 2, ..., N, occurs.

Entropy

Uncertainty of a single stochastic event E with known probability p=p(E)≠0 plays a fundamental role in information theory. To each probability can be assigned the equivalent number (2) of probabilities or events ν (E) =1/ p(E ). The entropy of a single stochastic event E can be interpreted according to Wiener (1948) either as a measure of the information yielded by the event or how unexpected the event was and can be defined as the logarithm of the equivalent number of events ν (E ) as follows:

HE()== log()22ν E log1/()[] pE =− log() 2 pE The unit of unexpectedness H (1/ 2) = 1 expresses how unexpected is for example to get a tail when flipping a coin. More important than unexpectedness of a single stochastic event are the uncertainties of systems of N events. The uncertainty of a complete system S of N events can be expressed as the weighted sum of unexpectedness of all events by the Shannon’s entropy (Shannon and Weaver, 1949), as it follows: NN N HpNjjjjjj()S ==∑∑ logυ pppp log(1/ ) =− ∑ log jj==11 j = 1 The uncertainty of an incomplete system of N events S can be defined as the limiting case of the Renyi’s entropy (1970) of order 1, as it is shown: N R1 1 HppNjj()S =− ∑ log p()S j=1 The definition of the unit of uncertainty according to Renyi (1970) is not more and not less arbitrary than the choice of the unit of some physical quantity. E.g., if the logarithm applied is of base two, the unit of entropy is denoted as one "bit". One bit is the uncertainty of a system of two equally probable events. If the natural logarithm is applied, the unit is denoted as one "nit". Outcomes with 0 probabilities do not change the uncertainty. By convention, 0 log 0= 0. Some characteristics of the probabilistic uncertainty measures and properties of the entropy are summarized next. The entropy HN(S) is equal to zero when the state of the system S can be surely predicted, i.e., no uncertainty exists at all. This occurs when one of the probabilities of events pi, i=1,2,...,N is equal to one, let us say pk=1 and all other probabilities are equal to zero, pj=0, j≠k. The entropy is maximal when all events are equally probable and the probability of failure is equal to pi= 1/N, for i=1, 2, ..., N, and it amounts to HN(S)max=log N, that is the Hartley's entropy (1928). Hartley’s entropy (1928) corresponds to the Renyi’s entropy of order 0 (1970). The entropy increases as the number of events increase. The entropy does not depend on the sequence of events: Hn(p1,p2,...,pN)= Hn(pk(1),pk(2),...,pk(N)), where k is an arbitrary permutation on (1,2,...,N). The uniqueness theorem by Khinchin (1957) states that the entropy is the only function that measures the probabilistic uncertainty of systems of events in agreement with human experience of uncertainty.

(see Excel example U1-EntropyDieCoin) Random (stochastic) variables

Deterministic variables are normally described by their properties: • N - Nominal value or the exact value And possibly with tolerances • T - Tolerance • t=T/N - Relative tolerance

Description of characteristics of random variables: • μ=ON+N Mean value • O=μ/N-1 Mean deviation of the nominal value (bias) • Var=σ2 Variance • σ=Var1/2 Standard deviation • COV=σ/μ Coefficient of variation • F Probability distribution: PDF – probability density function CDF – cumulative distribution function

Empirically, it is possible for practical purposes to relate the tolerance of the deterministic variables to the standard deviation of random variables: T=n σ For example, supposing a normal probability distribution, for n=3 is less than 27 samples out of 10000 expected to be outside the tolerable margins. In other words, the is 99.73% that a random sample will be within the prescribed of ±3σ.

(See Excel example: MSproperty-plating-statistics) The example presents the statistical analysis of mechanical properties mild shipbuilding steels (MS) for rolled plates and profilers obtained by tensile testing in the Laboratory for experimental mechanics at the Faculty of Mechanical Engineering and Naval Architecture at the University of Zagreb.

(See Excel example: MSproperty-profils-statistics)

Discontinuous (discrete) random variables

Definition: The values of discontinuous (discrete) random variables x1, x2, ... are probabilities p(x1), p(x2), ... with the property ∑ px()1i = . i Moments of discrete random variables r mxpxrii= ∑ () i r M ri=−∑()()xpxμ i i

μ = ∑ xiipx() Expectation i 22 σμ==Vx()∑ ( xii − ) px ( ) Variance i Probability distributions of discrete random variables

Binomial distribution

⎛ n⎞ x nx− Px()= ⎜ ⎟ pq Mean= np Sigma= npq ⎝ x⎠ (see Excel example DD1-DistributionBinomial) Binomial distribution 1.00

0.90 CDF 0.80

0.70

0.60 n=25 0.50 p=0.5 0.40 P(x) 0.30

0.20 Mean=12.5 Sigma=2.5 PDF 0.10

0.00 0 5 10 15 20 25 x

Binomial distribution 0.6

0.5 n=2

0.4

0.3 n=5 P(x) 0.2 n=10

n=20 0.1 n=50

p=0.25 p=0.50 p=0.75 0 0 5 10 15 20 25 30 35 40 45 50 x

Poisson distribution

mx Px()= e−m x! mnp=>0 Mean= m Variance= m Sigma= m COV= 1/ m (see Excel example DD2-DistributionPoisson) Poisson distribution 1 0.9 Poisson disribution 0.8 Cumulative density function 0.7

0.6

0.5 P(x) 0.4

0.3

0.2

0.1

0 0 102030405060 x

Continuous random variables

A random variable is called continuous if it can assume all possible values in the possible range of the random variable. In continuous random variable the value of the variable is never an exact point. It is always in the form of an interval, the interval may be very small.

Probability Density Function (PDF)

The probability function of the continuous random variable is called probability density function of briefly p.d.f. It is denoted by f(x) and represents the probability that the random variable X takes the value between x and x+Δx where Δx is a very small change in X.

Cumulative Distribution Function (CDF)

In terms of probability density function the cumulative distribution function is defined as: x CDF= ∫ f() x dx −∞ (see Excel example DC3-DistributionNormal) 1.0000 0.9000 CDF 0.8000 0.7000 0.6000 0.5000 PDF 0.4000 0.3000 0.2000 Mean=2 0.1000 Sigma=1 0.0000 ‐4.0 ‐3.5 ‐3.0 ‐2.5 ‐2.0 ‐1.5 ‐1.0 ‐0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Moments of continuous random variables +∞ mxfxdx= r () r ∫ −∞ +∞ M =−()()xfxdxμ r r ∫ −∞ +∞ μ = ∫ xfxdx() Expectation −∞ +∞ σμ22==−Vx()∫ ( x ) fxdx () Variance −∞

Probability distributions of continuous random variables

Uniform distribution

11μ 11μ fx()==⋅ Fx()=−⋅ ( x a ) =−⋅⋅ ( x a ) ()ba− σ 3(ba+ ) ba− σ 3(ba+ ) a + b b − a μ = σ = 2 2⋅ 3 (see Excel example DC6-DistributionUniform)

Simpson’s (triangular) distribution

xa−− xa μ ()x − a 2 fx()=⋅ fb () =⋅⋅ fb () Fx()=⋅ fb () ba−+ baσ 3 2(⋅−ba ) 2 fb()= ba− a + b b − a μ = σ = 2 2⋅ 3 (see Excel example DC7-DistributionSimpson) Simpson's (triangular) distribution 1 0.9 0.8 0.7 CDF 0.6 0.5 0.4 0.3 0.2 PDF 0.1 0 0 5 10 15 20

Normal distribution

2 1 ⎛ x−μ ⎞ 1 −⋅⎜⎟ ⎛ x − μ ⎞ fx()= e2 ⎝ σ ⎠ F(x) = Φ⎜ ⎟ 2 2πσ ⎝ σ ⎠ Standard normal cumulative probability 1 x 1 1 − u2 1 − u 2 x − μ ϕ ()ue= 2 Φ()u = e 2 u = σ > 0 2π 2π ∫ σ −∞ (see Excel example DC3-DistributionNormal) 1.0000 0.9000 CDF 0.8000 0.7000 0.6000 0.5000 PDF 0.4000 0.3000 0.2000 Mean=2 0.1000 Sigma=1 0.0000 ‐4.0 ‐3.5 ‐3.0 ‐2.5 ‐2.0 ‐1.5 ‐1.0 ‐0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

Lognormal distribution

2 1 ⎛ ln x−μ ⎞ −⋅⎜ y ⎟ 1 2 ⎜⎟σ ⎛ ln x − μ ⎞ ln x − μ fx()= e ⎝ y ⎠ Fx()=Φ⎜ y ⎟ u = y ⎜⎟σ σ x ⋅σπy 2 ⎝ y ⎠ y μ =+ln⎡⎤μσμ222 / σ =+ln⎡ 1σμ22 / ⎤ yxxx⎣⎦ yxx⎣ ⎦ 2 2 σ y σ y μ y + μ y + 2 2 2 σ y μx = e σ x =−ee1 (see Excel example DC4-DistributionLogNormal) 0.2000 CDF

0.1000 PDF Mean‐x=8 Sigma‐x=5

Mean‐y=1.91 Sigma‐y=0.547

0.0000 0.0 5.0 10.0 15.0 20.0

(see Excel example DC4-DistributionLogNormal-MildSteel) 0.0200 0.0190 CDF 0.0180 Lognormal distribution of yield stress 0.0170 0.0160 of mild shipbuilding steel 0.0150 0.0140 PDF 0.0130 0.0120 0.0110 0.0100 Mean‐x=268 0.0090 Sigma‐x=30.5 0.0080 0.0070 0.0060 Mean‐y=5.58 0.0050 Sigma‐y=0.114 0.0040 0.0030 0.0020 0.0010 0.0000 0.0 50.0 100.0 150.0 200.0 250.0 300.0 350.0 400.0

Shifted exponential

[]−λ(x−xo ) [−λ(x−xo )] f (x) = λe F(x) =1− e λ > 0 11 1 μλ=+xo = σ = λμ− xo λ

Gamma distribution

λ(λx)k −1 Γ(k,λx) f (x) = e(−λx) F(x) = λ > 0 k > 0 Γ(k) Γ(k) ∞ Γ()k = ∫ e−u ⋅u k −1du k ⋅Γ(k) = (k −1)! Gamma function o x Γ()k, x = ∫ e−u ⋅u k−1du Incomplete gamma function o

k k k 2 μ = σ = λ = ()λσ = k λ λ μ

xk −1 1 x f ()xe= − x/θ Fx()= γ (, k ) θ k ⋅Γ()k Γ()k θ

μ =⋅θ k σθ=⋅k 2 ⎛ μ ⎞ σ 2 k = ⎜⎟ θ = ⎝ σ ⎠ μ (see Excel example DC9-DistributionGamma) Gamma distribution 1.00 0.90 CDF 0.80 0.70 0.60 0.50 0.40 0.30 0.20 PDF 0.10 0.00 .0 5.0 10.0

Shifted Rayleigh

2 2 1 ⎛ x−x ⎞ 1 ⎛ x−x ⎞ (x − x ) − ⋅⎜ o ⎟ − ⋅⎜ o ⎟ f (x) = o e 2 ⎝ α ⎠ F(x) =1− e 2 ⎝ α ⎠ α 2 π π μ = v +α ⋅ σ = α ⋅ 2 − o 2 2

Type I Largest value (Gumbel)

−−αno()xu −−αno()xu fx()=⋅α e−−αno()xu ⋅ ee Fx()=− 1 e−e α > 0 n n 0.5772 π μ = u + σ = n α α 6 n n 0.5772 π u =−μ α = n α n σ ⋅ 6 n (see Excel example DC5-DistributionGumbel) 0.05 CDF CDF 0.04

0.03 Mean= 100 PDF Sigma= 20 0.02 un= 109 an= 0.064 0.01

0.00 .0 50.0 100.0 150.0 200.0 250.0 300.0 350.0 400.0

Type III Smallest values (for ε = 0 it is known as the Weibull distribution) k k k −1 ⎛ x ⎞ ⎛ x ⎞ −⎜⎟ −⎜⎟ kx⎛ ⎞ ε ε fx()= ⎜⎟ e⎝ ⎠ Fx()=− 1 e⎝ ⎠ k > 0 εε⎝ ⎠

⎛ 1 ⎞ ⎛⎛21⎞ 2 ⎞ με=⋅Γ+⎜⎟1 σε=⋅Γ+−⋅Γ+⎜⎟11 ε ⎜⎟ ⎝ k ⎠ ⎝⎝kk⎠ ⎠ Type III Smallest values (for ε = 0 it is known as the Weibull distribution) k k k −1 ⎛ x−ε ⎞ ⎛ x−ε ⎞ ⎛ ⎞ −⎜ ⎟ −⎜ ⎟ k x −ε ⎝ u1 −ε ⎠ ⎝ u1 −ε ⎠ f (x) = ⎜ ⎟ e F(x) =1− e k > 0 u1 −ε ⎝ u1 −ε ⎠

⎛ 1 ⎞ ⎛ 2 ⎞ 2 ⎛ 1 ⎞ μ = ε + (u1 − ε ) ⋅Γ⎜1+ ⎟ σ = (u1 −ε )⋅ Γ⎜1+ ⎟ − Γ ⎜1+ ⎟ ⎝ k ⎠ ⎝ k ⎠ ⎝ k ⎠ Beta distribution (x − a)q−1(b − x)r−1 f (x) = q−r−1 q > 0 r > 0 B(q,r)(b − a) Γ(q)Γ(r) B(q, r) = Beta function Γ(q + r) q(b − a) (b − a) qr μ = a + σ = q + r q + r q + r +1 Type I Smallest values distribution αo ( x−uo ) α1( x−u1) α1 ( x−u1 )e −e f (x) = α1e F(x) =1− e α1 > 0 0.5772 π μ = u1 − σ = αn 6α1 Type II Largest value k k k−1 ⎛ u ⎞ ⎛ u ⎞ −⎜ o ⎟ −⎜ o ⎟ k ⎛ uo ⎞ ⎝ x ⎠ ⎝ x ⎠ f (x) = ⎜ ⎟ e F(x) = e k > 0 uo ⎝ x ⎠

⎛ 1 ⎞ ⎛ 2 ⎞ 2 ⎛ 1 ⎞ μ = un ⋅Γ⎜1− ⎟ σ = un ⋅ Γ⎜1− ⎟ − Γ ⎜1− ⎟ ⎝ k ⎠ ⎝ k ⎠ ⎝ k ⎠ Combinations of random variables

For linear combinations y =+aX11 aX 2 2 ++... aXkk of random variables X12+ XX++... k

μ12,μμ , ..., k With given arithmetic means 22 2 σ12,σσ , ..., k And standard deviations

Theorem 1: The mean value of the linear combination of random variables is the sum of the mean values of components:

EaX(11+++= aX 2 2 ... aXkk ) aEX 1 ( 1 ) + aEX 2 ( 2 ) ++ ... aEX k ( k )

μ =+++aa11μμ 12 ... akk μ . Theorem 2: The variance of the linear combination of random variables is the sum of the of components: 22222 22 σ =+++aa11σσ 22 ... akk σ