Quick viewing(Text Mode)

Medical Statistics II. Sample Characteristics

Medical Statistics II. Sample Characteristics

MedicalMedical StatisticsStatistics II.II. SampleSample CharacteristicsCharacteristics

Assoc. Prof. Katarína Kozlíková, RN., PhD. IMPhBPhITM FM CU in Bratislava [email protected] ContentContent

 Introduction  Sample Characteristics  Measures of Location   Structural Characteristics  Measures of Variability  Connected with Averages  Connected with Structural Characteristics  Measures of Shape  Skewness  Excess Kurtosis  Literature

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 2 IntroductionIntroduction

 The aim of this lecture is to familiarize students with the sample characteristics that are most commonly used in the evaluation of medical experiments and occur in medical scientific and professional literature  It follows up the lecture Medical Statistics I – Basic Terminology and uses the terminology explained there  To practice the lectured topics and for further study is recommended to use the literature from the list of literature

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 3 SampleSample CharacteristicsCharacteristics

 A descriptive characteristics  A number calculated from a statistical set in a well defined way (using a formula)  The aim of descriptive characteristics  Numbers obtained in this way are used to characterise a set consisting of many numbers by only a few numbers   A part of statistics that deals with descriptive characteristics  Sample characteristics  Descriptive characteristics applied to a sample  There exist three groups of descriptive statistics  Measures of location  Measures of variability  Measures of shape

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 4 MeasuresMeasures ofof LocationLocation MeasuresMeasures ofof LocationLocation -- IntroductionIntroduction

 In the most common case, the obtained data are organised as a set of numbers  Quantitative data are usually clustered around a  They are organised according some (statistically defined) distribution  A central tendency  A single number intended to typify the numbers in the set  Measures of location are used for estimation of location of this distribution, or its „middle“, that is, where on the (numerical) axis – on the scale is the position of the central tendency

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 6 MeasuresMeasures ofof LocationLocation –– CentralCentral TendencyTendency (1)(1)

 A central tendency can be estimated in following way  If all the numbers in the list are the same, then this number should be used  If the numbers are not the same, the is calculated by combining the numbers from the list in a specific way and computing a single number as being the average of the list  The method of calculation depends on the distribution of data in the set  In this lecture, we assume a of data generally  A single number is then used for consecutive calculations

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 7 MeasuresMeasures ofof LocationLocation –– CentralCentral TendencyTendency (2)(2)

 A central tendency is a measure of the "middle" or "typical" value of a data set  Several types of averages are used according to what kind of data are represented by the numbers  (averages)  Median  Mode  Quantiles

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 8 MeasuresMeasures ofof LocationLocation -- AveragesAverages

 Take into account the values of all measurements of the quantity  Used for normally distributed quantitative data  The most used means  Arithmetic (weighted)  Geometric mean (weighted)  (weighted)  Weighted means are used for very large samples, for sorted data (data in classes), for data of different „weights“

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 9 ArithmeticArithmetic MeanMean (Mean(Mean ))

 The best estimate of the measured quantity X from n measurements  Calculated as the sum of all measured values and then divided by the number of the measurements

x + x + x + ... + x 1 n = 1 2 3 n or another x = ⋅ x x notation ∑ i n n i =1

Used nomenclature: x : average () n : sample size th Σ xi : i measurement of the quantity X : summation sign i : measurement number (i = 1, 2, ..., n)

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 10 ArithmeticArithmetic MeanMean -- PropertiesProperties

 The sum of differences of all values from the arithmetic mean equals zero  Suitable to check the calculation  The sum of squared differences of all values from the arithmetic mean is less then the sum of squared differences of all values from any other value  Used for calculations of and  An arithmetic mean of arithmetic means of subsets of a set is not the arithmetic mean of the whole set  In such case, the has to be used

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 11 WeightedWeighted ArithmeticArithmetic MeanMean (Weighted(Weighted Mean)Mean)

 If different „weights“ (frequencies) of input data have to be considered k ()n ⋅ x n ⋅ x + n ⋅ x + ... + n ⋅ x ∑ j j = 1 1 2 2 k k or another j =1 x notation x = + + + k n1 n2 ... nk ∑ nj j =1 where k n + n + ... + n = n or another n = n 1 2 k notation ∑ j j =1 Used nomenclature: x : mean (arithmetic, weighted) n : sample size th xj : midpoint of the j class k : number of classes th Σ nj : frequency („weight”) of the j class : summation sign j : class number (j = 1, 2, ..., k)

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 12 GeometricGeometric MeanMean

 Suitable for data that can be expressed using coefficients (for example, mean increase, percentage, area, volume – positive values of the measured quantity having an absolute zero)  Obtained by multiplying n positive numbers and then taking th the n root 1  n  n = n ⋅ ⋅ ⋅ ⋅ = xG x1 x2 x3 ... xn or another notation xG ∏ xi   i =1  1 n ⋅ ∑ logz ()xi = n i=1 or a notation using logarithms xG z Used nomenclature: Π xG : geometric mean : multiplication sign th Σ xi : i measurement of the quantity X : summation sign i : measurement number (i = 1, 2, ..., n) z : basis of logarithm n : sample size © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 13 ComparisonComparison ooff tthehe ArithmeticArithmetic MeanMean aandnd tthehe GeometricGeometric MeanMean

Sum of individual items Sum of means Product of individual items Product of means

x + x + x x = 1 2 3 = x = 3 x ⋅ x ⋅ x = 3 G 1 2 3 + + = 3 ⋅ ⋅ = = 4 18 3 = 25 ≅ 4 18 3 3 3 = 3 216 = 6 ≅ 8.3

Figure 1: An example of an arithmetic mean (left) and a geometric mean (right) calculated from the same data (4; 18; 3).

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 14 WeightedWeighted GeometricGeometric MeanMean

 If different „weights“ (frequencies) of input data have to be considered

k n = n n1 ⋅ n2 ⋅ ⋅ nk or another = j xG x1 x2 ... xk notation xG n ∏ x j j =1

where k n + n + ... + n = n or another n = n 1 2 k notation ∑ j j =1 Used nomenclature:

xG : geometric mean n : sample size th xj : midpoint of the j class k : number of classes th Σ nj : frequency („weight”) of the i class : summation sign j : class number (j = 1, 2, ..., k)

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 15 HarmonicHarmonic MeanMean

 Appropriate for situations when the average of rates or ratios is desired (for example, given effort at open time consumption, parallel connection of electrical resistances)  Calculated as the reciprocal of the arithmetic mean of the reciprocals of all measured values −1 n n = or another  1 −1  xH =  ⋅  1 1 1 1 notation xH  ∑ xi  + + + ... +  n i =1  x1 x2 x3 xn

Used nomenclature: : harmonic mean n : sample size xH th Σ xi : i measurement of the quantity X : summation sign i : measurement number (i = 1, 2, ..., n)

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 16 WeightedWeighted HarmonicHarmonic MeanMean

 If different „weights“ (frequencies) of input

data have to be considered k + + + + n = n1 n2 n3 ... nk ∑ j xH or another j =1 n1 n2 n3 nk notation x = + + + ... + H k x x x x nj 1 2 3 k ∑ j =1 x j where k n + n + ... + n = n or another n = n 1 2 k notation ∑ j j =1 Used nomenclature:

xH : harmonic mean n : sample size th xj : midpoint of the j class k : number of classes th Σ nj : frequency („weight”) of the j class : summation sign j : class number (j = 1, 2, ..., k)

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 17 MeasuresMeasures ofof LocationLocation –– StructuralStructural CharacteristicsCharacteristics

 They are based on distribution of elements in an ordered set , that is, they are based on the ranks of the measurements , and not on the values of individual measurements  The most used structural characteristics:  Median  Mode  Quantiles:  Quartiles  Deciles  Percentiles

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 18 MedianMedian

 Divides an ordered sample into two equally sized parts (with the same probability 0.50)  Is calculated according to the below mentioned formulas depending on whether the sample size is odd or even

+ xn xn +1 ~ = ~ = 2 2 Odd n:x xn+1 Even n: x 2 2

Used nomenclature: ~x : median n : sample size th xn : n measurement of the quantity X in the sorted sample

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 19 MedianMedian -- ExamplesExamples

Number of patients for different doctors in the city XY Number of patients for different doctors in the city XZ Number of patients Number of patients 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 0 ABCDEFGHIJKDoctor ABCDEFGHIJDoctor

Figure 2: Examples of medians. In the case of odd number of elements in an ordered sample (the graph on the left), the median equals the „middle“ value (value of the item F). In the case of even number of elements in an ordered sample (the graph on the right), the median is calculated from the two „middle“ values (average of values of the items E and F).

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 20 MedianMedian -- SortedSorted SamplesSamples (can(can bebe omittedomitted withoutwithout lossloss ofof continuitycontinuity ))

 In a sorted and ordered sample, the median interval (the median class, in which the median is) is established, from which the median is interpolated ~ h  n  ~ = ~ + ⋅  − cum  x xL ~ n()−1 n  2 

Used nomenclature: ~ ~x : median h : width of the median interval ~ : lower end of the median interval xL ~ n : frequency of the median interval n : sample size cum : cumulative frequency of the interval before the median interval n()−1

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 21 ModeMode

 The most frequent value in the sample ˆx

Number of outstanding package of different drugs in one day Number of packages 45 40 35 30 25 20 15 10 5 0 ABCDEFGHIJDrug

Figure 3: An example of a mode. The mode is represented by the item G.

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 22 ModeMode –– SortedSorted SampleSample (can be omitted without loss of continuity )

 In an ordered sample, the modal interval (the class with the highest frequency) is found, from which the mode can be interpolated in any of the two ways − nˆ(+1) nˆ nˆ(−1) xˆ = xˆ + hˆ ⋅ or xˆ = xˆ + hˆ ⋅ L + L − − nˆ(−1) nˆ(+1) 2nˆ nˆ(−1) nˆ(+1)

Used nomenclature: xˆ : mode hˆ : length of the modal interval

xˆL: lower end of the modal interval nˆ : frequency of the modal interval

nˆ()−1 : frequency of the interval before the modal interval

nˆ()+1 : frequency of the interval after the modal interval

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 23 SymmetricSymmetric DistributionDistribution

 The mean, the median, and the mode have the same value ~ = = Age distribution of patients with influenza x x ˆx Number of patients 40

30 Histogram Polygon 20 x ≅ 43 ~x ≅ 43 10 xˆ ≅ 43

0 < 5 5 - 10 10 - 15 - 20 - 25 - 30 - 35 - 40 - 45 - 50 - 55 - 60 - 65 - 70 - 75 - > 80 15 20 25 30 35 40 45 50 55 60 65 70 75 80 Age [y]

Figure 4: A histogram and a polygon of a symmetric distribution. © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 24 AsymmetricAsymmetric DistributionDistribution LeftLeft--SidedSided

 At the numerical axis, the mode is at most to the left (has the highest count, but a low value), the mean is at most to the right (has a high value) and the median is in the middle Age distribution of patients with allergy Number of patients ~ 60 < < 50 Histogram ˆx x x Polygon 40

30 xˆ ≅ 22.6 ~ 20 x ≅ 32.9 10 x ≅ 35.4

0 < 5 5 - 10 - 15 - 20 - 25 - 30 - 35 - 40 - 45 - 50 - 55 - 60 - 65 - 70 - 75 - > 80 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 Age [y] Figure 5: A histogram and a polygon of an asymmetric leftsided distribution. © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 25 AsymmetricAsymmetric DistributionDistribution RightRight--SidedSided

 At the numerical axis, the mean is at most to the left (has a low value), the mode is at most to the right (has the highest count and a high value) and the median is in the middle

Age distribution of patients with osteoporosis Number of patients x < ~x < xˆ 60 50 Histogram Polygon

40

30 x ≅ 53.9 20 ~x ≅ 56.8 10 xˆ ≅ 62.3 0 < 5 5 - 10 - 15 - 20 - 25 - 30 - 35 - 40 - 45 - 50 - 55 - 60 - 65 - 70 - 75 - > 80 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 Age [y] Figure 6: A histogram and a polygon of an asymmetric rightsided distribution. © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 26 ApproximateApproximate RelationRelation BetweenBetween Mode,Mode, Mean,Mean, andand MedianMedian

 The distance between the mean and the mode is approximately three times larger than is the distance between the median and the mean

ˆx − x ≅ 3 ⋅ (~x − x )

 The relation can be used to estimate the mode or to check the correctness of its calculation  The mean and the median can usually be calculated more easily

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 27 QuantileQuantile

(m)  Quantile K j divides the of a into m parts  m denotes the number of parts of the set  j is the order number of the quantile  j = 0, 1, 2, ...., m1, m  Each part has the probability 1/m  Quantile value indicates how much of the set has a value less than or equal to the quantile value  Quantile – special names: (m)  Lower j = 1: K1 (m)  Upper j = m 1: Km−1

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 28 CalculationCalculation ooff tthehe QuantileQuantile (can(can bebe omittedomitted withoutwithout lossloss ofof continuitycontinuity )) + (m) = xk xl Kj 2 ⋅ ( + ) ⋅ =  j n 1  =  j n k   and l   where  m   m 

⋅ ( + ) ⋅ ( + ) ⋅ ⋅  j n 1  = j n 1  j n = j n   and   if  m  m  m  m

 j ⋅ (n+1)  j ⋅ n and = + else k =   +1 l   1  m   m  Used nomenclature: n : sample size  : floor function (rounds a number down to the next integer) th xk : k ordered measurement of the quantity X th xl : l ordered measurement of the quantity X © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 29 TheThe MostlyMostly UsedUsed QuantilesQuantiles

 Quartile Qj  Divides an ordered sample into m = 4 parts, all with probability 0.25

 Sextile Sj  Divides an ordered sample into m = 6 parts, all with probability 1/6

 Decile Dj  Divides an ordered sample into m = 10 parts, all with probability 0.10

 Percentile Pj  Divides an ordered sample into m = 100 parts, all with probability 0.01

Examples = = = = ~ Q1 P25 Q2 D5 P50 x = Q3 P75 © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 30 MeasuresMeasures ofof VariabilityVariability MeasuresMeasures ofof VariabilityVariability –– IntroductionIntroduction (1)(1)

 Statistical data as numeric variables are always various (they vary, have different values )  A small degree of variability means a small mutual diversity of values of a given variable (the values are very similar)  In this case, the central tendencies such as mean, median or mode, are good characteristics of the general size of the value of the variable in the sample  High variability means a large mutual difference among values of a given variable  In this case, the calculated parameters are good characteristics of the general size of the value of the variable in the sample

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 32 MeasuresMeasures ofof VariabilityVariability –– IntroductionIntroduction (2)(2)

 Characteristics of the middle of the set (the central tendencies )  Only indicate the location information of a statistical sample on numerical axis  They do not inform  How are the values scattered around the centre in the set  Whether there are any outliers in the sample  Measures of variability (variability characteristics)  Reflect the distribution of values of the variable around the mean value of the set

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 33 MeasuresMeasures ofof VariabilityVariability ConnectedConnected withwith AveragesAverages

 Take into account the values of the measurements of the quantity X  The mostly used measures:  Standard deviation  Mean deviation (Average error)  of the mean  Coefficient of variation  Dispersion coefficient

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 34 VarianceVariance (Sample(Sample Variance)Variance)

 Characterizes the scatter of measurements around the sample mean  Calculated as sum of squared deviations of random variables in a random sample from their sample mean divided by the number of terms in the sum minus one n − 2 ∑ (xi x ) s2 = i =1 x n − 1 Used nomenclature: 2 sx : sample variance n : sample size x : average (arithmetic mean) n1 : degrees of freedom th Σ xi : i measurement of the quantity X : summation sign i : measurement number (i = 1, 2, ..., n)

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 35 DoDo NotNot ConfuseConfuse thethe SampleSample VarianceVariance withwith thethe PopulationPopulation Variance!Variance!

 Population variance

N N − µ 2 ∑(xi ) ∑ xi 2 i =1 i =1 σ = where µ = x N N

Used nomenclature: σ2 x : population variance N : population size µ : true value of the mean Σ : summation sign th xi : i measurement of the quantity X i : measurement number (i = 1, 2, ..., n)

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 36 WeightedWeighted VarianceVariance (Sample(Sample Variance)Variance)

 Characterizes the dispersion of all measurements around the weighted arithmetic mean of the sample  It is calculated analogously to the "normal" mean, but takes into account the frequency of each class k − 2 ⋅ ∑(x j x) nj 2 = s = k 1 where n + n + ... + n = n x n − 1 1 2 k Used nomenclature: 2 sx : sample variance x :mean (weighted) th xj : midpoint of the j class n : sample size th nj : frequency („weight”) of the j class k : number of classes j : class number (j = 1, 2, ..., k) Σ : summation sign

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 37 StandardStandard DeviationDeviation (Sample(Sample StandardStandard Deviation)Deviation)

 Characterizes the scatter of measurements around the sample mean  Calculated as the square root of the sample variance n − 2 ∑ (xi x ) i =1 or another = 2 s = expression sx sx x n − 1

Used nomenclature: 2 sx: sample standard deviationsx :sample variance x : average (arithmetic mean) n : sample size th xi : i measurement of the quantity X n1 : degrees of freedom i : measurement number (i = 1, 2, ..., n) Σ : summation sign

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 38 DoDo NotNot ConfuseConfuse TheThe SampleSample StandardStandard DeviationDeviation WithWith TheThe PopulationPopulation StandardStandard Deviation!Deviation!

 The square root of the variance

N N (x − µ)2 ∑ i ∑ xi = σ = i 1 i =1 x where µ = N N

Used nomenclature: σ x: population standard deviation N : population size µ : true value of the mean Σ : summation sign th xi : i measurement of the quantity X i : measurement number (i = 1, 2, ..., n)

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 39 WeightedWeighted StandardStandard DeviationDeviation (Sample(Sample StandardStandard DeviationDeviation ))

 Characterizes the dispersion of all measurements around the weighted arithmetic mean of the sample  It is calculated analogously to the "normal" mean, but takes into account the frequency of each class k − 2 ⋅ ∑(x j x) nj = s = k 1 where n + n + ... + n = n x n − 1 1 2 k Used nomenclature:

s x: sample standard deviationx :mean (weighted) th xj : midpoint of the j class n : sample size th nj : frequency („weight”) of the j class k : number of classes j : class number (j = 1, 2, ..., k) Σ : summation sign

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 40 MeanMean DeviationDeviation (Average(Average Error)Error)

n  Mean absolute deviation − ∑ xi x s = i =1 abs n

 Can be calculated also

from median or from mode n n ~ − x − x ∑ xi xˆ ∑ i = = = i 1 s = i 1 sabs abs n n

Used nomenclature:

Sabs : mean deviation n : sample size x : average (arithmetic mean) Σ : summation sign ~x : medianxˆ : mode th xi : i measurement of the quantity X i : measurement number (i = 1, 2, ..., n)

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 41 StandardStandard ErrorError ooff tthehe MeanMean (Error(Error iinn tthehe Mean)Mean)

 Used for comparison of sample means (from the same population) n (x − x )2 s ∑ i = x i =1 sex or se = n x (n − 1) ⋅ n

Used nomenclature:

sx: sample standard deviationsex : sample standard error x : average (arithmetic mean) n : sample size th Σ xi : i measurement of the quantity X : summation sign i : measurement number (i = 1, 2, ..., n)

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 42 WeightedWeighted ErrorError inin thethe MeanMean

 Used for comparison of sample means (from the same population)

k − 2 ⋅ ∑(x j x) nj k se = j =1 n = n x − ⋅ where ∑ j (n 1) n j =1

Used nomenclature: : standard error of the meanx :mean (weighted) sex th xj : midpoint of the j class n : sample size th nj : frequency („weight”) of the j class k : number of classes j : class number (j = 1, 2, ..., k) Σ : summation sign

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 43 CoefficientCoefficient ooff VariationVariation (1)(1) (Sample(Sample CoefficientCoefficient ooff Variation)Variation)

 It is a characteristic of variability (probability distribution) of a random variable  In general, it is defined as the ratio of the standard deviation and the (absolute) value of the mean (average) s V = x ⋅100% x x

Used nomenclature:

Vx : variability coefficientsx : sample standard deviation x : average (arithmetic mean) n : sample size

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 44 CoefficientCoefficient ofof VariationVariation ((22))

 Is a relative measure of variability  Is not affected by the absolute values of the character  Is used  For a mutual comparison of variability of two or more sets with significantly different levels of values (for example, mass in grams and kilograms)  For statistical control (laboratory results)

x © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 45 CoefficientCoefficient ofof DispersionDispersion (can(can bebe omittedomitted withoutwithout lossloss ofof continuitycontinuity ))

 Is a relative measure of variability (probability distribution) of a random variable  In general, it is defined as the mean deviation calculated from the median and the (absolute) value of the median n 1 ⋅ − ~ ∑ xi x n = d = i 1 x ~x

Used nomenclature: ~ dx : coefficient of dispersion x : median th xi : value of the i measurement n : sample size

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 46 MeasuresMeasures ofof VariabilityVariability ConnectedConnected withwith StructuralStructural CharacteristicsCharacteristics

 Take into account the ranks of the measurements of the quantity X in an ordered sample  The mostly used measures  Ranges  Range (Sample range)  Interquantile range  Interquartile range  Interquartile deviation

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 47 RangeRange (Sample(Sample Range)Range)

 The difference between the highest value (maximum) and the lowest value (minimum) of the sample  The largest ordered statistics minus the smallest ordered statistics = − R xmax xmin

Used nomenclature: R : range

xmax : the maximal value of the measured quantity X xmin : the minimal value of the measured quantity X

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 48 InterInter--QuantileQuantile RangeRange

 The difference between the upper and the lower quantile = (m) − (m) RK Km−1 K1

 A nonparametric measure of variability  It does not take into account the size of all values file  Excludes the impact of outliers  In the case of deciles, in this interval would lie 80% of values (D9 – D1)  In the case of percentiles, in this interval would lie 98% of values (P99 – P1)

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 49 InterInter--QuartileQuartile RangeRange

 The difference between the upper quartile and the lower quartile = − RQ Q3 Q1

 A nonparametric measure of variability  A special case of the interquantile range  The interval, in which lies the half of the sample set  It is often used with the median  Is often used to find outliers in the sample

 Outliers are observations that fall below Q1 1.5 ∙ RQ or above Q3 + 1.5 ∙ RQ

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 50 InterInter--QuartileQuartile RangeRange –– ExampleExample

Figure 7: Box plot (plot at the top) with an

interquartile range (R Q = IQR = Q3 –Q1) and a probability density function of a normal N(0,σ2) population

Available at: http://en.wikipedia.org/wiki/Interquartile_range [17. 9. 2013]

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 51 QuantileQuantile DeviationDeviation

 Determined as an average of positive deviations of neighbouring quantiles (m) − (m) Km−1 K1  After editing, the formula is R ()m = K m − 2

 Measure of variability, which is neither affected by extremely high nor by extremely low values  It takes into account the variability of intermediate values  In the case of deciles, variability − of 80% of values = D9 D1 RD 8  In the case of percentiles, variability 8 of 98% of values P − P R = 99 1 P 98 98

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 52 QuartileQuartile DeviationDeviation

 Determined as an average of positive deviations of neighbouring quartiles Q − Q R = 3 1 Q 2 2

 It is a measure of the spread through the middle half of a distribution  It is neither influenced by extremely high nor extremely low scores  Quartile deviation is an ordinal statistic and is most often used in conjunction with the median

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 53 MeasuresMeasures ofof ShapeShape SkewnessSkewness (1)(1)

 Skewness measures the direction and degree of asymmetry of the distribution of a random variable  There are more measures of skewness, the most used is the coefficient of skewness  For the selection of the normal distribution, the skewness is calculated according to one of the equivalent formulas

n n 1 3 ⋅ ()− ⋅ ()− 3 ∑ xi x n ∑ xi x n = As = i 1 or As = i =1 3 3 sx  n  ()− 2 ∑ xi x   i =1 

 It is defined only for sx ≠ 0 and n ≥ 3

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 55 SkewnessSkewness (2)(2)

 If the calculated skewness  Equals zero ( As = 0)  It characterizes a symmetric distribution (fig. 2 and 7)  Is positive ( As > 0)  It characterizes right sided skewness, that is, left sided asymmetric distribution, when the mean is greater than the median (fig. 3)  Is negative ( As < 0)  It characterizes left sided skewness, that is, right sided asymmetric distribution, when the mean is smaller than median (fig. 4)

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 56 ExcessExcess KurtosisKurtosis (1)(1)

 Kurtosis characterizes the „peakedness“ (flatness or sharpness) of a distribution  There are more measures of kurtosis, the mostly used is the excess kurtosis  For the sample of the normal distribution, the excess kurtosis is calculated according to one of the equivalent formulas

n n 4 1 4 ⋅ ()− ⋅ ()− n ∑ xi x ∑ xi x i =1 n i =1 or = − Exc = − 3 Exc 2 3 4  n  sx ()− 2 ∑ xi x   i =1 

 Ii is defined only for sx ≠ 0 and n ≥ 4

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 57 ExcessExcess KurtosisKurtosis (2)(2)

 If the calculated excess kurtosis  Equals zero ( Exc = 0)  It characterizes a normal distribution (fig. 7)  Is positive ( Exc > 0)  It characterizes a more peaked distribution (with a sharp peak) than the normal distribution, therefore, more values are concentrated around the central tendency, less values are at the tails of the distribution  Is negative ( Exc < 0)  It characterizes a less peaked distribution (with a rounded peak) than the normal distribution, therefore, less values are concentrated around the central tendency, more values are at the tails of the distribution

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 58 LiteratureLiterature

 HAWKINS, D. Biomeasurement: Understanding, Analysing and Communicating Data in the Biosciences. Oxford University Press : New York, 2007. 300 p. ISBN 9780199265152.  CHAJDIAK, J. Štatistika jednoducho. Statis : Bratislava, 2003. 88 s. ISBN 80 8565928.  KOZLÍKOVÁ, K. Základy spracovania biomedicínskych meraní I. Asklepios : Bratislava, 2003. 88 s. ISBN 8071670642.  KOZLÍKOVÁ, K. Lekárska štatistika I. – Základné pojmy. [Cit. 1. 10. 2012]. Dostupné na internete: http://portal.fmed.uniba.sk/  KOZLÍKOVÁ, K., MARTINKA, J. Základy spracovania biomedicínskych meraní II. Asklepios : Bratislava, 2009. 204 s. ISBN 9788071761374.  KOZLÍKOVÁ, K., MARTINKA, J. Theory And Tasks For Practicals On Medical Biophysics. Librix : Brno, 2010. 248 p. ISBN 9788073998811.  KOZLÍKOVÁ, K., MARTINKA, J., KNEZOVIĆ, R. Otázky na overenie znalostí z vybraných kapitol lekárskej fyziky a biofyziky. EQUILIBRIA : Košice, 2013. 174 p. ISBN 9788081431050.  STN ISO 35341. Štatistika. Slovník a značky. Časť 1: Všeobecné štatistické termíny a termíny používané v teórii pravdepodobnosti. Slovenský ústav technickej normalizácie : Bratislava, 2008. 90 s.

Comment: If not stated else, the author of the text is the author of the figures and graphs as well.

© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 59