Medical Statistics II. Sample Characteristics
MedicalMedical StatisticsStatistics II.II. SampleSample CharacteristicsCharacteristics
Assoc. Prof. Katarína Kozlíková, RN., PhD. IMPhBPhITM FM CU in Bratislava [email protected] ContentContent
Introduction Sample Characteristics Measures of Location Averages Structural Characteristics Measures of Variability Connected with Averages Connected with Structural Characteristics Measures of Shape Skewness Excess Kurtosis Literature
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 2 IntroductionIntroduction
The aim of this lecture is to familiarize students with the sample characteristics that are most commonly used in the evaluation of medical experiments and occur in medical scientific and professional literature It follows up the lecture Medical Statistics I – Basic Terminology and uses the terminology explained there To practice the lectured topics and for further study is recommended to use the literature from the list of literature
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 3 SampleSample CharacteristicsCharacteristics
A descriptive characteristics A number calculated from a statistical set in a well defined way (using a formula) The aim of descriptive characteristics Numbers obtained in this way are used to characterise a set consisting of many numbers by only a few numbers Descriptive statistics A part of statistics that deals with descriptive characteristics Sample characteristics Descriptive characteristics applied to a sample There exist three groups of descriptive statistics Measures of location Measures of variability Measures of shape
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 4 MeasuresMeasures ofof LocationLocation MeasuresMeasures ofof LocationLocation -- IntroductionIntroduction
In the most common case, the obtained data are organised as a set of numbers Quantitative data are usually clustered around a central tendency They are organised according some (statistically defined) distribution A central tendency A single number intended to typify the numbers in the set Measures of location are used for estimation of location of this distribution, or its „middle“, that is, where on the (numerical) axis – on the scale is the position of the central tendency
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 6 MeasuresMeasures ofof LocationLocation –– CentralCentral TendencyTendency (1)(1)
A central tendency can be estimated in following way If all the numbers in the list are the same, then this number should be used If the numbers are not the same, the average is calculated by combining the numbers from the list in a specific way and computing a single number as being the average of the list The method of calculation depends on the distribution of data in the set In this lecture, we assume a normal distribution of data generally A single number is then used for consecutive calculations
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 7 MeasuresMeasures ofof LocationLocation –– CentralCentral TendencyTendency (2)(2)
A central tendency is a measure of the "middle" or "typical" value of a data set Several types of averages are used according to what kind of data are represented by the numbers Means (averages) Median Mode Quantiles
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 8 MeasuresMeasures ofof LocationLocation -- AveragesAverages
Take into account the values of all measurements of the quantity Used for normally distributed quantitative data The most used means Arithmetic mean (weighted) Geometric mean (weighted) Harmonic mean (weighted) Weighted means are used for very large samples, for sorted data (data in classes), for data of different „weights“
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 9 ArithmeticArithmetic MeanMean (Mean(Mean ))
The best estimate of the measured quantity X from n measurements Calculated as the sum of all measured values and then divided by the number of the measurements
x + x + x + ... + x 1 n = 1 2 3 n or another x = ⋅ x x notation ∑ i n n i =1
Used nomenclature: x : average (arithmetic mean) n : sample size th Σ xi : i measurement of the quantity X : summation sign i : measurement number (i = 1, 2, ..., n)
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 10 ArithmeticArithmetic MeanMean -- PropertiesProperties
The sum of differences of all values from the arithmetic mean equals zero Suitable to check the calculation The sum of squared differences of all values from the arithmetic mean is less then the sum of squared differences of all values from any other value Used for calculations of variance and standard deviation An arithmetic mean of arithmetic means of subsets of a set is not the arithmetic mean of the whole set In such case, the weighted arithmetic mean has to be used
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 11 WeightedWeighted ArithmeticArithmetic MeanMean (Weighted(Weighted Mean)Mean)
If different „weights“ (frequencies) of input data have to be considered k ()n ⋅ x n ⋅ x + n ⋅ x + ... + n ⋅ x ∑ j j = 1 1 2 2 k k or another j =1 x notation x = + + + k n1 n2 ... nk ∑ nj j =1 where k n + n + ... + n = n or another n = n 1 2 k notation ∑ j j =1 Used nomenclature: x : mean (arithmetic, weighted) n : sample size th xj : mid point of the j class k : number of classes th Σ nj : frequency („weight”) of the j class : summation sign j : class number (j = 1, 2, ..., k)
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 12 GeometricGeometric MeanMean
Suitable for data that can be expressed using coefficients (for example, mean increase, percentage, area, volume – positive values of the measured quantity having an absolute zero) Obtained by multiplying n positive numbers and then taking th the n root 1 n n = n ⋅ ⋅ ⋅ ⋅ = xG x1 x2 x3 ... xn or another notation xG ∏ xi i =1 1 n ⋅ ∑ logz ()xi = n i=1 or a notation using logarithms xG z Used nomenclature: Π xG : geometric mean : multiplication sign th Σ xi : i measurement of the quantity X : summation sign i : measurement number (i = 1, 2, ..., n) z : basis of logarithm n : sample size © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 13 ComparisonComparison ooff tthehe ArithmeticArithmetic MeanMean aandnd tthehe GeometricGeometric MeanMean
Sum of individual items Sum of means Product of individual items Product of means
x + x + x x = 1 2 3 = x = 3 x ⋅ x ⋅ x = 3 G 1 2 3 + + = 3 ⋅ ⋅ = = 4 18 3 = 25 ≅ 4 18 3 3 3 = 3 216 = 6 ≅ 8.3
Figure 1: An example of an arithmetic mean (left) and a geometric mean (right) calculated from the same data (4; 18; 3).
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 14 WeightedWeighted GeometricGeometric MeanMean
If different „weights“ (frequencies) of input data have to be considered
k n = n n1 ⋅ n2 ⋅ ⋅ nk or another = j xG x1 x2 ... xk notation xG n ∏ x j j =1
where k n + n + ... + n = n or another n = n 1 2 k notation ∑ j j =1 Used nomenclature:
xG : geometric mean n : sample size th xj : mid point of the j class k : number of classes th Σ nj : frequency („weight”) of the i class : summation sign j : class number (j = 1, 2, ..., k)
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 15 HarmonicHarmonic MeanMean
Appropriate for situations when the average of rates or ratios is desired (for example, given effort at open time consumption, parallel connection of electrical resistances) Calculated as the reciprocal of the arithmetic mean of the reciprocals of all measured values −1 n n = or another 1 −1 xH = ⋅ 1 1 1 1 notation xH ∑ xi + + + ... + n i =1 x1 x2 x3 xn
Used nomenclature: : harmonic mean n : sample size xH th Σ xi : i measurement of the quantity X : summation sign i : measurement number (i = 1, 2, ..., n)
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 16 WeightedWeighted HarmonicHarmonic MeanMean
If different „weights“ (frequencies) of input
data have to be considered k + + + + n = n1 n2 n3 ... nk ∑ j xH or another j =1 n1 n2 n3 nk notation x = + + + ... + H k x x x x nj 1 2 3 k ∑ j =1 x j where k n + n + ... + n = n or another n = n 1 2 k notation ∑ j j =1 Used nomenclature:
xH : harmonic mean n : sample size th xj : mid point of the j class k : number of classes th Σ nj : frequency („weight”) of the j class : summation sign j : class number (j = 1, 2, ..., k)
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 17 MeasuresMeasures ofof LocationLocation –– StructuralStructural CharacteristicsCharacteristics
They are based on distribution of elements in an ordered set , that is, they are based on the ranks of the measurements , and not on the values of individual measurements The most used structural characteristics: Median Mode Quantiles: Quartiles Deciles Percentiles
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 18 MedianMedian
Divides an ordered sample into two equally sized parts (with the same probability 0.50) Is calculated according to the below mentioned formulas depending on whether the sample size is odd or even
+ xn xn +1 ~ = ~ = 2 2 Odd n:x xn+1 Even n: x 2 2
Used nomenclature: ~x : median n : sample size th xn : n measurement of the quantity X in the sorted sample
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 19 MedianMedian -- ExamplesExamples
Number of patients for different doctors in the city XY Number of patients for different doctors in the city XZ Number of patients Number of patients 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 0 ABCDEFGHIJKDoctor ABCDEFGHIJDoctor
Figure 2: Examples of medians. In the case of odd number of elements in an ordered sample (the graph on the left), the median equals the „middle“ value (value of the item F). In the case of even number of elements in an ordered sample (the graph on the right), the median is calculated from the two „middle“ values (average of values of the items E and F).
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 20 MedianMedian -- SortedSorted SamplesSamples (can(can bebe omittedomitted withoutwithout lossloss ofof continuitycontinuity ))
In a sorted and ordered sample, the median interval (the median class, in which the median is) is established, from which the median is interpolated ~ h n ~ = ~ + ⋅ − cum x xL ~ n()−1 n 2
Used nomenclature: ~ ~x : median h : width of the median interval ~ : lower end of the median interval xL ~ n : frequency of the median interval n : sample size cum : cumulative frequency of the interval before the median interval n()−1
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 21 ModeMode
The most frequent value in the sample ˆx
Number of outstanding package of different drugs in one day Number of packages 45 40 35 30 25 20 15 10 5 0 ABCDEFGHIJDrug
Figure 3: An example of a mode. The mode is represented by the item G.
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 22 ModeMode –– SortedSorted SampleSample (can be omitted without loss of continuity )
In an ordered sample, the modal interval (the class with the highest frequency) is found, from which the mode can be interpolated in any of the two ways − nˆ(+1) nˆ nˆ(−1) xˆ = xˆ + hˆ ⋅ or xˆ = xˆ + hˆ ⋅ L + L − − nˆ(−1) nˆ(+1) 2nˆ nˆ(−1) nˆ(+1)
Used nomenclature: xˆ : mode hˆ : length of the modal interval
xˆL: lower end of the modal interval nˆ : frequency of the modal interval
nˆ()−1 : frequency of the interval before the modal interval
nˆ()+1 : frequency of the interval after the modal interval
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 23 SymmetricSymmetric DistributionDistribution
The mean, the median, and the mode have the same value ~ = = Age distribution of patients with influenza x x ˆx Number of patients 40
30 Histogram Polygon 20 x ≅ 43 ~x ≅ 43 10 xˆ ≅ 43
0 < 5 5 - 10 10 - 15 - 20 - 25 - 30 - 35 - 40 - 45 - 50 - 55 - 60 - 65 - 70 - 75 - > 80 15 20 25 30 35 40 45 50 55 60 65 70 75 80 Age [y]
Figure 4: A histogram and a polygon of a symmetric distribution. © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 24 AsymmetricAsymmetric DistributionDistribution LeftLeft--SidedSided
At the numerical axis, the mode is at most to the left (has the highest count, but a low value), the mean is at most to the right (has a high value) and the median is in the middle Age distribution of patients with allergy Number of patients ~ 60 < < 50 Histogram ˆx x x Polygon 40
30 xˆ ≅ 22.6 ~ 20 x ≅ 32.9 10 x ≅ 35.4
0 < 5 5 - 10 - 15 - 20 - 25 - 30 - 35 - 40 - 45 - 50 - 55 - 60 - 65 - 70 - 75 - > 80 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 Age [y] Figure 5: A histogram and a polygon of an asymmetric left sided distribution. © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 25 AsymmetricAsymmetric DistributionDistribution RightRight--SidedSided
At the numerical axis, the mean is at most to the left (has a low value), the mode is at most to the right (has the highest count and a high value) and the median is in the middle
Age distribution of patients with osteoporosis Number of patients x < ~x < xˆ 60 50 Histogram Polygon
40
30 x ≅ 53.9 20 ~x ≅ 56.8 10 xˆ ≅ 62.3 0 < 5 5 - 10 - 15 - 20 - 25 - 30 - 35 - 40 - 45 - 50 - 55 - 60 - 65 - 70 - 75 - > 80 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 Age [y] Figure 6: A histogram and a polygon of an asymmetric right sided distribution. © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 26 ApproximateApproximate RelationRelation BetweenBetween Mode,Mode, Mean,Mean, andand MedianMedian
The distance between the mean and the mode is approximately three times larger than is the distance between the median and the mean
ˆx − x ≅ 3 ⋅ (~x − x )
The relation can be used to estimate the mode or to check the correctness of its calculation The mean and the median can usually be calculated more easily
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 27 QuantileQuantile
(m) Quantile K j divides the probability distribution of a random variable into m parts m denotes the number of parts of the set j is the order number of the quantile j = 0, 1, 2, ...., m 1, m Each part has the probability 1/m Quantile value indicates how much of the set has a value less than or equal to the quantile value Quantile – special names: (m) Lower j = 1: K1 (m) Upper j = m 1: Km−1
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 28 CalculationCalculation ooff tthehe QuantileQuantile (can(can bebe omittedomitted withoutwithout lossloss ofof continuitycontinuity )) + (m) = xk xl Kj 2 ⋅ ( + ) ⋅ = j n 1 = j n k and l where m m
⋅ ( + ) ⋅ ( + ) ⋅ ⋅ j n 1 = j n 1 j n = j n and if m m m m
j ⋅ (n+1) j ⋅ n and = + else k = +1 l 1 m m Used nomenclature: n : sample size : floor function (rounds a number down to the next integer) th xk : k ordered measurement of the quantity X th xl : l ordered measurement of the quantity X © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 29 TheThe MostlyMostly UsedUsed QuantilesQuantiles
Quartile Qj Divides an ordered sample into m = 4 parts, all with probability 0.25
Sextile Sj Divides an ordered sample into m = 6 parts, all with probability 1/6
Decile Dj Divides an ordered sample into m = 10 parts, all with probability 0.10
Percentile Pj Divides an ordered sample into m = 100 parts, all with probability 0.01
Examples = = = = ~ Q1 P25 Q2 D5 P50 x = Q3 P75 © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 30 MeasuresMeasures ofof VariabilityVariability MeasuresMeasures ofof VariabilityVariability –– IntroductionIntroduction (1)(1)
Statistical data as numeric variables are always various (they vary, have different values ) A small degree of variability means a small mutual diversity of values of a given variable (the values are very similar) In this case, the central tendencies such as mean, median or mode, are good characteristics of the general size of the value of the variable in the sample High variability means a large mutual difference among values of a given variable In this case, the calculated parameters are good characteristics of the general size of the value of the variable in the sample
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 32 MeasuresMeasures ofof VariabilityVariability –– IntroductionIntroduction (2)(2)
Characteristics of the middle of the set (the central tendencies ) Only indicate the location information of a statistical sample on numerical axis They do not inform How are the values scattered around the centre in the set Whether there are any outliers in the sample Measures of variability (variability characteristics) Reflect the distribution of values of the variable around the mean value of the set
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 33 MeasuresMeasures ofof VariabilityVariability ConnectedConnected withwith AveragesAverages
Take into account the values of the measurements of the quantity X The mostly used measures: Standard deviation Mean deviation (Average error) Standard error of the mean Coefficient of variation Dispersion coefficient
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 34 VarianceVariance (Sample(Sample Variance)Variance)
Characterizes the scatter of measurements around the sample mean Calculated as sum of squared deviations of random variables in a random sample from their sample mean divided by the number of terms in the sum minus one n − 2 ∑ (xi x ) s2 = i =1 x n − 1 Used nomenclature: 2 sx : sample variance n : sample size x : average (arithmetic mean) n 1 : degrees of freedom th Σ xi : i measurement of the quantity X : summation sign i : measurement number (i = 1, 2, ..., n)
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 35 DoDo NotNot ConfuseConfuse thethe SampleSample VarianceVariance withwith thethe PopulationPopulation Variance!Variance!
Population variance
N N − µ 2 ∑(xi ) ∑ xi 2 i =1 i =1 σ = where µ = x N N
Used nomenclature: σ2 x : population variance N : population size µ : true value of the mean Σ : summation sign th xi : i measurement of the quantity X i : measurement number (i = 1, 2, ..., n)
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 36 WeightedWeighted VarianceVariance (Sample(Sample Variance)Variance)
Characterizes the dispersion of all measurements around the weighted arithmetic mean of the sample It is calculated analogously to the "normal" mean, but takes into account the frequency of each class k − 2 ⋅ ∑(x j x) nj 2 = s = k 1 where n + n + ... + n = n x n − 1 1 2 k Used nomenclature: 2 sx : sample variance x :mean (weighted) th xj : mid point of the j class n : sample size th nj : frequency („weight”) of the j class k : number of classes j : class number (j = 1, 2, ..., k) Σ : summation sign
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 37 StandardStandard DeviationDeviation (Sample(Sample StandardStandard Deviation)Deviation)
Characterizes the scatter of measurements around the sample mean Calculated as the square root of the sample variance n − 2 ∑ (xi x ) i =1 or another = 2 s = expression sx sx x n − 1
Used nomenclature: 2 sx: sample standard deviationsx :sample variance x : average (arithmetic mean) n : sample size th xi : i measurement of the quantity X n 1 : degrees of freedom i : measurement number (i = 1, 2, ..., n) Σ : summation sign
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 38 DoDo NotNot ConfuseConfuse TheThe SampleSample StandardStandard DeviationDeviation WithWith TheThe PopulationPopulation StandardStandard Deviation!Deviation!
The square root of the variance
N N (x − µ)2 ∑ i ∑ xi = σ = i 1 i =1 x where µ = N N
Used nomenclature: σ x: population standard deviation N : population size µ : true value of the mean Σ : summation sign th xi : i measurement of the quantity X i : measurement number (i = 1, 2, ..., n)
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 39 WeightedWeighted StandardStandard DeviationDeviation (Sample(Sample StandardStandard DeviationDeviation ))
Characterizes the dispersion of all measurements around the weighted arithmetic mean of the sample It is calculated analogously to the "normal" mean, but takes into account the frequency of each class k − 2 ⋅ ∑(x j x) nj = s = k 1 where n + n + ... + n = n x n − 1 1 2 k Used nomenclature:
s x: sample standard deviationx :mean (weighted) th xj : mid point of the j class n : sample size th nj : frequency („weight”) of the j class k : number of classes j : class number (j = 1, 2, ..., k) Σ : summation sign
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 40 MeanMean DeviationDeviation (Average(Average Error)Error)
n Mean absolute deviation − ∑ xi x s = i =1 abs n
Can be calculated also
from median or from mode n n ~ − x − x ∑ xi xˆ ∑ i = = = i 1 s = i 1 sabs abs n n
Used nomenclature:
Sabs : mean deviation n : sample size x : average (arithmetic mean) Σ : summation sign ~x : medianxˆ : mode th xi : i measurement of the quantity X i : measurement number (i = 1, 2, ..., n)
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 41 StandardStandard ErrorError ooff tthehe MeanMean (Error(Error iinn tthehe Mean)Mean)
Used for comparison of sample means (from the same population) n (x − x )2 s ∑ i = x i =1 sex or se = n x (n − 1) ⋅ n
Used nomenclature:
sx: sample standard deviationsex : sample standard error x : average (arithmetic mean) n : sample size th Σ xi : i measurement of the quantity X : summation sign i : measurement number (i = 1, 2, ..., n)
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 42 WeightedWeighted ErrorError inin thethe MeanMean
Used for comparison of sample means (from the same population)
k − 2 ⋅ ∑(x j x) nj k se = j =1 n = n x − ⋅ where ∑ j (n 1) n j =1
Used nomenclature: : standard error of the meanx :mean (weighted) sex th xj : mid point of the j class n : sample size th nj : frequency („weight”) of the j class k : number of classes j : class number (j = 1, 2, ..., k) Σ : summation sign
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 43 CoefficientCoefficient ooff VariationVariation (1)(1) (Sample(Sample CoefficientCoefficient ooff Variation)Variation)
It is a characteristic of variability (probability distribution) of a random variable In general, it is defined as the ratio of the standard deviation and the (absolute) value of the mean (average) s V = x ⋅100% x x
Used nomenclature:
Vx : variability coefficientsx : sample standard deviation x : average (arithmetic mean) n : sample size
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 44 CoefficientCoefficient ofof VariationVariation ((22))
Is a relative measure of variability Is not affected by the absolute values of the character Is used For a mutual comparison of variability of two or more sets with significantly different levels of values (for example, mass in grams and kilograms) For statistical control (laboratory results)
x © Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 45 CoefficientCoefficient ofof DispersionDispersion (can(can bebe omittedomitted withoutwithout lossloss ofof continuitycontinuity ))
Is a relative measure of variability (probability distribution) of a random variable In general, it is defined as the mean deviation calculated from the median and the (absolute) value of the median n 1 ⋅ − ~ ∑ xi x n = d = i 1 x ~x
Used nomenclature: ~ dx : coefficient of dispersion x : median th xi : value of the i measurement n : sample size
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 46 MeasuresMeasures ofof VariabilityVariability ConnectedConnected withwith StructuralStructural CharacteristicsCharacteristics
Take into account the ranks of the measurements of the quantity X in an ordered sample The mostly used measures Ranges Range (Sample range) Inter quantile range Inter quartile range Inter quartile deviation
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 47 RangeRange (Sample(Sample Range)Range)
The difference between the highest value (maximum) and the lowest value (minimum) of the sample The largest ordered statistics minus the smallest ordered statistics = − R xmax xmin
Used nomenclature: R : range
xmax : the maximal value of the measured quantity X xmin : the minimal value of the measured quantity X
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 48 InterInter--QuantileQuantile RangeRange
The difference between the upper and the lower quantile = (m) − (m) RK Km−1 K1
A nonparametric measure of variability It does not take into account the size of all values file Excludes the impact of outliers In the case of deciles, in this interval would lie 80% of values (D9 – D1) In the case of percentiles, in this interval would lie 98% of values (P99 – P1)
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 49 InterInter--QuartileQuartile RangeRange
The difference between the upper quartile and the lower quartile = − RQ Q3 Q1
A nonparametric measure of variability A special case of the inter quantile range The interval, in which lies the half of the sample set It is often used with the median Is often used to find outliers in the sample
Outliers are observations that fall below Q1 1.5 ∙ RQ or above Q3 + 1.5 ∙ RQ
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 50 InterInter--QuartileQuartile RangeRange –– ExampleExample
Figure 7: Box plot (plot at the top) with an
inter quartile range (R Q = IQR = Q3 –Q1) and a probability density function of a normal N(0,σ2) population
Available at: http://en.wikipedia.org/wiki/Interquartile_range [17. 9. 2013]
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 51 QuantileQuantile DeviationDeviation
Determined as an average of positive deviations of neighbouring quantiles (m) − (m) Km−1 K1 After editing, the formula is R ()m = K m − 2
Measure of variability, which is neither affected by extremely high nor by extremely low values It takes into account the variability of intermediate values In the case of deciles, variability − of 80% of values = D9 D1 RD 8 In the case of percentiles, variability 8 of 98% of values P − P R = 99 1 P 98 98
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 52 QuartileQuartile DeviationDeviation
Determined as an average of positive deviations of neighbouring quartiles Q − Q R = 3 1 Q 2 2
It is a measure of the spread through the middle half of a distribution It is neither influenced by extremely high nor extremely low scores Quartile deviation is an ordinal statistic and is most often used in conjunction with the median
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 53 MeasuresMeasures ofof ShapeShape SkewnessSkewness (1)(1)
Skewness measures the direction and degree of asymmetry of the distribution of a random variable There are more measures of skewness, the most used is the coefficient of skewness For the selection of the normal distribution, the skewness is calculated according to one of the equivalent formulas
n n 1 3 ⋅ ()− ⋅ ()− 3 ∑ xi x n ∑ xi x n = As = i 1 or As = i =1 3 3 sx n ()− 2 ∑ xi x i =1
It is defined only for sx ≠ 0 and n ≥ 3
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 55 SkewnessSkewness (2)(2)
If the calculated skewness Equals zero ( As = 0) It characterizes a symmetric distribution (fig. 2 and 7) Is positive ( As > 0) It characterizes right sided skewness, that is, left sided asymmetric distribution, when the mean is greater than the median (fig. 3) Is negative ( As < 0) It characterizes left sided skewness, that is, right sided asymmetric distribution, when the mean is smaller than median (fig. 4)
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 56 ExcessExcess KurtosisKurtosis (1)(1)
Kurtosis characterizes the „peakedness“ (flatness or sharpness) of a distribution There are more measures of kurtosis, the mostly used is the excess kurtosis For the sample of the normal distribution, the excess kurtosis is calculated according to one of the equivalent formulas
n n 4 1 4 ⋅ ()− ⋅ ()− n ∑ xi x ∑ xi x i =1 n i =1 or = − Exc = − 3 Exc 2 3 4 n sx ()− 2 ∑ xi x i =1
Ii is defined only for sx ≠ 0 and n ≥ 4
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 57 ExcessExcess KurtosisKurtosis (2)(2)
If the calculated excess kurtosis Equals zero ( Exc = 0) It characterizes a normal distribution (fig. 7) Is positive ( Exc > 0) It characterizes a more peaked distribution (with a sharp peak) than the normal distribution, therefore, more values are concentrated around the central tendency, less values are at the tails of the distribution Is negative ( Exc < 0) It characterizes a less peaked distribution (with a rounded peak) than the normal distribution, therefore, less values are concentrated around the central tendency, more values are at the tails of the distribution
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 58 LiteratureLiterature
HAWKINS, D. Biomeasurement: Understanding, Analysing and Communicating Data in the Biosciences. Oxford University Press : New York, 2007. 300 p. ISBN 978 0199265152. CHAJDIAK, J. Štatistika jednoducho. Statis : Bratislava, 2003. 88 s. ISBN 80 85659 28. KOZLÍKOVÁ, K. Základy spracovania biomedicínskych meraní I. Asklepios : Bratislava, 2003. 88 s. ISBN 80 7167 064 2. KOZLÍKOVÁ, K. Lekárska štatistika I. – Základné pojmy. [Cit. 1. 10. 2012]. Dostupné na internete: http://portal.fmed.uniba.sk/ KOZLÍKOVÁ, K., MARTINKA, J. Základy spracovania biomedicínskych meraní II. Asklepios : Bratislava, 2009. 204 s. ISBN 978 80 7176 137 4. KOZLÍKOVÁ, K., MARTINKA, J. Theory And Tasks For Practicals On Medical Biophysics. Librix : Brno, 2010. 248 p. ISBN 978 80 7399 881 1. KOZLÍKOVÁ, K., MARTINKA, J., KNEZOVIĆ, R. Otázky na overenie znalostí z vybraných kapitol lekárskej fyziky a biofyziky. EQUILIBRIA : Košice, 2013. 174 p. ISBN 978 80 8143 105 0. STN ISO 3534 1. Štatistika. Slovník a značky. Časť 1: Všeobecné štatistické termíny a termíny používané v teórii pravdepodobnosti. Slovenský ústav technickej normalizácie : Bratislava, 2008. 90 s.
Comment: If not stated else, the author of the text is the author of the figures and graphs as well.
© Katarína Kozlíková, 2013 2013/14 Medical Statistics II Sample characteristcs 59