Numerical Descriptive Measures
Numerical Descriptive Measures
Jiˇr´ıNeubauer
Department of Econometrics FVL UO Brno office 69a, tel. 973 442029 email:[email protected]
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Numerical Descriptive Measures
measures of location (center) measures of dispersion (variation) measures of concentration
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Arithmetic mean
The most important aspect of studying the distribution of a sample of measurements is locating the position of a central value about which the measurements are distributed. Definition The arithmetic mean (average) of a set of n measurements x1, x2 ..., xn is given by the formula
n 1 X x = x . n i i=1
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Arithmetic mean
If the data are organized in the frequency distribution table then we can calculate the mean by the formula
k 1 X x = n · x , n j j j=1
where n1, n2,..., nk are frequencies of variable varieties x1, x2 ..., xk .
Jiˇr´ıNeubauer Numerical Descriptive Measures if the variable is constant then the mean is equal to this constant
n 1 X c = c, n i=1
Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Arithmetic mean
Elementary properties of the arithmetic mean: the sum of deviations between the values and the mean is equal to zero n X (xi − x) = 0, i=1
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Arithmetic mean
Elementary properties of the arithmetic mean: the sum of deviations between the values and the mean is equal to zero n X (xi − x) = 0, i=1 if the variable is constant then the mean is equal to this constant
n 1 X c = c, n i=1
Jiˇr´ıNeubauer Numerical Descriptive Measures if we multiply the values of the variable by a constant c, then
n 1 X c · x = c · x. n i i=1
Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Arithmetic mean
Elementary properties of the arithmetic mean: if we add a constant to the values of the variable, then
n 1 X (x + c) = c + x, n i i=1
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Arithmetic mean
Elementary properties of the arithmetic mean: if we add a constant to the values of the variable, then
n 1 X (x + c) = c + x, n i i=1 if we multiply the values of the variable by a constant c, then
n 1 X c · x = c · x. n i i=1
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Harmonic mean
Definition
The harmonic mean of a set of n measurements x1, x2 ..., xn is given by the formula n x H = n . P 1 xi i=1
In certain situations, especially many situations involving rates and ratios, the harmonic mean provides the truest average.
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Geometric mean
Definition
The geometric mean of a set of n measurements x1, x2 ..., xn is given by the formula √ n x G = x1 · x2 ··· xn.
The geometric mean may be more appropriate than the arithmetic mean for describing percentage growth. Suppose an orange tree yields 100 oranges one year, then 180, 210 and 300 the following years, so the growth is 80%, 16.7% and 42.9% for each of the years. Using the arithmetic mean, we can calculate an average growth as 46.5% (80% + 16.7% + 42.9 % divided by 3). However, if we start with 100 oranges and let it grow with 46.5% for three years, the result is 314 oranges, not 300.
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example
Calculate the arithmetic, harmonic and geometric mean of 1, 2, 5, 6, 7, 8, 8, 9. The arithmetic mean is 1+2+5+6+7+8+8+9 x = = 5.75. 8 The harmonic mean is 8 . x H = 1 1 1 1 1 1 1 1 = 3.375. 1 + 2 + 5 + 6 + 7 + 8 + 8 + 9 The geometric mean is √ 8 . x G = 1 · 2 · 5 · 6 · 7 · 8 · 8 · 9 = 4.709.
Notice that x H ≤ x G ≤ x.
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile
Definition
The quantile xp is the value of the variable which fulfils that 100p % of values of ordered sample (or population) are smaller or equal to xp and 100(1 − p) % of values of ordered sample (or population) are larger or equal to xp.
The quantile is not uniquely defined.
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile
Let us have the data set 2 5 7 10 12 13 18 21. Possible methods of calculation
Sort the data in ascending order. Find the sequential index ip of the quantile xp, which fulfils inequation
np < ip < np + 1.
The quantile xp is then equal to the value of variable with the
sequential index ip – xp = x(ip ). If np, np + 1 are integer, we calculate x(np)+x(np+1) the quantile as an aritmetic mean of x(np) a x(np+1), xp = 2 . Statistical software STATISTICA uses this method.
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile
According to MATLAB We calculate np + np + 1 2np + 1 ¯i = = p 2 2 determining the location of the quantile. Using linear interpolation we get
x = x + (x − x )(¯i − [¯i ]), p ([¯ip ]) ([¯ip ]+1) ([¯ip ]) p p
where [·] denotes the integer part of the number. If ¯ip < 1 then ¯ xp = x(1), if ip > n then xp = x(n).
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile
According to EXCEL 1 2 n−2 We assign values 0, n−1 , n−1 ,..., n−1 , 1 to the data sorted in 1 ascending order. If P is equal to the multiple of n−1 , the quantile xp is equal to the value corresponding to the given multiple. If P is not 1 the multiple n−1 , we use linear interpolation.
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile
xp 0.10 0.25 0.50 0.75 0.90 STATISTICA 2 6 11 15.5 21 MATLAB 2.9 6 11 15.5 20.1 EXCEL 4.1 6.5 11 14.25 18.9
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example
Calculate the median, lower and upper quartile and lower and upper decile of 1, 2, 5, 6, 7, 8, 8, 9. The range of the data set is n = 8. The median is the middle value of the data sorted in ascending order. There is not one middle value, but two (6 and 7). We calculate the median as
6 + 7 x˜ = x = = 6.5. 0.50 2 Interpretation: 50 % of ordered values are smaller or equal to 6.5, do not exceed value 6.5.
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example
Lower quartile x0.25. Usign the formula
np < ip < np + 1
we get 8 · 0.25 < ip < 8 · 0.25 + 1 ⇔ 2 < ip < 3. x + x 2 + 5 x = (2) (3) = = 3.5. 0.25 2 2
Analogously for upper decile: x0.90, 8 · 0.90 < ip < 8 · 0.90 + 1 ⇔ 7.2 < ip < 8.2, we get ip = 8 and
x0.90 = x(8) = 9.
We say that 25 % of ordered values are smaller or equal to 3.5. Analogously 90 % of values do not exceed 9.
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Mode
Definition The mode xˆ is the value of variable with the highest frequency.
In the case of continuous variable (data) the mode is the value where the histogram reaches its peak.
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Mode
Figure: Non-homogeneous sample
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Measures of Dispersion
Means, quantiles and a mode – measures of location – describe one property of frequency distribution – location. Another important property is dispersion (variation) which we describe by several measures of variation
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Measures of Dispersion
Figure: Two samples with different variation
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Range of Variation
Definition The range of variation R is defined as difference between the largest and the smallest value of the variable
R = xmax − xmin.
It is the simplest but the rawest measure of variation. It indicates the width of the interval where all values are included.
Jiˇr´ıNeubauer Numerical Descriptive Measures the interdecile range
RD = x0.90 − x0.10
the interpercentile range
RC = x0.99 − x0.01
Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Interquantile Ranges
Definition the interquartile range
RQ = x0.75 − x0.25
Jiˇr´ıNeubauer Numerical Descriptive Measures the interpercentile range
RC = x0.99 − x0.01
Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Interquantile Ranges
Definition the interquartile range
RQ = x0.75 − x0.25
the interdecile range
RD = x0.90 − x0.10
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Interquantile Ranges
Definition the interquartile range
RQ = x0.75 − x0.25
the interdecile range
RD = x0.90 − x0.10
the interpercentile range
RC = x0.99 − x0.01
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Interquantile Ranges
The interquartile range indicates the width of the interval which includes 50 % of middle values of ordered sample. By analogy the interdecile or the interpercentile range indicatethe width of the interval which includes 80 % or 98 % of middle values of ordered sample.
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example
We have calculated quantiles of the data 2, 5, 7, 10, 12, 13, 18 and 21. According to STATISTICA: x0.10 = 2, x0.25 = 6, x0.50 = 11, x0.75 = 15.5, x0.90 = 21.
The range of variation is R = xmax − xmin = 21 − 2 = 19. The interquartile range is RQ = x0.75 − x0.25 = 15.5 − 6 = 9.5. The interdecile range is RD = x0.90 − x0.10 = 21 − 2 = 19.
Jiˇr´ıNeubauer Numerical Descriptive Measures the decile deviation D = RD /8 the percentile deviation
C = RC /98
Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile Deviations
Definition the quartile deviation Q = RQ /2
Jiˇr´ıNeubauer Numerical Descriptive Measures the percentile deviation
C = RC /98
Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile Deviations
Definition the quartile deviation Q = RQ /2 the decile deviation D = RD /8
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile Deviations
Definition the quartile deviation Q = RQ /2 the decile deviation D = RD /8 the percentile deviation
C = RC /98
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example
Calculate the quartile and the decile deviation of 2, 5, 7, 10, 12, 13, 18 and 21.
The quartile deviation is Q = RQ /2 = 9,5/2 = 4,75. The decile deviation is D = RD /8 = 19/8 = 2,375. It means that the average width of two (eight) middle quartile (decile) intervals is 4.75 (2.375).
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Average Deviation
Definition The average deviation is defined as the arithmetic mean of the absolute deviations n 1 X d = |x − x|. x n i i=1
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example
Find the average deviation of a data set 1, 2, 5, 6, 7, 8, 8 and 9. The arithmetic mean is x = 5.75. We obtain |1 − 5.75| + |2 − 5.75| + |5 − 5.75| + |6 − 5.75| d = + x 8 |7 − 5.75| + |8 − 5.75| + |8 − 5.75| + |9 − 5.75| + = 2.3125. 8
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Variance
Definition 2 The variance sn is defined as the arithmetic mean of squares of deviations n 1 X s2 = (x − x)2. n n i i=1
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Variance
n n n n ! 1 X 1 X X X s2 = (x − x)2 = x 2 − 2x x + x 2 n n i n i i i=1 i=1 i=1 i=1 n ! n 1 X 1 X = x 2 − 2nx 2 − nx 2 = x 2 − x 2 = x 2 − x 2. n i n i i=1 i=1
Jiˇr´ıNeubauer Numerical Descriptive Measures if we add a constant to the values of the variable, then
n 1 X [(x + c) − (x + c)]2 = s2, n i n i=1 if we multiply the values of the variable by a constant c, then
n 1 X (c · x − c · x)2 = c2 · s2. n i n i=1
Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Variance
Elementary properties of the variance: if the variable is constant and is equal to c, then the variance is zero
n 1 X (c − c)2 = 0, n i=1
Jiˇr´ıNeubauer Numerical Descriptive Measures if we multiply the values of the variable by a constant c, then
n 1 X (c · x − c · x)2 = c2 · s2. n i n i=1
Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Variance
Elementary properties of the variance: if the variable is constant and is equal to c, then the variance is zero
n 1 X (c − c)2 = 0, n i=1 if we add a constant to the values of the variable, then
n 1 X [(x + c) − (x + c)]2 = s2, n i n i=1
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Variance
Elementary properties of the variance: if the variable is constant and is equal to c, then the variance is zero
n 1 X (c − c)2 = 0, n i=1 if we add a constant to the values of the variable, then
n 1 X [(x + c) − (x + c)]2 = s2, n i n i=1 if we multiply the values of the variable by a constant c, then
n 1 X (c · x − c · x)2 = c2 · s2. n i n i=1
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Standard Deviation
Definition The square root of the variance is called the standard deviation
p 2 sn = sn
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Sample Variance and Standard Deviation
Definition The sample variance s2 if defined by the formula
n 1 X s2 = (x − x¯)2, n − 1 i i=1 the square root of the sample variance is called the sample standard deviation √ s = s2. It is obvious that n − 1 s2 = s2. n n
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example
Calculate the variance, the standard deviation, the sample variance and the sample standard deviation of the data set 1, 2, 5, 6, 7, 8, 8 and 9. The arithmetic mean is x = 5.75. (1 − 5.75)2 + (2 − 5.75)2 + (5 − 5.75)2 + (6 − 5.75)2 s2 = + n 8 (7 − 5.75)2 + (8 − 5.75)2 + (8 − 5.75)2 + (9 − 5.75)2 + = 7.4375. 8
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example
2 2 2 The variance can be also calculated by the formula sn = x − x .
n 1 X 12 + 22 + 52 + 62 + 72 + 82 + 82 + 92 x 2 = x 2 = = 40.5, n i 8 i=1
2 2 2 2 sn = x − x = 40.5 − 5.75 = 7.4375. The standard deviation is √ p 2 . sn = sn = 7.4375 = 2.72718.
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example
To get the sample variation we apply the formula n 8 s2 = s2 = · 7.4375 = 8.5. n − 1 n 7 The sample standard deviation is √ √ . s = s2 = 8.5 = 2.91548.
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Moments
Definition The r th moment is defined by the formula
n 1 X m0 = x r , r n i i=1
The r th central moment is defined by the formula
n 1 X m = (x − x)r . r n i i=1
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Sample Skewness
Definition The sample skewness is defined by the formula
n P(x − x)3 m i m a = 3 = i=1 = 3 3 3/2 ns3 s3 m2 n n
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Sample Skewness
Figure: Frequency distribution with the different sample skewness
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Sample Kurtosis
Definition The sample kurtosis is defined by the formula
n P 4 (xi − x) m4 i=1 a4 = 2 − 3 = 4 − 3 m2 nsn
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Sample Kurtosis
Figure: Frequency distribution with the different sample kurtosis
Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Note
Excel functions SKEW and KURT calculate skewness and kurtosis by formulas n 3 n X xi − x a∗ = , 3 (n − 1)(n − 2) s i=1 n 4 2 n(n + 1) X xi − x 3(n − 1) a∗ = − . 4 (n − 1)(n − 2)(n − 3) s (n − 2)(n − 3) i=1 It can be derived that
n − 2 ∗ a3 = · a , pn(n − 1) 3 (n − 2)(n − 3) 6 a = · a∗ − . 4 n2 − 1 4 n + 1
Jiˇr´ıNeubauer Numerical Descriptive Measures