Numerical Descriptive Measures

Numerical Descriptive Measures

Jiˇr´ıNeubauer

Department of Econometrics FVL UO Brno office 69a, tel. 973 442029 email:[email protected]

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Numerical Descriptive Measures

measures of location (center) measures of dispersion (variation) measures of concentration

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Arithmetic mean

The most important aspect of studying the distribution of a sample of measurements is locating the position of a central value about which the measurements are distributed. Definition The arithmetic mean (average) of a set of n measurements x1, x2 ..., xn is given by the formula

n 1 X x = x . n i i=1

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Arithmetic mean

If the data are organized in the frequency distribution table then we can calculate the mean by the formula

k 1 X x = n · x , n j j j=1

where n1, n2,..., nk are frequencies of variable varieties x1, x2 ..., xk .

Jiˇr´ıNeubauer Numerical Descriptive Measures if the variable is constant then the mean is equal to this constant

n 1 X c = c, n i=1

Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Arithmetic mean

Elementary properties of the arithmetic mean: the sum of deviations between the values and the mean is equal to zero n X (xi − x) = 0, i=1

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Arithmetic mean

Elementary properties of the arithmetic mean: the sum of deviations between the values and the mean is equal to zero n X (xi − x) = 0, i=1 if the variable is constant then the mean is equal to this constant

n 1 X c = c, n i=1

Jiˇr´ıNeubauer Numerical Descriptive Measures if we multiply the values of the variable by a constant c, then

n 1 X c · x = c · x. n i i=1

Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Arithmetic mean

Elementary properties of the arithmetic mean: if we add a constant to the values of the variable, then

n 1 X (x + c) = c + x, n i i=1

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Arithmetic mean

Elementary properties of the arithmetic mean: if we add a constant to the values of the variable, then

n 1 X (x + c) = c + x, n i i=1 if we multiply the values of the variable by a constant c, then

n 1 X c · x = c · x. n i i=1

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Harmonic mean

Definition

The harmonic mean of a set of n measurements x1, x2 ..., xn is given by the formula n x H = n . P 1 xi i=1

In certain situations, especially many situations involving rates and ratios, the harmonic mean provides the truest average.

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Geometric mean

Definition

The geometric mean of a set of n measurements x1, x2 ..., xn is given by the formula √ n x G = x1 · x2 ··· xn.

The geometric mean may be more appropriate than the arithmetic mean for describing percentage growth. Suppose an orange tree yields 100 oranges one year, then 180, 210 and 300 the following years, so the growth is 80%, 16.7% and 42.9% for each of the years. Using the arithmetic mean, we can calculate an average growth as 46.5% (80% + 16.7% + 42.9 % divided by 3). However, if we start with 100 oranges and let it grow with 46.5% for three years, the result is 314 oranges, not 300.

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example

Calculate the arithmetic, harmonic and geometric mean of 1, 2, 5, 6, 7, 8, 8, 9. The arithmetic mean is 1+2+5+6+7+8+8+9 x = = 5.75. 8 The harmonic mean is 8 . x H = 1 1 1 1 1 1 1 1 = 3.375. 1 + 2 + 5 + 6 + 7 + 8 + 8 + 9 The geometric mean is √ 8 . x G = 1 · 2 · 5 · 6 · 7 · 8 · 8 · 9 = 4.709.

Notice that x H ≤ x G ≤ x.

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile

Definition

The quantile xp is the value of the variable which fulfils that 100p % of values of ordered sample (or population) are smaller or equal to xp and 100(1 − p) % of values of ordered sample (or population) are larger or equal to xp.

The quantile is not uniquely defined.

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile

Let us have the data set 2 5 7 10 12 13 18 21. Possible methods of calculation

Sort the data in ascending order. Find the sequential index ip of the quantile xp, which fulfils inequation

np < ip < np + 1.

The quantile xp is then equal to the value of variable with the

sequential index ip – xp = x(ip ). If np, np + 1 are integer, we calculate x(np)+x(np+1) the quantile as an aritmetic mean of x(np) a x(np+1), xp = 2 . Statistical software STATISTICA uses this method.

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile

According to MATLAB We calculate np + np + 1 2np + 1 ¯i = = p 2 2 determining the location of the quantile. Using linear interpolation we get

x = x + (x − x )(¯i − [¯i ]), p ([¯ip ]) ([¯ip ]+1) ([¯ip ]) p p

where [·] denotes the integer part of the number. If ¯ip < 1 then ¯ xp = x(1), if ip > n then xp = x(n).

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile

According to EXCEL 1 2 n−2 We assign values 0, n−1 , n−1 ,..., n−1 , 1 to the data sorted in 1 ascending order. If P is equal to the multiple of n−1 , the quantile xp is equal to the value corresponding to the given multiple. If P is not 1 the multiple n−1 , we use linear interpolation.

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile

xp 0.10 0.25 0.50 0.75 0.90 STATISTICA 2 6 11 15.5 21 MATLAB 2.9 6 11 15.5 20.1 EXCEL 4.1 6.5 11 14.25 18.9

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example

Calculate the , lower and upper quartile and lower and upper decile of 1, 2, 5, 6, 7, 8, 8, 9. The of the data set is n = 8. The median is the middle value of the data sorted in ascending order. There is not one middle value, but two (6 and 7). We calculate the median as

6 + 7 x˜ = x = = 6.5. 0.50 2 Interpretation: 50 % of ordered values are smaller or equal to 6.5, do not exceed value 6.5.

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example

Lower quartile x0.25. Usign the formula

np < ip < np + 1

we get 8 · 0.25 < ip < 8 · 0.25 + 1 ⇔ 2 < ip < 3. x + x 2 + 5 x = (2) (3) = = 3.5. 0.25 2 2

Analogously for upper decile: x0.90, 8 · 0.90 < ip < 8 · 0.90 + 1 ⇔ 7.2 < ip < 8.2, we get ip = 8 and

x0.90 = x(8) = 9.

We say that 25 % of ordered values are smaller or equal to 3.5. Analogously 90 % of values do not exceed 9.

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Mode

Definition The mode xˆ is the value of variable with the highest frequency.

In the case of continuous variable (data) the mode is the value where the histogram reaches its peak.

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Mode

Figure: Non-homogeneous sample

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Measures of Dispersion

Means, quantiles and a mode – measures of location – describe one property of frequency distribution – location. Another important property is dispersion (variation) which we describe by several measures of variation

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Measures of Dispersion

Figure: Two samples with different variation

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Range of Variation

Definition The range of variation R is defined as difference between the largest and the smallest value of the variable

R = xmax − xmin.

It is the simplest but the rawest measure of variation. It indicates the width of the interval where all values are included.

Jiˇr´ıNeubauer Numerical Descriptive Measures the interdecile range

RD = x0.90 − x0.10

the interpercentile range

RC = x0.99 − x0.01

Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Interquantile Ranges

Definition the

RQ = x0.75 − x0.25

Jiˇr´ıNeubauer Numerical Descriptive Measures the interpercentile range

RC = x0.99 − x0.01

Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Interquantile Ranges

Definition the interquartile range

RQ = x0.75 − x0.25

the interdecile range

RD = x0.90 − x0.10

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Interquantile Ranges

Definition the interquartile range

RQ = x0.75 − x0.25

the interdecile range

RD = x0.90 − x0.10

the interpercentile range

RC = x0.99 − x0.01

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Interquantile Ranges

The interquartile range indicates the width of the interval which includes 50 % of middle values of ordered sample. By analogy the interdecile or the interpercentile range indicatethe width of the interval which includes 80 % or 98 % of middle values of ordered sample.

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example

We have calculated quantiles of the data 2, 5, 7, 10, 12, 13, 18 and 21. According to STATISTICA: x0.10 = 2, x0.25 = 6, x0.50 = 11, x0.75 = 15.5, x0.90 = 21.

The range of variation is R = xmax − xmin = 21 − 2 = 19. The interquartile range is RQ = x0.75 − x0.25 = 15.5 − 6 = 9.5. The interdecile range is RD = x0.90 − x0.10 = 21 − 2 = 19.

Jiˇr´ıNeubauer Numerical Descriptive Measures the decile deviation D = RD /8 the percentile deviation

C = RC /98

Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile Deviations

Definition the quartile deviation Q = RQ /2

Jiˇr´ıNeubauer Numerical Descriptive Measures the percentile deviation

C = RC /98

Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile Deviations

Definition the quartile deviation Q = RQ /2 the decile deviation D = RD /8

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Quantile Deviations

Definition the quartile deviation Q = RQ /2 the decile deviation D = RD /8 the percentile deviation

C = RC /98

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example

Calculate the quartile and the decile deviation of 2, 5, 7, 10, 12, 13, 18 and 21.

The quartile deviation is Q = RQ /2 = 9,5/2 = 4,75. The decile deviation is D = RD /8 = 19/8 = 2,375. It means that the average width of two (eight) middle quartile (decile) intervals is 4.75 (2.375).

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Average Deviation

Definition The average deviation is defined as the arithmetic mean of the absolute deviations n 1 X d = |x − x|. x n i i=1

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example

Find the average deviation of a data set 1, 2, 5, 6, 7, 8, 8 and 9. The arithmetic mean is x = 5.75. We obtain |1 − 5.75| + |2 − 5.75| + |5 − 5.75| + |6 − 5.75| d = + x 8 |7 − 5.75| + |8 − 5.75| + |8 − 5.75| + |9 − 5.75| + = 2.3125. 8

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration

Definition 2 The variance sn is defined as the arithmetic mean of squares of deviations n 1 X s2 = (x − x)2. n n i i=1

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Variance

n n n n ! 1 X 1 X X X s2 = (x − x)2 = x 2 − 2x x + x 2 n n i n i i i=1 i=1 i=1 i=1 n ! n 1 X 1 X = x 2 − 2nx 2 − nx 2 = x 2 − x 2 = x 2 − x 2. n i n i i=1 i=1

Jiˇr´ıNeubauer Numerical Descriptive Measures if we add a constant to the values of the variable, then

n 1 X [(x + c) − (x + c)]2 = s2, n i n i=1 if we multiply the values of the variable by a constant c, then

n 1 X (c · x − c · x)2 = c2 · s2. n i n i=1

Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Variance

Elementary properties of the variance: if the variable is constant and is equal to c, then the variance is zero

n 1 X (c − c)2 = 0, n i=1

Jiˇr´ıNeubauer Numerical Descriptive Measures if we multiply the values of the variable by a constant c, then

n 1 X (c · x − c · x)2 = c2 · s2. n i n i=1

Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Variance

Elementary properties of the variance: if the variable is constant and is equal to c, then the variance is zero

n 1 X (c − c)2 = 0, n i=1 if we add a constant to the values of the variable, then

n 1 X [(x + c) − (x + c)]2 = s2, n i n i=1

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Variance

Elementary properties of the variance: if the variable is constant and is equal to c, then the variance is zero

n 1 X (c − c)2 = 0, n i=1 if we add a constant to the values of the variable, then

n 1 X [(x + c) − (x + c)]2 = s2, n i n i=1 if we multiply the values of the variable by a constant c, then

n 1 X (c · x − c · x)2 = c2 · s2. n i n i=1

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration

Definition The square root of the variance is called the standard deviation

p 2 sn = sn

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Sample Variance and Standard Deviation

Definition The sample variance s2 if defined by the formula

n 1 X s2 = (x − x¯)2, n − 1 i i=1 the square root of the sample variance is called the sample standard deviation √ s = s2. It is obvious that n − 1 s2 = s2. n n

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example

Calculate the variance, the standard deviation, the sample variance and the sample standard deviation of the data set 1, 2, 5, 6, 7, 8, 8 and 9. The arithmetic mean is x = 5.75. (1 − 5.75)2 + (2 − 5.75)2 + (5 − 5.75)2 + (6 − 5.75)2 s2 = + n 8 (7 − 5.75)2 + (8 − 5.75)2 + (8 − 5.75)2 + (9 − 5.75)2 + = 7.4375. 8

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example

2 2 2 The variance can be also calculated by the formula sn = x − x .

n 1 X 12 + 22 + 52 + 62 + 72 + 82 + 82 + 92 x 2 = x 2 = = 40.5, n i 8 i=1

2 2 2 2 sn = x − x = 40.5 − 5.75 = 7.4375. The standard deviation is √ p 2 . sn = sn = 7.4375 = 2.72718.

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Example

To get the sample variation we apply the formula n 8 s2 = s2 = · 7.4375 = 8.5. n − 1 n 7 The sample standard deviation is √ √ . s = s2 = 8.5 = 2.91548.

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Moments

Definition The r th moment is defined by the formula

n 1 X m0 = x r , r n i i=1

The r th central moment is defined by the formula

n 1 X m = (x − x)r . r n i i=1

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Sample Skewness

Definition The sample skewness is defined by the formula

n P(x − x)3 m i m a = 3 = i=1 = 3 3 3/2 ns3 s3 m2 n n

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Sample Skewness

Figure: Frequency distribution with the different sample skewness

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Sample Kurtosis

Definition The sample kurtosis is defined by the formula

n P 4 (xi − x) m4 i=1 a4 = 2 − 3 = 4 − 3 m2 nsn

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Sample Kurtosis

Figure: Frequency distribution with the different sample kurtosis

Jiˇr´ıNeubauer Numerical Descriptive Measures Measures of Location Numerical Descriptive Measures Measures of Dispersion Measures of Concentration Note

Excel functions SKEW and KURT calculate skewness and kurtosis by formulas n  3 n X xi − x a∗ = , 3 (n − 1)(n − 2) s i=1 n  4 2 n(n + 1) X xi − x 3(n − 1) a∗ = − . 4 (n − 1)(n − 2)(n − 3) s (n − 2)(n − 3) i=1 It can be derived that

n − 2 ∗ a3 = · a , pn(n − 1) 3 (n − 2)(n − 3) 6 a = · a∗ − . 4 n2 − 1 4 n + 1

Jiˇr´ıNeubauer Numerical Descriptive Measures