<<

Paper No: 14 Statistical Applications in Environmental Sciences Module: 9 Measures of Dispersion II with and

Development Team

Prof. R.K. Kohli Principal Investigator & Prof. V.K. Garg &Prof.AshokDhawan Co- Principal Investigator Central University of Punjab, Bathinda

Dr. Harmanpreet Singh Kapoor, Paper Coordinator Central University of Punjab, Bathinda Dr. Harmanpreet Singh Kapoor Content Writer Central University of Punjab, Bathinda Content Reviewer Prof. Kanchan Jain, Panjab University, Chandigarh

1 Central University of Punjab Anchor Institute

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

Description of Module

Subject Name Environmental Sciences

Paper Name Statistical Applications in Environmental Sciences

Module Name/Title Measures of Dispersion II with Skewness and Kurtosis

Module Id EVS/SAES-XIV/9

Pre-requisites Module 1-8

Objectives To give an introduction to relative measures of dispersion, skewness and kurtosis

Keywords Skewness, symmetric, kurtosis, coefficient of variation

2

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

Module 8: Measures of Dispersion –II with Skewness and Kurtosis

. Learning Objectives . Introduction . Skewness . Kurtosis . Summary . Suggested Readings 1. Learning Objectives This module is a continuation of the module “Measures of Dispersion- I”. In this module, introduction to relative measures of dispersion and different measure of dispersion are discussed with examples. Properties of different measures are also discussed with merits and demerits. There are different relative measures like coefficient of , coefficient of deviation, coefficient of variation etc. Through this module, one can easily understand about which method to use under what type of conditions. In this module, another important measure like Skewness and Kurtosis are also discussed. Moments are also introduced here for the derivation of these measures. Questions with detailed solutions are included to give an in-depth knowledge of the topic. 2. Introduction As measures of dispersion are basically used to discuss about the variation, scatterness of the observations from the measure. We already covered the one category of measures of dispersion. In this module, second branch of measures of dispersion that is relative measures of dispersion will be discussed. Relative measure of dispersion is the ratio of absolute dispersion with its appropriate average i.e. to find out the relative measure of dispersion from an absolute measure then the quantity that is used in the denominator must be of same units that of absolute measure. Mostly it is considered as the average. The basic purpose of using these measures over absolute measures that we can compare different dataset which is not possible with absolute measure due to its dependency on the units. Hence the relative measures of dispersion have a great importance in due to its property of independence of units. Also in the modules “Central Tendency Measures I”, “Central Tendency Measures –II” and “Measures of Dispersion –I”, we discussed about different central tendency measures and measures of dispersions. The measures of central tendency and measures of dispersion both together discuss about the characteristics of the dataset but they are not able to demonstrate that to what extent the observations deviate from the central value i.e. whether equal number of observations are dispersed from the central tendency or whether the is symmetrical about the mean or not. Therefore it does not answers how many observations have their value below the mean value or above the mean value. If one is interested to know about the concentration of the observations around a central tendency measure then it is

3

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

essential to study two more measures. These are Skewness and Kurtosis. These two measure are considered as a supportive measures for better understanding the characteristics of the data. Skewness is basically used to tell about the shape of the data i.e whether data is symmetric or skew symmetric. Skewness value help in determining the concentration of the observations below and above the average value. If the observations are concentrated in the centre then it is called symmetric. If the observations lie on either side of concentration of observations then there are two possibilities either more than average value or less than average value. Hence there are two types of skewness for asymmetric data i.e. positive or negative depending on the concentration of the observations. We will discuss about it later. Another important measure is kurtosis which refer to the peakedness, flaterness of the curve that can be drawn from the dataset. It is basically used to study the concentration of the observations at the central part is whether more or less. If the concentration of observations at central part is very high then the curve is leptokurtic. On the other hand, if the concentration of the observations at the centre is less than the curve is platykurtic. Hence central tendency measures, measures of dispersion, skewness and kurtosis represent a complete package to understand the data in depth. In other word, it completely describe the distribution of the data. Relative Measures (a) Coefficient of Range: This measure of dispersion is evaluated from the range of the data set. First range of the data set is calculated then H − L Coefficient of Range = H + L where H is the highest value in the dataset L is the lowest value in the dataset. (b) Coefficient of quartile deviation: This measure is defined as the ratio of quartile deviations to its average value. It is defined as Q − 푄 Coefficient of Quartile Deviation = 3 1 푄3 + 푄1

where 푄1 is the first quartile;

푄3 is the third quartile. Ques 1. Find out the quartile deviation from the following data set related to the marks obtained by 15 students in statistics. 60, 67, 56, 78, 92, 55, 72, 54, 49, 59, 37, 84, 83, 69, 62 Ans: Arrange the observations in the ascending order 37, 49, 54, 55, 56, 59, 60, 62, 67,69, 72,78, 83, 84, 92 4

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

Compute the range that is 92-37 = 55 Coefficient of Range is = 55/ 129 = 42.63%. 푁+1 Now compute the first quartile th term i.e. 55 4 Similarly third quartile as 12th term i.e. 78. Quartile deviation is (78 – 53)/2 = 25/2 = 12.5. Coefficient of quartile deviation = 25/131=19.08% Hence coefficient of range and coefficient of quartile deviation is 42.63% and 19.08% respectively. (c) Coefficient of Mean Deviation: This relative measure of dispersion is derived from the mean deviation. Mean deviation is computed as the absolute value of difference of observations from a central tendency i.e. mean, and . Mostly, mean deviation is calculated by taking deviation of observations from mean and median. So in order to convert the mean deviation measure into independent of unit coefficient of mean deviation is computed. Mean deviation Coefficient of Mean Deviation = Central tendency measure Central tendency measure can be mean or median but it is divided by the measure that is used for the derivation of mean deviation or from which the mean deviation is derived. Ques 2. Calculate the coefficient of mean deviation about mean for the following dataset.

Observations 38 7 43 9 46 10 49 6 51 4 54 8 67 3 78 5 Table 1 ∑푓 푥 Ans. First compute A.M. of the data using the formula 푖 푖, where 푁 = ∑푓 . The values are shown in 푁 푖 the Table 2. Column 4 shows the difference of each observation from its mean value. Column 5 is the modulus of values derived from column 4. Now multiply this absolute value with the corresponding frequency. Compute the A.M of these absolute observation. Mean deviation of the observations is evaluated as 8.2278.

5

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

To compute coefficient of mean deviation divide the mean deviation value with the mean that is 8.2278 Coefficient of Mean Deviation = = 16.24%. 50.65 Hence the mean deviation and coefficient of mean deviation is evaluated as 8.22 and 16.24% respectively.

absolute Observations frequency deviation deviation about from mean mean 풇풊 풙풊 풙풊 − 풙̅ |풙풊 − 풙̅| 풇풊* |풙풊 − 풙̅| 38 7 266 -12.653846 12.65384615 88.57692308 43 9 387 -7.6538462 7.653846154 68.88461538 46 10 460 -4.6538462 4.653846154 46.53846154 49 6 294 -1.6538462 1.653846154 9.923076923 51 4 204 0.34615385 0.346153846 1.384615385 54 8 432 3.34615385 3.346153846 26.76923077 67 3 201 16.3461538 16.34615385 49.03846154 78 5 390 27.3461538 27.34615385 136.7307692

N = 52 ∑푓푖푥푖 = 2634 427.8461538

∑풇 |풙 −풙̅| 풊 풊 = ∑푓 푥 푵 푖 푖 = 50.65384615 mean 푁 8.227810651 Table 2 (d) Coefficient of As standard deviation is evaluated in terms of the observations units and is considered as a absolute measure of dispersion. It is essential for the comparison purpose that the measure must be independent of units. The relative measure based on standard deviation that is independent of units is called coefficient of standard deviation. It is defined as 휎 Coefficient of Standard Deviation = . 푥̅ As coefficient of standard deviation would be given in fraction. So if we want to express our coefficient value in term of percentage by multiplying the coefficient by 100. Then this relative measure is called coefficient of variation (C.V.). It is defined as 휎 Coefficient of variation = X 100. 푥̅ The coefficient of variation is among the most popular relative measure of dispersion. It is basically used to compare the variability among two or more dataset. The dataset that has more value of coefficient of variation among two is said to be more variable and vice versa.

6

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

Ques 3. Calculate the coefficient of standard deviation and coefficient of variation for the following dataset.

Observations frequency 38 7 43 9 46 10 49 6 51 4 54 8 67 3 78 5 Table 3 Ans. Calculate the mean of the observations as shown in Table 3. Subtract the mean from the observations as shown in 4th column. Now take square of these observations as shown in the 5th column in Table 4. Column 6 shows the product of frequencies with values from column 5. Now take sum of the observations in column 5 and divide it by 푁 i.e. total frequency. Take square root of the value 125.22 and the standard deviation value is 11.190.

Observations Frequency ퟐ ퟐ (풙풊) (풇풊 ) 풇풊 풙풊 풙풊 − 풙̅ (풙풊 − 풙̅ ) 풇풊 ∗ (풙풊 − 풙̅ ) 38 7 266 -12.65384 160.1198225 1120.838757 43 9 387 -7.653846 58.58136095 527.2322485 46 10 460 -4.653846 21.65828402 216.5828402 49 6 294 -1.653846 2.735207101 16.4112426 51 4 204 0.346153 0.119822485 0.479289941 54 8 432 3.346153 11.19674556 89.5739645 67 3 201 16.34615 267.1967456 801.5902367 78 5 390 27.34615 747.8121302 3739.060651 ퟐ ∑풇풊 ∗ (풙풊 − 풙̅ ) = N = 52 ∑푓푖푥푖 = 2634 6511.769231 풇 (풙 −풙̅ )ퟐ ∑ 풊 풊 = = ∑푓 푥 푵 푖 푖 = 50.653 Mean 푁 125.2263314 풇 (풙 −풙̅ )ퟐ √∑ 풊 풊 = 푵 11.19045716 Table 4 Now for the coefficient of standard deviation, the formula is

7

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

11.19045716 Coefficient of standard deviation = = 0.22091. 50.653 For the coefficient of variation, the formula is

11.19045716 Coefficient of variation = X 100 = 22.091. 50.653 Hence, we discussed about different relative measures of dispersion. As these measures are derived from the absolute measures to make them independent of units. So one should know about absolute measures of dispersion in depth and the properties before applying these measures on the dataset. Now, in the next session, we will discuss about the skewness and kurtosis. 3. Skewness A skewness is basically to see tendency of the shape of the distribution. If the of the data is not equally distributed about the mean i.e. the frequency distribution is not symmetric then the term that is used to refer this situation is called skewness. Skewness has many synonyms like asymmetry and lack of symmetrical. Some authors give definitions of skewness as: “When a series is not symmetrical, it is said to be asymmetrical or skewed” by Croxton and Cowden. “Measure of skewness tell us the direction and the extent of skewness. In symmetrical distribution, the mean, median and mode are identical. The more the mean moves away from the mode, the larger the asymmetry or skewness” by Simpson and Kafka. Hence skewness that the data is not symmetrical about the mean. It is also be defined in term of . Normal distribution is the distribution which has mean, median and mode all are equal. Hence the shape of the frequency of this distribution is like bell shape.

8

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

Figure 1 Link for the image https://www.kullabs.com/classes/subjects/units/lessons/notes/note-detail/9958 Hence from the Figure 1, one can observe the shape of the frequency distribution. A frequency distribution is said to be positive skewed when the mean (μ) > Median > Mode. In this case, the value of mean is more than the value of median and mode. Also median value is more than the value of mode. A frequency distribution is said to be symmetric distribution when the mean (μ) = Median = Mode. In this case, the values of mean, median and mode all are same. A frequency distribution is said to be negative skewed distribution when the mean (μ) < Median < Mode. In this case, the value of mode is more than the median and mean. Also the median is more than the mean. Difference between skewness and measures of dispersion As we discuss above different measures of dispersion, as dispersion measures are basically used to know about the variation in the dataset while skewness is concerned with the concentration of the observations around the central part of the data. There are some important differences between measure of dispersion and skewness. These are: (i) Skewness is basically concerned about the shape of the frequency distribution while measures of dispersion are more concerned about the amount of variations. (ii) Skewness shows the nature of data about its central value while dispersion try to measure up to what extent the central tendency value represent the whole data set. (iii) It is possible that the data that is more dispersed but has symmetric frequency distribution. Hence in that case one can say that symmetric does not mean that variation is less.

9

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

(iv) Measures of dispersion are based on first and second order moments while skewness is based on first, second and third order moments. This is the reason that both skewness and measures of dispersion are studied together in literature. As both measures help in understanding the features of the frequency distribution in depth. There are many methods available in the literature to find out the skewness. Some of them are discussed here. Measures of skewness are used to detect whether the frequency distribution is symmetric or skew. As the values of these measures depend on the units of the observations. So there are two categories of measures of skewness. These are: (a) Absolute measure of skewness (b) Relative measure of skewness

Absolute measures of skewness These measures of skewness are basically used to check the asymmetry of the data. Hence these measures assume that the data is not symmetric otherwise the values of these measures will be zero. Some of the measures of skewness are given below : (a) Mean(μ) − Mode (Md) (b) Mean (μ) – Median (M) (c) 푄3 + 푄1 − 2푀 Hence by using the above three measures one can check the skewness of the distribution. As we already discussed that in skewed distribution either the mean value is greater than median or mode or mode is greater than median and mean. Hence, these measure just give you an indication about the presence of skewness in the data expect when the value of these measure is zero in that case data is symmetric. However, these measures of skewness have limited utilization in practice due to these reasons. These are: (i) The first and the most important thing is that these measures are based on the units of the observations. Hence the values that are derived from these measures cannot be used for comparison purposes. (ii) If absolute measure of skewness values of two data sets are same. It does not mean that data set are always same as it may be possible that there may be variation between distributions in terms of mean and dispersion. Now to overcome these limitations, another measure that is independent from units is used is called relative measure of skewness or coefficient of skewness.

10

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

Relative Measure of Skewness In these measure, the limitations of absolute measures have been removed by dividing the absolute measure by the suitable measure or quantity. The following are some coefficient of skewness which are commonly used. (a) Karl Peason Coefficient of Skewness The formula for Karl Pearson coefficient of skewness is given as Mean − Mode 3(Mean − Median) Coefficient of Skewness = = . standard deviation standard deviation (b) Bowley Coefficient of Skewness The formula for Bowley coefficient of skewness is given as 푄 + 푄 − 2푀푒푑푖푎푛 Coefficient of Skewness = 3 1 Q3 − 푄1

where 푄3, 푄1 are the third and first quartile of the distribution. (c) Kelly’s Coefficient of Skewness The formula for Kelly Coefficient of Skewness is given as 퐷 + 퐷 − 2푀푒푑푖푎푛 Coefficient of Skewness = 9 1 D9 − 퐷1 th st where 퐷9, 퐷1 are the 9 and 1 deciles of the distribution. (d) Measures of Skewness Based on Moments: This measure of skewness is the most widely used method. Here the measure used second and third order moments about mean. This measure is defined as 휇 Coefficient of Skewness = 3 휎3 where 휇3 and 휎 are the third order and standard deviation of the distribution.

Ques 4 Calculate the absolute measure of skewness and coefficient of skewness (a) Karl Pearson (b) Bowley (c) Kelly (d) Based on moments of the marks of 15 students in statistics given below: 54, 63, 78, 59, 69, 74, 85, 46, 63, 51, 58, 73, 86, 88, 93 Ans Arrange the series in ascending order

11

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

46, 51, 54, 58, 59, 63, 63, 69, 73, 74, 78, 85, 86, 88, 93 Now compute , median and mode from the series. A.M. is 69.33, Median is 69, Mode is 63

Now 푄1 is 58 ,푄3 is 85 and 퐷1 is 46+0.6 (51-46) = 49 and 퐷9 is 88 + 0.4 (93-88) = 90 Absolute measures values are Mean(μ) − Mode (Md) = 69.33 − 63 = 6.33 Mean (μ)– Median (M) = 69.33 − 69 = 0.33

푄3 + 푄1 − 2푀 = 85 + 58 − 2 ∗ 69 = 5 Hence all absolute measures show that the data is positive skewed. Now calculate the relative measures of skewness. We need standard deviation of the data.

Let 푥1, 푥2, … , 푥푛 be the observations and n is the number of observations. S.D. is evaluated by using the formula

∑(푥 −푥̅)2 푆. 퐷. (휎) = √ 푖 = 14.079 푛 Karl Pearson coefficient of skewness is 69.33 − 63 Coefficient of Skewness = = 0.4496 14.079 Bowlley Coefficient of skewness is 푄 +푄 −2푀푒푑푖푎푛 85+58−2∗69 Coefficient of Skewness = 3 1 = = 0.1851 Q3−푄1 85−58 Kelly Coefficient of Skewness 퐷 + 퐷 − 2푀푒푑푖푎푛 90 + 49 − 2 ∗ 69 Coefficient of Skewness = 9 1 = = 0.0243 D9 − 퐷1 90 − 49 Coefficient of skewness based on moments ∑(푥 − 푥̅)3 휇 = 푖 = 233.2740 3 푛 휇 233.2740 Coefficient of Skewness = 3 = = 0.0835 휎3 14.0793 Hence from all the absolute and relative measure of skewness, we conclude that data is positively skewed. Although we also notice that different measures have different values for same dataset that is considered as the limitation of the measures of skewness.

12

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

4. Kurtosis Kurtosis word comes from the Greek language with a meaning curved arching. Kurtosis is basically used to measure the peakedness of the frequency distribution. It is possible that two data set have same arithmetic mean, standard deviation and coefficient of skewnss but still one has different concentration of values near the mode value. So the distribution can have more peakedness than the usual normal distribution, less peakedness than the usual normal curve and equal to the normal distribution curve. So basically kurtosis is a measure that compare the peakedness of the curve relative to the peakedness of a normal curve. So kurtosis is basically used to measure the extent how the distribution is more peaked or less peaked than the normal distribution curve. Many authors give the definitions of the kurtosis as “A measure of kurtosis indicated the degree to which a curve of a frequency distribution is peaked or flat topped” by Croxton and Cowden “Kurtosis is the degree of peakedness of a distribution, usually taken relative to a normal distribution” by Spiegel.

Figure 2 Courtesy for image is whatilearned.wikia.com So if the distribution curve (blue curve) is more peaked than the normal distribution as shown with blue curve in Figure 2, then the distribution is called Leptokurtic. If the distribution curve (red curve) is more flat than the normal distribution curve then the distribution is called Platykurtic. Hence, the black curve represent the normal curve is also known as Mesokurtic.

13

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

Measure of Kurtosis Kurtosis is defined as 휇 훽 = 4 2 휎4 th where, 휇4 is the 4 order moment about mean; 휎 is the standard deviation.

If 훽2 > 3 then the distribution is Leptokurtic;

If 훽2 = 3 then the distribution is Mesokurtic;

If 훽2 < 3 then the distribution is Platykurtic.

Another measure that is used in literature is 훾2 = 훽2 − 3. Hence in this case

If 훾2 > 0 then the distribution is Leptokurtic;

If 훾2 = 0 then the distribution is Mesokurtic;

If 훾2 < 0 then the distribution is Platykurtic. Ques 5. Calculate the Kurtosis of the marks of 15 students in statistics given below: 54, 63, 78, 59, 69, 74, 85, 46, 63, 51, 58, 73, 86, 88, 93. Ans S.D. of the data is evaluated as 14.079. ∑(푥 − 푥̅)4 휇 = 푖 = 71641.5407 4 푛 Kurtosis is 휇 71641.5407 훽 = 4 = = 1.8233 < 3. 2 휎4 14.0794

Also 훾2 = 훽2 − 3 = 1.8233 − 3 = −1.177 < 0. Hence the distribution is Platykurtic.

14

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis

5. Summary In this module, first we introduced the relative measures of dispersion. There are different relative measures like coefficient of range, coefficient of mean deviation, coefficient of variation etc. Properties of different measures are also discussed with merits and demerits. Difference between dispersion and skewness are also discussed. Through this module, based on the merits and demerits one can easily understand about which method to use under what type of conditions. In this module, another important measure like Skewness and Kurtosis are discussed.

6. Suggested Readings Agresti, A. and B. Finlay, Statistical Methods for the Social Science, 3rd Edition, Prentice Hall, 1997.

Daniel, W. W. and C. L. Cross, C. L., : A Foundation for Analysis in the Health Sciences, 10th Edition, John Wiley & Sons, 2013.

Hogg, R. V., J. Mckean and A. Craig, Introduction to , Macmillan Pub. Co. Inc., 1978.

Meyer, P. L., Introductory Probability and Statistical Applications, Oxford & IBH Pub, 1975.

Triola, M. F., Elementary Statistics, 13th Edition, Pearson, 2017.

Weiss, N. A., Introductory Statistics, 10th Edition, Pearson, 2017.

15

Statistical Applications in Environmental Sciences Environmental Sciences Measures of Dispersion II with Skewness and Kurtosis