<<

Unit-III Measures of Dispersion &

CONTENTS Measures of Dispersion 2.1 2.2 Quartile Deviation (Q.D) 2.3 2.3.1 Combined Standard Deviation: 2.4 Deviation OR Average Deviation 2.4.1 Difference between the Mean Deviation and standard Deviation 2.5 Coefficient of Variation 2.6 Variance 2.7 Skewness 2.7.1 Karl- Pearson’s Coefficient of Skewness 2.7.2 Bowley’s Coefficient of Skewness

Measures of Dispersion Definition: It is hardly fully representative of a mass, unless we know the manner in which the individual items scatter around it. A further description of the series is necessary if we are to gauge how representative the average is. .

Importance or Significance of Measures of Dispersion:

The reliability of a measure of is known. 1. Measures of Dispersion provide a basis for the control of variability. 2. They help to compare two or more sets of data with regard to their variability. 3. They enhance the utility and scope of statistical techniques.

The Usual Measures of Dispersion: The usual measures of dispersion, very often suggested by the statisticians, are exhibited with the aid of the following chart:

Difference between the Absolute measure and Relative Measure:

Absolute measure Relative measure Range Co-efficient of Range Quartile deviation Co-efficient of Quartile Deviation Mean Deviation (about Mean) Co-efficient of Mean Deviation (about Mean) Deviation (about Median) Co-efficient of Median Deviation (about Mode Deviation (about Mode Median) Co-efficient of Mode Deviation (about Mode) Standard deviation and Variance Co-efficient of Variation

Absolute Measures of Dispersion: Range

Definition: Range is the difference between the greatest (largest) and the smallest of the values. In Symbols, Range = L-S. L- Largest Value S- Smallest Value In individual observations and discrete series, L and S are easily identified. In Continuous series, the following two method are followed. Method-I L- Upper boundary of the highest class S - Lower boundary of the lowest class Method-2 L - Mid value of the highest class S - Mid value of the lowest class 푳−푺 Co-efficient of Range = 푳+푺 Uses of Range:

1. Range is used in finding the control limits of mean chart and Range chart in S.Q.C. 2. While Quoting the prices of shares, bonds, gold, etc. on daily basis or yearly basis, the minimum and the maximum prices are mentioned. 3. The minimum and the maximum temperature likely to prevail on each day are forecasted. Merits: 1. It is simple to understand and easy to calculate. 2. It can be calculated in no time. Demerits: 1. Its definition does not seem to suit continuous series. 2. It is based on the two extreme items. It does not consider the other items. 3. It is usually affected by the extreme items. 4. It cannot be manipulated algebraically. The Range of combined set cannot be found from the range of the individual sets. 5. It does not have sampling stability. 6. It cannot be calculated from open-end class intervals. It is a very rarely used measure. Its Scope is limited. Quartile Deviation (Q.D)

Definition : Quartile Deviation is half of the difference between the first and the third quartiles. Hence it is called Semi Inter Quartile Range.

Q − Q In Symbols, Q.D = 3 1 . 2

Q − Q Coefficient of Quartile Deviation = 3 1 Q + Q 3 1 Merits: 1. It is simple to understand and easy to calculate. 2. It is not affected by extreme items. 3. It can be calculated for data with open and classes also. Demerits:

1. It is not based on all items. It is based on two positional values Q1 and Q3 and ignores the extreme 50% of the items. 2. It cannot be manipulated algebraically. 3. It is affected by sampling fluctuations. 4. Like range, it does not measure the deviation about any measure of central tendency. Mean Deviation or Average Deviation:

Definition: Mean deviation is the arithmetic mean of the absolute deviations of the values about their arithmetic mean or median or mode. M.D. is the abbreviation for Mean Deviation. There are three kinds of mean deviations, Viz., 1. mean deviation or mean deviation about mean 2. mean deviation about median 3. mean deviation about mode. Mean deviation about median is the least. it could be easily verified in individual observations and discrete series where the actual values are considered. The relative measures are the following:

Coefficient of Mean deviation (about Mean) 푀푒푎푛 퐷푒푣𝑖푎푡𝑖표푛 푎푏표푢푡 푀푒푎푛 = 푀푒푎푛 Coefficient of Mean deviation about Median

푀푒푎푛 퐷푒푣𝑖푎푡𝑖표푛 푎푏표푢푡 푀푒푑𝑖푎푛 = 푀푒푑𝑖푎푛 Coefficient of Mean deviation about Mode

푀푒푎푛 퐷푒푣𝑖푎푡𝑖표푛 푎푏표푢푡 푀표푑푒 = 푀표푑푒 Individual Observations:  X − X Mean Deviation (about Mean) = N X − M Mean Deviation about Median =  N X − Z Mean Deviation about Mode =  N

Discrete Series:  f X − X Mean Deviation (about Mean) = N f X − M Mean Deviation about Median =  N f X − Z Mean Deviation about Mode =  N Continuous Series:  f m − X Mean Deviation (about Mean) = N f m − M Mean Deviation about Median =  N f m − Z Mean Deviation about Mode =  N 푀.퐷.푎푏표푢푡 푀푒푎푛 Coefficient of M.D about Mean = 푀푒푎푛 푀.퐷.푎푏표푢푡 푀푒푑𝑖푎푛 Coefficient of M.D about Median = 푀푒푑𝑖푎푛 푀.퐷.푎푏표푢푡 푀표푑푒 Coefficient of M.D about Mode = 푀표푑푒 Uses of Mean Deviation: Mean deviation provides an opportunity to calculate deviation, absolute deviation, total deviation and average of the deviations. Standard deviation is the most important absolute measure of dispersion. knowledge of the principle of mean deviation facilities understanding the concept of standard deviation. Standard deviation is a part of almost all the theories of , Viz., skewness, kurtosis, correlation, regression, sampling, estimation, inference, S.Q.C., etc. Mean deviation is preferred when a particular discussion is not carried to other spheres. It is found to be much useful in forecasting business cycles and a few other statistical activities connected with business, economic and sociology. Merits: 1. Mean deviations are rigidly defined. 2. They are based on all the items. 3. They are affected less by extreme items than standard deviation. Among the three mean deviations, mean deviation about median is the least. 4. They are simple to understand and not difficult to calculate. 5. They do not vary much from sample to sample. 6. They provide choice. among the three mean deviations, the one that is suitable to a particular situation can be used. 7. Formation of different distributions can be compared on the basis of a mean deviation. Demerits: 1. Omission of negative sign of deviations makes them non-algebraic. It is pointed out as a great drawback. 2. They could not be manipulated. Combined mean deviation could not be found. 3. it is not widely used in business or economics. Standard Deviation:

Definition: Standard Deviation is the root mean square deviation of the values from their arithmetic mean. S.D denoted by σ (read , sigma). Variance is denoted by σ2. S.D. is the positive square root of variance. Karl Pearson introduced the concept of standard deviation in 1893. S.D is also called root mean square deviation. The corresponding relative measure is Coefficient of Variation. 푆푡푎푛푑푎푟푑 퐷푒푣𝑖푎푡𝑖표푛 Coefficient of variation = × 100 퐴푟𝑖푡ℎ푚푒푡𝑖푐 푀푒푎푛 Individual Observation: Method-1

x 2 Standard deviation σ =  N Method-2

2 X 2  X   −    Standard deviation σ =   N  N  Discrete Series: Method-1

 fx2 Standard deviation σ = where x=X- X and N = ∑f N Method – 2

2 fX 2  fX  Standard deviation σ =  −      N  N  Continuous Series: Method – 1

2 f (m − X ) Standard deviation σ =  N Method – 2

2 fm2  fm  −    Standard deviation σ =   N  N  Combined Standard Deviation: When two or three groups merge, the mean and standard deviation of the combined group are calculated as follows. Case 1. Merger of Two Groups

Size Mean S.D.

Group I N1 X 1 σ1

Group II N2 X 2 σ2 That is,

N1 – Number of items in the first group.

N2 – Number of items in the second group.

X 1 - Mean of items in the first group

X 2 - Mean of items in the second group.

σ1 – Standard Deviation of items in the first group.

σ2 - Standard Deviation of items in the second group. The Mean of the combined group,

N1 X 1 + N 2 X 2 X 12 = N1 + N 2 The Standard Deviation of the combined group,

2 2 2 2 N1 1 + N 2 2 + N1d1 + N 2 d 2  12 = N1 + N 2

where d1= X 1 − X 12 and d2= X 2 − X 12 Case 2. Merger of Three Groups Size Mean S.D.

Group I N1 X 1 σ1

Group II N2 X 2 σ2 Group III N3 X 3 σ3 That is

N1 – Number of items in the first group.

N2 – Number of items in the second group.

X 1 - Mean of items in the first group

X 2 - Mean of items in the second group.

σ1 – Standard Deviation of items in the first group.

σ2 - Standard Deviation of items in the second group. The Mean of the combined group,

N X 1 + N X 2 + N X 3 X 123 = 1 2 3 N1 + N 2 + N 3 The Standard Deviation of the combined group,

2 2 2 2 2 2 N1 1 + N 2 2 + N 3 3 + N1d1 + N 2 d 2 + N 3 d 3  12 = N1 + N 2 + N 3 where d1= X 1 − X 123 , d2= X 2 − X 123 and d3= X 3 − X 123 Difference between the Mean Deviation and standard Deviation: Mean Deviations Standard Deviations

Deviations are calculated from Mean or Median Deviations are always calculated from or Mode. Mean.

While finding the deviations, negative sign is Deviations are squared. Finally square root omitted. is taken.

M.D.’s are simple to understand and not difficult It is not simple to under standard and it is to calculate. not easy to calculate.

Omission of negative sign is considered to be It has many desirable mathematically deficient mathematically. properties.

Uses Standard deviation is the best absolute measure of dispersion. It is a part of many statistical concepts such as Skewness, Kurtosis, Correlation, Regression, Estimation, sampling, tests of Significance and Statistical Quality Control. Not only in statistics but also in Biology, education, Psychology and other disciplines standard deviation is of immense use. Merits: 1. Standard deviation is rigidly defined. 2. It is calculated on the basis of the magnitudes of all the items. 3. It could be manipulated further. The combined S.D. can be calculated. 4. Mistakes in its calculation can be corrected. The entire calculation need not be redone. 5. Coefficient of variation is based on S.D.. It is the best and most widely used relative measure of dispersion. 6. It is free from sampling fluctuations. This property of sampling stability has brought it an indispensable place in tests of significance. 7. It reduces the complexity in the approach of normal distribution by providing standard normal variable. 8. It is the most important absolute measure of dispersion. It is used in all the areas of statistics. It is widely used in other disciplines such as Psychology, Education and Biology as well. 9. Scientific calculators show the standard deviation of any series. 10. Different forms of the formula are available. Demerits: 1. Compared with other absolute measures of dispersion, it is difficult to calculate. 2. It is not simple to understand. 3. It gives more weight age to the items away from the mean than those near the mean as the deviations are squared. Coefficient of Variation

Definition: Coefficient of variation is the most widely used relative measure of dispersion. It is based on the best absolute measure of dispersion and the best measure of central tendency. It is a percentage. While comparing two or more groups, the group which has less coefficient of variation is less variable or more consistent or more stable or more uniform or more homogeneous. Coefficient of Variation is denoted by the C.V.

푆푡푎푛푑푎푟푑 퐷푒푣𝑖푎푡𝑖표푛 Coefficient of Variation = × 100 퐴푟𝑖푡ℎ푚푒푡𝑖푐 푀푒푎푛 푆.퐷.  C.V. = × 100 OR C.V. = 100 퐴.푀. X Variance Variance is the mean square deviation of the values from their arithmetic mean. It is denoted by σ2. Standard deviation is the positive square root of variance and is denoted by σ. The term of variance was introduced by R.A. fisher in the year 1913. It is used much in sampling, analysis of variance, etc., In analysis of variance, total variation is split into a few components. Each component is ascribable to one factor of variation. The significance of the variation is then tested. Individual Observation:

2 (X − X ) Variance, σ2 =  N Discrete Series:

2 fX 2  fX  Variance, σ2 =  −      N  N  Continuous Series:

2  d'2  fd'2   Variance, σ2 = C 2  −      N  N       Combined Variance: Based on the notations used in combined mean and combined variance, Combined variance of two groups,

2 2 2 2 2 N1 1 + N 2 2 + +N1d1 + N 2d 2  12 = N1 + N 2

Combined variance of three groups,

2 2 2 2 2 2 2 N1 1 + N 2 2 + N3 3 + N1d1 + N 2d 2 + N3d3  123 = N1 + N 2 + N3 Skewness:

Definition: Skewness is the degree of asymmetry, or departure from symmetry, of a distribution. There are two types of Skewness: Positive and Negative

Positive Skewness when the tail on the right side of the distribution is longer or fatter. The mean and median will be greater than the mode.

Negative Skewness is when the tail of the left side of the distribution is longer or fatter than the tail on the right side. The mean and median will be less than the mode. Karl – Pearson (1867- 1936) was great British Biometric and Statistician. He introduced the formula given below. 1. Karl- Pearson’s Coefficient of Skewness,

Mean − Mode X − Z SK = OR Sk = p S tan dardDeviation P  Theoretically, no limit can be found for this measure. This is found mostly to vary between -1 and +1. Based on the interrelation between mean, median and mode in a moderately skewed distribution, his second formula: Karl- Pearson’s Coefficient of Skewness,

3(Mean − Mode) 3(X − M ) Sk = OR Sk = P S tan dardDeviation P  2. The following formula is given by Prof. Bowley, Bowley’s Coefficient of Skewness,

Q3 + Q1 − 2M Sk B = Q3 − Q1

This is quartile measure of skewness and the value of this between -1 and +1. This method is useful where there is an open-end class interval or extreme values are present.

MEASURES OF DISPERSION 271

METHODS OF STUDYING VARIATION

The following are the important methods of studying variation: 1. The Range. The 2. and the Quartile Deviatiorn. 3. The Mean Deviation or Average Deviation. 4. The Standard Deviation, and 5. The Lorenz Curve. Of these the first two, namely, the range and quartile deviations. are positional measures because they depend on the values at a particular posi- tion in the distribution. The other two, the average deviation and the standard deviation, are called calculation measures of deviation because all of the values are employed in their calculation and the last one is a graphic method.

Absolute and Relative Measures of Variation Measures of dispersion may be either absolute or relative. Absolute measures of dispersion are expressed in the same statistical unit in which the original data are given such as rupees, kilograms, tonnes. etc. These values may be used to the variations in two distributions provided the variables are Compare size. In case the twvo expressed in the same units and of the same average however, such as quintals of sets of data are expressed in different units. or if the size is very different versus tonnes of sugarcane, average sugar the absolute measures Such aS manager's salary versus workers salary, cases measures of relative of dispersion are not comparable. In such dispersion should be used. is the ratio of a measure of absolute A measure of relative dispersion It is sometimes called a coefficient of to an appropriate average. dispersion a number that is independent of because "coefficient" means pure dispersion, that while the It should be remembered computing the unit of measurement. from used as base should be the same one relative dispersion the average measured. This means that the arithmetic deviations were which the absolute and either the arithmetic the standard deviation, mean should be used with the mean deviatlon. nean or median with

1. Range

lt is detined as the method of sludying dispersion. is the simplest ilem the value of the ange oI the smallest and between the value ifference distribution. Syinbolically, in the largest item included Range L- S

L = Largest ite, and where S Sallest item.

to caled the coetlicien ot range corresponing rallge, The relative measure forula the following ODtained by applying -S Coefficient of Range LS coparison of ale boul the saune, a two distriDutols has less the averages of the With the smaller range that the distribuo11 the group. indicates distridutton is nore typical of range of thal spersion, and the average 272 STATISTCAL METHODS

llustration 1. The following are the prices of shares of AB Co. Ltd. from Mondav ay to Satur lay Day Price (Rs.) Day Price (Rs.) Monday 200 Thursday 160 Tuesday 210 Friday 220 Wednesday 208 Saturday 250 Calculate range and its coefficient. Solution. Range L- S Here L= 250 and S = 160 Range 250 160 = Rs. 90 160 Coefficient of Range = S 250- L+ S 250+ 160 90-0.22.410 Continuous Series. There are two methods of determining the range from data grouped into a frequency distribution. The first method is to find the difference between the upper limit of the highest wage class and the lower limit of the lowest wage class. The other method is to subtract the midpoint of the lowest wage class from the mid-point of the highest wage class. In practice, both the methods are used. Illustration 2. Calculate Coefficient of Range from the following data

Marks No. of students Marks No. of students 10-20 8 40-50 8 20-30 10 50-60 4 30-40 12

Solution. Coefficient of Range 10700.71450

Merits and Limitations The merits and limitations of Range can be enumerated here. Merits. . Amongst all the methods of studying dispersion range is the simplest to understand and the easiest to compute. . It takes minimum time to calculate the value of range. Hence. it on is interested in getting a quick rather than a very accurate picture o variability one may compute range. Limitations. Range is not based on each and every item of the distribution. It is subject to fluctuations of considerable magnitude from sample sample. Range cannot tell us anything about the character of the distribuo within the two extreme observations. For example. Observe the tollow ingthree series 46 46 Series A 46 6 46 46 46 46 46 46 6 10 6 6 46 46 Series B 46 40 Series C 6 6 15 25 30 32

= But it is the same, i.e., (46 6) 40. does the three series range the fol" In all takes no account of distributions are allke. The range mean that the STATISTICAL METHODS 274 interquartile range reprepresents the In. other words, interquartile range. and the first quartile. between the third quartile difference Symbolically. = Interquartile range 93- 91 reduced to the form of the Semi.: interquartile range is often the it 2. Semi-inter- Very deviation by dividing by quartile range or quartile Symbolically.

= 391 Deviation Q.D. Quartile or 2 amount by which the two quartil deviation gives the average Quartile distribution the two quartil median. In asymmetrical differ from the - = from the median, Le., Med 91 93 Med. and and 93) are equidistant (O1 measure of dispersion. The Median can be taken as a as such the difference cent observations. covers 50 per of the t9.D. exactly business and seldom finds a series in economic In reality, however, one all distributions of social series are data that is perfectly symmetrial. Nearly distribution, O1 and 93 are not equidistant asymmetrical. In an asymmetrical distribution includes only from the median. As a result an asymmetrical approximately 50 per cent of observationS. describes or small When quartile deviation is very small, it high uniformity deviation means that variation of the central 50% items and a high quartile the variation among the central items is large. The relative Quartile deviation is an absolute measure of dispersion. measure corresponding to this measure, called the coefficient of quartile deviation, is calculated as follows. (93-O1)/2 of D. = O3-91 Coefficient Q.D. O3+91 93+91)/2 ot Coefficient of quartile deviation can be used to compare the degree variation in different distributions. Computation of guartile Deviation The process of computing quar deviation is very simple, we have just to compute the values of the upre and lower quartiles. The following illustrations would clarify calculauon

Indiwidual Observations Ilustration 3. Find out the value of quartile deviation and its coefficient from the follow data: 7 Roll No 1 2 3 4 5 6 Marks 20 28 40 12 30 15 50 Solution. CALCULATION OF QUARTILE DEVIATION Marks arranged in ascending order: 12 15 20 28 30 40 50 O Size of th item Size of =2nd item 4 4 Size of 2nd item is 15. Thus Q = 15 Gs Size of 3 th item Size of th item= 6th item

Size of 6th item is 40. Thus Q = 40 Q.D. = 40-15 2 2 12.5. MEASURES OF DISPERSION 275

Q.D. : Coefficient of Qa + 40+40-15_23-0.455. 15 55 0.455.

Discrete Series ukstration 4. Compute coefficient of quartile deviation from the following data 60 Marks 10 20 30 40 50 No. of Students 4 7 15 8 2 (B.Com., Madras Untv., 1998)

Solution. CALCULATION OF COEFFICIENT OF QUARTILE DEVIATION Marks FrequencyY C.f. Marks Frequency C.f. 10 40 8 34 20 7 11 50 41 30 15 26 60 2 43 a= Size of th item =4 = 11th item. 4 Size of 11th item is 20. Thus, Q1 20 Os Size of 3 th item =x4= 33rd item

Size of 33rd item is 40. Thus, Qa= 40 QD 3-_40-20 10 2 2 Coefficient oft Q.D. - = 0.333 O3+O 40 +20 llustration 5. Calculate quartile deviation and the coefficient of quartile deviation from the following data Wages in Rupees per week less then 35 35-37 38-40 41-43 over 43 Number of wage earners 14 62 99 18 7 (B. Con., Madras Uniw., 1998) Solution. CALCULATION OF Q.D. AND ITS COEFFICIENT Wages C.f. Rs. per week) Less than 35 14 14 35-37 62 76 38-40 99 175 41-43 18 193 Over 43 200 Q.D. = -O 200 item- 50 th iten O Size ofth 4 lies in the cla_s 35-37 Q1 = L+ N/4-C.xi L 35, N/4 = 50, c.f, = 14, f= 62, /= 2 35+b0 14 x2 -35 + 1.16 36.16 Q 62 276 STATISTICAL METH ODS

Size item = 150 th item Q3 ofth =* 4 Q3 lies in the class 38-40. Q3 L+ 3N/4-C.1:xi L 38, 3N/4 150, c.f. = 76, f=99, i=2 Q= 38+ 150-76x2 =38 +1.49 39.49 99 Q.D. = 39.49 36.16= 1.67 2

Coefficient of Q.D. = B +Q

= 39.4939.49-36.163.35 +36.16 75.650.044 - 0.044

Merits and Limitations

Merits. In certain respects it is superior to range as a measure of dispersion. It has a special utility in measuring variation in case of open end distributions or one in which the data may be ranked but measured quantitatively. It is also useful in erratic or badly skewed distributions. where the other measures of dispersion would be warped by extreme values The quartile deviation is not affected by the presence of ertreme values.

Limitations. Quartile deviation ignores 50% items. Le.. the first 2 and the last 25%. As the value of quartile deviation does not depend a upon every item of the series, it cannot be regarded as a good methoc measuring dispersion. .It is not capable of mathematical manipulation. Its value is very much affected by sampling fluctuations. It is in fact not a measure of dispersion as it really does not Show the scatter around an average but rather a distance on a scai quartile deviation is not itself measured from an average. budke positional average. Consequently, some statisticians speak O e deviation as a measure of partitlon rather than a measur dispersion. If we really desire to measure variation in tnei showing the scatter round an average, we must include ne devia of each and every item from an average in the measurenet selul Because of the above limitations quartile devtation is not otet statistical inference.

Percentile Range. LAke semi-interquartile range, the percentdaof ot a set also used as a measure of disperslon. range defined as Percentle Range - Po0- P1o buse

as ey are positional measures of dispersion The Range and 9.D. f certaln items. on the positlon MEASUREs OF DISPERSION 277 where P90 and Pi0 are the 90th and 10th 1ne P respectively. semi-percentile range. Le.. Pio | 2 can also be used, but is not com- monly employed.

The Mean Deviation

The two methods of dispersion discussed above, and deviation, are measures of namely, range quarule not dispersion in the strict sense of the term because do not show the they scatterness around an to study the formation of a average. However, distribution we should take the deviations an average. two other from The measures the and the standard namely, average deviation deviation, help us in achieving this The mean deviation is also goal. known as the average deviation. It is the difference between the items in a average distribution and the median or mean of that series. there is Theoretically an advantage in taking the deviations from median because the sum of deviations of items median is minimum are from when signs ignored.However, in practice, the arithmetic mean is more used in the frequently calculating value of average deviation and this is the reason is more called why it commonly mean deviation. In any case, the average used be stated in a must clearly given problem so that any confusion in is avoided. possible meaning

Computation of Mean Deviation-Individual Observations. If X2. X1. X3. Xy are N given observations then the deviation about an average A is given by M.D. X-A|

or DI EDI N where D =X- A |. Read as mod (X-A) is the modulus value or absolute value of the deviation ignoring plus and minus signs. Steps. Compute the median of the series. The deviations of items from median 1gnoring t signs and denote these deviations by | D. Obtain the total of these deviations, ie., 2 |DI. Divide the total obtained in step (it) by the number of observations. a distribution is normal. the mean t mean deviation is the range that will Cude 57.7 per cent of the items in the serles. If it is moderately skewed. thenwe may expect approximately 57.5 per cent of the items to fall within this ge. Hence, if average deviation 1s small, he distribution is highly compact or niform, since more than half of the cases are concentrated within a small Ange around the mean.

the mean deviation is computed Irom mean then in that case | DI shall note deviations of the items from mean, 1gnoring signs: 278 STATISTICAL METHODS

The relative measure corresponding to the mean deviation. called coefficient of mean deviation. is obtained by dividing mean deviation hy particular average used in computing mean deviation. Thus, if mean viation has been computed from median. the coefficient of mean deviation shal nall obtained by dividing mean deviation by median. be M.D. Coefficient of M.D. Median If mean has been used while calculating the value of mean deviation. in such a case coefticient of mean deviation shall be obtained by dividing mean deviation by the mean. llustration 6. Calculate the mean deviation and its coefficient of the two incorne grouos of five and seven members given below (Rs.): 4,000 4,200 4,400 4,600 4,800 II (Rs.) 3,000 4,000 ,200 4,400 4,600 4,800 5,800 Solution. CALCULATION OF MEAN DEVIATION Group Group Deviation from Deviation from median 4400 | D| median 4400 | D| 4,000 400 3,000 1,400 4,200 200 4,000 400 4,400 4,200 200 600 200 4,400 0 200 ,800 400 4,600 4,800 400 5,800 1,400 E|D=4,000 5 2|D = 1200 N=7

I: M.D. Mean Deviation: Group = N D=Deviation from median ignoring signs, 5 = 3rd item Median Size of N+ th item = 2 2 Size of 3rd item is 4,400 M.D. = 1,200 240 5 from the individual incomes meua This means that the average deviation of the Rs. 240. Mean Deviation: Group 4th item Mean th item Size of 2 2 Size of 4th item is 4,400

2|D|=4,000, N= 7. 10an devialionDy 4,000 M.D. 571.43. 7 deviation we shall vi coefficient of mean Note. If we were to compute first median. Thus for the group MEASURES OF DISPERSION 279

240 Coefficient of M.D. = = 0.054 4,400 and for the second group Coefficient of M.D. = 4_0.130. 4,400 Calculation of Mean Deviation

Discrete Series In discrete series the formula for calculating mean deviation is DI. M.D. (by= I the same logic as given before) N D | denotes deviation from median ignoring signs.

Steps. Calculate the median of the seriesS. Take the deviations of. the items from median ignoring signs and denote them by | D|. .Multiply these deviations by the respective frequencies and obtain the total 2f| DI. Divide the total obtained in Step (i) by the number of observations. This gives us the value of mean deviation. lustration 7. (a) Calculate mean deviation from the following series X 10 11 12 13 14 3 12 18 12 3

Solution. CALCULATION OF MEAN DEVIATION X D c.f. 10 3 2 6 3 11 12 1 12 15 12 18 0 0 33 13 12 12 45 14 3 2 6 48 N 48 Ef| D= 36

M.D. = 2IDI N Median Size of th item =24.5th item 2 2 SIZe of 24.5th item is 12, hence Median = 12

M.D. 48 0.75. (6) Calculate the mean deviation from the mean for the following data

Size 2 4 6 10 12 14 16 Frequency 2 2 A 5 3 2 1 STATIS METH 280

DEVIATION FROM MEAN CALCULATION OF MEAN Solution. f fX X-8 fD X D 12 2 2 8 4 2 A 24 40 8 5 30 2 10 3 24 4 12 2 14 14 1 16 16 1 N 20 E fX= 160 ZfD=56 IX 160 X- N 20

M.O.D1-=28.N 20

Calculation of Mean Deviation-Continuous Series

the remains For calculating mean deviation in continuous series procedure difference is that here we have to the same as discussed above. The only of these obtain the mid-point of the various classes and take deviations points from median. The formula is same, ie., M.D.=2I| DI N lustration 8. (a) Find the median and meân deviation of the following data

Size Frequency Size Frequency 16 0-10 7 40-50 14 10-20 12 50-60 8 20-30 18 60-70

30-40 25 1998 Unu. (B. Com. Mysore Solution. CALCULATION OF MEDIAN AND MEAN DEVIATION f| D Size c.f. m.p. m-35.2 m DI 211.4 0-10 7 7 30.2 242.4 10-20 12 19 15 20.2 183.6 20-30 18 37 10.2 25 5.0 30-40 25 62 35 0.2 156.8 40-50 16 78 45 9.8 277.2 50-60 14 92 19.8 55 238.4 60-70 8 100 65 29.8 E|D =1314.8 N 100

Med. Size ofth item - 50th item MEASURES OF DISPERSION 281

30-40 lies in the class Median Med. =L+w2 xi

N/2 -50, c.f. = 37, f= 25, i= 10 L30, 20503/ 10 =30+5.2 35.2 25 M.D DI.1814.8. 13.148 N 100 the mean deviation and its coefficient from the following data: (h Calculate FrequencCy Class Frequency Class 20 0-10 5 40-50 14 10-20 8 50-60 12 20-30 12 60-70 70-80 6 30-40 15 1996 (B.Com.. Andhra Univ..

from median we will calculate mean deviation Solution. Since nothing is special CALCULATION OF MEAN DEVIATION m-43 f DI Class Frequency c.f. m.p. m DL 190 5 38 0-10 5 15 28 224 10-20 8 13 216 25 18 20-30 12 25 35 8 120 30-4 15 40 45 2 40 40-50 20 60 55 12 168 50-60 14 74 65 22 264 60-70 12 86 70-80 92 75 32 192 N= 92 Ef| D= 1414 item Med. Size ofth item =46th Median lies in the class 40-50

i = L+ 2 Med. f = f= 20, i= 10 L= 40, N/2 =46, c.f. 40, Med 40+ 46-40x 10 = 40+ 3 =43 20 M.D. D 1414 15.37 N 92 M.D 15.37 0,357. Coeff. of M.D. Median 43 Merits and Limitations

sS,Mert The outstanding advantage of the average deviation is its relative and to It is simple to understand easy conipute. Any one am"dr y.with the concept ol the averag Can readily appreciate the

stinction is often made betwer populilion stindard deviation and sannple ndard deviation. Population slandrd devialioN 1s denoted by o whereas anple standard deviallon Dy 283 MEASURES OF DISPERSION

or standard deviation measures the absolute dispersion (or variability The the greater greater the amount of dispersion or variability). distribution; the of the deviations standard for the greater wil1 be the magnitude the deviation, means a high from their mean. A small standard deviation of the values of a Series; a of of the observation as well as homogeneity degree uniformity we have two means the opposite. Thus. if or standard deviation just the large identical means, it is comparable series with identical or nearly more has the most with the smallest standard deviation that repre distribution in the deviation is extremely useful judging sentative mean. Hence standard representativeness of the mean. Deviation Deviation and Standard Difference between Mean item of based on each and every these measures of dispersion are Both respects distribution. But they differ in the following the deviation whereas while calculating mean Algebraic signs are ignored taken into account. standard deviation signs are in the calculation of mean. The either from median or can be computed Mean deviation computed fromn the on the other hand. is always standard deviation, of the deviation of the sum of the squares arithmetic mean because mean is the least. items from arithmetic

Deviation Calculation of Standard standard individual observations In case of Individual Observations two methods applying any of the following be computed by mnean. deviation may from the actual deviation of the items 1. taking assumed mean. By of the items from an 2. By taking deviations actual deviations are taken from Actual Mean. When Deviations taken from formula is applied: mean the following

-VN x= (X - X). where. Steps the series. ie., X. actual mean of Calculate the the mean, ie.. find (X-) of the items from the deviations Take x. deviations by Denote these total Xx. and obtain the Square these deviations ie., N and extract the total number of observations, Divide Exby the value of standard deviation. This gives us the square-root. When the actual mean is in Assumed Mean. taken from cumbersome to take deviations eviations 123.674 it woula be too it is case either Iractions, sav, of these deviations. In such a obtain squares taken from an Irom it and then or the deviations be be apprOximated else value of the he mean may adjustment made in the the necessary assumed mean and 284 STATISTICAL METHODS standard deviation. The former method of approximation is less in such a case accura and. therefore, invariably deviations are taken from acurateassumed mean. When are taken from deviations assumed mean the following formila. applied: is - 2d2 Steps.

. Take the deviations of the items from an assumed mean, i.e., obtain (X A). Denote these deviations by d. Take the total of these ie., obtain 2d. deviations Square these deviations and obtain the total 2 d2. Substitute the values of 2d, 2d and N in the above formula. Illustration 9. Blood serum cholesterol levels of 10 persons are as under: 240, 260, 290, 245, 255, 288, 272, 263, 277,251. Calculate standard deviation with the help of assurmed mean. Solution. CALCULATION OF STANDARD DEVIATION BY THE ASSUMED MEAN METHOD X (X-264)*

240 -24 576 260 4 16 290 +26 676 245 -19 361 255 9 81 288 +24 576 272 8 64 263 1 277 +13 169 251 -13 169 2X= 2641 d=+1 d 2689

N N 2d 2689, 2d=+ 1, N= 10 1/2689 10 = V268.9- 0.01 16.398. Illustration 10. Calculate the slandard deviation from the following observations 240.12 240.13 240.15 240.12 240.17 240.15 240.17 240.16 240.22 240.21

Madras Univ (B.Sc., e to The assumed mean should be as nearer to the actual mean poSSnve minimise calculations. In this case the actual mean is 264.1 and w taken 264 as assumed mean. MEASUR OF DISPERSION 285

CALCULATION OF solution. STANDARD DEVIATIONN X (X-240) 2 d 240.12 +0.12 .0144 240.13 +0.13 0169 240.15 +0.15 0225 240.12 +0.12 0144 240.17 +0.17 0289 240.15 +0.15 0225 240.17 +0.17 0289 240.16 +0.16 0256 240.22 +0.22 .0484 240.21 +0.21 .0441 N= 10 Ed=+ 1.60 Zd=0.2666

V2558 =o02666-0-0256 =0-033

Calculation of Standard Deviation--Discrete Series. For calculating methods be standard deviation in discrete series, any of the following may applied 1. Actual mean method. 2. Assumed mean method. 3. Step deviation method. is deviations are a Actual Mean Method. When this method applied, and deviationsS from the actual mean, ie., we find (X- X) denote these taken and multiplied by the respective by x. These deviations are then squared irequencies. The following formula is applied

where x= (X-X) - N . used because if the actual mean in practice this method is rarely However, of time. fractions the calculations take a lot in this method is used. the following for- bAssumed Mean Method. When mula is applied: where d = (X- A). N Steps. from an assumed mean and denote Take the deviations of the items these deviations by d the the respective frequencies and obtain Multiply these deviations by total, E fd deviations, Le.. calculate ds. Obtain the squares of the the respective frequencies,\ and Iultiply the squared deviations by obtain the total. fd". 286 STATISTICAL METHODOs

formula. .Substitute the values in the above Illustration11. Calculate the standard deviation from the data given below: Size of item Frequency Size of item Frequency 3.5 3 7.5 85 4.5 7 8.5 32 5.5 22 9.5 8 6.5 60 (B. Com.. Calicut Untv.. 1996) Solution. CALCULATION OF STANDARD DEVIATION X f (X-6.5) fd Size of item 3.5 3 -9 27 4.5 14 28 5.5 22 1 -22 22 6.5 60 0 0 7.5 85 85 85 8.5 32 +2 64 128 9.5 8 +3 +24 72 N=217 2fd=+128 fd=362 Efd2 (E fo -V N N f 362, 2fd 128, N=217 217362 (128217 = V1.668-348 1.149 (c) Step Deviation Method. When this method is used we take deviations o midpoints from an assumed mean and divide these deviations by the widtn class interval, 'T. i.e., In case class intervals are unequal, we tions of midpoints by the lowest common factor and use 'c' insteaddivide ottheu t formula for standard ard deviationcalculating is deviation. The formula for calculating sta

xi N where, d = and i= class interval. The use of the above formula simplifies calculations. llustration 12. The annual salaries of a in the table group of employees are given folow Salaries (in Rs. 000) 45 50 55 80 60 65 70 75 Number of persons 3 7 5 8 7 9 7 4 195 Calculate the standard Univ., deviation of the salaries. (B. Com. Kerela MEASURES OF DISPERSION 287

CALCULATION OF STANDARD DEVIATION

Solution. No. of persons Salaries (X-60)/5 f X fd fd 27 45 3 3 -9 20 50 5 2 -10 8 55 8 -1 -8 60 7 0 9 65 9 +1 +9 28 70 +2 +14 36 75 4 +3 12 112 80 +4 +28 Efd2 240 N= 50 fd 36

i= 5 g= V2/d(2td = Efd= 36, Here, Efd2 240, N= 50, V - N xi o-5050240 36 x5 V4.8 5184 x 5 10.35 In continuous Standard Deviation-Continuous Series. Calculation of distribution the methods discussed above for discrete frequency series any of method that is it is the deviation can be used. However, in practice step most used. The formula is

xi N N

i= class interval where d = . Steps. Find the mid-points of various classes. from an assumed mean and Take the deviations of these mid-points denote these deviations by d. factor and denote this column by Wherever possible take a common d. clasS with these deviations and Multiply the frequencies of each obtain 2d. them with the respective Square the deviations and multüply frequen- each class and obtain 2fd". cies of case of continuous series is to find .nus the only difference in procedure in d-points of the various classes. deviatlon of following frequency distribution of Ustration13. Calculate mean and standard marks Marks No. of Students Marks No. of Students 0-10 5 40-50 50 10-20 12 50-60 37 20-30 30 60-70 21 30-40 45 (B.A. &B.Sc., Guwahatt Untv., B.Com., Calicut Univ. 1994) STATISTICAL METHODs 288

STANDARD DEVIATION CALCULATION OF MEAN AND Solution. (m-35)/10 Marks m.p. d fd m fd2 5 -3 -15 45 0-10 5 12 -24 10-20 15 48 -30 20-30 25 30 30 30-40 35 45 +1 +50 40-50 45 50 50 +2 +74 50-60 55 37 148 +3 +63 60-70 65 21 189 N= 200 Z fd 118 fd 510 2 fd 35+5.9 = 40.9 =A+xi=N 35+x200 10 2 fd ( fd 510 (118 o- x 10 V 200 200 10 = V2.55 3481 x 10 1.4839x 10 =14.839. Wlustration 14. Find the standard deviation from the following data : Age under 10 20 30 40 50 60 70 80 No. of persons dying9 15 30 53 75 100 110 115 125 (B. Com.. Kerala Unw., B. Com.: Nagarjuna Univ.; B.Com, Bangalore Unw., I1996) Solution. CALCULATIiON OF STANDARD DEVIATION Age m.p. (m-35)/10 fd fd m 0-10 15 -45 135 10-20 15 15 -2 -30 60 20-30 23 25 -23 23 30-40 22 35 0 40-50 25 45 +1 +25 25 50-60 10 55 +2 +20 40 60-70 5 65 +3 +15 45 70-80 10 75 +4 +40 160 N=125 fd 2 Id 488 a-2- xi-V1510 = v3.904- .0003x 10 1.976 x 10 19.76. llustration 15. Find the standard deviation of the following distribution :

45-50 Age 20-25 25-30 30-35 35-40 40-45 No. of persons 35 170 110 80 45 40 Take assumed average = 32.5*. Delhi Unit BA. (H) Eco.

When we are asked in the question to take a specitied valuealue as assumeda* mean we should take deviatlons only from that value. MEASURES OF DISPERSION 289

Solution. CALCULATION OF STANDARD DEVIATION No. Age m.p of persons (m- m f 32.5)/5 d fd 20-25 22.5 170 -340 680 25-30 27.5 110 -110 110 30-35 32.5 80 0 35-40 37.5 45 +1 +45 45 40-45 42.5 40 +2 +80 160 45-50 35 47.5 +3 +105 315 N 480 2fd= - 220 ff 1310

a- x= 11310(-220 x5 480 480

= v2.729 21 x5 V2.519 x 5= 1.587 x 5 = 7.936. Mathematical Properties of Standard Deviation Standard deviation has some very important nmathematical properties which considerably enhance its utility in statistical work.

1. Combined Standard Deviation Just as it is possible to compute combined mean of two or more than two groups, similarly we can als0 compute combined standard deviation of two or more groups. Combined standard deviation is denoted by d12 and is computed as follows

O2-VMo1+ No2 + Nd2+ Nad,? Ni +N2 O12 combined standard deviation; O standard deviation of first group; 2 standard deviation of second group: d |X - X12: dh =| X2-X12l: The above formula can be extended to find out the standard deviation of three or more groups. For example. combined standard deviation of three groups would be Nao2 +Ngogí+ Nd+Nada + Ngd2 09=VNo+O123 V N1 + N2 + Na where d = IX, - X123 dh = |X2-X123 d = X3 - X lustration 16. The following are some of the particulars of the distribution of weight oft 0Oys and girls in a class Boys Girls Number 100 50 Mean weight 60 kg 45 kg Variance 4 a the standard deviation of the combined data is more variable ? (B.Com. MD. Unu. 1997) wnich of the two distributions Solution. N101 N202+ Nidh+N2d2 Ta) Combined S.D. o12=V Ni + N 290 STATISTICAL METHODs

For finding combined standard deviation, we have to calculate combined mean. X12 +2X2 N + N2 100 (60)+50 (45) 6000+2250cc= 100+50 150 N 100, o=9, N2 50, o2 = 4, dh = | Xi - X12|=60- 55 5 d=| X2-X12| = |45-55|=10.

Substituting the values o12 V 100 (9)+50 (4)+ 100 (5) +50 (102 150 900+200 +2500+ 5000 150 86007.57150 (b) For finding which distribution is more variable compare the coefficient of variation of w distributions C.V. (Boys)= x 3 100 5.00 50 C.V.(Girls) x 100-x 100-4.44 Since coefficient of variation is more for distribution of weight of boys hence this distribution shows greater variability. llustration 17. The number of workers employed, the mean wages (in Rs.) per month and standard deviation (in Rs.) in each section of a factory are given below. Calculate the mean wages and standard deviation of all the workers taken together. Section No. of workers Mean wages Standard deviation employedd in Rs. in Rs. 50 1113 60 B 60 1120 70 C 90 1115 80 Solution. NX +X2t NaXa Ni +N2+ Na (50x 1113) +(60x 1120) +(90x 1115) 50+60+ 90 55,650+67,200+1,00,350 200 2,23,200 = Rs. 200 1116. Combined standard deviation of three series O123=1io"+ Ng2+ Noas+ Nid+Nadh+ Ngcd M + Ne + Na d = X1 -X1231 or |1113- 1116 =3 d2 X2-X123 or |1120- 1116 = 4 d Xg- X123| or |1115-1116 1

2 0123= 50 (60+60 (70) +90 (80)+ 50 ( 3) + 60 (4 +90 (1) 50+60+ 90

F 1,80,000 +2,94,000 +5,76,000 + 450+960 +90 200 10,51,500 N 200 V5,257.5 72.51 294 STATISTICAL METHOD

Mean: X=A+x i=60 x 5 60 1.625 58.375

S.D. G-dExi=V352-26x580 = V4.4-.106 x 5 2.072 x 5 = 10.36 10.36 C.V.x 100 58.375 x 100 17.75%. llustration 20. From the prices of shares of Xand Y below find out which is more stable in value: X 35 54 52 53 56 58 52 50 51 49 Y 108 107 105 105 106 107 104 103 104 101 (B. Com., Bangalore Univ., 1998) Solution. In order to find out which shares are more stable, we have to compare coetf cient of variations. CALCULATION OF COEFFIIENT OF VARIATION (X-X) (Y-Y X Y 35 -16 256 108 +3 54 +3 107 +2 4 52 +1 105 0 0 53 +2 105 0 56 +5 25 106 58 +7 49 107 52 +1 104 50 103 51 0 0 104 49 2 101 -4 16 ZX= 510 Ex= 0 2=350 EY= 1050 y 0 Ey40 Coefficient of Variation X: C.V.x 100

2510 51 N 10 =51 350 N10 5.916 C.V.= x 100 11.6 51 Coefficient of Variation Y: C.V.x 100

Y-Y1050 1 105 N 10 - 140V10 2

C.V. = x 100 = 1.905 1ASURES OF DISPERSION 295

Since coefficientnt of variation is much less in case of shares Y, hence they are more stable in

value. etration 21. Goals scored by two teams in a Football session were as follows tlustras No. of Goals Scored in a No. of Football Matches Played Football Match Team 'A' Team B' 0 15 20 10 10 2 07 05 05 04 4 03 02 5 02 01 Total 42 42 consistent. Calculate coefficient of variation and state which team is more (B. Com, M.D. Uniu., 1996) the consistent we shall have to compare Solution. In order to find out which team is more coeficient of variation. X (X-7) (Y-7) 2 Y y2

20 +13 169 15 +8 64 10 +3 9 10 +3 -2 4 0 5 -3 9 2 4 25 16 2 5 -6 36 -5 25 Y= 42 Ey=0 y252 X= 42 2x= 0 Ex 118 Team B Team A C.V.x 100 C.V.x 100 X 427 . 7

g N E 118 -4.43 a 2526.A48 CV. x 100 92.57 C.V x 100 63.29 7

tested with the following results: Mustraon 22. Two brands of tyres are Life (in'000 miles) No. of tyres brand

20-25 22 24 25-30 64 76 30-35 10 35-40 3 40-45 life ? have greater average fleet of tyres Drand use on your state which of tyres would you mpare the variability and Untv., Comp (B.Com.. Bangalore 1997) Scanned with FlashScan