<<

1 BUSINESS BBA UNIT –II, 2ND SEM UNIT-II : Measures of and Dispersion: Meaning and objectives of measures of central tendency, different measure viz. arithmetic , , , and harmonic mean, characteristics, applications and limitations of these measures; measure of variation viz. , quartile deviation mean deviation and , co-efficient of variation and .

Measures of Central Tendency:

A measure of central tendency is a single value that attempts to describe a set of by identifying the central position within that set of data. As such, measures of central tendency are sometimes called measures of central location. They are also classed as summary : 1. Mean 2. Median 3. Mode

The mean, median and mode are all valid measures of central tendency, but under different conditions, some measures of central tendency become more appropriate to use than others. In the following sections, we will look at the mean, mode and median, and learn how to calculate them and under what conditions they are most appropriate to be used.

1. Mean: There are 3 types of mean

1A. The mean (or ) is the most popular and well known measure of central tendency. It can be used with both discrete and continuous data, although its use is most often with continuous data. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set. So, if we have n values in a data set and they have values x1, x2, ..., xn, the sample mean, usually denoted by (pronounced x bar), is:

This formula is usually written in a slightly different manner using the Greek capitol letter, , pronounced "sigma", which "sum of...":

Example: The marks of seven students in a test with a maximum possible mark of 20 are given below: 15 13 18 16 14 17 12 Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

2 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM Find the mean of this set of data values.

Solution:

So, the mean mark is 15

Advantages : 1. It can be easily calculated; and can be easily understood. It is the reason that it is the most used measure of central tendency. 2. As every item is taken in calculation, it is effected by every item. 3. As the mathematical formula is rigid one, therefore the result remains the same. 4. Fluctuations are minimum for this measure of central tendency when repeated samples are taken from one and the same population. 5. It can further be subjected to algebraic treatment unlike other measures i.e. mode and median. 6. A.M. has also a plus point being a calculated quantity and is not based on position of terms in a series. 7. As it is rigidly defined, it is mostly used for comparing the various issues. Disadvantages: 1. The arithmetic mean is highly affected by extreme values. 2. It cannot average the ratios and percentages properly. 3. It is not an appropriate average for highly skewed distributions. 4. It cannot be computed accurately if any item is missing. 5. The mean sometimes does not coincide with any of the observed value.

1B: Geometric Mean A geometric mean is a mean or average which shows the central tendency of a set of numbers by using the product of their values. For a set of n observations, a geometric mean is the nth root of their product. The geometric mean G.M., for a set of numbers x1, x2, … , xn is given as

1⁄n G.M. = (x1. x2 … xn)

n 1⁄n n or, G. M. = (π i = 1 xi) = √( x1, x2, … , xn).

Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

3 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM The geometric mean of two numbers, say x, and y is the square root of their product x×y. For three numbers, it will be the cube root of their products i.e., (x y z) 1⁄3.

Geometric Mean of Distribution

For a grouped , the geometric mean G.M. is

f1 f2 fn 1⁄N n G.M. = (x1 . x2 … xn ) , where N = ∑ i= 1 fi

Taking logarithms on both sides, we get

n log G.M. = 1⁄N (f1 log x1 + f2 log x2 + … + fn log xn) = 1⁄N [∑ i= 1 fi log xi ].

Properties of Geometric Mean

 The logarithm of geometric mean is the arithmetic mean of the logarithms of given values  If all the observations assumed by a variable are constants, say K >0, then the G.M. of the observation is also K  The geometric mean of the ratio of two variables is the ratio of the geometric means of the two variables

 The geometric mean of the product of two variables is the product of their geometric means

Geometric Mean of a Combined Group

Suppose G1, and G2 are the geometric means of two series of sizes n1, and n2 respectively. The geometric mean G, of the combined groups, is:

log G = (n1 log G1 + n2 log G2) ⁄ (n1 + n2)

or, G = antilog [(log G1 + n2 log G2) ⁄ (n1 + n2)]

In general for ni geometric means, i = 1 to k, we have

G = antilog [(log G1 + n2 log G2 + … + nk log Gk) ⁄ (n1 + n2 + … +nk)]

Advantages:

 A geometric mean is based upon all the observations  It is rigidly defined  The fluctuations of the observations do not affect the geometric mean  It gives more weight to small items

Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

4 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM Disadvantages :

 A geometric mean is not easily understandable by a non-mathematical person  If any of the observations is zero, the geometric mean becomes zero  If any of the observation is negative, the geometric mean becomes imaginary

1C: Harmonic Mean Harmonic mean is the reciprocal of the arithmetic mean of the reciprocals of the observations. The most important criteria for it is that none of the observations should be zero. A harmonic mean is used in averaging of ratios. The most common examples of ratios are that of speed and time, cost and unit of material, work and time etc. The harmonic mean (H.M.) of n observations is

n H.M. = 1÷ (1⁄n ∑ i= 1 (1⁄xi) )

In the case of frequency distribution, a harmonic mean is given by

n n H.M. = 1÷ [1⁄N (∑ i= 1 (fi ⁄ xi)], where N = ∑ i= 1 fi

Properties of Harmonic Mean

 If all the observation taken by a variable are constants, say k, then the harmonic mean of the observations is also k  The harmonic mean has the least value when compared to the geometric mean and the arithmetic mean

Advantages:

 A harmonic mean is rigidly defined  It is based upon all the observations  The fluctuations of the observations do not affect the harmonic mean  More weight is given to smaller items

Disadvantages:

 Not easily understandable  Difficult to compute

Numericals:

Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

5 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM 1. Suppose a person cover a certain distance d at a speed of x. He returns back to the starting point with a speed of y. In this case, the average speed of the person is calculated by the harmonic mean.

Average speed = Total distance covered / Total time taken = 2d (d⁄x + d⁄y).

In other words, if an equal distance is covered with speeds S1, S2, … , Sn, then

Average speed = n ÷ ∑ (1⁄S).

If different distances D1, D2, … , Dn, is covered with different speeds S1, S2, … , Sn, the average speed is

n n Average Speed = [∑ i= 1 Di] ⁄ [∑ i= 1 (Di ⁄ Si)]

2. Calculate the geometric and harmonic mean of the given data x 2 4 5 8 f 3 3 2 2

f1 f2 fn 1⁄N Solution: Geometric mean = G.M. = (x1 . x2 … xn )

Here, N = 3 + 3 + 2 + 2 = 10

G.M. = (2 3 × 4 3 × 5 2 × 8 2)1/10 or, G.M. =(8×64×25×64)1/10 = (819200)1/10

n H.M. = 1 ÷ [1 ⁄ N ∑ i= 1 (fi ⁄ xi) ] = 1÷[1⁄10 × (3⁄2 + ¾ + 2⁄5 + 2⁄8)] = 100⁄29.

If AM = arithmetic mean, GM = geometric mean, and HM = harmonic mean. The relationship between the three is given by the formula : AM x HM = GM2

Let there are two numbers ‘a’ and ‘b’, a, b > 0 then AM = a+b/2 GM =√ab HM =2ab/a+b ∴ AM × HM =a+b/2 × 2ab/a+b = ab = (√ab)2 = (GM)2 Note that these means are in G.P. Hence AM.GM.HM follows the rules of G.P.

Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

6 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM i.e. G.M. =√A.M. × H.M. Now, let us see the difference between AM and GM AM – GM =a+b/2 – √ab =(√a2)+(√b)–2√a√b/2 i.e. AM > GM Similarly, G.M. – H.M. = √ab –2ab/a+b =√ab/a+b (√a – √b)2 > 0 So. GM > HM

2. Median The median of a set of data values is the middle value of the data set when it has been arranged in ascending order. That is, from the smallest value to the highest value.

Example: The marks of nine students in a geography test that had a maximum possible mark of 50 are given below: 47 35 37 32 38 39 36 34 35 Find the median of this set of data values.

Solution: Arrange the data values in order from the lowest value to the highest value: 32 34 35 35 36 37 38 39 47 The fifth data value, 36, is the middle value in this arrangement.

In general, median =1/2(n+1)th value, where n is the number of data values in the sample.

If the number of values in the data set is even, then the median is the average of the two middle values.

3. Mode The mode of a set of data values is the value(s) that occurs most often. The mode has applications in printing. For example, it is important to print more of the most popular books; because printing different books in equal numbers would cause a shortage of some books and an oversupply of others.

Likewise, the mode has applications in manufacturing. For example, it is important to manufacture more of the most popular shoes; because manufacturing different shoes in equal numbers would cause a shortage of some shoes and an oversupply of others. Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

7 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM

Example: Find the mode of the following data set: 48 44 48 45 42 49 48

Solution: The mode is 48 since it occurs most often.

Note: • It is possible for a set of data values to have more than one mode. • If there are two data values that occur most frequently, we say that the set of data values is bimodal. • If there is no data value or data values that occur most frequently, we say that the set of data values has no mode.

4.6 Empirical Relation Between Mean, Median And Mode

A distribution in which the values of mean, median and mode coincide (i.e. mean = median = mode) is known as a symmetrical distribution. Conversely, when values of mean, median and mode are not equal the distribution is known as asymmetrical or skewed distribution. In moderately skewed or asymmetrical distribution a very important relationship exists among these three measures of central tendency. In such distributions the distance between the mean and median is about one-third of the distance between the mean and mode, as will be clear from the diagrams 1 and 2. Karl Pearson expressed this relationship as:

Mode = mean - 3 [mean - median]

Mode = 3 median - 2 mean and Median = mode +

Knowing any two values, the third can be computed.

Example Given median = 20.6, mode = 26 Find mean.

Solution:

Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

8 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM

Measures of Dispersion

The measure of dispersion shows the homogeneity or the heterogeneity of the distribution of the observations.

Characteristics of Measures of Dispersion

 A measure of dispersion should be rigidly defined  It must be easy to calculate and understand  Not affected much by the fluctuations of observations  Based on all observations

Classification of Measures of Dispersion

The measure of dispersion is categorized as:

(i) An absolute measure of dispersion:

 The measures which express the scattering of observation in terms of distances i.e., range, quartile deviation.  The measure which expresses the variations in terms of the average of deviations of observations like mean deviation and standard deviation.

Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

9 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM (ii) A relative measure of dispersion:

We use a relative measure of dispersion for comparing distributions of two or more data set and for unit free comparison. They are the coefficient of range, the coefficient of mean deviation, the coefficient of quartile deviation, the , and the coefficient of standard deviation.

Range A range is the most common and easily understandable measure of dispersion. It is the difference between two extreme observations of the data set. If X max and X min are the two extreme observations then

Range = X max – X min

Merits of Range

 It is the simplest of the measure of dispersion  Easy to calculate

 Easy to understand  Independent of change of origin

Demerits of Range

 It is based on two extreme observations. Hence, get affected by fluctuations  A range is not a reliable measure of dispersion  Dependent on change of scale

Quartile Deviation

The quartiles divide a data set into quarters. The first quartile, (Q1) is the middle number between the smallest number and the median of the data. The second quartile, (Q2) is the median of the data set. The third quartile, (Q3) is the middle number between the median and the largest number.

Quartile deviation or semi-inter-quartile deviation is

Q = ½ × (Q3 – Q1)

Merits of Quartile Deviation

 All the drawbacks of Range are overcome by quartile deviation  It uses half of the data

Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

10 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM

 Independent of change of origin  The best measure of dispersion for open-end classification

Demerits of Quartile Deviation

 It ignores 50% of the data  Dependent on change of scale  Not a reliable measure of dispersion

Mean Deviation Mean deviation is the arithmetic mean of the absolute deviations of the observations from a measure of central tendency. If x1, x2, … , xn are the set of observation, then the mean deviation of x about the average A (mean, median, or mode) is

Mean deviation from average A = 1⁄n [∑i|xi – A|]

For a grouped frequency, it is calculated as:

Mean deviation from average A = 1⁄N [∑i fi |xi – A|], N = ∑fi

th Here, xi and fi are respectively the mid value and the frequency of the i class interval.

Merits of Mean Deviation

 Based on all observations

 It provides a minimum value when the deviations are taken from the median  Independent of change of origin

Demerits of Mean Deviation

 Not easily understandable  Its calculation is not easy and time-consuming  Dependent on the change of scale

 Ignorance of negative sign creates artificiality and becomes useless for further mathematical treatment

Standard Deviation A standard deviation is the positive square root of the arithmetic mean of the squares of the deviations of the given values from their arithmetic mean. It is denoted by a Greek letter sigma, σ. It is also referred to as deviation. The standard deviation is given as Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

11 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM

½ 2 2 ½ σ = [(Σi (yi – ȳ) ⁄ n] = [(Σ i yi ⁄ n) – ȳ ]

For a grouped frequency distribution, it is

½ 2 2 ½ σ = [(Σi fi (yi – ȳ) ⁄ N] = [(Σi fi yi ⁄ n) – ȳ ]

The square of the standard deviation is the . It is also a measure of dispersion.

2 ½ 2 2 σ = [(Σi (yi – ȳ) / n] = [(Σi yi ⁄ n) – ȳ ]

For a grouped frequency distribution, it is

2 ½ 2 2 σ = [(Σi fi (yi – ȳ) ⁄ N] = [(Σ i fi xi ⁄ n) – ȳ ].

If instead of a mean, we choose any other arbitrary number, say A, the standard deviation becomes the root mean deviation.

Variance of the Combined Series

If σ1, σ2 are two standard deviations of two series of sizes n1 and n2 with means ȳ1 and ȳ2. The variance of the two series of sizes n1 + n2 is:

2 2 2 2 2 σ = (1/ n1 + n2) ÷ [n1 (σ1 + d1 ) + n2 (σ2 + d2 )]

where, d1 = ȳ 1 − ȳ, d2 = ȳ 2 − ȳ, and ȳ = (n1 ȳ 1 + n2 ȳ 2) ÷ ( n1 + n2).

Merits of Standard Deviation

 Squaring the deviations overcomes the drawback of ignoring signs in mean deviations  Suitable for further mathematical treatment  Least affected by the fluctuation of the observations  The standard deviation is zero if all the observations are constant  Independent of change of origin

Demerits of Standard Deviation

 Not easy to calculate  Difficult to understand for a layman  Dependent on the change of scale

Coefficient of Dispersion

Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

12 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM Whenever we want to compare the variability of the two series which differ widely in their . Also, when the unit of measurement is different. We need to calculate the coefficients of dispersion along with the measure of dispersion. The coefficients of dispersion (C.D.) based on different measures of dispersion are

 Based on Range = (X max – X min) ⁄ (X max + X min).

 C.D. based on quartile deviation = (Q3 – Q1) ⁄ (Q3 + Q1).  Based on mean deviation = Mean deviation/average from which it is calculated.

 For Standard deviation = S.D. ⁄ Mean

Coefficient of Variation

100 times the coefficient of dispersion based on standard deviation is the coefficient of variation (C.V.).

C.V. = 100 × (S.D. / Mean) = (σ/ȳ ) × 100.

Skewness

Skewness is usually described as a measure of a dataset’s symmetry – or lack of symmetry. A perfectly symmetrical data set will have a skewness of 0. The has a skewness of 0.

The skewness is defined as (Advanced Topics in Statistical Process Control, Dr. Donald Wheeler, www.spcpress.com):

th where n is the sample size, Xi is the i X value, X is the average and s is the sample standard deviation. Note the exponent in the summation. It is “3”. The skewness is referred to as the “third standardized central for the probability model.”

Most software packages use a formula for the skewness that takes into account sample size:

This sample size formula is used here. It is also what Microsoft Excel uses. The difference between the two formula results becomes very small as the sample size increases. Figure 1 is a symmetrical data set. It was created by generating a set of data from 65 to 135 in steps of 5 with the number of each value as shown in Figure 1. For example, there are 3 65's, 6 70's, 9 75's, etc.

Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

13 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM Figure 1: Symmetrical Dataset with Skewness = 0

A truly symmetrical data set has a skewness equal to 0. It is easy to see why this is true from the skewness formula. Look at the term in the numerator after the summation sign. Each individual X value is subtracted from the average. So, if a set of data is truly symmetrical, for each point that is a distance “d” above the average, there will be a point that is a distance “-d” below the average. Consider the value of 65 and value of 135. The average of the data in Figure 1 is 100. So, the following is true when X = 65:

For X = 135, the following is true:

So, the -4278 and +4278 even out at 0. So, a truly symmetrical data set will have a skewness of 0. To explore positive and negative values of skewness, let’s define the following term:

So, Sabove can be viewed as the “size” of the deviations from average when Xi is above the average. Likewise, Sbelow can be viewed as the “size” of the deviations from average when Xi is below the average. Then, skewness becomes the following:

Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

14 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM

If Sabove is larger than Sbelow, then skewness will be positive. This typically means that the right- hand tail will be longer than the left-hand tail. Figure 2 is an example of this. The skewness for this dataset is 0.514. A positive skewness indicates that the size of the right-handed tail is larger than the left-handed tail.

Figure 2: Dataset with Positive Skewness

Figure 3 is an example of dataset with negative skewness. It is the mirror image essentially of Figure 2. The skewness is -0.514. In this case, Sbelow is larger than Sabove. The left-hand tail will typically be longer than the right-hand tail.

Figure 3: Dataset with Negative Skewness

So, when is the skewness too much? The rule of thumb seems to be:

 If the skewness is between -0.5 and 0.5, the data are fairly symmetrical  If the skewness is between -1 and – 0.5 or between 0.5 and 1, the data are moderately skewed  If the skewness is less than -1 or greater than 1, the data are highly skewed

Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.

15 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM

Dr. Kajarii Roy, Professor, MBA Department, Somany Institute of Technology and Management, Haryana.