<<

UNIT-II

SKEWNESS,,CORRELATION ,REGRESSION (i) (ii) kurtosis. skewness In a perfectly symmetrical distribution , and coincide , skewness is a measure to study the aspect of a statistical distribution. If adistribution is not symmetrical,we say that it is skewed.

(ii) kurtosis: Kurtosis is a measure of fitness or peakness of a distribution.

(iii) Pearsons coefficient of skewness =

When Mode is not well defined ( ) (iv) Pearsons coefficient of skewness = .

Bowley’s formula for measuring skewness.

Bowleys coefficient of skewness=

1. In a distribution mean=65,median=70 and the coefficient of skewness is -0.6. Find the coefficient of variation. ( ) Solution:

( ) -0.6 =

( ) = = =25

Coefficient variation = =

2. In a distribution the sum of the two quartiles is 78.2 and their difference is 14.3 and if it’s median is 35.7 Find the coefficient of skewness Solution: Given =78.2 =14.3 Median M=35.7

Coefficient of skewness= = =0.4755

3. Pearson’s coefficient of -0.7 and the value of the median and are 12.8 and 6 respectively. Estimate the value of mean. Solution: Pearsons coefficient of skewness =-0.7,Median=12.8,S,D=6 ( )

( ) - 0.7= -1.4=Mean-Median

-1.4 = Mean-12.8 Mean=12.8-1.4 Mean=11.4 4. In a distribution,the coefficient of skewness based upon quaetiles is 0.6.If the sum of the upper and lower quartiles is 100 and the median 38,Estimate the value of the upper quartile. Solution: =0.6, =100 ,M=38 ( ) =

( ) 0.6 = =

( ) ---( ) Adding 1&2 2 =140 ( ) 5.Find the coefficient of skewness,If difference between two quartiles is equal to 8,sum of two quartiles is 22 and median is 10.5. Solution: Given =22, =8 ,h=10.5 ( ) = = = =0.125

6. Calculate the coefficient of variation,if Karl Pearson’s coefficient of skewness is 0.42,mean is 86,and median is 80. Solution: Given ,pearsons coefficient of Skewness =0.42 ( ) Mean=86,Median=80. S.K =

( ) ⇒0.42= => = =42.857

Coefficient of variation = x 100 =

7. The first four central moments of a distribution are 0,2.5,0.7 and 8.75.Write the skewness and kurtosis of the distribution. Solution: The coefficient of skewness is given by

( ) = ,Since is positive ,the distribution is ( ) positively Skewed.

The measure of kurtosis is given by = = = =3 ( ) Since =3 the distribution is normal. 8 . The Karl Pearsons coefficient of skewness of a distribution is 0.32,it’s standard deviation is 6.5 and the mean is 29.6.Calculate the mode and the median.(L3) Solution: =0.32, =6.5 ,Mean =29.6 ( ) ( ) S.K = => 0.32=

=> 0.32x6.5 =88.8 -3 Median =>3 Median =-2.08+88.8 =86.72

Median = =28.90

Mean-Mode=3(Mean-Median) 29.6-Mode =3(29.6-28.90) =3(0.7) =2.1 Mode=29.6-2.1=27.5

9. Compute the first four central moments for the following 8, 10,11,12,14. (L3) Solution:

̅ = = =11

x x- ̅ ( ̅) ( ̅) ( ̅) 8 -3 9 -27 81

10 -1 1 -1 1

11 0 0 0 0

12 1 1 1 1

14 3 9 27 81

55 0 20 56 144

The four central moments are ∑( ̅) ∑(( ̅) ) ∑(( ̅) ) = =0, = =4 , = = =11.2

∑(( ̅) ) = = =32.8

10. The first three moments of a distribution about are 2,10 and -30. Find the value of (L1) Solution:

About the value x=3, =2 , =10,

= - =10-4=6, =-30-3( )( )+2( ) =-30-60+16=-74

( )

Pearsons coefficient of skewness=

1.Calculate Karl Pearson’s coefficient of skewness. (L3) Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 No.of 10 15 24 25 10 10 6 candidates

Solution:

Marks Mid F d fd f value 0-10 5 10 -3 -30 90

10-20 15 15 -2 -30 60

20-30 25 24 -1 -24 24

30-40 35 25 0 0 0

40-50 45 10 1 10 10

50-60 55 10 2 20 40

60-70 65 6 3 18 64

∑ A=35,d= , ̅ =A+ -

Mode =l + ( ) ( )

=30+ =30.625

√∑ (∑ )

=√ ( ) =30.625

Coefficient of skewness= = =0.0476

1. Calculate the Pearson’s coefficient of skewness for the following data (L3) Class 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 frequency 5 9 14 20 25 15 8 4

Solution: class Mid value F d fd f

9.5-19.5 14.5 5 3 -15 45

19.5-29.5 24.5 9 -2 -18 36

29.5-39.5 34.5 14 -1 -14 14

39.5-49.5 44.5 20 0 0 0

49.5-59.5 55.5 25 1 25 25

59.5-69.5 65.5 15 2 30 60

69.5-79.5 75.5 8 3 24 72

79.5-89.5 85.5 4 4 16 64

Let A=44.6 ;c=10 ,d=

∑ Mean = A+ =44.5+

√∑ (∑ )

=√ ( )

=√ =17.12

Mode = l + =49.5+ ( ) ( )

=49.5+

Pearsons coefficient of skewness=

2. Calculte the pearsons coefficient of skewness for the following data (L3) Class 3-7 8-12 13-17 18-22 23-27 28-32 33-37 38-42 frequency 2 108 580 175 80 32 18 5

Solution:

class Mid value f d fd f 2.5-7.5 5 2 -3 -6 18 7.5-12.5 10 108 -2 -216 512 12.5-17.5 15 58 -1 -580 580 17.5-22.5 20 175 0 0 0 22.5-27.5 25 80 1 80 80 27.5-32.5 30 32 2 64 128 32.5-37.5 35 18 3 54 162 37.5-42.5 40 5 4 20 80

TOTAL 1000 584 1560

A=20 d=

∑ Mean = A+ =20+

Mode = l + =15+ ( ) ( )

=15+ +15=17.69

S √∑ (∑ )

√ ( ) =5.52

Pearsons coefficient of skewness=

4. Calculate Pearson’s coefficient of skewness for the following data (L3) Size 7 8 9 10 11 12 13 14 Frequency 2 11 36 64 39 39 22 2

Solution: This is a discrete data. Maximum frequency corresponds to x=10 X f d fd f 7 2 -3 -6 18 8 11 -2 -22 44 9 36 -1 -36 36 10 64 0 0 0 11 39 1 39 39 12 39 2 60 120 13 22 3 66 198 14 2 4 8 32

Mode =10 ,let A=10,d=x-10 ∑ Mean = A+ =10+

S √∑ (∑ ) =√ ( )

Pearsons coefficient of skewness=

Bowley’s coefficient of Skewness =

5. Calculate Bowleys coefficient of skewness for the following data. ((L4) Weight(in kgs) 40 50 60 70 80 90 No.of persons 185 167 132 82 38 12 Solution: More than No of persons class f Cf 40 185 40-50 18 18 50 167 50-60 35 53 60 132 60-70 50 103 70 82 70-80 44 147 80 38 80-90 26 173 90 12 90and 12 185 above

Median = l + 60 + = 67.9

= = 50 + = 58.07

= 70 + = 78.125

( ) Bowley’s coefficient of Skewness = = = 19.61

MOMENTS

( ̅) ∑

( ̅) ∑

( ̅) ∑

( ̅) ∑

6. Calculate the first four central moments for the following . (L3) X 0 1 2 3 4 5 6 7 8 F 1 8 28 56 70 56 28 8 1 Solution: X f D ( ̅) ( ̅) ( ̅) ( ̅) 0 1 -4 -4 16 -64 256 1 8 -3 -24 72 -216 648 2 28 -2 -56 112 -224 448 3 56 -1 -56 56 -56 56 4 70 0 0 0 0 0 5 56 1 56 56 56 56 6 28 2 56 112 224 748 7 8 3 24 72 216 648 8 1 4 4 16 64 256 256 0 512 0 2616

̅ = = = 4

( ̅) ∑ = = 0

( ̅) ∑ = = 2

( ̅) ∑ = = 0

( ̅) ∑ = = 10.22

Since the distribution is symmetrical

7. Calculate the first four central moments for the following frequency.(L4) Marks less than 80 70 60 50 40 30 20 10 frequency 100 90 80 60 32 20 13 5

Solution:

Marks Mid value f d Fd f f f 0-10 5 5 -4 -20 80 -320 1280 10-20 15 8 -3 -24 72 -216 648 20-30 25 7 -2 -14 28 -56 112 30-40 35 12 -1 -12 12 -12 12 40-50 45 28 0 0 0 0 0 50-60 55 20 1 20 20 20 20 60-70 65 10 2 20 40 80 160 70-80 75 10 3 30 90 270 810 100 0 392 -234 3042

∑ ̅

Let d = , c = 10

8. Calculate the measure of Kurtosis from the following data (L4)

X 2 4 6 8 10 12 14 Y 4 11 48 27 20 16 8 Solution:

X F d fd f f f 2 4 -3 -12 36 -108 324 4 11 -2 -22 44 -88 176 6 18 -1 -18 18 -18 36 8 27 0 0 0 0 0 10 20 1 20 20 20 20 12 16 2 32 64 128 256 14 8 3 24 72 216 648

TOTAL 104 24 254 150 1442

( )

= 11.53 – 3 9.77 0.46 + 2(0.46)3 = 11.53 – 13.4826 + 0.0973 = – 1.8553

=221.84 –21.2152+12.404 –0.1341=212.89 Measure of Kurtosis based on moments

= 2.33

CORRELATION. Correlation; Let X and Y be two random variables, Correlation is the measure of co variability taking into account for the of X and Y. Let X and Y be two random variables,the correlation coefficient denoted by ( ) ( ) ,is defined by √ √

Types of correlation

Types of correlation: ( i) positive and negative (ii).Simple,partial and multiple (iii)Linear,non linear. lines of regression.

Regression is a mathematical measure of average relationship between two or more variables in terms of original limits of the data. Lines of regression: The line of regression fn y on x is given by

y- ̅ ( ̅).

The line of regression fn x on y is given by

( ̅)=r ( ̅)

` Regression coefficient.

A measure of assotiation between two random variables obtained as the expected value of the product of the two random variables around their ;that is Cov( )=E( ) –E( ) ( )

1. If two regression coefficients are 0.8 and 0.6.Find coefficient of correlation?(L1) Solution: Given =0.8, =0.6

= =( )( )=0.48 r=0.692

2. The two equations of the variable are Find the correlation coefficient between (L1) Solution: Given that the regression equations of X&Y are X=19.13-0.87y the regression coefficient of X onY is The regression eqn of Y on X is

the regression coefficient of YonX is the correlation oefficient between X &Y is given by

√ = √( )( )

= 3.Calculate the coefficient of correlation between from the following data. (L3) x 1 3 5 8 9 10 y 3 4 8 10 12 11 Solution:

x y ̅ ̅ ( ̅) ( ̅) ( ̅)( ̅) 1 3 -5 -5 25 25 25 3 4 -3 -4 9 16 12 5 8 -1 0 1 0 0 8 10 2 2 4 4 4 9 12 3 4 9 16 12 10 11 4 3 16 9 12

36 48 0 0 64 70 65

∑ ∑ ̅ ̅

∑( ̅)( ̅) ̅ √∑( ̅) √∑( ̅) √ √

4.Calculate coefficient of correlation between . (L3) x 1 2 3 4 5 6 7 8 9 y 12 11 13 15 14 17 16 19 18 Solution: Y ̅ ̅ ( ̅) ( ̅) ( ̅)( ̅) 1 12 -4 -3 16 9 12 2 11 -3 -4 9 16 12 3 13 -2 -2 4 4 4 4 15 -1 0 1 0 0 5 14 0 -1 0 1 0 6 17 1 2 1 4 2 7 16 2 1 4 1 2 8 19 3 4 9 16 12 9 18 4 3 16 9 12 45 135 0 0 60 60 56

̅ ̅

∑( ̅)( ̅) = √∑( ̅) √∑( ̅) √ √

5.Ten competitors in a musical test were ranked by 3 judges X,Y,Z in the following order. (L2) A B C D E F G H I J Rank by X 1 6 5 10 3 2 4 9 7 8 Rank by Y 3 5 8 4 7 10 2 1 6 9 Rank by Z 6 4 9 8 1 2 3 10 5 7 Using method ,Discuss which pair of judges has the nearest approach.

Solution:

X y Z 1 3 6 -2 -3 -5 4 9 25 6 5 4 1 1 2 1 1 4 5 8 9 -3 -1 -4 9 1 16 10 4 8 6 -4 2 36 16 4 3 7 1 -4 6 2 16 36 4 2 10 2 -8 8 0 64 64 0 4 2 3 2 -1 1 4 1 1 9 1 10 8 -9 -1 64 81 1 7 6 5 1 1 2 1 1 4 8 9 7 -1 2 1 1 4 1 200 214 60 The rank correlation between x & y is ∑ ( ) ( ) ( ) ( ) The rank correlation between y & z is

∑ ( ) ( ) ( ) ( ) The rank correlation between y & z is

∑ ( ) ( ) ( ) ( ) Since ( ) is maximum and also positive, We conclude that the pair of judges x & z has the nearest approach to common likings in music

6. From the following data, Calculate (L3) (i) The two regression equations. (ii)The coefficient of correlation between the marks in Economics and Statistics. (iii)The most likely marks in statistics when marks in Economics are 30. Marks in 25 28 35 32 31 36 29 38 34 32 Economics Marks in 43 46 49 41 36 32 31 30 33 39 Statistics

Solution: x Y x- ̅=x-32 y- ̅=y-38 ( ̅) ( ̅) ( ̅)( ̅) 25 43 -7 5 49 25 -35 28 46 -4 8 16 64 -32 35 49 3 11 9 121 33 32 41 0 3 0 9 0 31 36 -1 -2 1 4 2 36 32 4 -6 16 36 -24 29 31 -3 -7 9 49 21 38 30 6 -8 36 64 -48 34 33 2 -5 4 25 -10 32 39 0 1 0 1 0 320 380 0 0 140 398 -93

∑ ∑ Here ̅ & ̅

Coefficient of regression of y on x is ∑( ̅)( ̅)

∑( ̅)

Coefficient of regression of x on y is ∑( ̅)( ̅)

∑( ̅)

(i)Equation of the line of regression of x on y is ̅ ( ̅) (ie) x-32 = -0.2337(y-38) = -0.2337 y + 0.2337 38 X = -0.2337 y + 40.8806 Equation of the line of regression of y on x is ̅ ( ̅) (ie) y-38 = -0.6643(x-32) = -0.6643 x + 0.6643 32 y = -0.6643 x + 59.2576 (ii)Coefficient of correlation

= (-0.6643) (-0.2337) = 0.1552 r = √ (iii)When x = 30, y = ? Y = -0.6643 x + 59.2576 y = -0.6643 30 + 59.2576 y = 39.32 39

7. Find the regression equation showing the regression equation of capacity utilization on production from the following data. (L2) Average Standard deviation Production(in lakh units) 35.6 10.5 Capacity utilization (in percentage) 84.8 8.5 r=0.62.Estimate the production when the capacity utilization is 70 percent.

Solution: Let production be denoted by the variable x and capacity utilization by y Then the regression equation is given by ̅ ( ̅) ------(1)

Where = 0.62 = 0.5019

& ̅= 35.6 , ̅ = 84.8 (1) y – 84.8 = 0.5019 (x-35.6) y = 66.9324 + 0.5019 x Which is the required regression of capacity utilization on production. To find regression equation x on y is ̅ ( ̅) ------(2)

Where = 0.62 = 0.7659

(2) x – 35.6 = 0.7659(y-84.8) X = 35.6 + 0.7659 y – 64.9483 = 0.7659 y – 29.3483 When y = 70, x = 0.7659(70) – 29.3483 = 24.2647 Hence the estimated production is 242.647 units when the capacity utilization is 70 percent.

8. The two lines of regression are (L6) The variance of x is 9. Evaluate (i)The mean values of X and Y. (ii)Correlation coefficient between X and Y. Solution: (i)Since both the lines of regression passes through the mean values ̅ ̅, The point ( ̅ ̅) must satisfy the two given regression lines (ie) 8 ̅ – 10 ̅ = -66 ------(1) 40 ̅- 18 ̅ = 214 ------(2) (1)*5 40 ̅ – 50 ̅ = -330 40 ̅ – 18 ̅ = 214 ------32 ̅ = 544 ̅ = 17 (1) 8 ̅ - 10*17 = -66 ̅ = 13 ̅ ̅ (ii) From (1) 10 y = 8 x + 66

y =

= 0.6

Since both the regression coefficients are positive, r must be positive r = 0.6