PROJECT REPORT

IC 102. Data Analysis and Interpretation

(Prof. Milind Sohoni)

STATISTICAL ANALYSIS OF CENSUS DATA OF THE TALUKAS OF & MURBAD

Aamod Shailesh Kore (110050004)

HarshaVardhan Kode (110050067)

Kotha Vinod Reddy (110050060)

INTRODUCTION

In our report we have analysed the data of Ulhasnagar and Murbad, two talukas in district of the state of . Murbad is mainly a rural taluka located at 19.25°N 73.4°E. It has an average elevation of 83 metres (272 feet). On the other hand, Ulhasnagar is an urban taluka located at 19.22°N 73.15°E. It has an average elevation of 19 metres (62 feet).

Problem and Focus

We have done an analysis on general aspects like population, sex ratio, unemployment, literacy and female empowerment. However our major focus was particularly on the topics:

 Literacy, especially female literacy  Employment and working patterns  How these affect other fields.

We have correlated these fields and compared different parameters and relations between rural and urban population and also among different classes of the same population. The data and inferences are largely supported by graphs and charts. Wherever required we have drawn the best fit line between different parameters and found the correlation between them.

Finally we state the important key observations and suggest some remedial measures to improve certain fields and relations.

Data Sets

We have analysed the above fields for Ulhasnagar and Murbad using the census data provided. We also have tried to compare the data to that of the national average in a few cases to get a better idea. In certain cases data has be converted to graph plots and histograms for better inference.

1

GENERAL ASPECTS

Total Population

Best Fit Line: y = 5.076 x + 18.86 R² = 0.974

The relation between the total population and no. of households is linear as expected. The quality of fit for the graph R2 = 0.974 and the correl between the parameters turns out to be = 0.987294. This shows that the data is very consistent. On an average each household in the taluka of Murbad has about 5 persons.

The national average for is about 5.34 (5.4 for rural areas and 5.2 for urban areas). Thus the population per household is quite consistent with national statistics. This represents a rather good fact as it means that the villages will not fall short of food, water or other facilities provided they are allocated on the basis of national average statistics.

2

y = 4.566x + 304.4 R² = 0.984 correl = 0.992086

In comparison, the urban taluka of Ulhasnagar has towns with larger population but on an average each household still comprises of 4 – 5 persons. The similar graph for Ulhasnagar taluka has a best fit y = 4.566x + 304.4, with quality (goodness) of fit R2=0.0984. The correlation between these parameters turns out to be 0.992086, which shows that the data is consistent throughout all data points.

The data shows that the figure (4.56) is consistently lesser than the national average (5.34) which indicates better allocation of facilities and resources to each individual in the area.

3

 In short, it can thus be concluded that urban households would in general be expected to have lesser persons as compared to rural households. This can be accounted due to relatively lesser birth control measures in rural areas as compared to urban.

 Although the population per household is slightly more for urban regions, the total population of urban towns and wards is much higher as compared to rural. For example, the total population of the taluka of Murbad is 170267 while that of Ulhasnagar is 473731, which is almost 4 times that of Murbad. The histograms for population range in different villages and towns of Murbad an Ulhasnagar reveal that the average population for a village in Murbad is 753.39 with standard deviation of 435.39. For Ulhasnagar the average is 6865.667 and standard deviation 2370.065.

Population distribution in Ulhasnagar

4

Population Distribution in Murbad

 The histograms for population range in different villages and towns of Murbad an Ulhasnagar reveal that the average population for a village in Murbad is 753.39 with standard deviation of 435.39. For Ulhasnagar the average is 6865.667 and standard deviation 2370.065.

 The number of villages in Murbad is much more than the number of cities and towns in Ulhasnagar; even then the total population of Ulhasnagar is much higher. This clearly supports the fact that the population density in urban regions is much greater than rural areas.

5

Comparison of Population Distribution of Murbad and Ulhasnagar

This plot illustrates the comparison between population distribution in Murbad and Ulhasnagar. The crowded lines on the left represent the data for Murbad, which is levelled (approximately) to the corresponding data of Ulhasnagar represented by the broader lines towards the right).

 It can be thus seen that although the population in a particular village in Murbad, the number of villages in a particular population range in Murbad is much higher. This can be attributed to the fact that the a large number of people migrate to urban areas in search of better opportunities and the hope of better working conditions.

6

Sex Ratio

The sex ratio is the number of females per 1000 males. According to the 2011 census data, the average sex ratio of India is 940. The urban sex ratio was 926 while the rural sex ratio was 947.

In Murbad the average sex ratio is about 917, which is much lower than that of the nation. For Ulhasnagar, the figure stands to be ~ 870, very much lower than national average even for urban regions.

Murbad Ulhasnagar TOTAL 953 881 P_06 979 890 SC 978 918 ST 974 913

The sex ratio is extremely low in the taluka of Ulhasnagar as compared to Murbad and even to the national average.

y = 0.917x + 18.64 R² = 0.963 CORREL = 0.981651

7

y = 0.431 x + 253.5 R² = 0.976 Correl: 0.962877

As we can see the average value of the ratio of no. of females to the total population is 0.431. This shows that there the average female: male ratio is very low (~ 880 per 1000) as compared to 933 per 1000 of our country.

 The sex ratio in both talukas is much low which indicates higher incidences of female infanticide. Appropriate measures must be taken by the government and non-government organisations to save the girl child.

 Another reason for extremely low sex-ratio in urban taluka of Ulhasnagar could be the fact that bread owners of families from nearby villages might have moved to the cities and townships in search of better employment. Since most of such people would be men, the sex ratio tends to fall here.

 Also, owing to the increase in sex ratio for population under six as compared to adults we can say that female infanticide has reduced considerably. However the government must strive to completely abolish the practice.

8

WORKING POPULATION

Working population indicates the population which is eligible and willingly working. It consists of marginal workers and main workers, each of which is further divided into

Murbad:

TOT_WORK_P MAINWORK_P MARGWORK_P 87328 70885 16443 Percentage 81.17 18.83

Ulhasnagar:

TOT_WORK_P MAINWORK_P MARGWORK_P 155791 145770 10021 Percentage 93.57 6.43

 It can be seen that in urban regions higher percentage of population is involved in main working class (~ 93%) as compared to rural areas (~81%). This is rather obvious and expected as in towns and cities there are more jobs and opportunities which give permanent employment rather than on a daily-wage basis. Thus the percentage involved in main working population is much higher in the urban taluka as compared to rural talukas.

 On the contrary, in villages more people work as agricultural or other labourers on a daily wage basis, thus counting under marginal work force. Thus more percentage of population in rural areas forms marginal working population.

9

Murbad: (Graphs & Data Sets)

Main v/s Marginal Working Population

 We can see that a very less percent population (on an average ~ 30 %) in Murbad are marginal workers. but this is still higher than that of India which has ~22%.

 This shows that there are not adequate employment opportunities.

10

y = 0.783x R² = 0.667 Correl = 0.843758

The ratio of working females to males is about 0.783, which is quite good, as compared to urban regions in Ulhasnagar where it stands to be only 0.128. The high value of correl suggests that the relation between these parameters is quite strong.

 This can be attributed to the fact that in villages both men and women work in the fields or elsewhere, whereas in cities the man is mostly the sole bread earner, while the woman mostly deals with the household chores.

11

y = 0.643x + 40.62 R² = 0.499 Correl = 0.706845

 The ratio of main working population of females to males is about 0.643, which is not a very bad figure. The low value of R2 =0.499 suggests that this relation is not very accurate or consistent. However from the graph we can conclude that on an average for every 100 male, there are about 64 female main workers.

12

Ulhasnagar: (Graphs & Data Sets)

Main v/s Marginal Working Population

 Here, the percentage of marginal workers out of total working population is very less (~ 10 %). This is lesser when compared to both of Murbad (~30%) and of India (~22%).

 This shows that there are adequate employment opportunities in Ulhasnagar. Also we can interpret that there are more employment opportunities in urban than in rural areas.

13

y = 0.128x + 26.76 R² = 0.423

This graph shows the ratio of working male to female ratio. The ratio is 0.128 which is very less. Although the quality of fit is not very

 This can be attributed to the fact that in villages both men and women work in the fields or elsewhere, whereas in cities the man is mostly the sole bread earner, while the woman mostly deals with the household chores.

14

y = 0.898x - 20.23 R² = 0.984 correl = 0.992190132

From these graphs, we can infer that in Ulhasnagar almost as many as male are the female who are main workers and also the high value of correl and the quality of fit (R² = 0.984 correl = 0.99219013 ) signifies that the relation is consistent.

 This may be because in urban areas people may not look into the difference between male and female but instead look into the talent they possess and as the chance of being a main cultivator or labourer is less the jobs present are done equally efficiently by man as well as woman.

15

LITERACY

Census Data

Murbad Ulhasnagar INDIA INDIA INDIA 2011 Rural Urban 2011 2011 TOTAL 58.90% 74.50% 74.04% 68.90% 85.00% MALES 70.60% 79.00% 82.84% 78.60% 89.70% FEMALES 46.70% 69.30% 65.46% 58.80% 79.90%

This is the comparison of male and female literacy rates in the two talukas. The data of India is based on the current 2011 census. The data according to the 2001 numbers is stated below.

India India India 2001 Rural 2001 Urban 2001 TOTAL 64.80% 58.70% 79.90% MALES 75.30% 70.70% 86.30% FEMALES 53.70% 46.10% 72.90%

There is an enormous overall improvement in the literacy rate in all sections of the country between 2001 and 2011. A similar relation would exist in the literacy for small talukas such as Murbad and Ulhasnagar.

Let’s analyse the literacy rate patterns in rural and urban talukas and for males and females … … …

16

Literacy Rate in Murbad

 We can see that the average literacy rate in Murbad is around 60% which is good for a rural area. The national average for India for rural areas is 58.7% (in 2001).

 However, more measures should be taken to increase the literacy in villages and among females, since villages constitute about two-thirds of the nation.

17

Literacy Rate in Ulhasnagar

 The average literacy rate in Ulhasnagar is around 75-80% which is also lesser than the national average of 79 %( in 2001) for the urban area.

 Although, the female literacy is much high as compared to the national average, the male literacy is slightly low.

18

Comparison of Literacy Rate in Murbad & Ulhasnagar

(Green- Ulhasnagar, Blue- Murbad)

 The graph clearly demonstrates that the literacy in Ulhasnagar, on an average, is much higher than that of Murbad.  The literacy rate for an urban taluka is mostly greater than that of a rural taluka.

19

Male and Female Literacy:

Murbad:

Male Literate Population

Female Literate Population

20

 As we can see, female literacy is much less than male literacy in the rural taluka of Murbad.

Ulhasnagar:

Male Literate Population

Female Literate Population

21

 Again, the female literate population is much less than the male counterpart.

HOW LITERACY AFFECTS OTHER FIELDS

Murbad:

y = -0.845x + 1.156 R² = 0.301 Correl = -0.54928

 From the plot of Female Literacy Rate vs. Total Population under 6 per house hold we can see that the population under 6 decreases with increase in female literacy rate and also as the absolute value of correl is very high we can say that this data is more consistent.

 This is because when women become educated of the problems faced by the country with very high population they will generally go for family planning and birth control measures.

22

y = -0.228x + 0.666 R² = 0.098

Interpretation:

 A possible explanation is that as literacy rate increases, more people are studying and are not working. They opt for higher studies and thus even if eligible to work do not form a part of the working population. That shows the decrease of working population with increase in the literacy rate.

23

y = -0.228x + 0.560 R² = 0.043

 This graph can also be interpreted in the same way as above.

24

y = -0.214x + 0.372 R² = 0.031

y = -0.108x + 0.154 R² = 0.023

25

y = 0.142x - 0.007 R² = 0.065

Interpretation of above 3 graphs:

As the literacy increases, percentage of population engaged in agricultural and cultivation jobs decreases and other working population increases.

 Murbad has nearly all rural villages. Most of the people are engaged in agriculture/cultivation. Agriculture is the main occupation in the Indian villages As literacy rate increases, people prefer other works such as jobs in companies etc. which give them higher salary as compared to cultivation or working as agricultural labor. Thus we have lesser people in agriculture jobs and more in high end occupations.

 However this data does not have a very high goodness of fit and thus this relation is not very strong (R2<0.1).

26

Interpretation:

 Except one or two villages, the percentage of people working in household industry is very low, and in some villages zero too.

 The exceptions are Vadgaon and Goregaon in which unusually high number of people work in household industry.

27

y = 0.009x + 0.101 R² = 0.0001

 This graph has constant slope which could give the feeling that percentage of people taking marginal works is independent of literacy rate in rural areas.

 However the value for the fit of the line is extremely low and thus we cannot really conclude anything substantial from the data.

28

y = 0.046x + 0.028 R² = 0.006

 This graph has a slightly increasing slope. But it CANNOT be said that percentage of people taking Marginal cultivation increases with literacy. This is because the value of R2 is very low (0.006).

29

y = -0.024x + 0.053 R² = 0.002

 This graph can also be interpreted in the same way as above.

30

Interpretation for above two graphs:

 We can say on seeing the above two graphs that there are very less marginal household industry and other workers in rural areas.

 These graphs are also random and do not depend on literacy rate. There seems that no strong relation exists between literacy and these parameters.

31

Ulhasnagar:

y = -1.628x + 1.142 R² = 0.794 correl = -0.89128

From the plot of Female Literacy Rate vs. Total Population under 6 per house hold we can see that the population under 6 decreases with increase in female literacy rate and also as the positive value of correl is very high we can say that this data is more consistent. This is because when women become educated of the problems faced by the country with very high population they will generally go for family planning.

 So, it is necessary to increase the literacy of woman, this can be achieved by conducting seminars in villages and towns about its importance and also opening a large number of primary schools in those areas.

32

 By doing this we reduce the population of our country by which we can reduce many problems facing by our country such as malnutrition, unemployment.

y = -0.086x + 0.392 R² = 0.098

Interpretation:

 A possible explanation is that as literacy rate increases, more people are study and are not working. They opt for higher studies and thus even if eligible to work do not form a part of the working population. That shows the decrease of working population with increase in the literacy rate.

33

y = 0.015x + 0.297 R² = 0.002

 This graph can also be interpreted in the same way as above.

34

R² = 0.048

R² ~ 0

 As we can see a very low population of the urban area goes for cultivation/agricultural labor unlike the rural where these both are the main occupations. Also, much land is not available for cultivation in the urban area.

35

y = 0.007x + 0.005 R² = 0.003

 Also the main household workers in urban areas are less. It is also not much affected by literacy rate.

36

y = 0.007x + 0.292 R² = 0.000

 Most of the workers in urban areas do other works because one reason is that literacy rate is high. Most of the people do jobs.

 Also, much land is not available for cultivation in the urban area.

37

y = -0.101x + 0.095 R² = 0.226

 By this graph is clear that as literacy rate increases, people do not prefer marginal works which are temporary.

**MARG_CL and MARG_AL are not drawn as the urban wards have ~ 0 populations in those works.

38

R² ~ 0

 People doing marginal household works in urban areas are greater than those doing cultivation or agricultural labor, but in total, the number is still less.

 Also, as we can see from the graph it is independent of the literacy rate.

39

y = -0.100x + 0.092 R² = 0.261

 We can see that of all the marginal workers, more people do other works. Also as literacy rate increases, the number of marginal workers decreases. This might be because more literate people might want a more stable job rather than a marginal work.

40

CONCLUSIONS

From the data analysed we could draw out some key conclusions. We have listed some general trends common to both villages as well as conclusions for individual villages and then suggested remedial measures to improve the points that seemed to be lagging behind.

General Trends

 As female literacy and even literacy in general increases, the population under six years of age decreases. If people are educated they become aware of birth control measures and family planning measures. This helps to control the population.

 With increase in literacy, the total working population decreases. This is rather a weird result as one would expect literate people to be more eligible for employment. A possible explanation is that more literate people tend to opt for higher studies thus reducing the strength of the work force.

 With increase in literacy, the marginal working population rapidly decreases (although the relation is not very strong). This is because of the fact that when more percentage of the population is literate, they are more likely to secure permanent employment and rather form a part of the main working class.

41

Murbad

Key Conclusions

Positive conclusions:

 The average persons per household (~ 5.076) show a very good trend. This ensures that the people have access to adequate facilities like food, water, electricity etc.  The sex ratio for the overall population (~ 953) and especially for the population under six (~ 979) is highly greater than the national average. This reflects that gender discrimination and female infanticide are on the decrease in these regions.  The ratio of working females to males is quite high (0.783), which indicates a high rate of female employment in the taluka.  The sex ratio among scheduled castes and tribes is much better than the total population, which is a major positive indication.

Negative conclusions:

 A large percentage of the population works as marginal labourers (18.83%) in Murbad, which shows that they may lack permanent employment and could rather be working on daily-wages in fields or farms.  The literacy rate (58.90%) is alarmingly low, especially in females (46.70%). This shows largely the lack of female education and female empowerment. Although practices like female infanticide seem to be deteriorating, female education still seems to be far behind.

Remedial Measures

 The employment pattern must be shifted to include more persons in the main work force. This can be done by providing employment opportunities in villages, such as increasing government jobs etc.  Programmes must be conducted to educate the population, especially the females. Measures must be taken to improve literacy of the population. This is important since increasing literacy, will also help reduce the population of the region.

42

Ulhasnagar

Key Conclusions

Positive conclusions:

 The average persons per household (~ 4.556) show a very good trend. The figure is lesser than the national average. This ensures that the people have access to adequate facilities like food, water, electricity, education, employment etc.  The percentage of population in marginal work force is very less (6.43). This shows that most of the people have permanent employment since they form a part of the marginal working population. Thus a large proportion of the employed have permanent and mostly secured employment.  Although the literacy rate is quite high among females (69.30%) as compared to national standards, the overall literacy rate (74.50%) is still far behind as compared to the nation.

Negative conclusions:

 The sex ratio is alarmingly low (~ 881), much lesser than the rural taluka as well as the national average. However the sex ratio among the scheduled castes and tribes is much greater than the total, but still lesser than national figures.  The ratio of total working females to males is extremely low (0.128). This suggests a very low female employment rate. This shows that female empowerment is not high.

Remedial Measures

 Measures must be taken to increase the literacy. More schools could be set up and adult literacy programmes could be taken up. The literacy should be stepped up create better citizens for tomorrow.  Female empowerment must be given prime importance. Efforts must be taken to reduce cases of female infanticide and female foeticide by creating awareness and strictly regulating the law.  The proportion of females in the working population must be brought up. Jobs must be opened especially for women and women must be given equal opportunities and salaries.

43

Strengths and Weaknesses of the Analysis.

Strengths

 We have used best fit line method using linear regression and correlation values to analyse the data. This gives a general idea of the trends of various traditions.  Data has been compiled and converted into scatter plots graphs, graphs and histograms which help to analyse the data better.  Scanning through a large section of data such as the talukas of Murbad and Ulhasnagar, gives a rough idea about the parameters in nearby talukas and regions and various general trends in the country.

Weaknesses

 There may be error in the data. Also the data is not very large and thus it is not possible to draw conclusions for some trends.  In certain cases values of various fields were zero (such as P_SC or P_ST in certain wards of Ulhasnagar) and thus it became difficult to readily analyse the data and come to a standing conclusion.  In many cases the quality of the best fit was not good enough to make any reasonable conclusions from the data.

REFERENCES

 Census data for the talukas of Ulhasnagar and Murbad provided.  Census data of India (2001 and 2011), from various sources on the internet.  Wikipedia : http://www.en.wikipedia.org

44