1

Module Detail and its Structure

Subject Name Sociology

Paper Name Methodology of Research in Sociology

Module Name/Title Processing and Analyzing Quantitative Data

Module Id RMS 20

Pre-requisites Some knowledge on social

Objectives This module will deal with the issues involved in the process of handling, managing and interpreting quantitative data collected in the process of research. It will also discuss about the basic statistical tools with the help of which we analyse social phenomena. Keywords Coding, editing, statistics, quantitative research, measures of , dispersion, coefficient correlation and regression.

Role in Content Name Affiliation Development Principal Investigator Prof. Sujata Patel Dept. of Sociology, University of Hyderabad Paper Co-ordinator Prof. Biswajit Ghosh Professor, Department of Sociology, The University of Burdwan, Burdwan 713104 Email: [email protected] Ph. M +91 9002769014 Content Writer Dr. Udita Mitra Assistant Professor, Department of Sociology, Shri Shikshayatan College, Kolkata-700095 Email: [email protected] Ph. M +91 9433213816 Ph. L (O) 033-24140594 Content Reviewer (CR) Prof. Biswajit Ghosh Professor, Department of Sociology, The & Language Editor University of Burdwan, Burdwan 713104

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

2

Contents

1. Objective ...... 3 2. Introduction…………………………………………………………………………………….3 3. Learning Outcome ...... Error! Bookmark not defined. 4. Data Processing ...... Error! Bookmark not defined. 4.1 Editing ...... 3 4.2 Coding……………………………………………………………………………………….....3 4.3 Classification ...... 4 4.4. Tabulation...... 4 Self-check exercise – 1...... 4 5. Data Analysis…………………………………………………………………………………..5 6. Statistics in Social Research…………………………………………………………………...5 Self-check exercise – 2...... 6 6.1 Measures of Central tendency ...... 6 6.2 Measures of Dispersion ...... 9 6.3 Chi-Square Test ...... 13 6.4 T-test...... 15 6.5 Measures of Relationship…………………………………………………………………….18 Self-check Exercise - 3…………………………………………………………………….....22 7. Limitations of Statistics in Sociology…………………………………………………………23 8. Summary...... 23 9. References ...... 25

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

3

1. Objective

This module will deal with the issues involved in the process of handling, managing and interpreting quantitative data collected in the process of research. It will also discuss about the basic statistical tools with the help of which we analyse social phenomena.

2. Introduction

Quantitative research can be construed as a research strategy that emphasizes quantification in the collection and analysis of data. It entails a deductive approach to the relationship between theory and research in which the accent is placed on testing the theories. Quantitative research usually incorporates the practices and norms of the natural scientific model and of positivism in particular and it also embodies a view of social reality as an external, objective reality (Bryman 2004: 19). It also has a preoccupation with measurement and involves a process of collecting large amount of data. These data may be collected through various ways like survey and field research. The data, after collection, have to be processed in order to ensure their proper analysis and interpretation. According to Kothari (2004), technically, processing implies editing, coding, classification and tabulation of collected data so that they are amenable to analysis. These endeavours help us to search for patterns of relationship that exist among data-groups (Ibid.: 122).

3. Learning Outcome

This module will help you to understand different issues involved in processing and analysing quantitative data. It will also help you to grasp the essential steps of applying various statistical measures in order to interpret data collected through social research.

4. Data Processing

Data reduction or processing mainly involves various steps necessary for preparing the data for analysis. These steps involve editing, categorising the open-ended questions, coding, computerization and preparation of tables (Ahuja 2007: 304). The processing of data is an essential step before analysis because it enables us to overcome the errors at the stage of .

4.1. Editing

According to Majumdar (2005), error can come in at any stage of social research especially in the stage of data collection. These errors have to be kept at a minimum level to avoid errors in the results of the research. Editing or checking for errors in the completed is a laborious exercise and needs to be done meticulously. Interviewers tend to commit mistakes like some questions are missed out; some answers remain unrecorded or are recorded at the wrong places. The questionnaires therefore need to be checked for completeness, accuracy and uniformity (Ibid.: 310).

4.2. Coding

Coding implicates the process of assigning numbers or other symbols to answers so that they can be categorized into specific classes. Such classes should be appropriate to the research problem under consideration (Kothari 2004: 123). Careful consideration should be made so as not to leave out any response uncoded. According to Majumdar (2005: 313), a set of categories is referred to as “coding frame” or “code book”. Code book explains how to assign numerical codes for response categories Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

4

received in the /schedule. It also indicates the location of a variable on computer cards. Ahuja (2007: 306) provides an example to illustrate how variables can be coded. In a question regarding the religion of the respondent the answer categories of Hindu, Muslim, Sikh, and Christian can be coded as 1, 2, 3, and 4 respectively. In such cases, the counting of frequencies will not be according to Hindus, Muslims etc., but as 1, 2 and so on. Coding can be done manually or with the help of computers.

4.3. Classification

Besides editing and coding of data, classification is another important method to process data. Classification has been defined as the process of arranging data into groups and classes on the basis of some common characteristics (Kothari 2004: 123). Classification can be of two types, namely

 Classification according to attributes or common characteristics like gender, literacy etc., and

 Classification according to class intervals whereby the entire of data is divided into a number of classes or class intervals.

4.4. Tabulation

Tabulation is the process of summarising raw data and displaying the same in compact form for further analysis (Kothari 2004: 127). The necessity of tabulating raw data is:

 It conserves space and reduces explanatory and descriptive statement to a minimum, and

 It provides a basis for various statistical computations.

Tabulation can be done manually as well as with electronic and mechanical devices like computers. When the data are not large in number, tabulation can be done by hand with the help of tally marks.

Self check exercise – 1

Question 1. Tabulate the following examination grades for 80 students.

72, 49, 81, 52, 31,38,81, 58,68, 73, 43, 56, 45, 54, 40, 81, 60, 52, 52, 38, 79, 83, 63, 58, 59, 71, 89, 73, 77, 60, 65, 60, 69, 88, 75, 59, 52, 75, 70, 93, 90, 62, 91, 61, 53, 83, 32, 49, 39, 57, 39, 28, 67, 74, 61, 42, 39, 76, 68, 65, 58, 49, 72, 29, 70, 56, 48, 60, 36, 79, 72, 65, 40, 49, 37, 63, 72, 58, 62, 46 (Levin and Fox 2006).

Procedures for Tabulation/Grouping of Data

The above is an array of scores which otherwise would not be very handy to use. In order to make the data meaningful and useful it must be organized and classified into frequency tables. There are certain easy steps to be followed in order to convert the raw scores into frequency tables. i. We must first find the difference between the highest and the lowest score in the series. In the above case the difference is 65 (93-28). To it we must add 1 to bring in the entire range of scores. So it becomes 66. ii. Next, we would have to assume the number of class intervals that would best summarise the entire range of scores. In this case we assume the number of intervals as 10.

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

5

iii. Now we would divide the range of scores by the number of class intervals to obtain the width (denoted as i) of the class interval. Here it would be 6.6, that is 6 or 7 approximately. iv. To the lowest score in the series we add (i – 1) to get the first class interval. In this case it would be 28+ (7-1) or 28 to 33. v. We take the higher integer from the upper limit of the class interval and repeat step iv to get the next class interval. In this way we would obtain the class intervals and put the frequencies in the respective class intervals (Elifson 1997).

Answer: The complete class interval of examination grades for 80 students is the following:

Class Interval Frequencies 28- 33 4 34-39 7 40-45 5 46-51 6 52-57 9 58-63 16 64-69 7 70-75 12 76-81 6 82-87 3 88-93 5 N = 80

5. Data Analysis

The term ‘data analysis’ refers to the computation of certain indices or measures along with searching for patterns of relationship that exist among the data groups. Analysis, particularly in case of survey or experimental data (quantitative data), involves estimating the values of unknown parameters of the and testing of hypothesis for drawing inferences (Kothari 2004: 130). Quantitative data analysis occurs typically at a late stage in the research process. But this does not that the researchers should not be considering how they will analyse their data at the beginning of the research. During the designing phase of the questionnaire or observation schedule, the researchers should be fully aware of the techniques of data analysis. In other words, the kinds of data the researchers will collect and the size of the sample will have implications for the sorts of analysis that would be applied for (Bryman 2004).

6. Statistics in Social Research

The task of analysing quantitative data in research is done by . Social statistics has two major areas of function in research. They are namely Descriptive and Inferential. is concerned with organizing raw data obtained in the process of research. Tabulation and classification of data are instances of descriptive statistics. Inferential statistics is concerned with making inferences or conclusions from the data collected from the sample and drawing generalisations on the entire population (Elifson 1997). Inferential statistics is also known as statistics and it is concerned with two major types of problems:

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

6

 the estimation of population parameters, and

 the testing of statistical hypothesis (Kothari 2004: 131)

Some of the most important and useful statistical measures that would be taken up for discussion in the present module are:  measures of central tendency or statistical averages

 measures of dispersion

 chi-square test

 t-test

 measures of relationship

From the next section we are going to take up each for discussion.

Self Check Exercise – 2

1. How does descriptive statistics work?

Descriptive statistics tries to describe and summarize the mass of data that is obtained in the process of conducting research. It tries to do so with the help of some specific measures. The very first step of organizing data would be to arrange the raw scores into a number of categories known as frequency tables. After it is done, the next step would be to represent the data through various graphs and figures. Some of these would be bar graph, , frequency polygon etc.

2. What is inferential statistics?

Inferential statistics deals with the task of drawing inferences on the population by studying the sample drawn from that population. The reasons why we infer on the findings of a sample can be many. Insufficient resources in terms of money and man power can force a researcher to draw a sample from the population. Time available for a research may also be short and inadequate to study an entire population. Statistics can be of great help in generalizing findings. It needs to be mentioned here that error(s) inevitably appears in the process of sampling, but researchers may adopt various methods to minimize those. The prefix ‘social’ is attached to statistics due to its application to interpret social phenomena.

6.1. Measures of Central Tendency

When the scores have been tabulated into a , the next task is to calculate a measure of central tendency or central position. The measure of central tendency defines a value around which items have a tendency to cluster. The importance of the Measure of Central Tendency is twofold. First, it is an “average” which represents all the scores in a distribution and gives a precise picture of the entire distribution. Second, it enables us to compare two or more groups in terms of typical performance. Three “averages” or measures of central tendency are commonly used: , and (Garrett 1981: 27).

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

7

i) Arithmetic Mean: Mean is known as arithmetic average and is the most stable measure of central tendency. It is defined as the summation of all the values given in the series of numbers divided by the number of values. Mean can be calculated through different methods:

a) Calculation of the Mean from Ungrouped Scores: This can be computed by the following equation: ∑ 푓 + 푓 +푓 푋= 1 2 3……………… 푓푛 where ‘f’ is the frequency and ‘n’ is the number of scores. 푛

In the case of the following scores, the mean can be found out by the above formula (Garrett 1981). 8, 5, 4, 7, 9, 10 8+5+4+7+9+10 푋= = 7 6

b) Calculation of the Mean from Grouped Scores: In case of computing the mean from a grouped frequency distribution, the mean is calculated by a slightly different method from that given above. Thus, it can be computed by the following formula:

∑푓푋 푋 = where ‘X’ is the midpoint of the class intervals and ‘f’ is the frequency assigned to each class 푛 interval, ∑ is the summation operator and ‘n’ is the total frequency. The calculation is shown in the table below (see Garrett 1981, for details):

Class Intervals Frequencies Midpoint (X) fX 140-144 1 142 142 145-149 3 147 441 150-154 2 152 304 155-159 4 157 628 160-164 4 162 648 165-169 6 167 1002 170-174 10 172 1720 175-179 8 177 1416 180-184 5 182 910 185-189 4 187 748 190-194 2 192 384 195-199 1 197 197 ∑ N = 50 ∑ fX=8540

∑푓푋 8540 The Mean will be = or, 170.8. 푛 50

ii) Median: Median is the middle most value in the entire distribution of data. It divides the distribution into two equal parts: one half of the distribution falls below the median value and the other half falls above it. Before calculating the median we have to arrange the values in either ascending or descending order. It is a positional average. It is shown by the following formula: 푛+1 M= Value of ( )th item 2 It should be mentioned in this context that the median is usually used to describe qualitative phenomena like intelligence. It is not often used in sampling statistics (Kothari 2004: 133).

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

8

a) Computation of the Median when data are Ungrouped: Two situations arise in the computation of the Median from ungrouped data: a) when N is odd, and b) when N is even. To consider the first case where N is odd, suppose we have the following numbers: 7, 10, 8, 12, 9, 11, 7. First we have to arrange these data in an ascending order like 7, 7, 8, 9, 10, 11, 12. Then we apply the above equation to compute the median. 푛+1 M= Value of ( )th item where ‘n’ is the number of scores. 2 7+1 8 = = = 4th item 2 2 M= 9. When the total number of scores is even like 7, 8, 9, 10, 11, 12, the median is the average of the two middlemost numbers. In the above numbers, the two middlemost numbers are 9 and 10. The average of these numbers is 19/2 or 9.5.

b) Computation of the Median when data are Grouped: When the scores are arranged into a frequency distribution, the median by definition is the 50% point in the distribution. We calculate the cumulative frequency of the distribution and divide N by 2 to locate the class interval in which the median falls. The following equation would help us to compute the median from a grouped frequency distribution: 푁 − 퐹 Mdn = l + (2 ) 푖 where l is the exact lower limit of the class interval upon which the median lies. 푓푚

N/2 one half of the total number of scores, F is the sum of the scores on all intervals below l, and fm is the frequency within the interval upon which the median falls and i is the width of the class interval.

The computation of the median is shown in the following table: Class Intervals Frequencies Cumulative Frequencies 140-144 1 1 145-149 3 4 150-154 2 6 155-159 4 10 160-164 4 14 165-169 6 20 170-174 10 30 175-179 8 38 180-184 5 43 185-189 4 47 190-194 2 49 195-199 1 50 N = 50

When we divide N or 50 by 2 we get 25. We locate the class interval with the help of it and locate it as 170-174 (since 30 would include 25). Next we compute the median with the help of the equation above: Here ‘l’ would be (170- 0.5) = 169.5 25−20 Mdn = 169.5 + ( ) 5 10 = 172.

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

9

iii) Mode: When a rough and quick estimate of central tendency is wanted, mode is usually the most preferred measure. Mode is that value which has the greatest frequency in the given series of scores. Like median, mode is also a positional average and is therefore unaffected by extreme scores in the series of numbers. It is useful in all situations where we want to eliminate the effect of extreme variations (Kothari 2004: 133).

a) Calculating Mode from Ungrouped Data: In a simple ungrouped data, the mode is that single measure or score which occurs most frequently. For instance in the series of the numbers 10, 11, 11, 12, 12, 13, 13, 13, 14, 14, the crude mode is 13 (the most frequented one).

b) Calculating the Mode from : When the data are grouped into a frequency distribution, the crude mode is found out by the midpoint of the interval which contains the highest frequency. In the case of the above table, the value of the mode would be 172 (the midpoint of the class interval 170-174 (Garrett 1981). We can also calculate the true mode from a grouped frequency distribution. The formula for calculating the true mode in a normal or symmetrical distribution is:

Mode = 3 Mdn – 2 Mean (ibid).

iv) When to Use the Various Measures of Central Tendency: The situations in which the three measures are used are stated below:

a) The Mean is used when

 The scores are distributed symmetrically around a central point  The central tendency having the greatest stability is wanted  Other statistics like and correlation coefficient are to be computed later.

b) The Median is used when

 The exact midpoint of the distribution is all that is wanted  There are extreme scores which affect the mean but they do not affect the median.

c) The Mode is used when

 A rough and quick estimate of central tendency is all that is wanted  The measure of central tendency should be the most typical value (Garrett 1981).

The choice of average depends on the researcher and the objectives of the study. Only then will be the statistical computation of averages be effective and useful in interpretation of data.

6.2. Measures of Dispersion (Range, , Mean Deviation or Average Deviation and Standard Deviation)

Measures of central tendency like mean, median and mode can only be a representative of the entire series of scores. But it cannot fully describe the nature of a frequency distribution. For instance it cannot state how far a given score in a series deviates from the average. In other words, how much a Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

10

score is lower or higher than the average? Therefore, in order to measure this spread of score from the central tendency, we calculate the measures of dispersion or variability. There are different measures of dispersion. They are the range, mean deviation and standard deviation.

i) Range: Range is the simplest and the easiest measure of variability. It is usually calculated by subtracting the lowest score from the highest score in the given series of data. The value of the range depends on only two values and this is its main limitation. It ignores the remaining values in the distribution and therefore it fails to provide an accurate and stable picture of the dispersed scores.

a) Range for Ungrouped Data: In a distribution of ungrouped scores, if the scores are arranged in an array, the range is defined as the largest score minus the smallest score plus one.

Range = (Highest value of an item in a series) ─ (Lowest value of an item in a series) +1

In a distribution that has 103 as the highest score and 30 as the lowest score, the range is computed as range = (103- 30)+1 = 74 (Leonard 1996).

b) Range for Grouped Data: In case of grouped data, the range is the difference between the upper true limit of the highest class interval and the lower true limit of the lowest class interval. Let us look into the following data: Class Interval Frequency 31-33 3 34-36 0 37-39 1 40-42 5 43-45 7 46-48 6 49-51 24 52-54 18 55-57 14 58-60 15 61-63 16 64-66 7

In case of the above data, the upper true limit of the highest class interval is 66.5 (64-66) and the lower true limit of the lowest class interval is 30.5 (31-33). Therefore, the range would be 66.5- 30.5=36. Here, 1 is not added because the difference is between the two true limits (Leonard 1996). Please note that range does not represent the entire series of scores as its computation requires only the two extreme values.

ii) Mean Deviation or Average Deviation: It is the average of difference of the values of items from some average of the series (Kothari 2004: 135). It is based on absolute deviations of scores from the centre (Leonard 1996). This procedure is designed to avoid the algebraic sum of deviations from the mean equalling zero, in which case it would be impossible to compute indices of variability.

a) Average Deviation for Ungrouped Scores:

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

11

∑|푋−푋| Mean Deviation from Mean = where X denotes a particular score and 푋 the mean of the 푛 scores, n stands for total number of frequencies. Let us look into the calculation of following scores: Observation No. X 푋 |푋 − 푋| or x 1 26 16 10 2 24 16 8 3 22 16 6 4 20 16 4 5 18 16 2 6 16 16 0 7 14 16 2 8 10 16 6 9 6 16 10 10 4 16 12 N= 10 ∑ 푋 = 160 16 ∑|푋 − 푋| = 60

For the above scores, we first calculated the mean which is 16 (160/10). Then, we have subtracted the mean from the scores in order to know their deviation and ignored the sign of the scores. After this, the absolute scores have been summed up (60). To find out the average deviation, we divided 60 by n or 10 and obtain 6. Here 6 is our mean deviation (Ibid. 1996).

b) Average Deviation for Grouped Data: The formula for calculating the average deviation is ∑푓|푋− 푋| A.D. = 푁 The average deviation or mean deviation from the grouped data is calculated below: Class Intervals Midpoints (m) Frequencies (f) mf x= |푋 − 푋| Fx 140-144 142 1 142 -28.8 28.8 145-149 147 3 441 -23.8 71.4 150-154 152 2 304 -18.8 37.6 155-159 157 4 628 -13.8 55.2 160-164 162 4 648 -8.8 35.2 165-169 167 6 1002 -3.8 22.8 170-174 172 10 1720 1.2 12 175-179 177 8 1416 6.2 49.6 180-184 182 5 910 11.2 56 185-189 187 4 748 16.2 64.8 190-194 192 2 384 21.2 42.4 195-199 197 1 197 26.2 26.2 ∑N= 50 ∑mf= 8540 ∑fx= 502 Mean or 푋 of the above group of scores is 8540/50= 170.8. The rest of the calculations have been shown in the table. Therefore A.D. would be 502/50 or 10.04.

iii) Standard Deviation: Standard Deviation (S.D) is the most stable measure of dispersion or variability. It is defined as the square root of the average of the squares of deviations when such

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

12

deviations for the values of individual items in a series are obtained from the arithmetic average. In finding the S.D, we avoid the difficulty of signs by squaring the separate deviations (Garrett 1981).

a) Standard Deviation for Ungrouped Scores: The formula for computing S.D. from the ungrouped

∑ 푋2 scores is σ (S.D) = √ where ‘X’ is the value of the deviations of the scores from the mean and 푁 ‘N’ is the total of frequencies given.

We can calculate standard deviation from the scores below in the following manner (Leonard 1996) X X2 2 4 2 4 4 16 6 36 8 64 14 196 20 400 N= 56 ∑X2= 720 If we find the square root of 12.85, we will get standard deviation. So √12.85 or 3.58 is the S.D.

b) Standard Deviation for Grouped Data: The following is the formula for computing standard deviation for grouped data.

∑푓푥2 Standard Deviation for grouped data (σ) = √ where f stands for the individual frequency, ‘x’ is the 푁 value of the deviation of the individual scores from the mean and N stands for the total frequency (Garrett 1981). The calculation is shown in the table below: Class Interval Midpoints(X) Frequency (f) f(X) x=|푋 − 푋| 푥2 f푥2 (1) (2) (3) (4) (5) (6) (7) 140-144 142 1 142 -28.80 -28.80 829.44 145-149 147 3 441 -23.80 -71.40 1699.32 150-154 152 2 304 -18.80 -37.60 706.88 155-159 157 4 628 -13.80 -55.20 761.76 160-164 162 4 648 -8.80 -35.20 309.76 165-169 167 6 1002 -3.80 -22.80 86.64 170-174 172 10 1720 1.20 12 14.40 175-179 177 8 1416 6.20 49.60 307.52 180-184 182 5 910 11.20 56 627.20 185-189 187 4 748 16.20 64.80 1049.76 190-194 192 2 384 21.20 42.40 898.88 195-199 197 1 197 26.20 26.20 686.44 N=50 8540 ∑푓푥2=7978

The mean score of the above distribution of scores is 8540/50 or 170.80.

∑푓푥2 7978 The computed value of σ is √ or √ or 12.63. 푁 50

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

13

iv) When to Use the Various Measures of Variability: The rules for using the measures of dispersion are as follows:

a) Range can be used when  the scores are scanty in number or are too dispersed  a knowledge of the extreme scores or total spread of scores are wanted.

b) Average Deviation can be computed when  it is desirable to weigh all deviations from the mean according to their size  extreme deviations would influence the S.D. unduly.

c) S.D. is to be used when  the having the greatest stability is wanted  coefficient of correlation and other statistics are subsequently to be computed (Garrett 1981).

6.3. Chi-square Test

The Chi-square test is an important one among several tests of significance developed by statisticians. It is symbolically written as 푥2 and can be used to determine if categorical data shows dependency or the two classifications are independent. It can be used to make comparisons between theoretical and actual data when categories are used. The test is, in fact, a technique by use of which it is possible for all researchers to test a) , and b) test of significance of association between two attributes (Kothari 2008).

a) Test of Goodness of Fit: As a test of Goodness of Fit, Chi-square enables us to see how well the theoretical distribution fit to the observed data. If the calculated value of 푥2 is greater than its table value at a certain level of significance, the fit is considered to be a good one. When the calculated value of 푥2 is less than the table value, we do not consider the fit to be a good one (Kothari op. cit).

Illustrative Problem

Given below is the data on the number of students entering the University from each school.

School 1 – 22, School 2 – 25, school 3 – 26, School 4 – 28, School 5 – 33.

Is there a difference in the quality of school? N=50

In the case of the above data the most suitable technique of statistical application would be chi-square goodness of fit test because the data are at the nominal level and the hypothesis is to be tested on one variable, that is, the quality of schools on the basis of the prospect of entering the University from each school.

The steps for calculating the chi-square are shown below.

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

14

1. Stating the Null and the Alternative Hypothesis: The null hypothesis assumes that there is no difference in the quality of the schools. Whereas the alternative hypothesis would state that there is a difference in the quality of the schools. 2. Choice of a Statistical Test: As has been stated above, the appropriate statistical test applicable here would be Chi-square goodness of fit test. 3. Level of Significance and Sample Size: Here the level of significance would be 0.5, that only 5 times in 100. The sample size is 50. 4. One versus the two tailed test: It is a two tailed test because no direction is indicated in the alternative hypothesis. It only suggests that there is a difference in the number of students entering the University from each school. 5. The : The sampling distribution is a function of the degrees of freedom which are quantities that are free to vary. Here it can be computed by (k-1) where ‘k’ is the number of categories into which observations are divided. Here there are 5 categories, that means degrees of freedom (df) = (5-1) = 4. 6. The Region of Rejection: The point of intersection of the ‘df’ and the level of significance gives the critical value of 푥2 which is 9.488. The computed value of the chi-square has to be greater than the table value, so as to reject the null hypothesis. It is computed by the formula:

(푂 퐸 2 풙ퟐ= ∑ 푓− 푓) 퐸푓

where Of is the observed frequencies of each and Ef is the expected frequency. In ideal situation, from each school there would be 10 students selected in the University, therefore our expected frequencies in each case would be 50/5 = 10. The computation of the 푥2 is shown in the table below:

2 2 Schools Of Ef Of–Ef (Of – Ef) (푂푓 − 퐸푓)

퐸푓 1 22 10 12 144 14.4 2 25 10 15 225 22.5 3 26 10 16 256 25.6 4 28 10 18 324 32.4 5 33 10 23 529 52.9 2 ∑ (Of – Ef) /N= 147.8

Since the computed value of chi-square is 147.8, which is greater than its table value of 9.488, the alternative hypothesis is upheld that there are differences in the quality of schools. This is understood from the different number of students entering the University from each school.

b) Chi-square Test of Independence: As a test of independence, chi square test enables us to explain whether or not two attributes are associated. If the table value of chi is greater than its computed value, we can conclude that there is no association between the attributes, that is, the null hypothesis is upheld. But if the computed value of chi is greater than its table value, we uphold that the two attributes are associated and the association is not due to chance factors but it exists in reality (Kothari 2008). For the test of association, the formula for computing the chi-square remains the same as above.

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

15

Illustrative Problem Let us look into the following data: Level of Job Satisfaction Union Membership Not Satisfied Satisfied Total No 75 (A) 125 (B) 200 Yes 65 (C) 135 (D) 200 Total 140 260 400

From the above data, we have to find out if a relation exists between the two variables.

Here, we will apply a chi square test of independence because the data are at the nominal level and there are two variables in the data, namely job satisfaction and union membership. The steps 1 to 6 are to be written in the same manner as above. Only the sample size is 400. The degrees of freedom will be computed by (c-1)(r-1)where ‘c’ is the number of columns and ‘r’ means the number of rows into which observations are divided. Here the degrees of freedom (df) = (2-1)(2-1) = 1. The point of intersection between the ‘df’ and the level of significance (0.05) gives the critical or table value of 푥2 which is 3.841. The computed value of the chi-square would have to be more than its table value in order to reject the null hypothesis.

Next, we calculate the expected frequencies against each observed frequency by the following formula: (퐴+퐵)(퐴+퐶) 200 푋 140 Cell A = = = 70 푁 400

(퐵+퐴)(퐵+퐷) 200 푋 260 Cell B = = = 130 푁 400

(퐶+퐷)(퐶+퐴) 200 푋 140 Cell C = = = 70 푁 400

(퐷+퐶)(퐷+퐵) 200 푋 260 Cell D = = = 130 푁 400

Now we would compute the value of Chi-square in the following table: 2 2 Cell Of Ef Of – Ef (Of – Ef) (푂푓 − 퐸푓)

퐸푓 A 75 70 5 25 0.35 B 125 130 -5 25 0.19 C 65 70 -5 25 0.35 D 135 130 5 25 0.19 1.08

Since the computed value of Chi-square (1.08) is less than its table value (3.841), therefore the alternative hypothesis is rejected. It may hence be argued that there is an association between job satisfaction and union membership. It appears that the chi-square test is one of the most frequently used tests, but it should be applied correctly in situations where an individual observations of sample are independent (Kothari 2008: 295).

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

16

6.4. T-test

The states that, if sample size N is large, the sample statistic approaches the Z distribution (explained above). When a sample is taken from a normally distributed population with a known mean (µ) and standard deviation (σ), and then compute a z-score on the basis of each observation, the resulting scores will have a z – distribution, that is, a normal distribution with mean = 0 and standard deviation = 1. But the problem is that in most of the cases, the population standard deviation is unknown. As the Central Limit Theorem involves the use of the standard deviation, it cannot be ignored. One solution here is to substitute the sample standard deviation (s1) for the population standard deviation (Vito and Latessa 1989). To test the samples of small size, we have the “t” statistic. T-test can be of two types, namely – two sample t - test and related sample t – test. The type of test chosen will depend upon whether or not the two samples are independent or related. Related samples occur when -  both samples have been matched according to some trait like race or gender, or  repeated measurements of the same sample are taken (before – after or design) (Ibid. 1989).

a) Two Sample t – test: When two samples are to be tested on any trait or variable, then we apply for a two sample t- test. The formula for computation of the value of t is as follows: 푋 − 푋 t = 1 2 where 푋 is the mean of the first sample, 푋 is the mean of the 푛 푠 2+ 푛 푠 2 푛 +푛 1 2 √( 1 1 2 2 )( 1 2) 푛1+ 푛2−2 푛1푛2 second sample, s1 and s2 are the standard deviations of the first and the second samples respectively

and n1 and n2 are the sample sizes of the two samples respectively.

Illustrative Problem

The data for two schools have been provided below:

State Funded School

N1 =20 푋1 = 64 S1 = 18.5

Private Schools

N2 = 24 푋2 = 46 S2 = 18.5 (C.U. 2001)

The steps for computing the value of t would be summarized below.

1. Stating the Null and the Alternative Hypothesis: The null hypothesis assumes that there would be no differences in the samples. The alternative hypothesis assumes a difference between two samples. 2. Choice of Statistical Test: The statistical test chosen is the two sample t-test. 3. Level of Significance and Sample Size: The level of significance is .05 which means that 5 times in 100, we can reject the null hypothesis incorrectly or 5 times in 100 our result can be due to chance. The sample sizes are 20 and 24 respectively. 4. One Versus Two Tailed Test: It is a two tailed test because no direction is implied in the alternative hypothesis. It only suggests a difference between two sample means.

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

17

5. The sampling Distributions: It is the function of the degrees of freedom that means the

quantities which are free to vary. It can be calculated by the formula (N1+ N2 – 2). Here would be (20 + 24 – 2) or 42. 6. The Region of Rejection: The point of intersection between the degrees of freedom and the level of significance gives the table value of ‘t’. Here the critical or table value of “t” would be 1.684. The computed value of “t” has to be more than this in order to reject the null hypothesis. The computed value of “t” can be found out from the formula given above. We just substitute the values in it.

푋 − 푋 t = 1 2 푛 푠 2+ 푛 푠 2 푛 +푛 √( 1 1 2 2 )( 1 2) 푛1+ 푛2−2 푛1푛2 64−46 = = 3.16. 20(342.25)+24(342.25) 20+24 √( )( ) 20+24−2 20푥24

Since the computed value of “t” is 3.16 and is greater than its critical value which is 1.684, the alternative hypothesis is upheld. In other words, there are significant differences between the two school systems.

b) T-test for Related Samples: This is applicable when there are repeated measurements of the same sample (time series design). The formula for computing the value of “t” for related samples is:

푋1− 푋2 t = where again 푋1 and 푋2 are the two values of the mean of the samples 푆퐷 respectively and 푆퐷 is the estimation of the of the Mean Difference scores. The standard error is calculated by the formula

2 푠퐷 2 SD = √ where SD is the pooled of the different scores and N is the total 푁 number of scores. The pooled variance is computed by the formula 2 2 2 푁 ∑ 퐷 − ∑(퐷) SD = where D is the difference between the two mean of the related sample 푁(푁−1) (Vito and Latessa 1989).

Illustrative Problem

The governor of Florida wants a report on the effects of the death penalty. Homicide rates (per 100,000 population) in Florida cities, two weeks before and two weeks after an execution are noted below (Vito and Latessa 1989): City Rate Before Rate After D = Test 2–Test 1 D2 (Test 1) (Test 2) Pompano Beach 23 19 -4 16 Tallahassee 15 16 1 1 Tampa 12 18 6 36 Miami 20 17 -3 9 Orlando 13 11 -2 4 83 81 -2 66

At first we would calculate the mean of the two tests:

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

18

83 푋 = = 16.6 1 5 81 푋 = = 16.2 2 5 Now, we would calculate the population variance of difference scores. The formula is: 2 2 2 푁 ∑ 퐷 − ∑(퐷) SD = 푁(푁−1) 5(66)−(−2)2 = 5 (5−1) = 16.3 Next, we would calculate the Population Standard Error of the Mean Difference Scores, the formula for which is:

2 푠퐷 16.3 SD = √ = √ = √3.26 = 1.80 푁 5

From the above values we calculate the value of “t” as: 푋 − 푋 16.6−16.2 t = 1 2 = = 0.22 푆퐷 1.80

The steps for computing the value of ‘t’ are the same as above. Here the sample size is 5. The sampling distribution would be calculated by (N-1) or (5-1) or 4. The point of intersection between the degrees of freedom and the level of significance (0.05) gives the table value of ‘t’. Here the critical or table value of “t” would be 2.776. The computed value of “t” has to be more than this in order to reject the null hypothesis. The computed value of “t” is found out from the above formula. We have just substituted the values in it and found out “t” to be 0.22. Since the computed value of t is less than its table value, the null hypothesis is upheld. Our findings can be due to chance factors.

6.5. Measures of Relationship (Correlation co-efficient, Simple and Bivariate Contingency Tables)

The statistical measures discussed before have dealt with univariate population that is, the population which have one variable as their characteristic feature. But cases of observations based on two variables are known as bivariate relationships. If for every measurement of a variable X, we have corresponding value of a second variable, Y, the resulting pairs of values are called a bivariate population. We have to answer two types of questions in bivariate population:

 Does there exist an association or correlation between two variables? If yes, to what degree?

 Is there any cause and effect relation between the two variables (Kothari 2004: 138)

i) Coefficient of correlation or simple correlation: It is the most widely used method of measuring the degree of relationship between two variables. At times we want to know if there is a relation between the variables incidence of child labour and broken homes or that between drug addiction and involvement into criminal activities. In all such cases, it would be appropriate to use coefficient correlation. The Pearson correlation coefficient or Pearson’s ‘r’ (also known as Pearson product- coefficient correlation) is a measure of the straight line relationship between two interval- level variables (Elifson 1997). To employ Pearson’s correlation coefficient correctly as a measure of association between X and Y variables, the following requirements must be taken into account:

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

19

 Interval data: Both X and Y variables must be measured at the interval level so that scores may be assigned to the respondents

 Normally distributed characteristics: Testing the significance of Pearson’s ‘r’ requires both X and Y variables to be normally distributed in the population (Levin and Fox 2006: 357).

Computation of the Pearson’s ‘r’ by Mean Deviation Method: The mean deviation computational equation for ‘r’ is: ∑(푋−푋)(푌− 푌) ‘r’= where by X and Y stand for the variables respectively and 2 2 √∑(푋−푋) (푌−푌) 푋 and 푌 refer to the deviation of the scores from the mean. The calculation would be shown in the following table. An effort would be made here to find out the nature and strength of relationship between the variables mothers’ education and daughters’ education (Elifson 1997).

2 2 Respondents Mother’s (푋 − 푋) (푋 − 푋) Daughter’s (푌 − 푌) (푌 − 푌) (푋 − 푋)(푌 − 푌) (1) education education (3) (4) (6) (7) (8) (X) (2) (Y) (5) A 1 -6 36 7 -6 36 36 B 3 -4 16 4 -9 81 36 C 5 -2 4 13 0 0 0 D 7 0 0 16 3 9 0 E 9 2 4 10 -3 9 -6 F 11 4 16 22 9 81 36 G 13 6 36 19 6 36 36 ∑(푋 ∑(푌 Summation=138 2 2 − 푋) − 푌) = 112 = 252 Now we would substitute the values into the above equation and compute the Pearson’s ‘r’. 150 ‘r’ = √(112)(252) 150 138 = = √28224 168

= 0 .82 The value of ‘r’ lies in between (+1) and (-1). The direction of a relationship is indicated by the sign of the correlation coefficient. A positive relationship (or direct relationship) indicates that high scores on one variable tend to be associated with high scores on a second variable and conversely low scores on one variable tend to be associated with low scores on the second variable. A negative relationship (also referred to as an inverse or indirect relationship) indicates that low scores on one variable tend to be associated with high scores on a second variable. Conversely high scores on one variable tend to be associated with low scores on the second variable (Elifson 1997: 201). In the above example there is found to be a strong positive correlation between mothers’ education and their daughters’ education.

In a concluding note it can be said that although there is no established rule so as to specify what constitutes a weak, moderate or strong relationship, yet there are certain guidelines to follow. A weak

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

20

relationship is one where the score varies between ± 0.01 to ± 0.30, moderate when the scores vary between ± 0.31 to ± 0.70, and strong relationship between ± 0.71 to ± 0.99. A perfect relationship is ± 1.00 and no relationship is indicated when ‘r’ = 0 (Elifson 1997: 208).

ii) Simple : Regression analysis is very closely related to correlation. It is the statistical determination of a relationship between two or more variables (Kothari 2004). When we use regression analysis, we are essentially interested in the description of a predictive relationship (Vito and Latessa 1989). The independent variable in the relationship is known as the cause and the dependent variable is the effect. In regression analysis, we can state accurately the degree of change in the two variables. In other words, how much each unit change in X produces a change Y (Kothari 2004). The basic equation of is as follows:

푌̂= a+bX where 푌̂ is the predicted scores of the dependent variable, X is the scores of the independent variable, ‘a’ is the Y intercept, the point at which the regression line crosses the Y axis, representing the predicted value of Y when X = 0 and ‘b’ is the regression coefficient, it is the slope of the regression line and indicates the expected change in Y with a change of one unit in X (Vito and Latessa 1989).

Vito and Latessa (1989) state the example of the theory of prisonization in correction. According to the theory, the longer a person is incarcerated, the more ‘prisonized’ the person will become and their readjustment to society will be hampered. The hypothesis was tested with a random sample of inmates using a scale designed to test the degree of prisonization, where 0 indicates no prisonization and 10 equals a high degree of prisonization. Here prisonization is the dependent variable (Y) where as the time served in years in prison is our independent variable (X). The computation of the X and Y will be shown in the table below:

Prisoner X x= (X-푋) 푥2 Y y= Y-푌 푦2 XY A 0 -3.4 11.56 1 -3.6 12.96 12.24 B 2 -1.4 1.96 3 -1.6 2.56 2.24 C 5 1.6 2.56 4 -0.6 0.36 -0.96 D 4 0.6 0.36 6 1.4 1.96 0.84 E 6 2.6 6.76 9 4.4 19.36 11.44 N = 5 17 23.2 23 37.2 25.8 The Mean value of X is 푋= 17/5 = 3.4 The Mean value of Y is 푌 = 23/5 = 4.6 The value of the regression coefficient or ‘b’ can be found out from the formula 푥푦 25.8 b = = = 1.11 푥2 23.2 now we can find out the value of ‘a’ from the formula a = 푌- b(푋) = 4.6 – 1.11(3.4) = 0.83 ‘b’ is the slope of the regression line or the ratio of the change in Y corresponding to a change in X. Therefore when X changes by 1, Y will change by 1.11 units. ‘a’ is the y-intercept or the value of Y if x = 0 (Vito and Latessa 1989).

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

21

When the value of X (time spent in prison) would be 2, the value of 푌̂(the degree of prisonization) would be 푌̂ = a + bX = 0.83 + 1.11(2) = 0.83 + 2.22 = 3.05.

In this way we can calculate the value of the dependent variable from the existing regression equation and infer exactly what amount of change in X will lead to what amount of change in Y.

To conclude, we can state that the regression analysis is a statistical method to deal with the formulation of mathematical model depicting relationship amongst variables which can be used for the purpose of prediction of the values of the dependent variable, given the values of the independent variable (Kothari 2004: 142).

iii) Contingency Tables: Contingency Tables are another way of explaining and interpreting relationship between variables. In the present module, we would be concerned only with the bivariate contingency tables where the focus of discussion would be on two variables – one an independent variable or the predictor variable (symbolized by X) and the other a dependent variable (symbolized by Y). Here we would discuss a relationship between marital status (X) and employment status (Y) of women. The hypothesis is that marital status exerts an influence on the employment status of women. The study has been carried out on 200 respondents (Elifson 1997). The data have been presented in the table below:

Marital Status (X) Employment Never Married Married Divorced Widowed Total Status (Y) Employed 21 60 11 6 98 Not-Employed 14 65 4 19 102 Total 35 125 15 25 N= 200

Contingency tables can be interpreted by percentaging it in three ways as follows.

Percentaging Down: This is one of the most common ways of calculating percentages. Here the column marginals (35, 125, 15 and 25) are taken as the base on which the percentages are calculated. Percentaging down is also referred to as percentaging on the independent variable when it is the column variable. Percentaging down allows us to determine the effect of the independent variable by comparing across the percentages within a row that is by comparing people in different categories of the independent variable (Elifson 1997: 172). The method will be shown below: Marital Status (X) Employment Status (Y) Never Married Married Divorced Widowed Employed 60% 48% 73.3% 24% Not-Employed 40% 52% 26.7% 76% Total 100% 100% 100% 100%

While interpreting from the above table, we say 60% (21/35x100) of the never married respondents are employed, 48% (60/125x100) of the married respondents are employed, 73.3% (11/15x100) of the divorced respondents are employed and 24% (6/25x100) of the widowed respondents are employed. If

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

22

we interpret it in this way we get a logical relationship between marital status and employment status of women.

Percentaging Across: When we are percentaging across we are taking row marginal as the base and calculating percentages. Here we are percentaging across and comparing up and down. An advantage of doing this is that a profile of the employed versus those who are not employed can be established in terms of their marital status (Elifson 1997: 172). This is also shown in the table below:

Marital Status (X) Employment Never Married Divorced Widowed Total Status (Y) Married Employed 21.4% 61.2% 11.2% 6.1% 99.9% Not-Employed 13.7% 63.7% 3.9% 18.6% 99.9%

From the above table, we can say that 21.4% (21/98x100) of the respondents who are employed have never married, 13.7% (14/102x100) of the respondents who are not-employed have never married. Moreover, 61.2% of the employed respondents are married, where as 63.7% of the respondents who are not-employed have married, 11.2% of the employed respondents have been divorced, 3.9% of the not-employed respondents have been divorced. 6.1% of the employed respondents are widowed and 18.6% of the not-employed respondents are widowed. In the above table, the total has not come to 100% due to rounding (Elifson 1997).

Percentaging on the total number of cases: This is another method of interpreting bivariate contingency tables. Here the percentages are calculated on the total number of cases (N). The following table shows this:

Marital Status (X) Employment Never Married Divorced Widowed Status (Y) Married Employed 10.5% 30% 5.5% 3% Not-Employed 7% 32.5% 2% 9.5% Total 100%

From the above table we infer that 10.5% (21/200x100) of the respondents have never married and are employed where as 7% (14/200x100) of the respondents who have married are not-employed. This way of percentaging like the second method (percentaging across) also does not allow us to see the influence of the independent variable on the dependent one and is rarely used. But it is used in certain instances (Elifson 1997: 172).

Self-Check Exercise – 3

1. What is measurement? Measurement is the assignment of numbers to objects or events according to some predetermined (or arbitrary) rules. The different levels of measurement represent different levels of numerical information contained in a set of observations.

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

23

2. What are the levels of measurement that are used by the social scientists? There are four levels of measurement namely – nominal, ordinal, interval and ratio. The characteristics of each will decide the kind of statistical application we can use.

 The nominal level does not involve highly complex measurement but rather involves rules for placing individuals or objects into categories.

 The ordinal scales possess all the characteristics of the nominal and in addition the categories represent a rank-ordered series of relationships like poorer, healthier, greater than etc.

 The interval and ratio scales are the highest level of measurement in science and employ numbers. The numerical values associated with these scales permit the use of mathematical operations such as adding, subtracting, multiplying and dividing. The only difference between the two is that the ratio level has a true zero point which the interval does not have. With both these levels we can state the exact differences between categories (Elifson 1997).

7. Limitations of Statistics in Sociology

Statistics plays a role in Sociology, especially in Applied Sociology. There is a debate which has been going on since the middle of the twentieth century between researchers who are committed to the use of quantitative methods and computer application and those who believe in qualitative approach in sociology. The latter group argues that statistics, if its importance is overemphasized, will become a substitute for sociology. They argue that it is not always appropriate to conduct research with quantitative variables that can be handled by statistical analysis. The decision to apply statistics to the research would depend on factors like the nature of the problem, the subjects of study and the availability of previously collected data, to name a few (Weinstein 2011). Researchers now a day increasingly depend on the use of mixed methods. In general, mixed methods combine both qualitative and quantitative techniques to cancel out their weaknesses. Triangulation is a particular application of mixed methods (Guthrie 2010). One way in which a qualitative research approach is introduced into quantitative research is through ethnostatistics which implicates the study of the construction, interpretation and display of statistics in quantitative social research. The idea of ethnostatistics can be applied in many ways but one predominant way to apply it is to treat statistics as rhetoric. More specifically this implies examining the language used in persuading audiences about the validity of the research (Bryman 2004: 446). To conclude, we can say that statistics will be a necessary tool for effective research but can never be a substitute for sociological reasoning. It can give the data some precision and make it manageable and smart for presentation (Weinstein 2011).

8. Summary

The present module has tried to analyse the processes and methods to examine quantitative data that is, data that can be reduced to numbers. This process comes at a time when the researcher is through with the process of data collection. The data are first to be processed through various methods of coding, tabulation and classification. These help to reduce the data to manageable proportions and make it ready to be applied to interpret data. After the data are processed different methods of

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

24

statistics like measures of central tendency, dispersion, chi-square, t-test, coefficient correlation, simple regression and contingency tables are used to interpret data. The choice of the use of statistical application depends on the nature of the research and the availability of the levels of data. But it has to be remembered that statistical analysis is only a helping tool of research. It can never be a substitute for the efforts of the researcher and the quality of the data collected. A combination of quantitative and qualitative methods of analysis is essential for the interpretation of data in social research.

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data

25

9. References

Ahuja Ram. Research Methods. Jaipur: Rawat Publications, 2007.

Bryman Alan. Social Research Methods. New York: Oxford University Press, 2004.

Elifson Kirk W, Runyon Richard P. and Haber Audrey. Fundamentals of Social Statistics. United States: Mc. Graw Hill, 1997.

Garrett, Henry E. Statistics in Psychology and Education. New York: David McKay Company, Inc. , 1981.

Guthrie, Gerard. Basic Research Methods: An Entry to Social Science Research. New Delhi: Sage Publications India Private Limited, 2010.

Kothari C.R. Research Methodology: Methods and Techniques. New Delhi: New Age International (P) Limited, Publishers, 2008.

Leonard Wilbert Marcellus. Basic Social Statistics. Illinois: Stipes Publishing L.L.C., 1996.

Levin Jack and Fox James Alan.Elementary Statistics in Social Research. New Delhi: Dorling Kindersley (India) Pvt. Ltd., 2006.

Majumdar P. K. Research Methods in Social Science. New Delhi: Vinod Vasishtha for Viva Books Private Limited, 2005.

Morrison Ken. Marx, Durkheim, Weber. London: Sage Publications, 1995.

Vito Gennaro and Latessa Edward.Statistical Applications in Criminal Justice. London: Sage Publications, 1989.

Weinstein Jay Alan. Applying Social Statistics. United Kingdom: Rowman and Littlefield Publishers Inc., 2011.

Name of Paper: Methodology of Research in Sociology Sociology Name of Module: Processing and Analyzing Quantitative Data