UNIT 2 DESCRIPTIVE STATISTICS Introduction to Statistics
Total Page:16
File Type:pdf, Size:1020Kb
UNIT 2 DESCRIPTIVE STATISTICS Introduction to Statistics Structure 2.0 Introduction 2.1 Objectives 2.2 Meaning of Descriptive Statistics 2.3 Organising Data 2.3.1 Classification 2.3.2 Tabulation 2.3.3 Graphical Presentation of Data 2.3.4 Diagrammatical Presentation of Data 2.4 Summarising Data 2.4.1 Measures of Central Tendency 2.4.2 Measures of Dispersion 2.5 Use of Descriptive Statistics 2.6 Let Us Sum Up 2.7 Unit End Questions 2.8 Glossary 2.9 Suggested Readings 2.0 INTRODUCTION We have learned in the previous unit that looking at the functions of statistics point of view, statistics may be descriptive, correlational and inferential. In this unit we shall discuss the various aspects of descriptive statistics, particularly how to organise and discribe the data. Most of the observations in this universe are subject to variability, especially observations related to human behaviour. It is a well known fact that Attitude, Intelligence, Personality, etc. differ from individual to individual. In order to make a sensible definition of the group or to identify the group with reference to their observations/ scores, it is necessary to express them in a precise manner. For this purpose observations need to be expressed as a single estimate which summarises the observations. Such single estimate of the series of data which summarises the distribution are known as parameters of the distribution. These parameters define the distribution completely. In this unit we will be focusing on descriptive statistics, the characteristic features and the various statistics used in this category. 2.1 OBJECTIVES After going through this unit, you will be able to: Define the nature and meaning of descriptive statistics; Describe the methods of organising and condensing raw data; Explain concept and meaning of different measures of central tendency; and Analyse the meaning of different measures of dispersion. 15 Introduction to Statistics 2.2 MEANING OF DESCRIPTIVE STATISTICS Let us take up a hypothetical example of two groups of students taking a problem solving test. One group is taken as experimental group in that the subjects in this group are given training in problem solving while the other group subjects do not get any training. Both were tested on problem solving and the scores they obtained were as given in the table below. Table 2.1: Hypothetical performance scores of 10 students of experimental and control groups based on Problem solving experiment. Experimental Condition Control Condition ( With Training ) (Without Training) 4 2 8 4 12 10 8 6 7 3 9 4 15 8 6 4 5 2 8 3 The scores obtained by children in the two groups are the actual scores and are considered as raw scores. A look at the table shows that the control group subjects have scored rather lower as compared to that of the experimental group which had undergone training. There are some procedures to be followed and statistical tests to be used to describe the raw data and get some meaningful interpretation of the same. This is what Descriptive statistics is all about. Description of data performs two operations: (i) Organising Data and (ii) Summarising Data. 2.3 ORGANISING DATA Univariate analysis involves the examination across cases of one variable at a time. There are four major statistical techniques for organising the data. These are: Classification Tabulation Graphical Presentation Diagrammatical Presentation 2.3.1 Classification The classification is a summary of the frequency of individual scores or ranges of scores for a variable. In the simplest form of a distribution, we will have such value of variable as well as the number of persons who have had each value. 16 Once data are collected, researchers have to arrange them in a format from which Descriptive Statistics they would be able to draw some conclusions. The arrangement of data in groups according to similarities is known as classification. Thus by classifying data, the investigators move a step ahead to the scores and proceed forward concrete decision. Classification is done with following objectives: Presenting data in a condensed form Explaining the affinities and diversities of the data Facilitating comparisons Classification may be qualitative and quantitative Frequency distribution. A much clear picture of the information of score emerges when the raw data are organised as a frequency distribution. Frequency distribution shows the number of cases following within a given class interval or range of scores. A frequency distribution is a table that shows each score as obtained by a group of individuals and how frequently each score occurred. Frequency distribution can be with ungrouped data and grouped data i) An ungrouped frequency distribution may be constructed by listing all score value either from highest to lowest or lowest to highest and placing a tally mark (/) besides each scores every times it occurs. The frequency of occurrence of each score is denoted by ‘f’ . ii) Grouped frequency distribution: If there is a wide range of score value in the data, then it is difficult to get a clear picture of such series of data. In this case grouped frequency distribution should be constructed to have clear picture of the data. A group frequency distribution is a table that organises data into classes, into groups of values describing one characteristic of the data. It shows the number of observations from the data set that fall into each of the class. Construction of Frequency Distribution Before proceeding we need to know a few terminologies used in further discussion as for instance, a variable. A variable refers to the phenomenon under study. It may be the performance of students on a problem solving issue or it can be a method of teaching students that could affect their performance. Here the performance is one variable which is being studied and the method of teaching is another variable that is being manipulated. Variables are of two kinds : i) Continuous variable ii) Discrete variable. Those variables which can take all the possible values in a given specified range is termed as Continuous variable. For example, age ( it can be measured in years, months, days, hours, minutes , seconds etc.) , weight (lbs), height(in cms), etc. On the other hand those variables which cannot take all the possible values within the given specified range are termed as discrete variables. For example, number of children, marks obtained in an examination ( out of 200), etc. 17 Introduction to Statistics Preparation of Frequency Distribution To prepare a frequency distribution, we, first decide the range of the given data, that is, the difference between the highest and lowest scores. This will tell about the range of the scores. Prior to the construction of any grouped frequency distribution, it is important to decide the following 1) The number of class intervals: There is no hard and fast rules regarding the number of classes into which data should be grouped . If there are very few scores, it is useless to have a large number of class-intervals. Ordinarily, the number of classes should be between 5 to 30 2) Limits of each class interval: Another factor used in determining the number of classes is the size/ width or range of the class which is known as ‘class interval’ and is denoted by ‘i’. Class interval should be of uniform width resulting in the same-size classes of frequency distribution. The width of the class should be a whole number and conveniently divisible by 2, 3, 5, 10 or 20. There are three methods for describing the class limits for distribution: i) Exclusive method ii) Inclusive method iii) True or actual class method i) Exclusive method: In this method of class formation, the classes are so formed that the upper limit of one class also becomes the lower limit of the next class. Exclusive method of classification ensures continuity between two successive classes. In this classification, it is presumed that score equal to the upper limit of the class is exclusive, i.e., a score of 40 will be included in the class of 40 to 50 and not in a class of 30 to 40 ii) Inclusive method: In this method classification includes scores, which are equal to the upper limit of the class. Inclusive method is preferred when measurements are given in whole numbers. iii) True or Actual class method: In inclusive method upper class limit is not equal to lower class limit of the next class. Therefore, there is no continuity between the classes. However, in many statistical measures continuous classes are required. To have continuous classes it is assumed that an observation or score does not just represent a point on a continuous scale but an internal unit length of which the given score is the middle point. Thus, mathematically, a score is internal when it extends from 0.5 units below to 0.5 units above the face value of the score on a continuum. These class limits are known as true or actual class limits. Types of frequency distributions: There are various ways to arrange frequencies of a data array based on the requirement of the statistical analysis or the study. A couple of them are discussed below: Relative frequency distribution: A relative frequency distribution is a distribution that indicates the proportion of the total number of cases observed at each score value or internal of score values. 18 Cumulative frequency distribution: Sometimes investigator is interested to know Descriptive Statistics the number of observations less than a particular value. This is possible by computing the cumulative frequency. A cumulative frequency corresponding to a class-interval is the sum of frequencies for that class and of all classes prior to that class.