Frequency Distribution Measures of Central Values
Total Page:16
File Type:pdf, Size:1020Kb
i “Chapter-2(Final)” — 2010/9/18 — 16:03 — page 77 — #1 i i i FREQUENCY DISTRIBUTION MEASURES OF CENTRAL VALUES 2 CHAPTER AND DISPERSION INTRODUCTION Variability is the most common characteristic of the data with reference to agriculture, biological and physical science research. The need for statistical method arises from the variability. Examples where variation is observed 1. Number of seedlings 2. Blood group in man and their frequency 3. Plaque morphology of Bacteriophages infecting different strains of E. coli. 4. Plant height 5. Number of insects with specific eye colour 6. Eye colour in Drosophila. Characteristics which show variation of this sort are called random vari- ables or variates. Continuous variable: A continuous variable is one that can take any value in a given range which may be finite or infinite. For example, the yield of a crop, height of a plant, animal height, human height are continuous variables. Discrete variables: A discrete variable is one for which the possible values are not observed on a continuous scale. For example, number of children in a family, number of fingers, number of plants bearing reel flowers, number of Drosophila with red eyes, number of wild type and mutant colonies of a microorganism, number of lethals, (dominant or recessive lethals), number of fragments in a cell observed during cell division. POPULATION AND SAMPLE A population is defined as the total set of actual or possible values of the variable. A population may be finite or infinite, and the variable, continuous or discrete. i i i i i “Chapter-2(Final)” — 2010/9/18 — 16:03 — page 78 — #2 i i i 78 Fundamentals of Biostatistics The idea of infinite populations distributed in a frequency distribution in respect of one or more characters is fundamental to all statistical work (Fisher -1948). One of the principle objectives of statistics is to draw inferences with respect to populations by the study of groups of individuals forming part of the populations. A sample is therefore any finite set of items drawn from a population. The purpose of drawing samples is to obtain information about the populations from which they are drawn. A random sample from a given population is a sample, chosen in such a manner that each possible sample has an equal chance of being drawn. Quantities which characterise populations are known as parameters. Char- acters of sample are called statistics. A parameter is a fixed quantity, statistic is a variate. Generally, a statistic is sought which ‘best’ estimate the corre- sponding population parameter. FREQUENCY DISTRIBUTIONS The assemblage of xi with their associated frequencies f is called a frequency distribution. A typical frequency distribution is presented in table. Frequency distribution of variable X Value of variable Frequency X1 f1 X2 f2 Xi fi Xn fn Total frequency N Example 2.1 Prepare a frequency distribution for the following data of height of 20 children. 133 136 120 138 133 131 127 141 127 143 130 131 125 144 128 134 135 137 133 129 Frequency distribution Height(cm) Tally Frequency 119.5–124.5 1 | 124.5–129.5 5 |||| 129.5–134.5 7 |||||| 134.5–139.5 4 |||| 139.5–144.5 3 ||| Total 20 i i i i i “Chapter-2(Final)” — 2010/9/18 — 16:03 — page 79 — #3 i i i Frequency Distribution — Measures of Central Values and Dispersion 79 HISTOGRAM Grouped data can be displayed in a histogram. In a histogram rectangles are drawn so that the area of each rectangle is proportional to the frequency in the range covered by it. Example 2.2 The lengths of 30 plant leaves of species A were measured and the information grouped as shown. Measurements were taken correct to the nearest cm. Draw a histogram to illustrate the data. Length of leaf (cm) 9.5–14.5 14.5–19.5 19.5–24.5 24.5–29.5 Frequency 3 8 12 7 12 10 8 6 4 Frequency 2 0 9.5 14.5 19.5 24.5 29.5 Length of leaf FREQUENCY POLYGONS A frequency distribution may be displayed as a frequency polygon. A frequency polygon may be superimposed on a histogram by joining the midpoints of the tops of the rectangles. This is in grouped data. (a) Ungrouped data: The fin length in (cm) of particular type of fish is given. Draw a frequency polygon to illustrate this information. Fin length (cm) 246789 Frequency 385421 8 6 4 Frequency 2 0 2 4 6 7 8 9 Fin length (cm) i i i i i “Chapter-2(Final)” — 2010/9/18 — 16:03 — page 80 — #4 i i i 80 Fundamentals of Biostatistics (b) Grouped data: The following table shows the age distribution of cases of a certain disease reported during a year in a particular state. Prepare a frequency polygon. Age Number of cases 5–14 5 15–24 10 25–34 22 35–44 22 45–54 13 55–64 5 25 20 15 10 5 No. of cases (Frequency) 0 4.5 14.5 24.5 34.5 44.5 54.5 Age Example 2.3 Pollen grain length in microns is given below. Prepare frequency table. Represent cumulative frequency, draw histogram, frequency polygon. 63, 64, 64, 67, 62, 60, 69, 68 19, 16, 10, 20, 22, 20, 20, 21 21, 23, 26, 25, 24, 27, 32, 30 31, 33, 34, 38, 39, 36, 43, 46 44, 47, 50, 55, 59, 58, 58, 41 42, 48, 52, 42, 42, 64, 30, 54 Class interval Frequency CF 10–19 =3 3 ||| 20–29 = 11 3 + 11 = 14 |||| |||| | 30–39 = 9 14 + 9 = 23 |||| |||| 40–49 = 9 23 + 9 = 32 |||| |||| 50–59 =7 32 + 7 = 39 |||| || 60–69 = 9 39 + 9 = 48 |||| |||| Total 48 i i i i i “Chapter-2(Final)” — 2010/9/18 — 16:03 — page 81 — #5 i i i Frequency Distribution — Measures of Central Values and Dispersion 81 12 10 8 6 4 2 0 9.5 19.5 29.5 39.5 49.5 59.5 Frequency polygon 12 10 8 6 4 2 0 9.5 19.5 29.5 39.5 49.5 59.5 69.5 Histogram MEASURES OF CENTRAL TENDENCY One of the most important objectives of statistical analysis is to get one single value that describes the characteristic of the entire mass of data. There are three main statistical measures which attempt to locate a ‘typical’ value. These are 1. Arithmetic mean (A.M.) 2. Median 3. Mode i i i i i “Chapter-2(Final)” — 2010/9/18 — 16:03 — page 82 — #6 i i i 82 Fundamentals of Biostatistics Other measures of central tendency are 1. Geometric Mean (G.M.) 2. Harmonic Mean (H.M.) ARITHMETIC MEAN A numerical value which indicates the centre of the distribution is called the Arithmetic Mean. For ungrouped data n x X¯ = i n i=1 ∑ Grouped data fx X¯ = ∑n where n = f; (Sigma) = Summation; X¯(X-Bar) = Mean Example 2.4∑ Find∑ the mean of set of numbers 63, 65, 67, 68, 69, 70, 71, 72, 74, 75. n = 10 x = 63 + 65 + 67 + 68 + 69 + 70 + 71 + 72 + 74 + 75 = 694 ∑ X 694 X¯ = = = 69.4 n 10 ∑ Frequency Distribution No. of flowers (X) 1 2 345 No. of plants (f) 1110531 X f fX 1 11 11 2 10 20 3 5 15 4 3 12 5 1 5 f = 30 fX = 63 ∑ ∑ fX 63 X¯ = = =2.1 n 30 ∑ where n = f ∑ i i i i i “Chapter-2(Final)” — 2010/9/18 — 16:03 — page 83 — #7 i i i Frequency Distribution — Measures of Central Values and Dispersion 83 Grouped Frequency Distribution The lengths of 32 leaves were measured correct to the nearest mm. Find the mean length. Length (mm) 20–22 23–25 26–28 29–31 32–34 Frequency 3 6 12 9 2 Length (mm) Midpoint (X) f fX 20–22 21 3 63 23–25 24 6 144 26–28 27 12 324 29–31 30 9 270 32–34 33 2 66 f = 32 fX = 867 ∑ ∑ fX 867 X¯ = = = 27.1m n 32 ∑ where n = f Merits and∑ demerits of arithmetic mean 1. It is the simplest to understand and the easiest to compute. 2. It is affected by the value of every item in series. Demerit 1. Arithmetic mean is not always a good measure of central tendency, as for instance in extremely symmetrical distribution. MEDIAN The median refers to the middle value in a distribution. The median is called a positional average. The median is the middle value of a set of numbers arranged 1 in order of magnitude. The median is the (n + 1)th value. The median is 2 that value of the variate for which 50% of the observations, when arranged in order of magnitude, lie on each side. Median is the value of the variate which divides the total frequency in the whole range into two equal parts. Median is not affected by extreme values. If the total frequency is even, the median is the arithmetic mean of the two middle values. Compared with the arithmetic mean, the median places less emphasis on the minimum or maximum value in the sample or population. Example 2.5 Find the median of each of the sets. (a) 7 , 7 , 2 , 3 , 4 , 2 , 7 , 9 , 31 (b) 36 , 41 , 27 , 32 , 29 , 38 , 39 , 43 i i i i i “Chapter-2(Final)” — 2010/9/18 — 16:03 — page 84 — #8 i i i 84 Fundamentals of Biostatistics Solution: (a) 2, 2, 3, 4, 7, 7, 7, 9, 31 1 n = 9, and the median is the (9 + 1) the value, i.e., the 5th value.