Exercise 1C Scientific Investigation: Statistical Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Exercise 1C Scientific Investigation: Statistical Analysis Parts of this lab adapted from General Ecology Labs, Dr. Chris Brown, Tennessee Technological University and Ecology on Campus, Dr. Robert Kingsolver, Bellarmine University. In part C of our Scientific Investigation labs, we will use the measurement data from part B to ask new questions and apply some basic statistics. Ecology is the ambitious attempt to understand life on a grand scale. We know that the mechanics of the living world are too vast to see from a single vantage point, too gradual to observe in a single lifetime, and too complex to capture in a single narrative. This is why ecology has always been a quantitative discipline. Measurement empowers ecologists because our measuring instruments extend our senses, and numerical records extend our capacity for observation. With measurement data, we can compare the growth rates of trees across a continent, through a series of environmental conditions, or over a period of years. Imagine trying to compare from memory the water clarity of two streams located on different continents visited in separate years, and you can easily appreciate the value of measurement. Numerical data extend our capacity for judgment too. Since a stream runs muddier after a rain and clearer in periods of drought, how could you possibly wade into two streams in different seasons and hope to develop a valid comparison? In a world characterized by change, data sets provide reliability unrealized by single observations. Quantitative concepts such as averages, ratios, variances, and probabilities reveal ecological patterns that would otherwise remain unseen and unknowable. Mathematics, more than any cleverly crafted lens or detector, has opened our window on the universe. It is not the intention of this lab to showcase math for its own sake, but we will take measurements and make calculations because this is the simplest and most powerful way to examine populations, communities, and ecosystems. Sampling To demonstrate the power of quantitative description in ecology, you will use a series of measurements and calculations to characterize a population. In biology, a population is defined as a group of individuals of the same species living in the same place and time. Statisticians have a more general definition of a population, that is, all of the members of any group of people, organisms, or things under investigation. For the ecologist, the biological population is frequently the subject of investigation, so our biological population can be a statistical population as well. Think about a population of red-ear sunfish in a freshwater lake. Since the population's members may vary in age, physical condition, or genetic characteristics, we must observe more than one representative before we can say much about the sunfish population as a group. When the population is too large to catch every fish in the lake, we must settle for a sample of individuals to represent the whole. This poses an interesting challenge for the ecologist: how many individuals must we observe to ensure that we have adequately addressed the variation that exists in the entire population? How can this sample be collected as a fair representation of the whole? Biology 6C 23 Ecologists try to avoid bias, or sampling flaws that over- represent individuals of one type and under-represent others. If we caught our sample of sunfish with baited hooks, for example, we might selectively capture individuals large enough to take the bait, while leaving out smaller fish. Any estimates of size or age we made from this biased sample would poorly represent the population we are trying to study. After collecting our sample, we can measure each individual and then use these measurements to develop an idea about the population. If we are interested in fish size, we could measure each of our captured sunfish from snout to tail. Reporting every single measurement in a data table would be truthful, but not very useful, because the human mind cannot easily take in long lists of numbers. A more fruitful approach is to take all the measurements of our fish and systematically construct composite numerical descriptions, or descriptive statistics, which convey information about the population in a more concise form. Note that we did not measure each and every fish in the population, if for no other reason than we likely could never catch them all. Instead, we took a sample of fish from the population, and calculated our mean from this sample. Statistical values (the mean, variance, etc.) are called statistics because they represent one estimate of the true values of the mean and variance. So that’s what statistics really are…estimates of an unmeasured true value for a population. You can probably guess that, if we went and got a second sample of sunfish, the mean and variance would likely differ from what we got for the first sample. The true values for the mean, etc., found by measuring all individuals in our population, are called parameters, or parametric values. We almost never know these, but we assume that our statistics come reasonably close. In general, as long as we randomly sample our populations, and have a large enough sample size, this assumption will hold. However, it’s always possible that we just happen to sample uncommonly long (or uncommonly short) sunfish; if so, then our statistics will likely differ from the parametric values. Part A: Descriptive Statistics As the name implies, descriptive statistics describe our data. One common descriptive statistic is the mean (or average, as it’s more popularly called). The mean, or x , represents the “middle” (or, in mathematical terms, the central tendency) of the data, and is given by the formula: X X = Â i n where the xi’s are your individual data points (for example, the body length measurements), and n equals sample size (the total number of sunfish measured). The symbol S indicates summation, so for the mean we need to add together all the xi’s and then divide this by n. We might find, for instance, that the mean length of sunfish in this lake is 12.07 centimeters, based on a sample of 80 netted fish. Note: the symbol µ is used for the mean of all fish in the population, which we are trying to estimate in our study. The symbol x is used for the mean of our sample, which we hope to be close to µ. 24 Exercise 1.2. Scientific Investigation: Statistical Analysis Means are useful, but they can be misleading. If a population were made up of small one-year-old fish and much larger two-year-old fish, the mean we calculate may fall somewhere between the large and small size classes- above any of the small fish, but below any of the large ones. A mean evokes a concept of the "typical" fish, but the "typical" fish may not actually exist in the population (Figure 1.7). For this reason, it is often helpful to use more than one statistic in our description of the typical member of a population. One useful alternative is the median, which is the individual ranked at the 50th percentile when all data are arranged in numerical order. Another is the mode, which is the most commonly observed length of all fish in the sample. Figure 1.7 The calculated mean describes a “typical” fish that does not actually exist in a population composed of two size classes. Biology 6C 25 Application In our exercise, the data you collected in your groups last lab period are samples and the population is defined as the entire Biology 6C class. Use the data in Table 1.2 from the previous lab to calculate the mean height and mean arm length for your group. Show your calculations below: Mean height = Mean arm length = Complete Table 1.5 below using the class data spreadsheet. Table 1.5 Group Mean Heights Groups Mean Height (cm) Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8 Group 9 Group 10 Entire Class Using the data table distributed in class, enter the sample (group) means in Table 1.5 and compare them to the population (class) mean. Are there any sample means that do not seem to represent the population mean? How could sample size affect how well the sample mean represents the population mean? How could method of choosing a sample from the population affect how well the sample mean represents the population mean? 26 Exercise 1.2. Scientific Investigation: Statistical Analysis Picturing Variation After calculating statistics to represent the typical individual, it is still necessary to consider variation among members of the population. A frequency histogram is a simple graphic representation of the way individuals in the population vary. To produce a frequency histogram: 1. Choose a measurement variable, such as length in our red-ear sunfish. Assume we have collected 80 sunfish and measured each fish to the nearest millimeter. 2. On a number line, mark the longest and shortest measurements taken from the population (Figure 1.8). The distance on the number line between these points, determined by subtracting the smallest from the largest, is called the range. In our example, the longest fish measures 16.3 cm, and the shortest fish measures 8.5 cm, so the range is 7.8 cm. Figure 1.8 3. Next, divide the range into evenly spaced divisions (Figure 1.9). In our example, each division of the number line represents a size class. It is customary to use between 10 and 20 size classes in a histogram. For our example, we will divide the range of sunfish sizes into 16 units of 0.5 cm each.