The Scientific Method and Basic Statistics
Total Page:16
File Type:pdf, Size:1020Kb
The Scientific Method and Basic Statistics Objectives: Understand the steps in the Scientific Method Be able to describe basic statistical parameters and how they relate to the Normal (Gaussian) Distribution Model Be able to explain how hypotheses are tested; supported or rejected. What Do Scientists Do? •Scientists collect data and develop theories, models, and laws about how nature works. Science searches for natural causes to explain natural phenomenon 1. Purpose of science a. to determine cause and effect b. to gain insight into natural events 2. Science does not include “absolutes” 3. Science provides tentative explanations to explain natural phenomenon 4. Fundamental basis of science: The Principal of Uncertainty “Science cannot prove anything, nor is it a search for the ‘truth’.” 1. Science develops tentative answers for guesses (hypotheses) based on evidence 2. Theory - when supporting evidence is very strong! Science Is a Search for Order in Nature Identify a problem Find out what is known about the problem Ask a question to be investigated Gather data through experiments Propose a scientific hypothesis Science Is a Search for Order in Nature Make testable predictions Keep testing and making observations Accept or reject the hypothesis Scientific theory: well-tested and widely accepted hypothesis Characteristics of Science…and Scientists Curiosity Skepticism Reproducibility Peer review Openness to new ideas Critical thinking Creativity Observation: Nothing happens when I try to turn on my flashlight. Question: Why didn’t the light come on? Are the batteries dead? Hypothesis: Maybe the batteries are dead. Test hypothesis with an experiment: Put in new batteries and try to turn on the flashlight. Result: Flashlight still does not work. New hypothesis: Maybe the bulb is burned out. Experiment: Put in a new bulb. Result: Flashlight works. Conclusion: New hypothesis is verified. Fig. 2-3, p. 33 Concept 1.1 Connections in Nature Observation of Pacific tree frogs suggested that a parasite can cause deformities. Small glass beads implanted in tadpoles to mimic the effect of cysts of Ribeiroia ondatrae, a trematode flatworm, also produced deformities. Concept 1.1 Connections in Nature Further studies: • Deformities of Pacific tree frogs occurred only in ponds that also had an aquatic snail, Helisoma tenuis, an intermediate host of the parasite. • All frogs with deformed limbs had Ribeiroia cysts. Figure 1.3 The Life Cycle of Ribeiroia 1. Observation • The awareness of a natural event or natural phenomenon directly or indirectly by means of our senses. Observation: North facing slopes have heavier tree growth than south facing slopes N S Observation: North facing slopes have heavier tree growth than south facing slopes Possible Questions: What causes trees to grow more abundantly on north facing slopes? Question both relevant and testable, but very general. What causes the slope to be north facing? Probably not relevant. Did Martians plant these trees 10,000 years ago? Probably not testable. Is evaporation of water less on north facing slopes than south facing slopes? More relevant and to the point. Observation: North facing slopes have heavier tree growth than south facing slopes Question: Is evaporation of water less on north facing slopes than south facing slopes? N S 3. Hypothesis: A guess postulating an answer to the question Must be relevant and testable Bias My idea is so logical, so reasonable, and it sounds so right, it must be correct Where is the supporting evidence? Observation: North facing slopes have heavier tree growth than south facing slopes Question: Is evaporation of water less on north facing slopes than south facing slopes? Hypothesis: Evaporation is greater on south facing slopes than north facing slopes. 4. Experiment •Additional observations gathered to test the hypothesis. Observation: North facing slopes have heavier tree growth than south facing slopes Question: Is evaporation of water less on north facing slopes than south facing slopes? Hypothesis: Evaporation is greater on south facing slopes than north facing slopes. Experiment: Test evaporation using a sling psychrometer. Experimental Difficulties • Bias • Experimental Errors • Sample Size What are the odds of flipping: • 5 heads in a row? 2-5 = 1/32 •10 heads in a row? 2-10 = 1/1024 •100 heads in a row? 2-100 = 1.27x1030 or 1 in 1,270,000,000,000,000,000,000,000,000,000 Charlie Charlie’s Sick Diagnosis – Fish Ick Fish Ick Medicine Controlled Experiment •Run two side-by-side experiments 1. No change 2. Change one experimental variable only Controlled Study Experimental Group Control Group Conditions Identical Except Fish ick medicine no medicine How many of each? ~50 experimental fish ~50 control fish 5. Evaluation – Conclusions • Analyze the results of the experiment 50 Experimental Fish 50 Control Fish How many of each lived? Live 40 / 50 10 / 50 Conclusion – Medication helps Live 40 / 50 32 / 50 Conclusion – Not clear if medication helps 5. Evaluation • When results are close the sample size is critical. Experimental Fish Control Fish How many fish should be used? Inconclusive result if 100 fish are used (difference = 1/256 chance) Live 40 / 50 32 / 50 More conclusive result if 1000 fish are used Live 400 / 500 320 / 500 (difference = 1/1.21x1030 chance) Statistical Approach to Science How does science develop theories? A theory is an hypothesis which is solidly supported by evidence. Support for hypotheses comes from statistics Using a sample, the mean of an experimental population can be determined along with other statistical parameters The absolute “true mean” (denoted as m) cannot be determined. instead a we estimate a mean (x) for our sample population. We can estimate a confidence interval in which the true mean of the population lies at a given level of probability This honors the Uncertainty Principal in Science Statistical Method • There is a high degree of variability in living things: cells, organisms, populations • Sample – a portion of a population must be sufficiently large, but obtained randomly • Random selection reduces bias “Normal” Distribution The line of a bell-shaped curve reveals continuous variation in the population some value of the trait the of value some Number of individuals with individuals of Number Range of values for the trait Fig. 8-14a, p.120 some value trait of value the some Number of with individuals Range of values for the trait Fig. 8-14b, p.120 Statistics Summation Notation and • Mean Symbols 1 i is the index variable, or x xi counter. The index variable is N used to identify each observed value. • Variance n is the number of observations x x2 Xi is the variable of interest for s 2 i observation number i. x N 1 ∑ is sigma (Greek capital S) This means to add, or sum, all observations of variable X • Standard deviation x x2 x2 Nx 2 s i i x N 1 N 1 Arithmetic Mean Mean is the average value of observations; Determined by adding up all values then dividing them by the number of observations The mean represents an estimate of the absolute “true mean” denoted with a Greek lower case m (m) 1 x x N i Variance Variance is an estimate of the range of values from our observations Obtained by summing the square of the differences between individual values and the mean then dividing by the number of observations minus one. Again, this is an estimate of the “true variance” (s2) x x2 s 2 i x N 1 Standard deviation Standard deviation is another estimate of the range of values in relation to the mean. Again, this is an estimate of the “true deviation” (s) represented by a lower case Greek s Simply calculated as the square root of the variance x x2 x2 Nx 2 s i i x N 1 N 1 Confidence Interval CI gives the probability that the spread of values will lie within a distribution; with our sample mean and the true population in the center of the range It also provides our level of confidence for rejecting or failing to reject a null hypothesis 2 2 s1 s2 CI X 1 X 2 t n1 n2 Confidence Level • In biology the level of confidence used is usually 95%. • This means there is a 5% chance that our conclusion is in error! Confidence Level 95% Confidence interval: 95% of data will be contained within non-shaded area of curve In biology the level of confidence used is usually 95%. This means there is a 5% chance that our conclusion is in error! Fig. 8-15, p.121 T-test determines probability that two data sets are from a single population Hypotheses Ho: µ1 = µ2 6 H1: µ1 µ2 5 In this example we can 4 visually see a significant difference among two N means. 3 After conducting a t-test, we would reject the null 2 TAXON hypothesis; the two Pelv means are not equal 1 Porph 250200150100 50 0 50 100150200250 Count Count Null vs Alternate Hypotheses • Null Hypothesis Ho: µ1 = µ2 • By default, the null hypothesis is that there is no significant difference among our two sample means. • Alternate Hypothesis H1: µ1 µ2 Decision Rule If the p-value is less than alpha Reject the Hypothesis If the p-value is greater than or equal to alpha Fail to Reject the Hypothesis • t Test Decision Rule If the p-value is less than alpha, reject the null Hypothesis (two means are not equal) If the p-value is greater than or equal to alpha Fail to Reject the Hypothesis Two-sample t-test on TEMP grouped by TREATMENT$ against Alternative = 'not equal' Group N Mean SD None 116 16.55697 2.60453 Shade 287 14.57568 2.03032 25 Separate variance: Difference in means = 1.98130 95.00% CI = 1.44862 to 2.51398 t = 7.34105 20 df = 174.2 p-value = 0.00000 P M E Pooled variance: T 15 Difference in means = 1.98130 95.00% CI = 1.50322 to 2.45937 t = 8.14733 TREATMENT df = 401 None p-value = 0.00000 10 Shade 60 50 40 30 20 10 0 10 20 30 40 50 60 Count Count Comparing more than two means •T-tests work when we want to determine the equality of two means.