Statistics

Interpreting Data After psychologists develop a theory, form a hypothesis, make observations, and collect data, they end up with a lot of information, usually in the form of numerical data. The term statistics refers to the analysis and interpretation of this numerical data. Psychologists use statistics to organize, summarize, and interpret the information they collect.

Descriptive Statistics To organize and summarize their data, researchers need numbers to describe what happened. These numbers are called descriptive statistics. Researchers may use histograms or bar graphs to show the way data are distributed. Presenting data this way makes it easy to compare results, see trends in data, and evaluate results quickly. Example: Suppose a researcher wants to find out how many hours students study for three different courses. Each course has 100 students. The researcher does a survey of ten students in each of the courses. On the survey, he asks the students to write down the number of hours per week they spend studying for that course. The data look like this: Hours of Study per Week Course A Course B Course C Student Hours per week Student Hours per week Student Hours per week Joe 9 Hannah 5 Meena 6 Peter 7 Ben 6 Sonia 6 Zoey 8 Iggy 6 Kim 7 Ana 8 Louis 6 Mike 5 Jose 7 Keesha 7 Jamie 6 Lee 9 Lisa 6 Ilana 6 Joshua 8 Mark 5 Lars 5 Ravi 9 Ahmed 5 Nick 20 Kristen 8 Jenny 6 Liz 5 Loren 1 Erin 6 Kevin 6 To get a better sense of what these data mean, the researcher can plot them on a bar graph. Histograms or bar graphs for the three courses might look like this: Measuring Central Tendency Researchers summarize their data by calculating measures of central tendency, such as the mean, the median, and the mode. The most commonly used measure of central tendency is the mean, which is the arithmetic average of the scores. The mean is calculated by adding up all the scores and dividing the sum by the number of scores. However, the mean is not a good summary method to use when the data include a few extremely high or extremely low scores. A distribution with a few very high scores is called a positively skewed distribution. A distribution with a few very low scores is called a negatively skewed distribution. The mean of a positively skewed distribution will be deceptively high, and the mean of a negatively skewed distribution will be deceptively low. When working with a skewed distribution, the median is a better measure of central tendency. The median is the middle score when all the scores are arranged in order from lowest to highest. Another measure of central tendency is the mode. The mode is the most frequently occurring score in a distribution. Statistics Statistics is a branch of mathematics. Psychologists need a solid foundation in math to describe, analyze, and summarize the results of their research.

Measuring Variation Measures of variation tell researchers how much the scores in a distribution differ. Examples of measures of variation include the range and the standard deviation. The range is the difference between the highest and the lowest scores in the distribution. Researchers calculate the range by subtracting the lowest score from the highest score. The standard deviation provides more information about the amount of variation in scores. It tells a researcher the degree to which scores vary around the mean of the data.

Inferential Statistics After analyzing statistics, researchers make inferences about how reliable and significant their data are. Inferential statistics are used to interpret data and draw conclusions. They tell psychologists whether or not they can generalize from the chosen sample to the whole population, if the sample actually represents the population. Inferential statistics use rules to evaluate the probability that a correlation or a difference between groups reflects a real relationship and not just the operation of chance factors on the particular sample that was chosen for study. Statistical significance (p) is a measure of the likelihood that the difference between groups results from a real difference between the two groups rather than from chance alone. Results are likely to be statistically significant when there is a large difference between the means of the two frequency distributions, when their standard deviations (SD) are small, and when the samples are large. Some psychologists consider that results are significantly different only if the results have less than a 1 in 20 probability of being caused by chance (p = .05). Others consider that results are significantly different only if the results have less than a 1 in 100 probability of being caused by chance (p < .01). The lower the p value, the less likely the results were due to chance. Results of research that are statistically significant may be practically important or trivial. Statistical significance does not imply that findings are really important. Meta-analysis provides a way of statistically combining the results of individual research studies to reach an overall conclusion. Scientific conclusions are always tentative and open to change should better data come along. Good psychological research gives us an opportunity to learn the truth. Percentile Rank – A percentage that describes your rank among those also being evaluated. I.e. if your percentile rank on a test is 90, then your score is higher than 90% of the class. It is impossible to get 100% percentile rank because you cannot get higher than everyone in the class, including yourself.  Mean – The average score. Add all the numbers up and divide by number of terms. The mean of {2,2,3,10,98} is 23.  Median – The middle point of all the terms such that half is above the number and half is below the number (50th percentile). Arrange the number from highest to lowest or vice versa and find the number in the middle. The median of {2,2,3,10,96} is 3.  Mode – The number that occurs the most. Count to see which number appears the most. The mode of the {2,2,3,10,98} is 2.  Range – The range of the scores is the difference between the highest number and the lowest number. The range of GPA score is from 0.0 to 4.0.  Standard Deviation – A measurement of how far scores differ/deviate from the mean. The standard deviation of {5,6,5,6,6,7,5,4} is very low because terms hardly deviate from the mean of 5.5. Whereas, the standard deviation of {5,10,8,18,-6,5,-7,22} is high.  Variance = s2

Standard Deviation Method Example: To find the Standard deviation of 1,2,3,4,5.

Step 1: Calculate the mean and deviation. X M (X-M) (X-M)2 1 3 -2 4 2 3 -1 1 3 3 0 0 4 3 1 1 5 3 2 4 Step 2:Find the sum of (X-M)2 4+1+0+1+4 = 10 Step 3:N = 5, the total number of values.Find N-1. 5-1 = 4

Step 4:Now find Standard Deviation using the formula. √10/√4 = 1.58113 Another example: 1. Find the Standard Deviation of {2,3,3,4} 2. Find the mean. (2+3+3+4)/4 = 3 3. Subtract the mean from each term and square it. (2-3)²=1, (3-3)²=0, (3-3)²=0, (4-3)²=1 4. Find the average of the deviations from the mean. (1+0+0+1)/4 = 0.5 5. Square root the average and that’s the standard deviation (0.5)^1/2 = 0.7071 6. Normally this number should be rounded to the same decimal place as the data. But 0.7071 is shown for better understanding. 0.7071 ! 1  Normal curve (the 68-95-99.7 Rule ) or more commonly known as the bell curve is a distribution graph that dictates 68% of the scores should circa the mean. More specifically, 68% of the scores should fall within 1 standard deviation and 95% should fall within 2 standard deviations from the mean.

 Scatterplot – A graphical representation of data by usage of dots. The degree of cluster or formation of a slope can dictate the correlation between the two variables.  Correlation – The relationship between 2 events. I.e. Traffic accidents increase with increasing temperatures; businesses drop after Christmas ends.

Correlation Coefficient – A proportional number that measures correlation – how strongly two events vary together.  Positive Correlation – The two events increase and/or decrease together. For example, increasing study time positively correlates with increasing grades or decreased food consumption positively correlates with decreased excitability. Positive correlation coefficients are positive numbers ranging from 0.00 (no correlation) to 1.00 (perfect correlation). In a scatterplot graph, a positive correlation exists if a positive slope is seen.  Negative Correlation – One event increases and the other decreases or vice versa. For example, decreasing number of hours of sleep negatively correlates with increases traffic accidents or increasing alcohol consumption decreases alertness. Negative correlation coefficients are negative numbers ranging from –1.00 (perfect correlation) to 0.00 (no correlation). In a scatterplot, negative a correlation exists if a negative slope is seen. * Be sure to remember that CORRELATIONS DO NOT NECESSARILY MEAN CAUSATION. If car accidents increase with increasing temperatures, it does not necessarily mean that hot temperatures cause more traffic accidents!!  Be aware of ILLUSORY CORRELATION – seeing relationships between something when there is none. If you believe that black-colored dogs are more aggressive than white-colored dogs, then you will be more likely to notice and recall events where black-colored dogs show aggressiveness to confirm your belief (also know as “self -serving bias”).  Regression toward the mean – Tendency for extreme values to go back (“regress”) to the average value (mean). I.e. If you normally get 80% on your tests and suddenly you got an extreme (unusual) score of 50%, then on your next test you are likely to get around 80% again.  Statistical Significance – A measure of how likely an event is due to chance alone. I.e. If average marks concerning two classes are statistically significant, then the marks are actually different, not due to random chance or sampling errors. Statistical significance is usually determined by mathematical analysis of the samples.