Statistics 110

Cedar Crest College Statistics 110 Lecture Notes Author: E-Mail Address: James Hammer [email protected] June 23, 2015 Preface This is meant as a teaching aid. It can be freely distributed and edited in any way. For a copy of the LATEX document, please email the author. ii Contents 1 Nature of Probability and Statistics1 1.1 Descriptive and Inferential Statistics......................1 1.2 Variables and Types of Data...........................3 1.3 Data Collection and Sampling Techniques...................5 1.4 Experimental Design...............................6 2 Frequency Distribution and Graphs7 2.1 Organizing Data.................................7 2.2 Histograms, Frequency Polygons, and Ogives.................. 10 2.3 Other Types of Graphs.............................. 12 3 Data Description 19 3.1 Measures of Central Tendencies......................... 19 3.2 Measures of Variation............................... 26 3.3 Measures of Position............................... 33 3.4 Exploratory Data Analysis............................ 38 4 Probability and Counting Rules 41 4.1 Sample Spaces and Probability......................... 41 4.2 The Addition Rules for Probability....................... 47 4.3 The Multiplication Rules & Conditional Probability.............. 49 4.4 Counting Rules.................................. 56 4.5 Probability and Counting Rules......................... 62 5 Discrete Probability Distributions 65 5.1 Probability Distributions............................. 65 5.2 Mean, Variance, Standard Deviation, and Expectation............ 67 5.3 The Binomial Distribution............................ 72 6 The Normal Distribution 75 6.1 Normal Distributions............................... 75 6.2 Applications of the Normal Distribution.................... 77 6.3 The Central Limit Theorem........................... 80 iii 7 Confidence Intervals and Sample Size 83 7.1 Confidence Intervals for the Mean Shen σ is Known.............. 83 7.2 Confidence Intervals for the Mean When σ is Unknown............ 87 7.3 Confidence Intervals and Sample Size for Populations............. 89 8 Hypothesis Testing 93 8.1 Traditional Method................................ 93 8.2 z Test for a Mean................................. 95 8.3 t Test for a Mean................................. 98 8.4 z Test for a Proportion.............................. 100 10 Correlation and Regression 103 10.1 Scatter Plots and Correlation.......................... 103 iv Chapter 1 Nature of Probability and Statistics 1.1 Descriptive and Inferential Statistics Definition 1 (Statistics). Statistics is the science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data. Definition 2 (Variable). A variable is a characteristic of attribute that can assume different values. Definition 3 (Data). The data are the values (measurements or observations) that the variables can assume. Variables whose values are determined by chance are called random variables. Definition 4 (Population). A population consists of all subjects (human or otherwise) that are being studied. Definition 5 (Sample). A sample is a group of subjects selected from a population. Definition 6 (Descriptive Statistics). Descriptive statistics consists of the collection, organization, summarization, and presentation of data. Definition 7 (Inferential Statistics). Inferential statistics consists of generalizing from sample to populations, performing estimations and hypothesis tests, determining relationship among variables and making predictions. Definition 8 (Probability). Probability is the measure of the likeliness that an event will occur. Probability is quantified as a number between 0 and 1. The higher the probability of an event, the more likely the event will occur. Definition 9 (Hypothesis Testing). The area of inferential statistics called hypothesis testing is a decision-making process for evaluating claims about a population based on information obtained from samples. 1 Hammer 2 Statistics 110 Lecture Notes Problem 10. Determine whether descriptive statistics or inferential statistics were used. Explain why. a. The average jackpot for the top five lottery winners was $367:6 million. b. A study done by the American Academy of Neurology suggested that older people who had a high caloric diet more than doubled their risk of memory loss. c. Based on a survey of 9317 consumers done by the National Retail Federation, the average amount that consumers spent on Valentines Day in 2011 was $116. d. Scientists at the University of Oxford in England found that a good laugh significantly raises a person's pain level tolerance. Suggested Problems: 1-19. Cedar Crest College June 23, 2015 1.2. Variables and Types of Data Hammer 3 1.2 Variables and Types of Data Definition 11 (Qualitative Variables). Qualitative variables are variables that have dis- tinct categories according to some characteristic or attribute. Definition 12 (Quantitative Variables). Quantitative variables are variables that can be counted or measured. Definition 13 (Discrete Variables). Discrete variables assume values that can be counted. Definition 14 (Continuous Variables). Continuous variables can assume an infinite number of values between any two specific values. They are obtained by measuring. They often include fractions and decimals. Problem 15. Classify each of the variables as a discrete variable or as a continuous variable. a. The highest wind speed of a hurricane. b. The weight of baggage on an airplane. c. The number of pages in these lecture notes. d. The amount of money a person spends per year for online purchases. June 23, 2015 Cedar Crest College Hammer 4 Statistics 110 Lecture Notes Definition 16 (Boundary). The boundary of a number is defined as a class in which a data value would be placed before the data values are rounded. Problem 17. Find the boundaries of each variable. a.8 :4 quarts b. 138 mmHg c. 137:63 mg/dL Definition 18 (Nominal Level of Measurement). The nominal level of measurement classifies data into mutually exclusive (nonoverlapping) categories in which no order or rank- ing can be imposed on the data. Definition 19 (Ordinal Level of Measurement). The ordinal level of measurement classifies data into categories that can be ranked; however, precise differences between the ranks do not exist. Definition 20 (Interval Level of Measurement). The interval level of measurement ranks data, and precise differences between units of measure do exist; however, there is no meaningful zero. Definition 21 (Ratio Level of Measurement). The ratio level of measurement possesses all of the characteristics of interval measurement, and there exists a true zero. In addition, true ratios exist when the same variable is measured on two different members of the population. Problem 22. What level of measurement would be used to measure each variable? a. The age of patients in a local hospital. b. The rating of movies released this month. c. Colors of athletic shirts sold by the Iron Pigs. d. Temperatures of hot tubs in local health clubs. Suggested Problems: 5 − 22 Cedar Crest College June 23, 2015 1.3. Data Collection and Sampling Techniques Hammer 5 1.3 Data Collection and Sampling Techniques Definition 23 (Random Sample). A random Sample is a sample in which all members of the population have an equal chance of being selected. Definition 24 (Systematic Sample). A systematic sample is a sample obtained by selecting every kth member of the population. Definition 25 (Stratified Sample). A stratified sample is a sample obtained by dividing the population into subgroups or strata according to some characteristic relevant to the study. The subjects are selected from each subgroup. Definition 26 (Cluster Sample). A cluster sample is obtained by dividing the population into sections or clusters and then selecting one or more clusters and using all members in the cluster(s) as the members of the sample. Definition 27 (Sampling Error). Sampling error is the difference between the results obtained from a sample and the results obtained from the population from which the sample was selected. Definition 28 (Nonsampling Error). A nonsampling error occurs when the data are obtained erroneously or the sample is biased (non-representative). Problem 29. State which sampling method was used in the following: a. Out of 10 hospitals in the municipality, a researcher selects one and collects records for a 24-hour period on the types of emergencies that were treated here. b. A researcher divides a group of students according to gender; major field; and low, average, and high grade point average. Then she randomly selects six students from each group to answer questions in the survey. c. The subscribers to a magazine are numbered. Then a sample of these people is selected using random numbers. d. Every 10th bottle of Super-Duper Cola is selected, and the amount of liquid in the bottle is measured. The purpose is to see if the machines that fill the bottles are working properly. Suggested Problems: 11 − 16 June 23, 2015 Cedar Crest College Hammer 6 Statistics 110 Lecture Notes 1.4 Experimental Design Definition 30 (Observational Study). In an observational study, the researcher merely observes what is happening or what has happened in the past and tries to draw conclusions based on these observations. Definition 31 (Experimental Study). In an experimental study, the researcher manip- ulates one of the variables and tries to determine how the manipulation influences other variables. Definition 32 (Variable Types). The independent variable in an

Statistics 110

Programming in Stata

The Cascade Bayesian Approach: Prior Transformation for a Controlled Integration of Internal Data, External Data and Scenarios

What Is Bayesian Inference? Bayesian Inference Is at the Core of the Bayesian Approach, Which Is an Approach That Allows Us to Represent Uncertainty As a Probability

Basic Econometrics / Statistics Statistical Distributions: Normal, T, Chi-Sq, & F

Weighted Means and Grouped Data on the Graphing Calculator

Practical Meta-Analysis -- Lipsey & Wilson Overview

Lecture 3: Measure of Central Tendency

Statistical Power in Meta-Analysis Jin Liu University of South Carolina - Columbia

Descriptive Statistics Frequency Distributions and Their Graphs

Flood Frequency Analysis Using Copula with Mixed Marginal Distributions

Formulas Used by the “Practical Meta-Analysis Effect Size Calculator”

Categorical Data Analysis