Lecture 1: Introduction to Data and Distributions
Chapter 1 Important Things
• www.stat.purdue.edu/~xuanyaoh/stat350 • Syllabus – Textbook – Classroom Locat ions – Policy: Hw/Lab/Class Participation/Exams – Exam Schedule – SAS The Required Textbook What is Statistics?
• Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data
• Suppose we want to have an idea about how well do Purdue students do in MA162 in the past 5 years, what can one do to find it ou t?
–Find MA162 records and check them –too many to look at, not realistic in most cases – Draw a number of records and try to make a reasonable guess –statistics comes into play! Population vs. Samples
• Population: All objects of interest • Sample: a subset of the population Examples of Data
• RltResults from making obtibservations on one or more variables – x = score on a STAT 350 midterm exam – Univariate data or one variable
• (x, y) = height and weight of a STAT 350 student – Bivariate data or two variables
• Etc. Types of Variables Two Terminologies
• DitiDescriptive SttitiStatistics – Summarize and describe important features of data – Numerical summary measures • mean, median, standard deviation … – Graphic, visual display • –histogram, scatter plot…
• Inferential Statistics – Formal “guesses” we make about the population by lkilooking at the samp le • Common types of inferential statistics are confidence intervals and significance tests 121.2 Descriptive Statistics: Graphical
• The scores of 30 Undergraduate Students • GhilGraphical dildisplay such as the histogram in the previous slide –gives us a rough idea on the whole, very informative and clear
• Numerical measures such as mean and standard deviation in the previous slide –give us a quantitative measure of the center and spread of the data Visual Displays of Data
• Hist ogram—see in prev ious examp le, w ill discuss in detail • Dot plot —self reading (sec 1.2) • Stem and Leaf, see in later example (sec 12)1.2) • Bar graph or chart —self reading (sec 1.2) • SttltScatterplot—discusse d la ter • We won’t discuss them all but you should cover them in your reading & be comfortable with them all. Histogram for Discrete Data • Based on previous example, – To get the histogram, just count the occurrence of each value of the variable and plot the counts (frequency) on the vertical axis – Can display as frequencies (counts) or percents Continuous Data
• SbdiideSubdivide the x-aisaxis into a nnmbeumber of class inteintealsrvals (or classes), plot the frequency or relative frequency for each class
• Define the boundaries of the classes carefully to prevent observations from falling on boundaries (read Pg.14)
• The class size may greatly influence how the histogram looks –Big class interval: a few big rectangles –Small class interval: many small rectangles Example: Ex. 8 from Text (Pg. 14)
How to choose the class width?
• Although the class width doesn’t change the distribution,,gy it can change your visual understanding of the distribution • A rule of thumb in determining a reasonable number of classes if provided by your text Relative Frequency vs. Density Why “ Densities” ? Interpreting Histograms DotPlot (Self-reading in Sec 1. 2) StemPlot (SelfReading) Hank Aeron Example 1.3 Distributions Continuous Distributions Continuous Distribution: Density Function Examples About SAS …
• Read “SAS” section in syllabus, and also the instruction from course website When you go home…
• Read over the syllabus carefully, before you make decision! • Get the Textbook • Read the “SAS” part in syllabus and the instruction from course website • Read/Review sections 1. 1, 1. 2 and 1. 3 • Start doing Hw#1 and Lab#1 posted on the website • No lab this Wednesday, so go to the regular Wed classroom.
• To preview, Readddbd sections 1.3 (discrete distribution, mass function), 1.4 and 1.5