# T C 1 8

Basic Stats with Tableau

Tyler Martin Senior Software Engineer Tableau Software Tyler Martin [email protected] Agenda

• Confidence intervals • Hypothesis testing • Trend lines • Forecasting • Q&A Confidence Intervals

Confidence Interval: Definition

Definition For 95% of samples, the will contain the population average

For a particular sample There is an 95% chance that the confidence interval contains the population average Confidence Intervals Answer Questions Like…

What does my sample of mile-run times tell me about the average 2nd grader in Seattle?

Is the average 2nd grader in Seattle likely to run a faster mile than me? Hypothesis Testing

Hypothesis Testing: Test

A value calculated from your data

This value always follows the same distribution, regardless of the distribution of your data* Hypothesis Testing: Procedure

1. State the hypothesis and the null hypothesis

2. Choose an appropriate test statistic This usually follows a well-known distribution

3. Choose a threshold probability Usually small, we will use 0.005 (0.5%) Hypothesis Testing: Procedure (continued)

4. Calculate the p-value The probability under the null hypothesis of a test statistic at least as extreme as what we observed.

5. Accept or reject the hypothesis Accept: p < 0.005 Reject: p > 0.005 Student’s t-test

Test statistic follows Student’s t-distribution

We will use a two-sample location test

Tests the null hypothesis that the of two populations are equal Hypothesis Testing Can Answer Questions Like…

Are CrossFit Games athletes stronger on average in 2018 than they were in 2007? (t-test)

Are observations of two groups independent of one another? (Chi-squared test)

Is my sample drawn from a normally distributed population? (Shapiro-Wilk test) Trend Lines

Trend Lines: Null Hypothesis

What if there is no relationship? Trend Lines: Residuals Trend Lines: OLS Questions

1. Do I suspect there is a relationship between two variables? What do I suspect that relationship is?

2. Do the residuals have = 0? Do they appear unrelated to the independent variable?

3. Are the residuals are unlikely to be correlated with one another?

4. Does the spread of the residuals look roughly the same with changes in the independent variable? Trend Lines Answer Questions Like…

What is the relationship between profit and CEO compensation?

When wind speed changes, how does windmill power output change?

Does compensation change in a meaningful way when age changes? Forecasting

Forecasting: Model Quality

We will consider only Mean Absolute Scaled Error (MASE)

MASE compares the error of your model with the error of the naïve forecast

MASE is typically between 0 (good) and 1 (bad) Forecasting: Naïve Forecast

Forecast values copied from the last observed value.

For seasonal forecasts, values are copied from the last observed season. Forecasting: Unexpected and Poor Forecast

1. Does it look like there is a in my data?

2. Is there a lot of short-scale variation at the current date level? Forecasts Answer Questions Like…

How many visitors to my page can I expect in the future, given data on past visits?

Based on past data, what will my inventory be in the future?

How is the value of my collection likely to change in the future? Questions Please complete the session survey from the Session Details screen in your TC18 app