<<

Soc500: Applied Social Week 1: Introduction and Probability

Brandon Stewart1

Princeton

September 14, 2016

1These slides are heavily influenced by Matt Blackwell, Adam Glynn and Matt Salganik. The spam filter segment is adapted from Justin Grimmer and Dan Jurafsky. Illustrations by Shay O’Brien. Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 1 / 62 Where We’ve Been and Where We’re Going...

Last Week

I methods camp

I pre-grad school life This Week

I Wednesday

F welcome F basics of probability Next Week

I random variables

I joint distributions Long Run

I probability → inference → regression

Questions?

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 2 / 62 Soc500: Applied Social Statistics I

I ... am an Assistant Professor in .

I ... am trained in and statistics

I ... do research in methods and statistical text analysis

I ... love doing collaborative research

I ... talk very quickly Your Preceptors

I sage guides of all things

I Ian Lundberg

I Simone Zhang

Welcome and Introductions

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 3 / 62 I

I ... am an Assistant Professor in Sociology.

I ... am trained in political science and statistics

I ... do research in methods and statistical text analysis

I ... love doing collaborative research

I ... talk very quickly Your Preceptors

I sage guides of all things

I Ian Lundberg

I Simone Zhang

Welcome and Introductions

Soc500: Applied Social Statistics

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 3 / 62 I ... am an Assistant Professor in Sociology.

I ... am trained in political science and statistics

I ... do research in methods and statistical text analysis

I ... love doing collaborative research

I ... talk very quickly Your Preceptors

I sage guides of all things

I Ian Lundberg

I Simone Zhang

Welcome and Introductions

Soc500: Applied Social Statistics I

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 3 / 62 I ... am trained in political science and statistics

I ... do research in methods and statistical text analysis

I ... love doing collaborative research

I ... talk very quickly Your Preceptors

I sage guides of all things

I Ian Lundberg

I Simone Zhang

Welcome and Introductions

Soc500: Applied Social Statistics I

I ... am an Assistant Professor in Sociology.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 3 / 62 I ... do research in methods and statistical text analysis

I ... love doing collaborative research

I ... talk very quickly Your Preceptors

I sage guides of all things

I Ian Lundberg

I Simone Zhang

Welcome and Introductions

Soc500: Applied Social Statistics I

I ... am an Assistant Professor in Sociology.

I ... am trained in political science and statistics

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 3 / 62 I ... love doing collaborative research

I ... talk very quickly Your Preceptors

I sage guides of all things

I Ian Lundberg

I Simone Zhang

Welcome and Introductions

Soc500: Applied Social Statistics I

I ... am an Assistant Professor in Sociology.

I ... am trained in political science and statistics

I ... do research in methods and statistical text analysis

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 3 / 62 I ... talk very quickly Your Preceptors

I sage guides of all things

I Ian Lundberg

I Simone Zhang

Welcome and Introductions

Soc500: Applied Social Statistics I

I ... am an Assistant Professor in Sociology.

I ... am trained in political science and statistics

I ... do research in methods and statistical text analysis

I ... love doing collaborative research

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 3 / 62 Your Preceptors

I sage guides of all things

I Ian Lundberg

I Simone Zhang

Welcome and Introductions

Soc500: Applied Social Statistics I

I ... am an Assistant Professor in Sociology.

I ... am trained in political science and statistics

I ... do research in methods and statistical text analysis

I ... love doing collaborative research

I ... talk very quickly

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 3 / 62 I sage guides of all things

I Ian Lundberg

I Simone Zhang

Welcome and Introductions

Soc500: Applied Social Statistics I

I ... am an Assistant Professor in Sociology.

I ... am trained in political science and statistics

I ... do research in methods and statistical text analysis

I ... love doing collaborative research

I ... talk very quickly Your Preceptors

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 3 / 62 I Ian Lundberg

I Simone Zhang

Welcome and Introductions

Soc500: Applied Social Statistics I

I ... am an Assistant Professor in Sociology.

I ... am trained in political science and statistics

I ... do research in methods and statistical text analysis

I ... love doing collaborative research

I ... talk very quickly Your Preceptors

I sage guides of all things

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 3 / 62 I Simone Zhang

Welcome and Introductions

Soc500: Applied Social Statistics I

I ... am an Assistant Professor in Sociology.

I ... am trained in political science and statistics

I ... do research in methods and statistical text analysis

I ... love doing collaborative research

I ... talk very quickly Your Preceptors

I sage guides of all things

I Ian Lundberg

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 3 / 62 Welcome and Introductions

Soc500: Applied Social Statistics I

I ... am an Assistant Professor in Sociology.

I ... am trained in political science and statistics

I ... do research in methods and statistical text analysis

I ... love doing collaborative research

I ... talk very quickly Your Preceptors

I sage guides of all things

I Ian Lundberg

I Simone Zhang

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 3 / 62 1 Welcome

2 Goals

3 Ways to Learn

4 Structure of Course

5 Introduction to Probability What is Probability? Sample Spaces and Events Probability Functions Marginal, Joint and Conditional Probability Bayes’ Rule Independence

6 Fun With History

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 4 / 62 1 Welcome

2 Goals

3 Ways to Learn

4 Structure of Course

5 Introduction to Probability What is Probability? Sample Spaces and Events Probability Functions Marginal, Joint and Conditional Probability Bayes’ Rule Independence

6 Fun With History

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 4 / 62 Goal: get you ready to quantitative work First in a two course sequence project (for graduate students, part of a longer arc) Difficult course but with many resources to support you. When we are done you will be able to teach yourself many things

The Core Strategy

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 5 / 62 First in a two course sequence replication project (for graduate students, part of a longer arc) Difficult course but with many resources to support you. When we are done you will be able to teach yourself many things

The Core Strategy

Goal: get you ready to quantitative work

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 5 / 62 (for graduate students, part of a longer arc) Difficult course but with many resources to support you. When we are done you will be able to teach yourself many things

The Core Strategy

Goal: get you ready to quantitative work First in a two course sequence replication project

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 5 / 62 Difficult course but with many resources to support you. When we are done you will be able to teach yourself many things

The Core Strategy

Goal: get you ready to quantitative work First in a two course sequence replication project (for graduate students, part of a longer arc)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 5 / 62 When we are done you will be able to teach yourself many things

The Core Strategy

Goal: get you ready to quantitative work First in a two course sequence replication project (for graduate students, part of a longer arc) Difficult course but with many resources to support you.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 5 / 62 The Core Strategy

Goal: get you ready to quantitative work First in a two course sequence replication project (for graduate students, part of a longer arc) Difficult course but with many resources to support you. When we are done you will be able to teach yourself many things

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 5 / 62 For the semester

I critically read, interpret and replicate the quantitative content of many articles in the quantitative social sciences

I conduct, interpret, and communicate results from analysis using multiple regression

I explain the limitations of observational for making causal claims

I write clean, reusable, and reliable R code.

I feel empowered working with data

Specific Goals

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 6 / 62 I critically read, interpret and replicate the quantitative content of many articles in the quantitative social sciences

I conduct, interpret, and communicate results from analysis using multiple regression

I explain the limitations of observational data for making causal claims

I write clean, reusable, and reliable R code.

I feel empowered working with data

Specific Goals

For the semester

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 6 / 62 I conduct, interpret, and communicate results from analysis using multiple regression

I explain the limitations of observational data for making causal claims

I write clean, reusable, and reliable R code.

I feel empowered working with data

Specific Goals

For the semester

I critically read, interpret and replicate the quantitative content of many articles in the quantitative social sciences

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 6 / 62 I explain the limitations of observational data for making causal claims

I write clean, reusable, and reliable R code.

I feel empowered working with data

Specific Goals

For the semester

I critically read, interpret and replicate the quantitative content of many articles in the quantitative social sciences

I conduct, interpret, and communicate results from analysis using multiple regression

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 6 / 62 I write clean, reusable, and reliable R code.

I feel empowered working with data

Specific Goals

For the semester

I critically read, interpret and replicate the quantitative content of many articles in the quantitative social sciences

I conduct, interpret, and communicate results from analysis using multiple regression

I explain the limitations of observational data for making causal claims

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 6 / 62 I feel empowered working with data

Specific Goals

For the semester

I critically read, interpret and replicate the quantitative content of many articles in the quantitative social sciences

I conduct, interpret, and communicate results from analysis using multiple regression

I explain the limitations of observational data for making causal claims

I write clean, reusable, and reliable R code.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 6 / 62 Specific Goals

For the semester

I critically read, interpret and replicate the quantitative content of many articles in the quantitative social sciences

I conduct, interpret, and communicate results from analysis using multiple regression

I explain the limitations of observational data for making causal claims

I write clean, reusable, and reliable R code.

I feel empowered working with data

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 6 / 62 For the year

I conduct, interpret, and communicate results from analysis using generalized linear models

I understand the fundamental ideas of , modern causal inference, and hierarchical models

I build a solid, reproducible research pipeline to go from raw data to final paper

I provide you with the tools to produce your own research (e.g. second year empirical paper).

Specific Goals

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 7 / 62 I conduct, interpret, and communicate results from analysis using generalized linear models

I understand the fundamental ideas of missing data, modern causal inference, and hierarchical models

I build a solid, reproducible research pipeline to go from raw data to final paper

I provide you with the tools to produce your own research (e.g. second year empirical paper).

Specific Goals

For the year

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 7 / 62 I understand the fundamental ideas of missing data, modern causal inference, and hierarchical models

I build a solid, reproducible research pipeline to go from raw data to final paper

I provide you with the tools to produce your own research (e.g. second year empirical paper).

Specific Goals

For the year

I conduct, interpret, and communicate results from analysis using generalized linear models

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 7 / 62 I build a solid, reproducible research pipeline to go from raw data to final paper

I provide you with the tools to produce your own research (e.g. second year empirical paper).

Specific Goals

For the year

I conduct, interpret, and communicate results from analysis using generalized linear models

I understand the fundamental ideas of missing data, modern causal inference, and hierarchical models

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 7 / 62 I provide you with the tools to produce your own research (e.g. second year empirical paper).

Specific Goals

For the year

I conduct, interpret, and communicate results from analysis using generalized linear models

I understand the fundamental ideas of missing data, modern causal inference, and hierarchical models

I build a solid, reproducible research pipeline to go from raw data to final paper

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 7 / 62 Specific Goals

For the year

I conduct, interpret, and communicate results from analysis using generalized linear models

I understand the fundamental ideas of missing data, modern causal inference, and hierarchical models

I build a solid, reproducible research pipeline to go from raw data to final paper

I provide you with the tools to produce your own research (e.g. second year empirical paper).

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 7 / 62 It’s extremely powerful, but relatively simple to do basic stats It’s the de facto standard in many applied statistical fields

I great community support

I continuing development

I massive number of supporting packages It will help you do research

Why R?

It’s free

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 8 / 62 It’s the de facto standard in many applied statistical fields

I great community support

I continuing development

I massive number of supporting packages It will help you do research

Why R?

It’s free It’s extremely powerful, but relatively simple to do basic stats

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 8 / 62 I great community support

I continuing development

I massive number of supporting packages It will help you do research

Why R?

It’s free It’s extremely powerful, but relatively simple to do basic stats It’s the de facto standard in many applied statistical fields

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 8 / 62 I continuing development

I massive number of supporting packages It will help you do research

Why R?

It’s free It’s extremely powerful, but relatively simple to do basic stats It’s the de facto standard in many applied statistical fields

I great community support

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 8 / 62 I massive number of supporting packages It will help you do research

Why R?

It’s free It’s extremely powerful, but relatively simple to do basic stats It’s the de facto standard in many applied statistical fields

I great community support

I continuing development

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 8 / 62 It will help you do research

Why R?

It’s free It’s extremely powerful, but relatively simple to do basic stats It’s the de facto standard in many applied statistical fields

I great community support

I continuing development

I massive number of supporting packages

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 8 / 62 Why R?

It’s free It’s extremely powerful, but relatively simple to do basic stats It’s the de facto standard in many applied statistical fields

I great community support

I continuing development

I massive number of supporting packages It will help you do research

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 8 / 62 Why RMarkdown? What you’ve done before

Baumer et al (2014) Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 9 / 62 Why RMarkdown? RMarkdown

Baumer et al (2014)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 10 / 62 1 Welcome

2 Goals

3 Ways to Learn

4 Structure of Course

5 Introduction to Probability What is Probability? Sample Spaces and Events Probability Functions Marginal, Joint and Conditional Probability Bayes’ Rule Independence

6 Fun With History

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 11 / 62 1 Welcome

2 Goals

3 Ways to Learn

4 Structure of Course

5 Introduction to Probability What is Probability? Sample Spaces and Events Probability Functions Marginal, Joint and Conditional Probability Bayes’ Rule Independence

6 Fun With History

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 11 / 62 No formal pre-requisites Balancing rigor and intuition

I no rigor for rigor’s sake

I we will tell you why you need the math, but also feel free to ask We will teach you any math you need as we go along Crucially though- this class is not about statistical aptitude, it is about effort

Mathematical Prerequisites

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 12 / 62 Balancing rigor and intuition

I no rigor for rigor’s sake

I we will tell you why you need the math, but also feel free to ask We will teach you any math you need as we go along Crucially though- this class is not about statistical aptitude, it is about effort

Mathematical Prerequisites

No formal pre-requisites

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 12 / 62 I no rigor for rigor’s sake

I we will tell you why you need the math, but also feel free to ask We will teach you any math you need as we go along Crucially though- this class is not about statistical aptitude, it is about effort

Mathematical Prerequisites

No formal pre-requisites Balancing rigor and intuition

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 12 / 62 I we will tell you why you need the math, but also feel free to ask We will teach you any math you need as we go along Crucially though- this class is not about statistical aptitude, it is about effort

Mathematical Prerequisites

No formal pre-requisites Balancing rigor and intuition

I no rigor for rigor’s sake

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 12 / 62 We will teach you any math you need as we go along Crucially though- this class is not about statistical aptitude, it is about effort

Mathematical Prerequisites

No formal pre-requisites Balancing rigor and intuition

I no rigor for rigor’s sake

I we will tell you why you need the math, but also feel free to ask

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 12 / 62 Crucially though- this class is not about statistical aptitude, it is about effort

Mathematical Prerequisites

No formal pre-requisites Balancing rigor and intuition

I no rigor for rigor’s sake

I we will tell you why you need the math, but also feel free to ask We will teach you any math you need as we go along

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 12 / 62 Mathematical Prerequisites

No formal pre-requisites Balancing rigor and intuition

I no rigor for rigor’s sake

I we will tell you why you need the math, but also feel free to ask We will teach you any math you need as we go along Crucially though- this class is not about statistical aptitude, it is about effort

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 12 / 62 Problem Sets reinforce understanding of material, practice Piazza ask questions of us and your classmates Office Hours ask even more questions.

Lecture learn broad topics Precept learn data analysis skills, get targeted help on assignments Readings support materials for lecture and precept

Ways to Learn

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 13 / 62 Problem Sets reinforce understanding of material, practice Piazza ask questions of us and your classmates Office Hours ask even more questions.

Precept learn data analysis skills, get targeted help on assignments Readings support materials for lecture and precept

Ways to Learn

Lecture learn broad topics

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 13 / 62 Problem Sets reinforce understanding of material, practice Piazza ask questions of us and your classmates Office Hours ask even more questions.

Readings support materials for lecture and precept

Ways to Learn

Lecture learn broad topics Precept learn data analysis skills, get targeted help on assignments

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 13 / 62 Problem Sets reinforce understanding of material, practice Piazza ask questions of us and your classmates Office Hours ask even more questions.

Ways to Learn

Lecture learn broad topics Precept learn data analysis skills, get targeted help on assignments Readings support materials for lecture and precept

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 13 / 62 I Fox (2016) Applied and Generalized Linear Models

I Angrist and Pischke (2008) Mostly Harmless

I Imai (2017) A First Course in Quantitative Social Science*

I Aronow and Miller (2017) Theory of Agnostic Statistics* Suggested reading When and how to do the reading

Reading

Required reading:

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 14 / 62 I Angrist and Pischke (2008) Mostly Harmless Econometrics

I Imai (2017) A First Course in Quantitative Social Science*

I Aronow and Miller (2017) Theory of Agnostic Statistics* Suggested reading When and how to do the reading

Reading

Required reading:

I Fox (2016) Applied Regression Analysis and Generalized Linear Models

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 14 / 62 I Imai (2017) A First Course in Quantitative Social Science*

I Aronow and Miller (2017) Theory of Agnostic Statistics* Suggested reading When and how to do the reading

Reading

Required reading:

I Fox (2016) Applied Regression Analysis and Generalized Linear Models

I Angrist and Pischke (2008) Mostly Harmless Econometrics

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 14 / 62 I Aronow and Miller (2017) Theory of Agnostic Statistics* Suggested reading When and how to do the reading

Reading

Required reading:

I Fox (2016) Applied Regression Analysis and Generalized Linear Models

I Angrist and Pischke (2008) Mostly Harmless Econometrics

I Imai (2017) A First Course in Quantitative Social Science*

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 14 / 62 Suggested reading When and how to do the reading

Reading

Required reading:

I Fox (2016) Applied Regression Analysis and Generalized Linear Models

I Angrist and Pischke (2008) Mostly Harmless Econometrics

I Imai (2017) A First Course in Quantitative Social Science*

I Aronow and Miller (2017) Theory of Agnostic Statistics*

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 14 / 62 When and how to do the reading

Reading

Required reading:

I Fox (2016) Applied Regression Analysis and Generalized Linear Models

I Angrist and Pischke (2008) Mostly Harmless Econometrics

I Imai (2017) A First Course in Quantitative Social Science*

I Aronow and Miller (2017) Theory of Agnostic Statistics* Suggested reading

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 14 / 62 Reading

Required reading:

I Fox (2016) Applied Regression Analysis and Generalized Linear Models

I Angrist and Pischke (2008) Mostly Harmless Econometrics

I Imai (2017) A First Course in Quantitative Social Science*

I Aronow and Miller (2017) Theory of Agnostic Statistics* Suggested reading When and how to do the reading

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 14 / 62 Piazza ask questions of us and your classmates Office Hours ask even more questions.

Problem Sets reinforce understanding of material, practice

Ways to Learn

Lecture learn broad topics Precept learn data analysis skills, get targeted help on assignments Readings support materials for lecture and precept

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 15 / 62 Piazza ask questions of us and your classmates Office Hours ask even more questions.

Ways to Learn

Lecture learn broad topics Precept learn data analysis skills, get targeted help on assignments Readings support materials for lecture and precept Problem Sets reinforce understanding of material, practice

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 15 / 62 Grading and solutions Code conventions Collaboration policy

Note: You may find these difficult. Start early and seek help!

Problem Sets

Schedule (available Wednesday, due 8 days later at precept)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 16 / 62 Code conventions Collaboration policy

Note: You may find these difficult. Start early and seek help!

Problem Sets

Schedule (available Wednesday, due 8 days later at precept) Grading and solutions

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 16 / 62 Collaboration policy

Note: You may find these difficult. Start early and seek help!

Problem Sets

Schedule (available Wednesday, due 8 days later at precept) Grading and solutions Code conventions

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 16 / 62 Note: You may find these difficult. Start early and seek help!

Problem Sets

Schedule (available Wednesday, due 8 days later at precept) Grading and solutions Code conventions Collaboration policy

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 16 / 62 Problem Sets

Schedule (available Wednesday, due 8 days later at precept) Grading and solutions Code conventions Collaboration policy

Note: You may find these difficult. Start early and seek help!

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 16 / 62 Piazza ask questions of us and your classmates Office Hours ask even more questions.

Your Job: get help when you need it!

Ways to Learn Lecture learn broad topics Precept learn data analysis skills, get targeted help on assignments Readings support materials for lecture and precept Problem Sets reinforce understanding of material, practice

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 17 / 62 Office Hours ask even more questions.

Your Job: get help when you need it!

Ways to Learn Lecture learn broad topics Precept learn data analysis skills, get targeted help on assignments Readings support materials for lecture and precept Problem Sets reinforce understanding of material, practice Piazza ask questions of us and your classmates

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 17 / 62 Your Job: get help when you need it!

Ways to Learn Lecture learn broad topics Precept learn data analysis skills, get targeted help on assignments Readings support materials for lecture and precept Problem Sets reinforce understanding of material, practice Piazza ask questions of us and your classmates Office Hours ask even more questions.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 17 / 62 Ways to Learn Lecture learn broad topics Precept learn data analysis skills, get targeted help on assignments Readings support materials for lecture and precept Problem Sets reinforce understanding of material, practice Piazza ask questions of us and your classmates Office Hours ask even more questions.

Your Job: get help when you need it!

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 17 / 62 My philosophy on teaching: don’t reinvent the wheel- customize, refine, improve. Huge thanks to those who have provided slides particularly: Matt Blackwell, Adam Glynn, Justin Grimmer, Jens Hainmueller, Kevin Quinn Also thanks to those who have discussed with me at length including Dalton Conley, Chad Hazlett, Gary King, Kosuke Imai, Matt Salganik and Teppei Yamamoto. Shay O’Brien produced the hand-drawn illustrations used throughout.

Attribution and Thanks

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 18 / 62 Huge thanks to those who have provided slides particularly: Matt Blackwell, Adam Glynn, Justin Grimmer, Jens Hainmueller, Kevin Quinn Also thanks to those who have discussed with me at length including Dalton Conley, Chad Hazlett, Gary King, Kosuke Imai, Matt Salganik and Teppei Yamamoto. Shay O’Brien produced the hand-drawn illustrations used throughout.

Attribution and Thanks

My philosophy on teaching: don’t reinvent the wheel- customize, refine, improve.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 18 / 62 Also thanks to those who have discussed with me at length including Dalton Conley, Chad Hazlett, Gary King, Kosuke Imai, Matt Salganik and Teppei Yamamoto. Shay O’Brien produced the hand-drawn illustrations used throughout.

Attribution and Thanks

My philosophy on teaching: don’t reinvent the wheel- customize, refine, improve. Huge thanks to those who have provided slides particularly: Matt Blackwell, Adam Glynn, Justin Grimmer, Jens Hainmueller, Kevin Quinn

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 18 / 62 Attribution and Thanks

My philosophy on teaching: don’t reinvent the wheel- customize, refine, improve. Huge thanks to those who have provided slides particularly: Matt Blackwell, Adam Glynn, Justin Grimmer, Jens Hainmueller, Kevin Quinn Also thanks to those who have discussed with me at length including Dalton Conley, Chad Hazlett, Gary King, Kosuke Imai, Matt Salganik and Teppei Yamamoto. Shay O’Brien produced the hand-drawn illustrations used throughout.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 18 / 62 1 Welcome

2 Goals

3 Ways to Learn

4 Structure of Course

5 Introduction to Probability What is Probability? Sample Spaces and Events Probability Functions Marginal, Joint and Conditional Probability Bayes’ Rule Independence

6 Fun With History

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 19 / 62 1 Welcome

2 Goals

3 Ways to Learn

4 Structure of Course

5 Introduction to Probability What is Probability? Sample Spaces and Events Probability Functions Marginal, Joint and Conditional Probability Bayes’ Rule Independence

6 Fun With History

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 19 / 62 Regression: how to determine the relationship between variables. Inference: how to learn about things we don’t know from the things we do know Probability: learn what data we would expect if we did know the truth. Probability → Inference → Regression

Outline of Topics

Outline in reverse order:

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 20 / 62 Inference: how to learn about things we don’t know from the things we do know Probability: learn what data we would expect if we did know the truth. Probability → Inference → Regression

Outline of Topics

Outline in reverse order: Regression: how to determine the relationship between variables.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 20 / 62 Probability: learn what data we would expect if we did know the truth. Probability → Inference → Regression

Outline of Topics

Outline in reverse order: Regression: how to determine the relationship between variables. Inference: how to learn about things we don’t know from the things we do know

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 20 / 62 Probability → Inference → Regression

Outline of Topics

Outline in reverse order: Regression: how to determine the relationship between variables. Inference: how to learn about things we don’t know from the things we do know Probability: learn what data we would expect if we did know the truth.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 20 / 62 Outline of Topics

Outline in reverse order: Regression: how to determine the relationship between variables. Inference: how to learn about things we don’t know from the things we do know Probability: learn what data we would expect if we did know the truth. Probability → Inference → Regression

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 20 / 62 the name comes from the word state the arc of developments in statistics 1 an applied scholar has a problem 2 they solve the problem by inventing a specific method 3 generalize and export the best of these methods

What is Statistics?

branch of mathematics studying collection and analysis of data

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 21 / 62 the arc of developments in statistics 1 an applied scholar has a problem 2 they solve the problem by inventing a specific method 3 statisticians generalize and export the best of these methods

What is Statistics?

branch of mathematics studying collection and analysis of data the name statistic comes from the word state

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 21 / 62 1 an applied scholar has a problem 2 they solve the problem by inventing a specific method 3 statisticians generalize and export the best of these methods

What is Statistics?

branch of mathematics studying collection and analysis of data the name statistic comes from the word state the arc of developments in statistics

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 21 / 62 2 they solve the problem by inventing a specific method 3 statisticians generalize and export the best of these methods

What is Statistics?

branch of mathematics studying collection and analysis of data the name statistic comes from the word state the arc of developments in statistics 1 an applied scholar has a problem

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 21 / 62 3 statisticians generalize and export the best of these methods

What is Statistics?

branch of mathematics studying collection and analysis of data the name statistic comes from the word state the arc of developments in statistics 1 an applied scholar has a problem 2 they solve the problem by inventing a specific method

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 21 / 62 What is Statistics?

branch of mathematics studying collection and analysis of data the name statistic comes from the word state the arc of developments in statistics 1 an applied scholar has a problem 2 they solve the problem by inventing a specific method 3 statisticians generalize and export the best of these methods

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 21 / 62 Inspiration: Hadley Wickham, Image: Matt Salganik

Quantitative Research in Theory

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 22 / 62 Quantitative Research in Theory

Inspiration: Hadley Wickham, Image: Matt Salganik

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 22 / 62 Inspiration: Hadley Wickham, Image: Matt Salganik

Quantitative Research in Practice

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 23 / 62 Quantitative Research in Practice

Inspiration: Hadley Wickham, Image: Matt Salganik

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 23 / 62 Inspiration: Hadley Wickham, Image: Matt Salganik

Traditional Statistics Class

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 24 / 62 Traditional Statistics Class

Inspiration: Hadley Wickham, Image: Matt Salganik

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 24 / 62 Inspiration: Hadley Wickham, Image: Matt Salganik

Time Actually Spent

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 25 / 62 Time Actually Spent

Inspiration: Hadley Wickham, Image: Matt Salganik

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 25 / 62 Strike a balance between practice and theory Heavy emphasis on applied data analysis

I problem sets with real data

I replication project next semester Teaching select key principles from statistics

This Class

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 26 / 62 Heavy emphasis on applied data analysis

I problem sets with real data

I replication project next semester Teaching select key principles from statistics

This Class

Strike a balance between practice and theory

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 26 / 62 I problem sets with real data

I replication project next semester Teaching select key principles from statistics

This Class

Strike a balance between practice and theory Heavy emphasis on applied data analysis

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 26 / 62 I replication project next semester Teaching select key principles from statistics

This Class

Strike a balance between practice and theory Heavy emphasis on applied data analysis

I problem sets with real data

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 26 / 62 Teaching select key principles from statistics

This Class

Strike a balance between practice and theory Heavy emphasis on applied data analysis

I problem sets with real data

I replication project next semester

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 26 / 62 This Class

Strike a balance between practice and theory Heavy emphasis on applied data analysis

I problem sets with real data

I replication project next semester Teaching select key principles from statistics

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 26 / 62 One way to approach this:

I generate a deterministic account of performance performancei = f (hoursi ) I but studying isn’t the only indicator of performance!

I we could try to account for everything performancei = f (hoursi ) + g(otheri ). I but that’s impossible A better approach

I Instead treat other factors as stochastic

I Thus we often write it as performancei = f (hoursi ) + i I This allows us to have uncertainty over outcomes given our inputs Our way of talking about stochastic outcomes is probability.

Deterministic vs. Stochastic “what is the relationship between hours spent studying and performance in Soc500?”

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 27 / 62 I generate a deterministic account of performance performancei = f (hoursi ) I but studying isn’t the only indicator of performance!

I we could try to account for everything performancei = f (hoursi ) + g(otheri ). I but that’s impossible A better approach

I Instead treat other factors as stochastic

I Thus we often write it as performancei = f (hoursi ) + i I This allows us to have uncertainty over outcomes given our inputs Our way of talking about stochastic outcomes is probability.

Deterministic vs. Stochastic “what is the relationship between hours spent studying and performance in Soc500?” One way to approach this:

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 27 / 62 I but studying isn’t the only indicator of performance!

I we could try to account for everything performancei = f (hoursi ) + g(otheri ). I but that’s impossible A better approach

I Instead treat other factors as stochastic

I Thus we often write it as performancei = f (hoursi ) + i I This allows us to have uncertainty over outcomes given our inputs Our way of talking about stochastic outcomes is probability.

Deterministic vs. Stochastic “what is the relationship between hours spent studying and performance in Soc500?” One way to approach this:

I generate a deterministic account of performance performancei = f (hoursi )

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 27 / 62 I we could try to account for everything performancei = f (hoursi ) + g(otheri ). I but that’s impossible A better approach

I Instead treat other factors as stochastic

I Thus we often write it as performancei = f (hoursi ) + i I This allows us to have uncertainty over outcomes given our inputs Our way of talking about stochastic outcomes is probability.

Deterministic vs. Stochastic “what is the relationship between hours spent studying and performance in Soc500?” One way to approach this:

I generate a deterministic account of performance performancei = f (hoursi ) I but studying isn’t the only indicator of performance!

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 27 / 62 I but that’s impossible A better approach

I Instead treat other factors as stochastic

I Thus we often write it as performancei = f (hoursi ) + i I This allows us to have uncertainty over outcomes given our inputs Our way of talking about stochastic outcomes is probability.

Deterministic vs. Stochastic “what is the relationship between hours spent studying and performance in Soc500?” One way to approach this:

I generate a deterministic account of performance performancei = f (hoursi ) I but studying isn’t the only indicator of performance!

I we could try to account for everything performancei = f (hoursi ) + g(otheri ).

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 27 / 62 A better approach

I Instead treat other factors as stochastic

I Thus we often write it as performancei = f (hoursi ) + i I This allows us to have uncertainty over outcomes given our inputs Our way of talking about stochastic outcomes is probability.

Deterministic vs. Stochastic “what is the relationship between hours spent studying and performance in Soc500?” One way to approach this:

I generate a deterministic account of performance performancei = f (hoursi ) I but studying isn’t the only indicator of performance!

I we could try to account for everything performancei = f (hoursi ) + g(otheri ). I but that’s impossible

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 27 / 62 I Instead treat other factors as stochastic

I Thus we often write it as performancei = f (hoursi ) + i I This allows us to have uncertainty over outcomes given our inputs Our way of talking about stochastic outcomes is probability.

Deterministic vs. Stochastic “what is the relationship between hours spent studying and performance in Soc500?” One way to approach this:

I generate a deterministic account of performance performancei = f (hoursi ) I but studying isn’t the only indicator of performance!

I we could try to account for everything performancei = f (hoursi ) + g(otheri ). I but that’s impossible A better approach

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 27 / 62 I Thus we often write it as performancei = f (hoursi ) + i I This allows us to have uncertainty over outcomes given our inputs Our way of talking about stochastic outcomes is probability.

Deterministic vs. Stochastic “what is the relationship between hours spent studying and performance in Soc500?” One way to approach this:

I generate a deterministic account of performance performancei = f (hoursi ) I but studying isn’t the only indicator of performance!

I we could try to account for everything performancei = f (hoursi ) + g(otheri ). I but that’s impossible A better approach

I Instead treat other factors as stochastic

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 27 / 62 I This allows us to have uncertainty over outcomes given our inputs Our way of talking about stochastic outcomes is probability.

Deterministic vs. Stochastic “what is the relationship between hours spent studying and performance in Soc500?” One way to approach this:

I generate a deterministic account of performance performancei = f (hoursi ) I but studying isn’t the only indicator of performance!

I we could try to account for everything performancei = f (hoursi ) + g(otheri ). I but that’s impossible A better approach

I Instead treat other factors as stochastic

I Thus we often write it as performancei = f (hoursi ) + i

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 27 / 62 Our way of talking about stochastic outcomes is probability.

Deterministic vs. Stochastic “what is the relationship between hours spent studying and performance in Soc500?” One way to approach this:

I generate a deterministic account of performance performancei = f (hoursi ) I but studying isn’t the only indicator of performance!

I we could try to account for everything performancei = f (hoursi ) + g(otheri ). I but that’s impossible A better approach

I Instead treat other factors as stochastic

I Thus we often write it as performancei = f (hoursi ) + i I This allows us to have uncertainty over outcomes given our inputs

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 27 / 62 Deterministic vs. Stochastic “what is the relationship between hours spent studying and performance in Soc500?” One way to approach this:

I generate a deterministic account of performance performancei = f (hoursi ) I but studying isn’t the only indicator of performance!

I we could try to account for everything performancei = f (hoursi ) + g(otheri ). I but that’s impossible A better approach

I Instead treat other factors as stochastic

I Thus we often write it as performancei = f (hoursi ) + i I This allows us to have uncertainty over outcomes given our inputs Our way of talking about stochastic outcomes is probability.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 27 / 62 egresses .÷÷÷÷¥.arrF • :÷ . HE § :

In Picture Form ••••••@ @ # € a @ rasa ÷÷.÷÷÷÷a**roao¥÷ . ⇒ ÷ I ± .

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 28 / 62 In Picture Form ••••••@ @ # € a egresses .÷÷÷÷¥.arrF • @ :÷ . rasa ÷÷.÷÷÷÷a**roao¥÷ . HE ⇒ ÷ I ± § : .

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 28 / 62 In Picture Form

probability

Data generating Observed data process

inference

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 29 / 62 Start with probability Allows us to contemplate world under hypothetical scenarios

I hypotheticals let us ask- is the observed relationship happening by chance or is it systematic?

I it tells us what the world would look like under a certain assumption We will review probability today, but feel free to ask questions as needed.

Statistical Thought

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 30 / 62 Allows us to contemplate world under hypothetical scenarios

I hypotheticals let us ask- is the observed relationship happening by chance or is it systematic?

I it tells us what the world would look like under a certain assumption We will review probability today, but feel free to ask questions as needed.

Statistical Thought Experiments

Start with probability

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 30 / 62 I hypotheticals let us ask- is the observed relationship happening by chance or is it systematic?

I it tells us what the world would look like under a certain assumption We will review probability today, but feel free to ask questions as needed.

Statistical Thought Experiments

Start with probability Allows us to contemplate world under hypothetical scenarios

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 30 / 62 I it tells us what the world would look like under a certain assumption We will review probability today, but feel free to ask questions as needed.

Statistical Thought Experiments

Start with probability Allows us to contemplate world under hypothetical scenarios

I hypotheticals let us ask- is the observed relationship happening by chance or is it systematic?

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 30 / 62 We will review probability today, but feel free to ask questions as needed.

Statistical Thought Experiments

Start with probability Allows us to contemplate world under hypothetical scenarios

I hypotheticals let us ask- is the observed relationship happening by chance or is it systematic?

I it tells us what the world would look like under a certain assumption

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 30 / 62 Statistical Thought Experiments

Start with probability Allows us to contemplate world under hypothetical scenarios

I hypotheticals let us ask- is the observed relationship happening by chance or is it systematic?

I it tells us what the world would look like under a certain assumption We will review probability today, but feel free to ask questions as needed.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 30 / 62 The Story Setup (lady discerning about tea) The (perform a taste test) The Hypothetical (count possibilities) The Result (boom she was right)

This became the Fisher .

Example: Fisher’s Lady Tasting Tea

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 31 / 62 (lady discerning about tea) The Experiment (perform a taste test) The Hypothetical (count possibilities) The Result (boom she was right)

This became the Fisher Exact Test.

Example: Fisher’s Lady Tasting Tea

The Story Setup

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 31 / 62 The Experiment (perform a taste test) The Hypothetical (count possibilities) The Result (boom she was right)

This became the Fisher Exact Test.

Example: Fisher’s Lady Tasting Tea

The Story Setup (lady discerning about tea)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 31 / 62 (perform a taste test) The Hypothetical (count possibilities) The Result (boom she was right)

This became the Fisher Exact Test.

Example: Fisher’s Lady Tasting Tea

The Story Setup (lady discerning about tea) The Experiment

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 31 / 62 The Hypothetical (count possibilities) The Result (boom she was right)

This became the Fisher Exact Test.

Example: Fisher’s Lady Tasting Tea

The Story Setup (lady discerning about tea) The Experiment (perform a taste test)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 31 / 62 (count possibilities) The Result (boom she was right)

This became the Fisher Exact Test.

Example: Fisher’s Lady Tasting Tea

The Story Setup (lady discerning about tea) The Experiment (perform a taste test) The Hypothetical

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 31 / 62 The Result (boom she was right)

This became the Fisher Exact Test.

Example: Fisher’s Lady Tasting Tea

The Story Setup (lady discerning about tea) The Experiment (perform a taste test) The Hypothetical (count possibilities)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 31 / 62 (boom she was right)

This became the Fisher Exact Test.

Example: Fisher’s Lady Tasting Tea

The Story Setup (lady discerning about tea) The Experiment (perform a taste test) The Hypothetical (count possibilities) The Result

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 31 / 62 This became the Fisher Exact Test.

Example: Fisher’s Lady Tasting Tea

The Story Setup (lady discerning about tea) The Experiment (perform a taste test) The Hypothetical (count possibilities) The Result (boom she was right)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 31 / 62 Example: Fisher’s Lady Tasting Tea

The Story Setup (lady discerning about tea) The Experiment (perform a taste test) The Hypothetical (count possibilities) The Result (boom she was right)

This became the Fisher Exact Test.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 31 / 62 1 Welcome

2 Goals

3 Ways to Learn

4 Structure of Course

5 Introduction to Probability What is Probability? Sample Spaces and Events Probability Functions Marginal, Joint and Conditional Probability Bayes’ Rule Independence

6 Fun With History

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 32 / 62 1 Welcome

2 Goals

3 Ways to Learn

4 Structure of Course

5 Introduction to Probability What is Probability? Sample Spaces and Events Probability Functions Marginal, Joint and Conditional Probability Bayes’ Rule Independence

6 Fun With History

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 32 / 62 Helps us envision hypotheticals Describes uncertainty in how the data is generated Data Analysis: estimate probability that something will happen Thus: we need to know how probability gives rise to data

Why Probability?

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 33 / 62 Describes uncertainty in how the data is generated Data Analysis: estimate probability that something will happen Thus: we need to know how probability gives rise to data

Why Probability?

Helps us envision hypotheticals

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 33 / 62 Data Analysis: estimate probability that something will happen Thus: we need to know how probability gives rise to data

Why Probability?

Helps us envision hypotheticals Describes uncertainty in how the data is generated

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 33 / 62 Thus: we need to know how probability gives rise to data

Why Probability?

Helps us envision hypotheticals Describes uncertainty in how the data is generated Data Analysis: estimate probability that something will happen

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 33 / 62 Why Probability?

Helps us envision hypotheticals Describes uncertainty in how the data is generated Data Analysis: estimate probability that something will happen Thus: we need to know how probability gives rise to data

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 33 / 62 While there are several interpretations of what probability is, most modern (post 1935 or so) researchers agree on an axiomatic definition of probability. 3 Axioms (Intuitive Version): 1 The probability of any particular event must be non-negative. 2 The probability of anything occurring among all possible events must be1. 3 The probability of one of many mutually exclusive events happening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.

Intuitive Definition of Probability

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 34 / 62 3 Axioms (Intuitive Version): 1 The probability of any particular event must be non-negative. 2 The probability of anything occurring among all possible events must be1. 3 The probability of one of many mutually exclusive events happening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.

Intuitive Definition of Probability

While there are several interpretations of what probability is, most modern (post 1935 or so) researchers agree on an axiomatic definition of probability.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 34 / 62 1 The probability of any particular event must be non-negative. 2 The probability of anything occurring among all possible events must be1. 3 The probability of one of many mutually exclusive events happening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.

Intuitive Definition of Probability

While there are several interpretations of what probability is, most modern (post 1935 or so) researchers agree on an axiomatic definition of probability. 3 Axioms (Intuitive Version):

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 34 / 62 2 The probability of anything occurring among all possible events must be1. 3 The probability of one of many mutually exclusive events happening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.

Intuitive Definition of Probability

While there are several interpretations of what probability is, most modern (post 1935 or so) researchers agree on an axiomatic definition of probability. 3 Axioms (Intuitive Version): 1 The probability of any particular event must be non-negative.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 34 / 62 3 The probability of one of many mutually exclusive events happening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.

Intuitive Definition of Probability

While there are several interpretations of what probability is, most modern (post 1935 or so) researchers agree on an axiomatic definition of probability. 3 Axioms (Intuitive Version): 1 The probability of any particular event must be non-negative. 2 The probability of anything occurring among all possible events must be1.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 34 / 62 All the rules of probability can be derived from these axioms.

Intuitive Definition of Probability

While there are several interpretations of what probability is, most modern (post 1935 or so) researchers agree on an axiomatic definition of probability. 3 Axioms (Intuitive Version): 1 The probability of any particular event must be non-negative. 2 The probability of anything occurring among all possible events must be1. 3 The probability of one of many mutually exclusive events happening is the sum of the individual probabilities.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 34 / 62 Intuitive Definition of Probability

While there are several interpretations of what probability is, most modern (post 1935 or so) researchers agree on an axiomatic definition of probability. 3 Axioms (Intuitive Version): 1 The probability of any particular event must be non-negative. 2 The probability of anything occurring among all possible events must be1. 3 The probability of one of many mutually exclusive events happening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 34 / 62 To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is often written as S or Ω.

For example, if we flip a coin twice, there are four possible outcomes,

S = {heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

Thus the table in Lady Tasting Tea was defining the sample space. (Note we defined illogical guesses to be prob= 0)

Sample Spaces

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 35 / 62 The sample space is the set of all possible outcomes, and is often written as S or Ω.

For example, if we flip a coin twice, there are four possible outcomes,

S = {heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

Thus the table in Lady Tasting Tea was defining the sample space. (Note we defined illogical guesses to be prob= 0)

Sample Spaces

To define probability we need to define the set of possible outcomes.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 35 / 62 For example, if we flip a coin twice, there are four possible outcomes,

S = {heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

Thus the table in Lady Tasting Tea was defining the sample space. (Note we defined illogical guesses to be prob= 0)

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is often written as S or Ω.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 35 / 62 S = {heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

Thus the table in Lady Tasting Tea was defining the sample space. (Note we defined illogical guesses to be prob= 0)

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is often written as S or Ω.

For example, if we flip a coin twice, there are four possible outcomes,

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 35 / 62 {heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

Thus the table in Lady Tasting Tea was defining the sample space. (Note we defined illogical guesses to be prob= 0)

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is often written as S or Ω.

For example, if we flip a coin twice, there are four possible outcomes,

S =

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 35 / 62 {heads, tails}, {tails, heads}, {tails, tails}

Thus the table in Lady Tasting Tea was defining the sample space. (Note we defined illogical guesses to be prob= 0)

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is often written as S or Ω.

For example, if we flip a coin twice, there are four possible outcomes,

S = {heads, heads},

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 35 / 62 {tails, heads}, {tails, tails}

Thus the table in Lady Tasting Tea was defining the sample space. (Note we defined illogical guesses to be prob= 0)

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is often written as S or Ω.

For example, if we flip a coin twice, there are four possible outcomes,

S = {heads, heads}, {heads, tails},

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 35 / 62 {tails, tails}

Thus the table in Lady Tasting Tea was defining the sample space. (Note we defined illogical guesses to be prob= 0)

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is often written as S or Ω.

For example, if we flip a coin twice, there are four possible outcomes,

S = {heads, heads}, {heads, tails}, {tails, heads},

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 35 / 62 Thus the table in Lady Tasting Tea was defining the sample space. (Note we defined illogical guesses to be prob= 0)

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is often written as S or Ω.

For example, if we flip a coin twice, there are four possible outcomes,

S = {heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 35 / 62 (Note we defined illogical guesses to be prob= 0)

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is often written as S or Ω.

For example, if we flip a coin twice, there are four possible outcomes,

S = {heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

Thus the table in Lady Tasting Tea was defining the sample space.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 35 / 62 Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is often written as S or Ω.

For example, if we flip a coin twice, there are four possible outcomes,

S = {heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

Thus the table in Lady Tasting Tea was defining the sample space. (Note we defined illogical guesses to be prob= 0)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 35 / 62 Imagine that we sample an apple from a bag. Looking in the bag we see:

The sample space is:

A Running Visual Metaphor

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 36 / 62 Looking in the bag we see:

The sample space is:

A Running Visual Metaphor

Imagine that we sample an apple from a bag.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 36 / 62 The sample space is:

A Running Visual Metaphor

Imagine that we sample an apple from a bag. Looking in the bag we see:

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 36 / 62 A Running Visual Metaphor

Imagine that we sample an apple from a bag. Looking in the bag we see:

The sample space is:

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 36 / 62 For Example, if

then

a given } , )

and

←{ rain }

T.B.DMBB.GG. are both . .

Events Events are subsets of the sample space.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 37 / 62 then

a given } , )

and

←{ rain }

T.B.DMBB.GG. are both . .

Events Events are subsets of the sample space. For Example, if

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 37 / 62 Events Events are subsets of the sample space. For Example, if

then

a given } , )

and

←{ rain }

T.B.DMBB.GG. are both . .

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 37 / 62 Sets are collections of things, in this case collections of outcomes One way to define an event is to describe the common property that all of the outcomes share. We write this as

{ω|ω satisfies P},

where P is the property that they all share.

Events Are a Kind of Set

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 38 / 62 One way to define an event is to describe the common property that all of the outcomes share. We write this as

{ω|ω satisfies P},

where P is the property that they all share.

Events Are a Kind of Set Sets are collections of things, in this case collections of outcomes

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 38 / 62 Events Are a Kind of Set Sets are collections of things, in this case collections of outcomes One way to define an event is to describe the common property that all of the outcomes share. We write this as

{ω|ω satisfies P}, where P is the property that they all share.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 38 / 62 Events Are a Kind of Set Sets are collections of things, in this case collections of outcomes One way to define an event is to describe the common property that all of the outcomes share. We write this as

{ω|ω satisfies P}, where P is the property that they all share.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 38 / 62 a given } , )

and

again←{ } [email protected]

are .

Ac = {ω ∈ Ω|ω∈ / A}. Important complement: Ωc = ∅, where ∅ is the empty set—it’s just the event that nothing happens.

Complement A complement of event A is a set: Ac , is collection of all of the outcomes not in A. That is, it is “everything else” in the sample space.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 39 / 62 Important complement: Ωc = ∅, where ∅ is the empty set—it’s just the event that nothing happens.

Complement A complement of event A is a set: Ac , is collection of all of the outcomes not in A. That is, it is “everything else” in the sample space.

a given } , )

and

again←{ } [email protected]

are .

Ac = {ω ∈ Ω|ω∈ / A}.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 39 / 62 Complement A complement of event A is a set: Ac , is collection of all of the outcomes not in A. That is, it is “everything else” in the sample space.

a given } , )

and

again←{ } [email protected]

are .

Ac = {ω ∈ Ω|ω∈ / A}. Important complement: Ωc = ∅, where ∅ is the empty set—it’s just the event that nothing happens. Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 39 / 62 The union of two events, A and B is the event that A or B occurs:

• . u *on = A ∪ B = {ω|ω ∈ A or ω ∈ B}. air Mina# air } { , ,

The intersection of two events, A and B is the event that both A and B occur: • s.no#go= A ∩ B = {ω|ω ∈ A and ω ∈ B}. ease{ }

Operations on Events

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 40 / 62 The intersection of two events, A and B is the event that both A and B occur: • s.no#go= A ∩ B = {ω|ω ∈ A and ω ∈ B}. ease{ }

Operations on Events

The union of two events, A and B is the event that A or B occurs:

• . u *on = A ∪ B = {ω|ω ∈ A or ω ∈ B}. air Mina# air } { , ,

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 40 / 62 Operations on Events

The union of two events, A and B is the event that A or B occurs:

• . u *on = A ∪ B = {ω|ω ∈ A or ω ∈ B}. air Mina# air } { , ,

The intersection of two events, A and B is the event that both A and B occur: • s.no#go= A ∩ B = {ω|ω ∈ A and ω ∈ B}. ease{ }

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 40 / 62 An event and its complement A and Ac are disjoint.

Sample spaces can have infinite outcomes A1, A2,....

Operations on Events

We say that two events A and B are disjoint or mutually exclusive if they don’t share any elements or that A ∩ B = ∅.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 41 / 62 Sample spaces can have infinite outcomes A1, A2,....

Operations on Events

We say that two events A and B are disjoint or mutually exclusive if they don’t share any elements or that A ∩ B = ∅. An event and its complement A and Ac are disjoint.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 41 / 62 Sample spaces can have infinite outcomes A1, A2,....

Operations on Events

We say that two events A and B are disjoint or mutually exclusive if they don’t share any elements or that A ∩ B = ∅. An event and its complement A and Ac are disjoint.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 41 / 62 Operations on Events

We say that two events A and B are disjoint or mutually exclusive if they don’t share any elements or that A ∩ B = ∅. An event and its complement A and Ac are disjoint.

Sample spaces can have infinite outcomes A1, A2,....

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 41 / 62 nonnegativity normalization

additivity

1 P(A) ≥ 0 for all A in the set of all events. 2 P(S) = 1

3 if events A1, A2,... are mutually exclusive then S∞ P∞ P( i=1 Ai ) = i=1 P(Ai ).

All the rules of probability can be derived from these axioms. Intuition: probability as allocating chunks of a unit-long stick.

Probability Function A probability function P(·) is a function defined over all subsets of a sample space S that satisfies the following three axioms:

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 42 / 62 normalization

additivity

nonnegativity 2 P(S) = 1

3 if events A1, A2,... are mutually exclusive then S∞ P∞ P( i=1 Ai ) = i=1 P(Ai ).

All the rules of probability can be derived from these axioms. Intuition: probability as allocating chunks of a unit-long stick.

Probability Function A probability function P(·) is a function defined over all subsets of a sample space S that satisfies the following three axioms:

- B = 5 . 1 P(A) ≥ 0 for all A in the set . P(89 ) of all events.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 42 / 62 normalization

additivity

2 P(S) = 1

3 if events A1, A2,... are mutually exclusive then S∞ P∞ P( i=1 Ai ) = i=1 P(Ai ).

All the rules of probability can be derived from these axioms. Intuition: probability as allocating chunks of a unit-long stick.

Probability Function A probability function P(·) is a function defined over all subsets of a sample space S that satisfies the following three axioms:

- B = 5 . 1 P(A) ≥ 0 for all A in the set . P(89 ) of all events. nonnegativity

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 42 / 62 additivity

normalization

3 if events A1, A2,... are mutually exclusive then S∞ P∞ P( i=1 Ai ) = i=1 P(Ai ).

All the rules of probability can be derived from these axioms. Intuition: probability as allocating chunks of a unit-long stick.

Probability Function A probability function P(·) is a function defined over all subsets of a sample space S that satisfies the following three axioms:

- B = 5 . 1 P(A) ≥ 0 for all A in the set . P (89 ) of all events. nonnegativity

2 P(S) = 1 399k P raise Bin = 1 . ( @ air , } ) { , ,

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 42 / 62 additivity

3 if events A1, A2,... are mutually exclusive then S∞ P∞ P( i=1 Ai ) = i=1 P(Ai ).

All the rules of probability can be derived from these axioms. Intuition: probability as allocating chunks of a unit-long stick.

Probability Function A probability function P(·) is a function defined over all subsets of a sample space S that satisfies the following three axioms:

- B = 5 . 1 P(A) ≥ 0 for all A in the set . P (89 ) of all events. nonnegativity

2 P(S) = 1 normalization 399k P raise Bin = 1 . ( @ air , } ) { , ,

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 42 / 62 additivity

All the rules of probability can be derived from these axioms. Intuition: probability as allocating chunks of a unit-long stick.

Probability Function A probability function P(·) is a function defined over all subsets of a sample space S that satisfies the following three axioms:

- B = 5 . 1 P(A) ≥ 0 for all A in the set . P (89 ) of all events. nonnegativity

2 P(S) = 1 normalization 399k P = 1 . ( .ba @ } ) { , 3 if events A1, A2,... are

givenPgive= P + Aip B. . an) ) mutually exclusive then Praa( ( ) ∞ ∞ S P nq P( i=1 Ai ) = i=1 P(Ai ). when a and he# are

exclusive . mutually

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 42 / 62 All the rules of probability can be derived from these axioms. Intuition: probability as allocating chunks of a unit-long stick.

Probability Function A probability function P(·) is a function defined over all subsets of a sample space S that satisfies the following three axioms:

- B = 5 . 1 P(A) ≥ 0 for all A in the set . P (89 ) of all events. nonnegativity

2 P(S) = 1 normalization 399k P = 1 . ( .ba @ } ) { , 3 if events A1, A2,... are

givenPgive= P + Aip B. . an) ) mutually exclusive then Praa( ( ) ∞ ∞ S P nq P( i=1 Ai ) = i=1 P(Ai ). when a and he# are

exclusive . additivity mutually

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 42 / 62 Intuition: probability as allocating chunks of a unit-long stick.

Probability Function A probability function P(·) is a function defined over all subsets of a sample space S that satisfies the following three axioms:

- B = 5 . 1 P(A) ≥ 0 for all A in the set . P (89 ) of all events. nonnegativity

2 P(S) = 1 normalization 399k P = 1 . ( .ba @ } ) { , 3 if events A1, A2,... are

givenPgive= P + Aip B. . an) ) mutually exclusive then Praa( ( ) ∞ ∞ S P nq P( i=1 Ai ) = i=1 P(Ai ). when a and he# are

exclusive . additivity mutually

All the rules of probability can be derived from these axioms.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 42 / 62 Probability Function A probability function P(·) is a function defined over all subsets of a sample space S that satisfies the following three axioms:

- B = 5 . 1 P(A) ≥ 0 for all A in the set . P (89 ) of all events. nonnegativity

2 P(S) = 1 normalization 399k P = 1 . ( .ba @ } ) { , 3 if events A1, A2,... are

givenPgive= P + Aip B. . an) ) mutually exclusive then Praa( ( ) ∞ ∞ S P nq P( i=1 Ai ) = i=1 P(Ai ). when a and he# are

exclusive . additivity mutually

All the rules of probability can be derived from these axioms. Intuition: probability as allocating chunks of a unit-long stick.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 42 / 62 Massive debate on interpretation: Subjective Interpretation

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is whatever you want it to be. But...

I If you don’t follow the axioms, a bookie can beat you

I There is a correct way to update your beliefs with data. Interpretation

I Probability of is the relative frequency with which an event would occur if the process were repeated a large number of times under similar conditions.

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is the frequency with which this event occurs in repeated samples of 10 cards.

A Brief Word on Interpretation

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 43 / 62 Subjective Interpretation

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is whatever you want it to be. But...

I If you don’t follow the axioms, a bookie can beat you

I There is a correct way to update your beliefs with data. Frequency Interpretation

I Probability of is the relative frequency with which an event would occur if the process were repeated a large number of times under similar conditions.

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is the frequency with which this event occurs in repeated samples of 10 cards.

A Brief Word on Interpretation

Massive debate on interpretation:

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 43 / 62 I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is whatever you want it to be. But...

I If you don’t follow the axioms, a bookie can beat you

I There is a correct way to update your beliefs with data. Frequency Interpretation

I Probability of is the relative frequency with which an event would occur if the process were repeated a large number of times under similar conditions.

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is the frequency with which this event occurs in repeated samples of 10 cards.

A Brief Word on Interpretation

Massive debate on interpretation: Subjective Interpretation

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 43 / 62 But...

I If you don’t follow the axioms, a bookie can beat you

I There is a correct way to update your beliefs with data. Frequency Interpretation

I Probability of is the relative frequency with which an event would occur if the process were repeated a large number of times under similar conditions.

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is the frequency with which this event occurs in repeated samples of 10 cards.

A Brief Word on Interpretation

Massive debate on interpretation: Subjective Interpretation

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is whatever you want it to be.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 43 / 62 I If you don’t follow the axioms, a bookie can beat you

I There is a correct way to update your beliefs with data. Frequency Interpretation

I Probability of is the relative frequency with which an event would occur if the process were repeated a large number of times under similar conditions.

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is the frequency with which this event occurs in repeated samples of 10 cards.

A Brief Word on Interpretation

Massive debate on interpretation: Subjective Interpretation

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is whatever you want it to be. But...

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 43 / 62 I There is a correct way to update your beliefs with data. Frequency Interpretation

I Probability of is the relative frequency with which an event would occur if the process were repeated a large number of times under similar conditions.

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is the frequency with which this event occurs in repeated samples of 10 cards.

A Brief Word on Interpretation

Massive debate on interpretation: Subjective Interpretation

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is whatever you want it to be. But...

I If you don’t follow the axioms, a bookie can beat you

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 43 / 62 Frequency Interpretation

I Probability of is the relative frequency with which an event would occur if the process were repeated a large number of times under similar conditions.

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is the frequency with which this event occurs in repeated samples of 10 cards.

A Brief Word on Interpretation

Massive debate on interpretation: Subjective Interpretation

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is whatever you want it to be. But...

I If you don’t follow the axioms, a bookie can beat you

I There is a correct way to update your beliefs with data.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 43 / 62 I Probability of is the relative frequency with which an event would occur if the process were repeated a large number of times under similar conditions.

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is the frequency with which this event occurs in repeated samples of 10 cards.

A Brief Word on Interpretation

Massive debate on interpretation: Subjective Interpretation

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is whatever you want it to be. But...

I If you don’t follow the axioms, a bookie can beat you

I There is a correct way to update your beliefs with data. Frequency Interpretation

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 43 / 62 I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is the frequency with which this event occurs in repeated samples of 10 cards.

A Brief Word on Interpretation

Massive debate on interpretation: Subjective Interpretation

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is whatever you want it to be. But...

I If you don’t follow the axioms, a bookie can beat you

I There is a correct way to update your beliefs with data. Frequency Interpretation

I Probability of is the relative frequency with which an event would occur if the process were repeated a large number of times under similar conditions.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 43 / 62 A Brief Word on Interpretation

Massive debate on interpretation: Subjective Interpretation

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is whatever you want it to be. But...

I If you don’t follow the axioms, a bookie can beat you

I There is a correct way to update your beliefs with data. Frequency Interpretation

I Probability of is the relative frequency with which an event would occur if the process were repeated a large number of times under similar conditions.

I Example: The probability of drawing 5 red cards out of 10 drawn from a deck of cards is the frequency with which this event occurs in repeated samples of 10 cards.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 43 / 62 Suppose we are now in a situation where we would like to express the probability that an event A and an event B occur. This quantity is written as P(A ∩ B), P(B ∩ A), P(A, B), or P(B, A) and is the joint probability of A and B.

Marginal and Joint Probability

So far we have only considered situations where we are interested in the probability of a single event A occurring. We’ve denoted this P(A). P(A) is sometimes called a marginal probability.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 44 / 62 Marginal and Joint Probability

So far we have only considered situations where we are interested in the probability of a single event A occurring. We’ve denoted this P(A). P(A) is sometimes called a marginal probability.

Suppose we are now in a situation where we would like to express the probability that an event A and an event B occur. This quantity is written as P(A ∩ B), P(B ∩ A), P(A, B), or P(B, A) and is the joint probability of A and B.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 44 / 62 p( = ?

.a . air* . ¥589 qq.pe#)= rainringing. . #*@a ← a.). • . µ p( ? a a

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 45 / 62 The “soul of statistics” If P(A) > 0 then the probability of B conditional on A can be written as

P(A, B) P(B|A) = P(A)

This implies that

P(A, B) = P(A) × P(B|A)

Conditional Probability

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 46 / 62 If P(A) > 0 then the probability of B conditional on A can be written as

P(A, B) P(B|A) = P(A)

This implies that

P(A, B) = P(A) × P(B|A)

Conditional Probability

The “soul of statistics”

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 46 / 62 P(A, B) P(B|A) = P(A)

This implies that

P(A, B) = P(A) × P(B|A)

Conditional Probability

The “soul of statistics” If P(A) > 0 then the probability of B conditional on A can be written as

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 46 / 62 This implies that

P(A, B) = P(A) × P(B|A)

Conditional Probability

The “soul of statistics” If P(A) > 0 then the probability of B conditional on A can be written as

P(A, B) P(B|A) = P(A)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 46 / 62 P(A, B) = P(A) × P(B|A)

Conditional Probability

The “soul of statistics” If P(A) > 0 then the probability of B conditional on A can be written as

P(A, B) P(B|A) = P(A)

This implies that

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 46 / 62 Conditional Probability

The “soul of statistics” If P(A) > 0 then the probability of B conditional on A can be written as

P(A, B) P(B|A) = P(A)

This implies that

P(A, B) = P(A) × P(B|A)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 46 / 62 Conditional Probability: A Visual Example

. Parsa) = bids

. PC.ae#dpl*agrain • a gear• t.ioa *or• oaivoiv

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 47 / 62 Conditional Probability: A Visual Example

. Parsa) = bids

. PC.ae#dpl*agrain • a gear• t.ioa *or• oaivoiv

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 47 / 62 Conditional Probability: A Visual Example

. Parsa) = bids

. PC.ae#dpl*agrain • a gear• t.ioa *or• oaivoiv

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 47 / 62 P(A) = 4/52 P(B|A) = 3/51 P(A, B) = P(A) × P(B|A) = 4/52 × 3/51 ≈ .0045

A Card Player’s Example

If we randomly draw two cards from a standard 52 card deck and define the events A = {King on Draw 1} and B = {King on Draw 2}, then

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 48 / 62 4/52 P(B|A) = 3/51 P(A, B) = P(A) × P(B|A) = 4/52 × 3/51 ≈ .0045

A Card Player’s Example

If we randomly draw two cards from a standard 52 card deck and define the events A = {King on Draw 1} and B = {King on Draw 2}, then

P(A) =

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 48 / 62 P(B|A) = 3/51 P(A, B) = P(A) × P(B|A) = 4/52 × 3/51 ≈ .0045

A Card Player’s Example

If we randomly draw two cards from a standard 52 card deck and define the events A = {King on Draw 1} and B = {King on Draw 2}, then

P(A) = 4/52

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 48 / 62 3/51 P(A, B) = P(A) × P(B|A) = 4/52 × 3/51 ≈ .0045

A Card Player’s Example

If we randomly draw two cards from a standard 52 card deck and define the events A = {King on Draw 1} and B = {King on Draw 2}, then

P(A) = 4/52 P(B|A) =

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 48 / 62 P(A, B) = P(A) × P(B|A) = 4/52 × 3/51 ≈ .0045

A Card Player’s Example

If we randomly draw two cards from a standard 52 card deck and define the events A = {King on Draw 1} and B = {King on Draw 2}, then

P(A) = 4/52 P(B|A) = 3/51

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 48 / 62 4/52 × 3/51 ≈ .0045

A Card Player’s Example

If we randomly draw two cards from a standard 52 card deck and define the events A = {King on Draw 1} and B = {King on Draw 2}, then

P(A) = 4/52 P(B|A) = 3/51 P(A, B) = P(A) × P(B|A) =

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 48 / 62 A Card Player’s Example

If we randomly draw two cards from a standard 52 card deck and define the events A = {King on Draw 1} and B = {King on Draw 2}, then

P(A) = 4/52 P(B|A) = 3/51 P(A, B) = P(A) × P(B|A) = 4/52 × 3/51 ≈ .0045

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 48 / 62 Law of Total Probability (LTP)

With 2 Events:

P(B) = P(B, A) + P(B, Ac ) = P(B|A) × P(A) + P(B|Ac ) × P(Ac )

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 49 / 62 Recall, if we randomly draw two cards from a standard 52 card deck and define the events A = {King on Draw 1} and B = {King on Draw 2}, then

P(A) = 4/52 P(B|A) = 3/51 P(A, B) = P(A) × P(B|A) = 4/52 × 3/51 Question: P(B) =?

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 50 / 62 P(B) = P(B, A) + P(B, Ac ) = P(B|A) × P(A) + P(B|Ac ) × P(Ac )

P(B) = 3/51 × 1/13 + 4/51 × 12/13 3 + 48 1 4 = = = 51 × 13 13 52

Confirming Intuition with the LTP

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 51 / 62 P(B) = 3/51 × 1/13 + 4/51 × 12/13 3 + 48 1 4 = = = 51 × 13 13 52

Confirming Intuition with the LTP

P(B) = P(B, A) + P(B, Ac ) = P(B|A) × P(A) + P(B|Ac ) × P(Ac )

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 51 / 62 Confirming Intuition with the LTP

P(B) = P(B, A) + P(B, Ac ) = P(B|A) × P(A) + P(B|Ac ) × P(Ac )

P(B) = 3/51 × 1/13 + 4/51 × 12/13 3 + 48 1 4 = = = 51 × 13 13 52

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 51 / 62 We know the following: Pr(vote|mobilized) = 0.75 Pr(vote|not mobilized) = 0.15 Pr(mobilized) = 0.6 and so Pr( not mobilized) = 0.4 Note that mobilization partitions the data. Everyone is either mobilized or not. Thus, we can apply the LTP:

Pr(vote) = Pr(vote|mobilized) Pr(mobilized)+ Pr(vote|not mobilized) Pr(not mobilized) =0.75 × 0.6 + 0.15 × 0.4 =.51

Example: Voter Mobilization Suppose that we have put together a voter mobilization campaign and we want to know what the probability of voting is after the campaign: Pr[vote].

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 52 / 62 Pr(vote|mobilized) = 0.75 Pr(vote|not mobilized) = 0.15 Pr(mobilized) = 0.6 and so Pr( not mobilized) = 0.4 Note that mobilization partitions the data. Everyone is either mobilized or not. Thus, we can apply the LTP:

Pr(vote) = Pr(vote|mobilized) Pr(mobilized)+ Pr(vote|not mobilized) Pr(not mobilized) =0.75 × 0.6 + 0.15 × 0.4 =.51

Example: Voter Mobilization Suppose that we have put together a voter mobilization campaign and we want to know what the probability of voting is after the campaign: Pr[vote]. We know the following:

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 52 / 62 Pr(vote|not mobilized) = 0.15 Pr(mobilized) = 0.6 and so Pr( not mobilized) = 0.4 Note that mobilization partitions the data. Everyone is either mobilized or not. Thus, we can apply the LTP:

Pr(vote) = Pr(vote|mobilized) Pr(mobilized)+ Pr(vote|not mobilized) Pr(not mobilized) =0.75 × 0.6 + 0.15 × 0.4 =.51

Example: Voter Mobilization Suppose that we have put together a voter mobilization campaign and we want to know what the probability of voting is after the campaign: Pr[vote]. We know the following: Pr(vote|mobilized) = 0.75

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 52 / 62 Pr(mobilized) = 0.6 and so Pr( not mobilized) = 0.4 Note that mobilization partitions the data. Everyone is either mobilized or not. Thus, we can apply the LTP:

Pr(vote) = Pr(vote|mobilized) Pr(mobilized)+ Pr(vote|not mobilized) Pr(not mobilized) =0.75 × 0.6 + 0.15 × 0.4 =.51

Example: Voter Mobilization Suppose that we have put together a voter mobilization campaign and we want to know what the probability of voting is after the campaign: Pr[vote]. We know the following: Pr(vote|mobilized) = 0.75 Pr(vote|not mobilized) = 0.15

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 52 / 62 Note that mobilization partitions the data. Everyone is either mobilized or not. Thus, we can apply the LTP:

Pr(vote) = Pr(vote|mobilized) Pr(mobilized)+ Pr(vote|not mobilized) Pr(not mobilized) =0.75 × 0.6 + 0.15 × 0.4 =.51

Example: Voter Mobilization Suppose that we have put together a voter mobilization campaign and we want to know what the probability of voting is after the campaign: Pr[vote]. We know the following: Pr(vote|mobilized) = 0.75 Pr(vote|not mobilized) = 0.15 Pr(mobilized) = 0.6 and so Pr( not mobilized) = 0.4

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 52 / 62 Thus, we can apply the LTP:

Pr(vote) = Pr(vote|mobilized) Pr(mobilized)+ Pr(vote|not mobilized) Pr(not mobilized) =0.75 × 0.6 + 0.15 × 0.4 =.51

Example: Voter Mobilization Suppose that we have put together a voter mobilization campaign and we want to know what the probability of voting is after the campaign: Pr[vote]. We know the following: Pr(vote|mobilized) = 0.75 Pr(vote|not mobilized) = 0.15 Pr(mobilized) = 0.6 and so Pr( not mobilized) = 0.4 Note that mobilization partitions the data. Everyone is either mobilized or not.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 52 / 62 Pr(vote) = Pr(vote|mobilized) Pr(mobilized)+ Pr(vote|not mobilized) Pr(not mobilized) =0.75 × 0.6 + 0.15 × 0.4 =.51

Example: Voter Mobilization Suppose that we have put together a voter mobilization campaign and we want to know what the probability of voting is after the campaign: Pr[vote]. We know the following: Pr(vote|mobilized) = 0.75 Pr(vote|not mobilized) = 0.15 Pr(mobilized) = 0.6 and so Pr( not mobilized) = 0.4 Note that mobilization partitions the data. Everyone is either mobilized or not. Thus, we can apply the LTP:

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 52 / 62 =0.75 × 0.6 + 0.15 × 0.4 =.51

Example: Voter Mobilization Suppose that we have put together a voter mobilization campaign and we want to know what the probability of voting is after the campaign: Pr[vote]. We know the following: Pr(vote|mobilized) = 0.75 Pr(vote|not mobilized) = 0.15 Pr(mobilized) = 0.6 and so Pr( not mobilized) = 0.4 Note that mobilization partitions the data. Everyone is either mobilized or not. Thus, we can apply the LTP:

Pr(vote) = Pr(vote|mobilized) Pr(mobilized)+ Pr(vote|not mobilized) Pr(not mobilized)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 52 / 62 =.51

Example: Voter Mobilization Suppose that we have put together a voter mobilization campaign and we want to know what the probability of voting is after the campaign: Pr[vote]. We know the following: Pr(vote|mobilized) = 0.75 Pr(vote|not mobilized) = 0.15 Pr(mobilized) = 0.6 and so Pr( not mobilized) = 0.4 Note that mobilization partitions the data. Everyone is either mobilized or not. Thus, we can apply the LTP:

Pr(vote) = Pr(vote|mobilized) Pr(mobilized)+ Pr(vote|not mobilized) Pr(not mobilized) =0.75 × 0.6 + 0.15 × 0.4

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 52 / 62 Example: Voter Mobilization Suppose that we have put together a voter mobilization campaign and we want to know what the probability of voting is after the campaign: Pr[vote]. We know the following: Pr(vote|mobilized) = 0.75 Pr(vote|not mobilized) = 0.15 Pr(mobilized) = 0.6 and so Pr( not mobilized) = 0.4 Note that mobilization partitions the data. Everyone is either mobilized or not. Thus, we can apply the LTP:

Pr(vote) = Pr(vote|mobilized) Pr(mobilized)+ Pr(vote|not mobilized) Pr(not mobilized) =0.75 × 0.6 + 0.15 × 0.4 =.51

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 52 / 62 Often we have information about Pr(B|A), but require Pr(A|B) instead. When this happens, always think: Bayes’ rule Bayes’ rule: if Pr(B) > 0, then:

Pr(B|A) Pr(A) Pr(A|B) = Pr(B)

Proof: combine the multiplication rule Pr(B|A) Pr(A) = P(A ∩ B), and the definition of conditional probability

Bayes’ Rule

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 53 / 62 When this happens, always think: Bayes’ rule Bayes’ rule: if Pr(B) > 0, then:

Pr(B|A) Pr(A) Pr(A|B) = Pr(B)

Proof: combine the multiplication rule Pr(B|A) Pr(A) = P(A ∩ B), and the definition of conditional probability

Bayes’ Rule

Often we have information about Pr(B|A), but require Pr(A|B) instead.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 53 / 62 Bayes’ rule: if Pr(B) > 0, then:

Pr(B|A) Pr(A) Pr(A|B) = Pr(B)

Proof: combine the multiplication rule Pr(B|A) Pr(A) = P(A ∩ B), and the definition of conditional probability

Bayes’ Rule

Often we have information about Pr(B|A), but require Pr(A|B) instead. When this happens, always think: Bayes’ rule

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 53 / 62 Proof: combine the multiplication rule Pr(B|A) Pr(A) = P(A ∩ B), and the definition of conditional probability

Bayes’ Rule

Often we have information about Pr(B|A), but require Pr(A|B) instead. When this happens, always think: Bayes’ rule Bayes’ rule: if Pr(B) > 0, then:

Pr(B|A) Pr(A) Pr(A|B) = Pr(B)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 53 / 62 Bayes’ Rule

Often we have information about Pr(B|A), but require Pr(A|B) instead. When this happens, always think: Bayes’ rule Bayes’ rule: if Pr(B) > 0, then:

Pr(B|A) Pr(A) Pr(A|B) = Pr(B)

Proof: combine the multiplication rule Pr(B|A) Pr(A) = P(A ∩ B), and the definition of conditional probability

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 53 / 62 Bayes’ Rule Mechanics

)

I . P.cat#a..)p.h..Pl ... :@posed) =

birds

. go.nu • • areasaired• • aimed

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 54 / 62 Bayes’ Rule Mechanics

)

I . P.cat#a..)p.h..Pl ... :@posed) =

birds

. go.nu • • areasaired• • aimed

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 54 / 62 Bayes’ Rule Mechanics

)

I . P.cat#a..)p.h..Pl ... :@posed) =

birds

. go.nu • • areasaired• • aimed

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 54 / 62 Bayes’ Rule Mechanics

)

I . P.cat#a..)p.h..Pl ... :@posed) =

birds

. go.nu • • areasaired• • aimed

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 54 / 62 Bayes’ Rule Example Baird B. . . p. b. . .•am•••r•g . .

of female • 76.5% their -| billionaires inherited fortunes to , compared ) 24.5% of male billionaires

Women Men . inherited billions So is p( woman ) ) greater than Plmanlinherited billions ) ?

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 55 / 62 Bayes’ Rule Example

% of female their -| billionaires inherited fortunes to , compared ) 24.5% of male billionaires

Women Men . inherited billions So is P( woman ) ) , than billions ) greater Plmanlinherited . [email protected] P( WII )=P(I1W)P(w)P( ÷ei÷ml÷÷tetI )

= • }| characteristics database * Data source : Billionaires Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 55 / 62 Enos (2015): how do we identify a person’s race from their name? First, note that the collects information on the distribution of names by race. For example, Washington is the most common last name among African-Americans in America:

I Pr(AfAm) = 0.132

I Pr(not AfAm) = 1 − Pr(AfAm) = .868

I Pr(Washington|AfAm) = 0.00378

I Pr(Washington|not AfAm) = 0.000061 We can now use Bayes’ Rule

Pr(Wash|AfAm) Pr(AfAm) Pr(AfAm|Wash) = Pr(Wash)

Example: Race and Names

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 56 / 62 First, note that the Census collects information on the distribution of names by race. For example, Washington is the most common last name among African-Americans in America:

I Pr(AfAm) = 0.132

I Pr(not AfAm) = 1 − Pr(AfAm) = .868

I Pr(Washington|AfAm) = 0.00378

I Pr(Washington|not AfAm) = 0.000061 We can now use Bayes’ Rule

Pr(Wash|AfAm) Pr(AfAm) Pr(AfAm|Wash) = Pr(Wash)

Example: Race and Names

Enos (2015): how do we identify a person’s race from their name?

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 56 / 62 For example, Washington is the most common last name among African-Americans in America:

I Pr(AfAm) = 0.132

I Pr(not AfAm) = 1 − Pr(AfAm) = .868

I Pr(Washington|AfAm) = 0.00378

I Pr(Washington|not AfAm) = 0.000061 We can now use Bayes’ Rule

Pr(Wash|AfAm) Pr(AfAm) Pr(AfAm|Wash) = Pr(Wash)

Example: Race and Names

Enos (2015): how do we identify a person’s race from their name? First, note that the Census collects information on the distribution of names by race.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 56 / 62 I Pr(AfAm) = 0.132

I Pr(not AfAm) = 1 − Pr(AfAm) = .868

I Pr(Washington|AfAm) = 0.00378

I Pr(Washington|not AfAm) = 0.000061 We can now use Bayes’ Rule

Pr(Wash|AfAm) Pr(AfAm) Pr(AfAm|Wash) = Pr(Wash)

Example: Race and Names

Enos (2015): how do we identify a person’s race from their name? First, note that the Census collects information on the distribution of names by race. For example, Washington is the most common last name among African-Americans in America:

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 56 / 62 I Pr(not AfAm) = 1 − Pr(AfAm) = .868

I Pr(Washington|AfAm) = 0.00378

I Pr(Washington|not AfAm) = 0.000061 We can now use Bayes’ Rule

Pr(Wash|AfAm) Pr(AfAm) Pr(AfAm|Wash) = Pr(Wash)

Example: Race and Names

Enos (2015): how do we identify a person’s race from their name? First, note that the Census collects information on the distribution of names by race. For example, Washington is the most common last name among African-Americans in America:

I Pr(AfAm) = 0.132

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 56 / 62 I Pr(Washington|AfAm) = 0.00378

I Pr(Washington|not AfAm) = 0.000061 We can now use Bayes’ Rule

Pr(Wash|AfAm) Pr(AfAm) Pr(AfAm|Wash) = Pr(Wash)

Example: Race and Names

Enos (2015): how do we identify a person’s race from their name? First, note that the Census collects information on the distribution of names by race. For example, Washington is the most common last name among African-Americans in America:

I Pr(AfAm) = 0.132

I Pr(not AfAm) = 1 − Pr(AfAm) = .868

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 56 / 62 I Pr(Washington|not AfAm) = 0.000061 We can now use Bayes’ Rule

Pr(Wash|AfAm) Pr(AfAm) Pr(AfAm|Wash) = Pr(Wash)

Example: Race and Names

Enos (2015): how do we identify a person’s race from their name? First, note that the Census collects information on the distribution of names by race. For example, Washington is the most common last name among African-Americans in America:

I Pr(AfAm) = 0.132

I Pr(not AfAm) = 1 − Pr(AfAm) = .868

I Pr(Washington|AfAm) = 0.00378

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 56 / 62 We can now use Bayes’ Rule

Pr(Wash|AfAm) Pr(AfAm) Pr(AfAm|Wash) = Pr(Wash)

Example: Race and Names

Enos (2015): how do we identify a person’s race from their name? First, note that the Census collects information on the distribution of names by race. For example, Washington is the most common last name among African-Americans in America:

I Pr(AfAm) = 0.132

I Pr(not AfAm) = 1 − Pr(AfAm) = .868

I Pr(Washington|AfAm) = 0.00378

I Pr(Washington|not AfAm) = 0.000061

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 56 / 62 Pr(Wash|AfAm) Pr(AfAm) Pr(AfAm|Wash) = Pr(Wash)

Example: Race and Names

Enos (2015): how do we identify a person’s race from their name? First, note that the Census collects information on the distribution of names by race. For example, Washington is the most common last name among African-Americans in America:

I Pr(AfAm) = 0.132

I Pr(not AfAm) = 1 − Pr(AfAm) = .868

I Pr(Washington|AfAm) = 0.00378

I Pr(Washington|not AfAm) = 0.000061 We can now use Bayes’ Rule

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 56 / 62 Example: Race and Names

Enos (2015): how do we identify a person’s race from their name? First, note that the Census collects information on the distribution of names by race. For example, Washington is the most common last name among African-Americans in America:

I Pr(AfAm) = 0.132

I Pr(not AfAm) = 1 − Pr(AfAm) = .868

I Pr(Washington|AfAm) = 0.00378

I Pr(Washington|not AfAm) = 0.000061 We can now use Bayes’ Rule

Pr(Wash|AfAm) Pr(AfAm) Pr(AfAm|Wash) = Pr(Wash)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 56 / 62 Note we don’t have the probability of the name Washington. Remember that we can calculate it from the LTP since the sets African-American and not African-American partition the sample space:

Pr(Wash|AfAm) Pr(AfAm) Pr(Wash|AfAm) Pr(AfAm) = Pr(Wash) Pr(Wash|AfAm) Pr(AfAm) + Pr(Wash|not AfAm) Pr(not AfAm) 0.132 × 0.00378 = 0.132 × 0.00378 + .868 × 0.000061 ≈ 0.9

Example: Race and Names

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 57 / 62 Remember that we can calculate it from the LTP since the sets African-American and not African-American partition the sample space:

Pr(Wash|AfAm) Pr(AfAm) Pr(Wash|AfAm) Pr(AfAm) = Pr(Wash) Pr(Wash|AfAm) Pr(AfAm) + Pr(Wash|not AfAm) Pr(not AfAm) 0.132 × 0.00378 = 0.132 × 0.00378 + .868 × 0.000061 ≈ 0.9

Example: Race and Names

Note we don’t have the probability of the name Washington.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 57 / 62 Pr(Wash|AfAm) Pr(AfAm) Pr(Wash|AfAm) Pr(AfAm) + Pr(Wash|not AfAm) Pr(not AfAm) 0.132 × 0.00378 = 0.132 × 0.00378 + .868 × 0.000061 ≈ 0.9

Example: Race and Names

Note we don’t have the probability of the name Washington. Remember that we can calculate it from the LTP since the sets African-American and not African-American partition the sample space:

Pr(Wash|AfAm) Pr(AfAm) = Pr(Wash)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 57 / 62 0.132 × 0.00378 = 0.132 × 0.00378 + .868 × 0.000061 ≈ 0.9

Example: Race and Names

Note we don’t have the probability of the name Washington. Remember that we can calculate it from the LTP since the sets African-American and not African-American partition the sample space:

Pr(Wash|AfAm) Pr(AfAm) Pr(Wash|AfAm) Pr(AfAm) = Pr(Wash) Pr(Wash|AfAm) Pr(AfAm) + Pr(Wash|not AfAm) Pr(not AfAm)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 57 / 62 ≈ 0.9

Example: Race and Names

Note we don’t have the probability of the name Washington. Remember that we can calculate it from the LTP since the sets African-American and not African-American partition the sample space:

Pr(Wash|AfAm) Pr(AfAm) Pr(Wash|AfAm) Pr(AfAm) = Pr(Wash) Pr(Wash|AfAm) Pr(AfAm) + Pr(Wash|not AfAm) Pr(not AfAm) 0.132 × 0.00378 = 0.132 × 0.00378 + .868 × 0.000061

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 57 / 62 Example: Race and Names

Note we don’t have the probability of the name Washington. Remember that we can calculate it from the LTP since the sets African-American and not African-American partition the sample space:

Pr(Wash|AfAm) Pr(AfAm) Pr(Wash|AfAm) Pr(AfAm) = Pr(Wash) Pr(Wash|AfAm) Pr(AfAm) + Pr(Wash|not AfAm) Pr(not AfAm) 0.132 × 0.00378 = 0.132 × 0.00378 + .868 × 0.000061 ≈ 0.9

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 57 / 62 Intuitive Definition Events A and B are independent if knowing whether A occurred provides no information about whether B occurred. Formal Definition

P(A, B) = P(A)P(B) =⇒ A⊥⊥B With all the usual > 0 restrictions, this implies P(A|B) = P(A) P(B|A) = P(B)

Independence is a massively important concept in statistics.

Independence

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 58 / 62 Events A and B are independent if knowing whether A occurred provides no information about whether B occurred. Formal Definition

P(A, B) = P(A)P(B) =⇒ A⊥⊥B With all the usual > 0 restrictions, this implies P(A|B) = P(A) P(B|A) = P(B)

Independence is a massively important concept in statistics.

Independence

Intuitive Definition

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 58 / 62 Formal Definition

P(A, B) = P(A)P(B) =⇒ A⊥⊥B With all the usual > 0 restrictions, this implies P(A|B) = P(A) P(B|A) = P(B)

Independence is a massively important concept in statistics.

Independence

Intuitive Definition Events A and B are independent if knowing whether A occurred provides no information about whether B occurred.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 58 / 62 P(A, B) = P(A)P(B) =⇒ A⊥⊥B With all the usual > 0 restrictions, this implies P(A|B) = P(A) P(B|A) = P(B)

Independence is a massively important concept in statistics.

Independence

Intuitive Definition Events A and B are independent if knowing whether A occurred provides no information about whether B occurred. Formal Definition

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 58 / 62 With all the usual > 0 restrictions, this implies P(A|B) = P(A) P(B|A) = P(B)

Independence is a massively important concept in statistics.

Independence

Intuitive Definition Events A and B are independent if knowing whether A occurred provides no information about whether B occurred. Formal Definition

P(A, B) = P(A)P(B) =⇒ A⊥⊥B

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 58 / 62 P(A|B) = P(A) P(B|A) = P(B)

Independence is a massively important concept in statistics.

Independence

Intuitive Definition Events A and B are independent if knowing whether A occurred provides no information about whether B occurred. Formal Definition

P(A, B) = P(A)P(B) =⇒ A⊥⊥B With all the usual > 0 restrictions, this implies

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 58 / 62 Independence is a massively important concept in statistics.

Independence

Intuitive Definition Events A and B are independent if knowing whether A occurred provides no information about whether B occurred. Formal Definition

P(A, B) = P(A)P(B) =⇒ A⊥⊥B With all the usual > 0 restrictions, this implies P(A|B) = P(A) P(B|A) = P(B)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 58 / 62 Independence

Intuitive Definition Events A and B are independent if knowing whether A occurred provides no information about whether B occurred. Formal Definition

P(A, B) = P(A)P(B) =⇒ A⊥⊥B With all the usual > 0 restrictions, this implies P(A|B) = P(A) P(B|A) = P(B)

Independence is a massively important concept in statistics.

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 58 / 62 Reading

I Aronow and Miller (1.1) on Random Events

I Aronow and Miller (1.2-2.3) on Probability Theory, Summarizing Distributions

I Optional: Blitzstein and Hwang Chapters 1-1.3 (probability), 2-2.5 (conditional probability), 3-3.2 (random variables), 4-4.2 (expectation), 4.4-4.6 (indicator rv, LOTUS, ), 7.0-7.3 (joint distributions)

I Optional: Imai Chapter 6 (probability) A word from your preceptors

Next Week

Random Variables

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 59 / 62 I Aronow and Miller (1.1) on Random Events

I Aronow and Miller (1.2-2.3) on Probability Theory, Summarizing Distributions

I Optional: Blitzstein and Hwang Chapters 1-1.3 (probability), 2-2.5 (conditional probability), 3-3.2 (random variables), 4-4.2 (expectation), 4.4-4.6 (indicator rv, LOTUS, variance), 7.0-7.3 (joint distributions)

I Optional: Imai Chapter 6 (probability) A word from your preceptors

Next Week

Random Variables Reading

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 59 / 62 A word from your preceptors

Next Week

Random Variables Reading

I Aronow and Miller (1.1) on Random Events

I Aronow and Miller (1.2-2.3) on Probability Theory, Summarizing Distributions

I Optional: Blitzstein and Hwang Chapters 1-1.3 (probability), 2-2.5 (conditional probability), 3-3.2 (random variables), 4-4.2 (expectation), 4.4-4.6 (indicator rv, LOTUS, variance), 7.0-7.3 (joint distributions)

I Optional: Imai Chapter 6 (probability)

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 59 / 62 Next Week

Random Variables Reading

I Aronow and Miller (1.1) on Random Events

I Aronow and Miller (1.2-2.3) on Probability Theory, Summarizing Distributions

I Optional: Blitzstein and Hwang Chapters 1-1.3 (probability), 2-2.5 (conditional probability), 3-3.2 (random variables), 4-4.2 (expectation), 4.4-4.6 (indicator rv, LOTUS, variance), 7.0-7.3 (joint distributions)

I Optional: Imai Chapter 6 (probability) A word from your preceptors

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 59 / 62 Fun With

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 60 / 62 History

Fun with

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 61 / 62 Fun with History

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 61 / 62 Fun with History

Legendre

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 61 / 62 Fun with History

Gauss

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 61 / 62 Fun with History

Quetelet

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 61 / 62 Fun with History

Galton

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 61 / 62 References

Enos, Ryan D. ”What the demolition of public housing teaches us about the impact of racial threat on political behavior.” American Journal of Political Science (2015). J. Andrew Harris “Whats in a Name? A Method for Extracting Information about Ethnicity from Names” Political Analysis (2015). Salsburg, David. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century (2002).

Stewart (Princeton) Week 1: Introduction and Probability September 14, 2016 62 / 62