<<

Course notes: Website 111 - Lecture 2 • All handouts will be available on the website:

http://stat.wharton.upenn.edu/~stjensen/stat111.html Collecting Data • Website also contains contact information for myself and teaching assistants Design of • Will post lecture notes before each class as well as homeworks

Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 1 Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 2

Course notes: Homework Course Notes: Midterm Exam

• Midterm is held on following date: • Homeworks will be handed out every week or two (around 8 HWs in all) Monday, Feb 29th (6-8pm)

• Homeworks will be submitted during Friday recitation • No makeup midterm examination! • A missing midterm exam counts as a zero score • No late homeworks will be accepted !! • Late homeworks will get a score of zero, without exception • Do not take this course if you can not attend midterm! • Your lowest homework grade is not included in final grade

Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 3 Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 4

Course Notes: Friday Recitations Outline for Lecture

• Recitations begin this Friday • Introduction to Experiments • Friday recitations are mandatory: attendance will be • Sources of in Experiments taken by your TA • Techniques for Avoiding Bias • No excuses for missing recitation are accepted: every • Matching missed recitation will reduce your recitation score • • Block Designs • However, worth noting that recitation score is only 10% of overall course grade • Blinding and Double-Blinding • Experiments vs. Observational Studies • Association vs. Causation

Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 5 Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 6

1 Experiments Sources of Bias • Used to address a specific question • An or study is biased if it systematically favours a particular outcome • Often used to examine causal effects 1. Subjects are not representative of the population • Eg. medical trials, education interventions 2. Treatment and control groups are inherently Treatment Group Treatment Result different on some lurking or variable 1 3. Subjects are influenced by knowing they are in Population Experimental 2 3 4 Units treatment or control groups Control Group No Treatment Result 4. Evaluator of outcomes is influenced by knowing they are in treatment or control groups • Can we just look at difference in results to get the Treatment Group Treatment Result causal effect of the treatment ? 1 Population Experimental 2 3 4 • Depends on whether the experiment was done well Units • many possible sources of bias in Control Group No Treatment Result

Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 7 Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 8

Bias 1: Non-representative units Bias 2: Confounding/Lurking Variables

• If your subjects are not representative of the • Treatment group and control group are different on some population, you won’t be able to generalize the results variable that also influences the outcome even if the experiment is well done • Example from my research: Effects of schizophrenia • A confounding variable that we can’t attribute any difference in outcomes to the treatment alone • Treatment group: schizophrenic individuals • Control Group: normal individuals • Simple example: drug trial where proportion of women is • Problem: no foolproof way of classifying individuals different in treatment versus control group as schizophrenic, so we have misclassification in • Gender becomes a confounding variable both our control and treatment groups • Are treatment vs control outcomes different due to the • Observed differences between the groups can not treatment or gender differences between groups? be generalized to true schizophrenics vs. normals

Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 9 Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 10

Association vs Causation Examples • In the presence of a confounding variable, we can only • Children who watch many hours of TV get lower conclude there is an association between treatment and grades in school on average than those who outcome, not causation watch less TV • Does this that TV causes poor grades?

• What are potential confounding variables?

• People who use artificial sweeteners in place of sugar tend to be heavier than people who use sugar • Does this mean that sweeteners cause weight gain? • What is probably happening here?

Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 11 Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 12

2 One solution: Matching Another Solution: Randomization

• Make sure that treatment and control groups are very • Problem with matching is that you cannot usually match on similar on observed variables like race, gender, age etc. unobserved characteristics (eg. Genetics) • Block designs: divide subjects into blocks with similar observed • Eg. Cholesterol drug trial - can’t match treatment and control variables before dividing them into treatment vs control groups on genetic predisposition for high cholesterol • Special case: Matched Pairs • Subjects are matched up into pairs, then one • Randomly assign subjects to treatment or control member of each pair gets treatment and the other gets control • should lead to groups that are similar • Example: Dandruff experiment or balanced on both observed and unobserved • treatment applied to one side and control confounding variables to other side of head • No reason to expect difference • Example: student from last year – students in sides except for treatment filled out same form as you, but there was a randomly assigned “1” vs. “2” group for each form

Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 13 Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 14

Randomization of In-Class Survey Even Better: Randomization + Matching • Check to see if groups are balanced: • Randomization generally leads to treatment and control Variable Treatment Control groups that are evenly balanced but you can still get Average Height 67.2 66.4 unlucky and get unbalanced groups Average Shoe Size 8.71 8.65 Average Number of Siblings 1.39 1.47 • Example: randomly placing 20 people (10 males, 10 Proportion of Red Sox fans 0.14 0.09 females) into treatment and control groups. Proportion of Yankee fans 0.17 0.21

• There are differences, but are they “significant”? • How many males will end up in treatment group? • Later on in the course, we will be able to answer questions like this • Ideally, we would have 5 males in treatment group, and 5 males in control group (balanced) • Of course, we can’t check the balance for unobserved • However, there is a chance to get 9 males in treatment variables…we just have to trust the randomization process and 1 male in control group (unbalanced)

Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 15 Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 16

Even Better: Randomization + Matching Bias 3: Subject knows treatment assignment • Randomized Blocks: randomize within blocks of observed variables • Example: • A subjects outcome is influenced by knowing that they • Divide up subjects into males and females first, then randomly are in a treatment or control group assign treatment or control to subjects in each group separately • Eg. drug trials: patients improve just because they think they are • Guarantees that equal number of males end up in treatment group receiving the drug and control group (same with females) • Solution: with • Placebo appears to be the treatment, so all subjects • Randomized Matched Pairs: randomly decide which (treatment and control) don’t know their true treatment member of each pair gets treatment vs. control assignment • Example: • Controls may improve slightly which is often called the • For each head in dandruff experiment, randomly assign which placebo effect side of head to get dandruff shampoo vs. control

Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 17 Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 18

3 Experiments vs. Observational Studies Bias 4: Evaluator knows treatment assignment • Often, we want the causal effect of some treatment, but our data is from an • Person evaluating outcome (eg. doctor in drug trial) may • Observational studies examine effects of some variable but also be influenced by knowing they are getting treatment without the advantages of a controlled experiment • Not a problem if outcome is something indisputable, • No treatment is applied in observational studies such as death! • Example: health effects of smoking • This is a problem for more subjective measures like • Unethical to randomly impose a treatment blood pressure or results from social programs • Could there be some confounding variable that explains • Solution: double-blinded experiment where neither health differences between smokers and non-smokers ? subjects not evaluators know treatment assignments • Very risky to make causal statements from

observational data, since we can not avoid bias due

to confounding variables!

Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 19 Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 20

Health Effects of Chocolate

• Report to European Society of Sexual Medicine: Next Class - Lecture 3 • 153 Italian women filled out sexual function • “intriguing correlation”: sexual function/desire significantly greater among chocolate-eaters • Observational study: association does not imply causation! • Collecting Data: Surveys and • Confounding: average age is 35 among frequent chocolate- eaters, compared with 40.4 in non-chocolate group

• Experiment at Penn - effect of chocolate on acne: • Moore and McCabe: Sections 3.3 - 3.4 • Acne patients either got treatment (bar with 10X usual amount of chocolate) or placebo (bar with vegetable fat) • Neither group showed noticeable effect on acne • US Naval academy experiment also showed no effect

• Clinical studies may indicate chocolate good for heart

Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 21 Jan 19, 2016 Stat 111 - Lecture 2 - Experiments 22

4