The Statistics of Causal Inference in the Social Sciences1 Political Science 236A Statistics 239A
Jasjeet S. Sekhon
UC Berkeley
October 29, 2014
1 c Copyright 2014 1 / 765 Background
We are concerned with causal inference in the social • sciences. We discuss the potential outcomes framework of causal • inference in detail. This framework originates with Jerzy Neyman (1894-1981), • the founder of the Berkeley Statistics Department. A key insight of the framework is that causal inference is a • missing data problem. The framework applies regardless of the method used to • estimate causal effects, whether it be quantitative or qualitative.
2 / 765 Background
We are concerned with causal inference in the social • sciences. We discuss the potential outcomes framework of causal • inference in detail. This framework originates with Jerzy Neyman (1894-1981), • the founder of the Berkeley Statistics Department. A key insight of the framework is that causal inference is a • missing data problem. The framework applies regardless of the method used to • estimate causal effects, whether it be quantitative or qualitative.
3 / 765 The Problem
Many of the great minds of the 18th and 19th centuries • contributed to the development of social statistics: De Moivre, several Bernoullis, Gauss, Laplace, Quetelet, Galton, Pearson, and Yule. They searched for a method of statistical calculus that • would do for social studies what Leibniz’s and Newton’s calculus did for physics. It quickly came apparent this is would be most difficult. • For example: deficits crowding out money versus • chemotherapy
4 / 765 The Problem
Many of the great minds of the 18th and 19th centuries • contributed to the development of social statistics: De Moivre, several Bernoullis, Gauss, Laplace, Quetelet, Galton, Pearson, and Yule. They searched for a method of statistical calculus that • would do for social studies what Leibniz’s and Newton’s calculus did for physics. It quickly came apparent this is would be most difficult. • For example: deficits crowding out money versus • chemotherapy
5 / 765 The Experimental Model
In the early 20th Century, Sir Ronald Fisher (1890-1962) • helped to establish randomization as the “reasoned basis for inference” (Fisher, 1935) Randomization: method by which systematic sources of • bias are made random Permutation (Fisherian) Inference •
6 / 765 The Experimental Model
In the early 20th Century, Sir Ronald Fisher (1890-1962) • helped to establish randomization as the “reasoned basis for inference” (Fisher, 1935) Randomization: method by which systematic sources of • bias are made random Permutation (Fisherian) Inference •
7 / 765 Historical Note: Charles Sanders Peirce
Charles Sanders Peirce (1839–1914) independently, and • before Fisher, developed permutation inference and randomized experiments He introduced terms “confidence” and “likelihood” • Work was not well known until later. Betrand Russell wrote • (1959): “Beyond doubt [...] he was one of the most original minds of the later nineteenth century, and certainly the greatest American thinker ever.” Karl Popper (1972): “[He is] one of the greatest • philosophers of all times”
8 / 765 Historical Note: Charles Sanders Peirce
Charles Sanders Peirce (1839–1914) independently, and • before Fisher, developed permutation inference and randomized experiments He introduced terms “confidence” and “likelihood” • Work was not well known until later. Betrand Russell wrote • (1959): “Beyond doubt [...] he was one of the most original minds of the later nineteenth century, and certainly the greatest American thinker ever.” Karl Popper (1972): “[He is] one of the greatest • philosophers of all times”
9 / 765 There are Models and then there are Models
Inference based on observational data remains a difficult • challenge. Especially, without rigorous mathematical theories such as Newtonian physics
Note the difference between the following two equations: • F = m a × usually measured as F (force) Newtons, N; m (mass) kg; a (acceleration) as m/s2
• Y = Xβ
10 / 765 Regression is Evil
It was hoped that statistical inference through the use of • multiple regression would be able to provide to the social scientist what experiments and rigorous mathematical theories provide, respectively, to the micro-biologist and astronomer. OLS has unfortunately become a black box which people • think solves hard problems it actually doesn’t solve. It is in this sense Evil. Let’s consider a simple sample to test intuition •
11 / 765 Regression is Evil
It was hoped that statistical inference through the use of • multiple regression would be able to provide to the social scientist what experiments and rigorous mathematical theories provide, respectively, to the micro-biologist and astronomer. OLS has unfortunately become a black box which people • think solves hard problems it actually doesn’t solve. It is in this sense Evil. Let’s consider a simple sample to test intuition •
12 / 765 A Simple Example
Let Y be a IID standard normal random variable, indexed • by t
Define ∆Yt = Yt Y − • − t 1 Let’s estimate via OLS: •
∆Yt = α + β1∆Yt−1 + β2∆Yt−2 + β3∆Yt−3
Question: What are the values of the betas as n ? • → ∞
13 / 765 A Simple Example II
What about for: •
∆Yt = α + β ∆Y − + β ∆Y − + + β ∆Y − 1 t 1 2 t 2 ··· k t k ?
14 / 765 OLS: Sometimes Mostly Harmless
Notwithstanding the forgoing, as we shall see OLS, and some other estimators, have some nice • properties under certain identification assumptions These assumptions are distinct from the usual statistical • assumptions (e.g., those required for Gauss-Markov) Relevant theorems do not assume that OLS is correct, that • the errors are IID. e.g., what happens when we don’t assume E( X) = 0? • |
15 / 765 Early and Influential Methods
John Stuart Mill (in his A System of Logic) devised a set of • five methods (or canons) of inference. They were outlined in Book III, Chapter 8 of his book. • Unfortunately, people rarely read the very next chapter • entitled “Of Plurality of Causes: and of the Intermixture of Effects.”
16 / 765 Mill’s Methods of Inductive Inference
Of Agreement: “If two or more instances of the • phenomenon under investigation have only one circumstance in common, the circumstance in which alone all the instances agree is the cause (or effect) of the given phenomenon.” Of Difference: “If an instance in which the phenomenon • under investigation occurs, and an instance in which it does not occur, have every circumstance in common save one, that one occurring only in the former; the circumstance in which alone the two instances differ is the effect, or the cause, or an indispensable part of the cause, of the phenomenon.”
17 / 765 Mill’s Methods of Inductive Inference
Of Agreement: “If two or more instances of the • phenomenon under investigation have only one circumstance in common, the circumstance in which alone all the instances agree is the cause (or effect) of the given phenomenon.” Of Difference: “If an instance in which the phenomenon • under investigation occurs, and an instance in which it does not occur, have every circumstance in common save one, that one occurring only in the former; the circumstance in which alone the two instances differ is the effect, or the cause, or an indispensable part of the cause, of the phenomenon.”
18 / 765 Three Other Methods
Joint Method of Agreement and Difference: “If an • antecedent circumstance is invariably present when, but only when, a phenomenon occurs, it may be inferred to be the cause of that phenomenon.” Method of Residues: “If portions of a complex • phenomenon can be explained by reference to parts of a complex antecedent circumstance, whatever remains of that circumstance may be inferred to be the cause of the remainder that phenomenon.” Method of Concomitant Variation: “If an antecedent • circumstance is observed to change proportionally with the occurrence of a phenomenon, it is probably the cause of that phenomenon.”
19 / 765 Uses of Mills Methods
These methods have been used by a vast number of researchers, including such famous ones as Durkheim and Weber. They are known as the “most similar” and “most different” research designs in some fields (Przeworski and Teune, 1970): Here are some examples: The Protestant Ethic • Deficits and interest rates • Health care systems and life expectancy • Gun control • Three strikes • The list goes on, and on.... •
20 / 765 Uses of Mills Methods
These methods have been used by a vast number of researchers, including such famous ones as Durkheim and Weber. They are known as the “most similar” and “most different” research designs in some fields (Przeworski and Teune, 1970): Here are some examples: The Protestant Ethic • Deficits and interest rates • Health care systems and life expectancy • Gun control • Three strikes • The list goes on, and on.... •
21 / 765 Uses of Mills Methods
These methods have been used by a vast number of researchers, including such famous ones as Durkheim and Weber. They are known as the “most similar” and “most different” research designs in some fields (Przeworski and Teune, 1970): Here are some examples: The Protestant Ethic • Deficits and interest rates • Health care systems and life expectancy • Gun control • Three strikes • The list goes on, and on.... •
22 / 765 Uses of Mills Methods
These methods have been used by a vast number of researchers, including such famous ones as Durkheim and Weber. They are known as the “most similar” and “most different” research designs in some fields (Przeworski and Teune, 1970): Here are some examples: The Protestant Ethic • Deficits and interest rates • Health care systems and life expectancy • Gun control • Three strikes • The list goes on, and on.... •
23 / 765 Uses of Mills Methods
Mill himself thought they were inappropriate for the study of social questions (Sekhon, 2004). “Nothing can be more ludicrous than the sort of parodies on experimental reasoning which one is accustomed to meet with, not in popular discussion only, but in grave treatises, when the affairs of nations are the theme. “How,” it is asked, “can an institution be bad, when the country has prospered under it?” “How can such or such causes have contributed to the prosperity of one country, when another has prospered without them?” Whoever makes use of an argument of this kind, not intending to deceive, should be sent back to learn the elements of some one of the more easy physical sciences” (Mill, 1873, pp. 346–7).
24 / 765 Basic Statistical Inference
Let’s look at an example Mill himself brought up: “In England, westerly winds blow during about twice as great a portion of the year as easterly. If, therefore, it rains only twice as often with a westerly as with an easterly wind, we have no reason to infer that any law of nature is concerned in the coincidence. If it rains more than twice as often, we may be sure that some law is concerned; either there is some cause in nature which, in this climate, tends to produce both rain and a westerly wind, or a westerly wind has itself some tendency to produce rain.”
25 / 765 Conditional Probability
H :P(rain westerly wind, Ω) > | P(rain not westerly wind, Ω), | where Ω is a set of background conditions we consider necessary for a valid comparison.
A lot is hidden in the Ω. The issue becomes clearer with the potential outcomes framework.
26 / 765 Potential Outcomes
Fundamental problem: not observing all of the potential • outcomes or counterfactuals This is the Neyman-Rubin Causal Model (Neyman 1923, • Rubin 1974, Holland 1986). Can be used to describe problems of causal inference for • both experimental work and observational studies.
27 / 765 Potential Outcomes
Fundamental problem: not observing all of the potential • outcomes or counterfactuals This is the Neyman-Rubin Causal Model (Neyman 1923, • Rubin 1974, Holland 1986). Can be used to describe problems of causal inference for • both experimental work and observational studies.
28 / 765 Observational Study
An observational study concerns cause-and-effect relationships • treatments, interventions or policies and • the effects they cause • The design stage of estimating the causal effect for treatment T is common for all Y .
29 / 765 Observational Study
An observational study concerns cause-and-effect relationships • treatments, interventions or policies and • the effects they cause • The design stage of estimating the causal effect for treatment T is common for all Y .
30 / 765 A Thought Experiment
An observational study could in principle have been an experiment but for ethical concerns or logistical issues.
You are probably not estimating a causal effect if you can’t answer Dorn’s (1953) Question: “what experiment would you have run if you were dictator and has infinite resources?”
E.G.: Can we estimate the causal effect of race on SAT scores?
Descriptive and predictive work is something else and can be interesting.
31 / 765 A Thought Experiment
An observational study could in principle have been an experiment but for ethical concerns or logistical issues.
You are probably not estimating a causal effect if you can’t answer Dorn’s (1953) Question: “what experiment would you have run if you were dictator and has infinite resources?”
E.G.: Can we estimate the causal effect of race on SAT scores?
Descriptive and predictive work is something else and can be interesting.
32 / 765 A Thought Experiment
An observational study could in principle have been an experiment but for ethical concerns or logistical issues.
You are probably not estimating a causal effect if you can’t answer Dorn’s (1953) Question: “what experiment would you have run if you were dictator and has infinite resources?”
E.G.: Can we estimate the causal effect of race on SAT scores?
Descriptive and predictive work is something else and can be interesting.
33 / 765 A Thought Experiment
An observational study could in principle have been an experiment but for ethical concerns or logistical issues.
You are probably not estimating a causal effect if you can’t answer Dorn’s (1953) Question: “what experiment would you have run if you were dictator and has infinite resources?”
E.G.: Can we estimate the causal effect of race on SAT scores?
Descriptive and predictive work is something else and can be interesting.
34 / 765 ford - Google Search 10/4/13 11:24 PM
ford SIGN IN
Web Images Maps Shopping News More Search tools
About 868,000,000 results (0.30 seconds)
Ads related to ford Ford.com - Ford Focus Official Site Ford Motor www.ford.com/Focus Be Everywhere @ Once w/ Up to 40MPG Ford Focus. Learn More @ Ford.com. Company Build and Price Photo Gallery 2,620,994 followers on Google+ Build & Price the 2014 Ford Focus. See Interior & Exterior Photos Ford Motor Company is an American A Small Car That's Big On Features! of the 2014 Focus @ Ford.com. multinational automaker headquartered in Dearborn, Michigan, a suburb of Detroit. It was 2013 Toyota Camry - BuyAToyota.com founded by Henry Ford and incorporated on June 16, 1903. Wikipedia www.buyatoyota.com/Camry 0% for 60 mos PLUS $1,000 Trade-in Cash on a new Camry. Learn more. CEO: Alan Mulally Headquarters: Dearborn, MI Ford – New Cars, Trucks, SUVs, Hybrids & Crossovers | Ford Vehicles www.ford.com/ Founder: Henry Ford The Official Ford Site to research, learn and shop for all new Ford Vehicles. View Founded: June 16, 1903 photos, videos, specs, compare competitors, build and price, search inventory ... Awards: Car and Driver 10Best, Motor Trend Car of the Year, More All Vehicles - 2013 F-150 - Mustang - 2014 Ford Focus 77,507 people +1'd this Recent35 posts / 765 Ford Dealership Charleston, North Charleston ... - Moncks Corner #FiestaST Duel - Two racing drivers battle it out. berkeleyford.net/ Johnny Herbert vs Dan Cammish - Formula Ford & Visit the Official Site of Berkeley Ford, Selling Ford in Moncks Corner, SC and Serving Fiesta ST ... and the winner is? Oct 1, 2013 Charleston. 1511 Highway 52, Moncks Corner, SC 29461.
Used Vehicle For Sale | Berkeley Ford Serving ... - Moncks Corner People also search for berkeleyford.net/Charleston/For-Sale/Used/ Shop now for Used Cars For Sale in Charleston . See the available vehicles to buy now.
ford near Berkeley, CA
Albany Ford Subaru 718 San Pablo Ave Toyota Volkswagen Honda Dodge Mazda www.albanyfordsubaru.com Albany Motor Passenger Motor 3.9 6 Google reviews (510) 528-1244 Corporation Cars Company...
East Bay Ford Truck Center 333 Filbert St Feedback / More info www.eastbaytruckcenter.com Oakland 2 Google reviews (510) 272-4400 See results for ford on a map »
Albany Ford Subaru | New Ford, Subaru dealership in Albany, CA ... www.albanyfordsubaru.com/ Albany, CA New, Albany Ford Subaru sells and services Ford, Subaru vehicles in the greater Albany area.
New Vehicle For Sale | Berkeley Ford Serving ... - Moncks Corner berkeleyford.net/Charleston/For-Sale/New/ Shop now for New Cars For Sale in Charleston . See the available vehicles to buy now.
News for ford Ford's Mulally dispels Microsoft CEO rumors USA TODAY - by Mike Snider - 2 days ago ATHENS, Ga. – Despite resurfaced rumors that he is on the very short list to replace outgoing CEO Steve Ballmer at Microsoft, Ford CEO Alan ...
Crosscut Ford's Mulally brushes off Microsoft CEO chatter Map for ford CNET - 3 days ago
Find Ford Dealers in Berkeley, California - Edmunds.com www.edmunds.com › Car Dealers › Ford › California › Alameda County Find Ford Dealers in Berkeley, California. The process of buying a new Ford car or truck https://www.google.com/#q=ford Page 1 of 2 ebay - Google Search 10/4/13 11:24 PM
ebay SIGN IN
Web Images Maps Shopping News More Search tools
About 1,730,000,000 results (0.19 seconds)
eBay: Electronics, Cars, Fashion, Collectibles, Coupons and More ... www.ebay.com/ eBay Buy and sell electronics, cars, fashion apparel, collectibles, sporting goods, digital cameras, baby items, coupons, and everything else on eBay, the world's ... Corporation Motors Cars Trucks eBay Inc. is an American multinational internet Cars Trucks - Parts & Accessories - Salvage - Truck - Rat rod - eBay consumer-to-consumer corporation, Collector Cars - Motorcycles Motors - Project - No Reserve headquartered in San Jose, California. Wikipedia
Women Men Customer service: 1 (866) 540-3229 Tops & Blouses - Coats & Jackets - Casual Shirts - Jeans - Coats & (Consumer) Swimwear - Shoes - Sweaters Jackets - Shorts - Dress Shirts Stock price: EBAY (NASDAQ) $55.58 +0.67 (+1.22%) Oct 4, 4:00 PM EDT - Disclaimer Daily Deals Cell Phone, Accessories Deals are updated daily, so check Cell Phones & Accessories. Cell Founder: Pierre Omidyar back for the deepest discounts ... Phones & Smartphones · Cell ... Founded: September 3, 1995, Campbell, CA More results from ebay.com » Headquarters: San Jose, CA CEO: John Donahoe News for ebay How the eBay of Illegal Drugs Came Undone People also search for New Yorker (blog) - 12 hours ago 36 / 765 If the F.B.I.'s allegations are true, Silk Road was undone by the zeal and carelessness of its owner, Ross William Ulbricht.
How to Get the Most Money Out of Your eBay Auctions TIME - 20 hours ago Amazon.c… United Craigslist Yahoo! Apple Silk Road Accused Denies Running 'Drugs Ebay' States Inc. Postal Se... Sky News - 4 hours ago
Feedback / More info Cars Trucks | eBay motors.shop.ebay.com › Buy Ford : Mustang. 1996 Ford Mustang SVT Cobra Coupe. Location: Franklin, MI. Watch this item. Get fast shipping and excellent service when you buy from eBay ... You recently searched for ford.
Ford Mustang - eBay Motors hub.motors.ebay.com › eBay Motors › Collector Cars Ford Mustang overview, descriptions of generations of Ford Mustangs, history, characteristics, pricing and specifications. Provided by eBay Motors.
eBay Classifieds (Kijiji) - Post & Search Free Local Classified Ads. www.ebayclassifieds.com/ Use free eBay Classifieds (Kijiji) to buy & sell locally all over the United States.
eBay - Wikipedia, the free encyclopedia en.wikipedia.org/wiki/EBay eBay Inc. is an American multinational internet consumer-to-consumer corporation, headquartered in San Jose, California. It was founded in 1995, and became ...
eBay for iPad for iPad on the iTunes App Store https://itunes.apple.com/us/app/ebay-for-ipad/id364203371?mt=8 Rating: 4.5 - 18,295 votes - Free - iOS Sep 17, 2013 - Description. The eBay for iPad app lets you sell, search, bid, buy, browse, and pay in an interface optimized for the iPad. With seamless ...
In-depth articles Behind eBay's Comeback The New York Times - Jul 2012 The Internet company's surprise earnings report was the result of technological innovation, a management overhaul and an embrace of new opportunities.
https://www.google.com/#q=ebay Page 1 of 2 A Modest Experiment
Figure 2: Brand Keyword Click Substitution 06jul2012 13jul2012 20jul2012 27jul2012 01jan2012 08jan2012 15jan2012 22jan2012 29jan2012 03jun2012 10jun2012 17jun2012 24jun2012 01jun2012 08jun2012 15jun2012 22jun2012 29jun2012 05feb2012 12feb2012 19feb2012 26feb2012 01apr2012 08apr2012 15apr2012 22apr2012 29apr2012 07sep2012 14sep2012 03aug2012 10aug2012 17aug2012 24aug2012 31aug2012 04mar2012 11mar2012 18mar2012 25mar2012 06may2012 13may2012 20may2012 27may2012 MSN Paid MSN Natural Goog Natural Goog Paid Goog Natural
(a) MSN Test (b) Google Test MSNFrom: and Blake, Google click Nosko, tra c is and shown Tadelis, for two events 2014 where paid search was suspended (Left) and suspended and resumed (Right).
To quantify this substitution, Table 1 shows estimates from a simple pre-post comparison 37 / 765 as well as a simple di↵erence-in-di↵erences across search platforms. In the pre-post analysis we regress the log of total daily clicks from MSN to eBay on an indicator for whether days were in the period with ads turned o↵.Column1showstheresultswhichsuggestthat click volume was only 5.6 percent lower in the period after advertising was suspended. This approach lacks any reasonable control group. It is apparent from Figure 2a that tra cincreasesindailyvolatilityandbeginstodeclineaftertheadvertisingisturned o↵. Both of these can be attributed to the seasonal nature of e-commerce. We look to another search platform which serves as a source for eBay tra c, Google, as a control group to account for seasonal factors. During the test period on MSN, eBay continued to purchase brand keyword advertising on Google which can serve as a control group. With this data, we calculate the impact of brand keyword advertising on total click tra c. In the di↵erence-in-di↵erences approach, we add observations of daily tra cfromGoogle and Yahoo! and include in the specification search engine dummies and trends.15 The variable of interest is thus the interaction between a dummy for the MSN platform and a dummy for treatment (ad o↵) period. Column 2 of Table 1 show a much smaller impact
15The estimates presented include date fixed e↵ects and platform specific trends but the results are very similar without these controls.
9 General Problems with Observational Studies
Prediction accuracy is uninformative • No randomization, the “reasoned basis for inference” • Selection problems abound • Data/Model Mining—e.g., not like vision • Found data, usually retrospective • Note: it is far easier to estimate the effect of a cause than the cause of an effect. Why?
38 / 765 General Problems with Observational Studies
Prediction accuracy is uninformative • No randomization, the “reasoned basis for inference” • Selection problems abound • Data/Model Mining—e.g., not like vision • Found data, usually retrospective • Note: it is far easier to estimate the effect of a cause than the cause of an effect. Why?
39 / 765 General Problems with Observational Studies
Prediction accuracy is uninformative • No randomization, the “reasoned basis for inference” • Selection problems abound • Data/Model Mining—e.g., not like vision • Found data, usually retrospective • Note: it is far easier to estimate the effect of a cause than the cause of an effect. Why?
40 / 765 General Problems with Observational Studies
Prediction accuracy is uninformative • No randomization, the “reasoned basis for inference” • Selection problems abound • Data/Model Mining—e.g., not like vision • Found data, usually retrospective • Note: it is far easier to estimate the effect of a cause than the cause of an effect. Why?
41 / 765 General Problems with Observational Studies
Prediction accuracy is uninformative • No randomization, the “reasoned basis for inference” • Selection problems abound • Data/Model Mining—e.g., not like vision • Found data, usually retrospective • Note: it is far easier to estimate the effect of a cause than the cause of an effect. Why?
42 / 765 Difficult Example: Does Information Matter?
We want to estimate the effect on voting behavior of paying • attention to an election campaign. Survey research is strikingly uniform regarding the • ignorance of the public (e.g., Berelson, Lazarsfeld, and McPhee, 1954; Campbell et al., 1960; J. R. Zaller, 1992). Although the fact of public ignorance has not been • forcefully challenged, the meaning of this observation has been (Sniderman, 1993). Can voters use information such as polls, interest group • endorsements and partisan labels to vote like their better informed compatriots (e.g., Lupia, 2004; McKelvey and Ordeshook, 1985a; McKelvey and Ordeshook, 1985b; McKelvey and Ordeshook, 1986)?
43 / 765 Difficult Example: Does Information Matter?
We want to estimate the effect on voting behavior of paying • attention to an election campaign. Survey research is strikingly uniform regarding the • ignorance of the public (e.g., Berelson, Lazarsfeld, and McPhee, 1954; Campbell et al., 1960; J. R. Zaller, 1992). Although the fact of public ignorance has not been • forcefully challenged, the meaning of this observation has been (Sniderman, 1993). Can voters use information such as polls, interest group • endorsements and partisan labels to vote like their better informed compatriots (e.g., Lupia, 2004; McKelvey and Ordeshook, 1985a; McKelvey and Ordeshook, 1985b; McKelvey and Ordeshook, 1986)?
44 / 765 Difficult Example: Does Information Matter?
We want to estimate the effect on voting behavior of paying • attention to an election campaign. Survey research is strikingly uniform regarding the • ignorance of the public (e.g., Berelson, Lazarsfeld, and McPhee, 1954; Campbell et al., 1960; J. R. Zaller, 1992). Although the fact of public ignorance has not been • forcefully challenged, the meaning of this observation has been (Sniderman, 1993). Can voters use information such as polls, interest group • endorsements and partisan labels to vote like their better informed compatriots (e.g., Lupia, 2004; McKelvey and Ordeshook, 1985a; McKelvey and Ordeshook, 1985b; McKelvey and Ordeshook, 1986)?
45 / 765 Difficult Example: Does Information Matter?
We want to estimate the effect on voting behavior of paying • attention to an election campaign. Survey research is strikingly uniform regarding the • ignorance of the public (e.g., Berelson, Lazarsfeld, and McPhee, 1954; Campbell et al., 1960; J. R. Zaller, 1992). Although the fact of public ignorance has not been • forcefully challenged, the meaning of this observation has been (Sniderman, 1993). Can voters use information such as polls, interest group • endorsements and partisan labels to vote like their better informed compatriots (e.g., Lupia, 2004; McKelvey and Ordeshook, 1985a; McKelvey and Ordeshook, 1985b; McKelvey and Ordeshook, 1986)?
46 / 765 Fundamental Problem of Causal Inference
Fundamental problem: not observing all of the potential • outcomes or counterfactuals Let Y denote i’s vote intention when voter i learns during • i1 the campaign (i.e., is in the treatment regime). Let Y denote i’s vote intention when voter i does not learn • i0 during the campaign (i.e., is in the control regime). Let T be a treatment indicator:1 when i is in the treatment • i regime and0 otherwise. The observed outcome for observation i is • Y = T Y + (1 T )Y . i i i1 − i i0 The treatment effect for i is • τ = Y Y . i i1 − i0
47 / 765 Fundamental Problem of Causal Inference
Fundamental problem: not observing all of the potential • outcomes or counterfactuals Let Y denote i’s vote intention when voter i learns during • i1 the campaign (i.e., is in the treatment regime). Let Y denote i’s vote intention when voter i does not learn • i0 during the campaign (i.e., is in the control regime). Let T be a treatment indicator:1 when i is in the treatment • i regime and0 otherwise. The observed outcome for observation i is • Y = T Y + (1 T )Y . i i i1 − i i0 The treatment effect for i is • τ = Y Y . i i1 − i0
48 / 765 Fundamental Problem of Causal Inference
Fundamental problem: not observing all of the potential • outcomes or counterfactuals Let Y denote i’s vote intention when voter i learns during • i1 the campaign (i.e., is in the treatment regime). Let Y denote i’s vote intention when voter i does not learn • i0 during the campaign (i.e., is in the control regime). Let T be a treatment indicator:1 when i is in the treatment • i regime and0 otherwise. The observed outcome for observation i is • Y = T Y + (1 T )Y . i i i1 − i i0 The treatment effect for i is • τ = Y Y . i i1 − i0
49 / 765 Fundamental Problem of Causal Inference
Fundamental problem: not observing all of the potential • outcomes or counterfactuals Let Y denote i’s vote intention when voter i learns during • i1 the campaign (i.e., is in the treatment regime). Let Y denote i’s vote intention when voter i does not learn • i0 during the campaign (i.e., is in the control regime). Let T be a treatment indicator:1 when i is in the treatment • i regime and0 otherwise. The observed outcome for observation i is • Y = T Y + (1 T )Y . i i i1 − i i0 The treatment effect for i is • τ = Y Y . i i1 − i0
50 / 765 Fundamental Problem of Causal Inference
Fundamental problem: not observing all of the potential • outcomes or counterfactuals Let Y denote i’s vote intention when voter i learns during • i1 the campaign (i.e., is in the treatment regime). Let Y denote i’s vote intention when voter i does not learn • i0 during the campaign (i.e., is in the control regime). Let T be a treatment indicator:1 when i is in the treatment • i regime and0 otherwise. The observed outcome for observation i is • Y = T Y + (1 T )Y . i i i1 − i i0 The treatment effect for i is • τ = Y Y . i i1 − i0
51 / 765 Fundamental Problem of Causal Inference
Fundamental problem: not observing all of the potential • outcomes or counterfactuals Let Y denote i’s vote intention when voter i learns during • i1 the campaign (i.e., is in the treatment regime). Let Y denote i’s vote intention when voter i does not learn • i0 during the campaign (i.e., is in the control regime). Let T be a treatment indicator:1 when i is in the treatment • i regime and0 otherwise. The observed outcome for observation i is • Y = T Y + (1 T )Y . i i i1 − i i0 The treatment effect for i is • τ = Y Y . i i1 − i0
52 / 765 Probability Model I: Repeated Sampling
Assume that Y (t) is a simple random sample from an • i infinite population, were Y is the outcome of unit i under treatment condition t. Assume the treat T is randomly assigned following either: • • Bernoulli assignment to each i • Complete randomization
53 / 765 Probability Model II: Fixed Population
Assume that Y (t) is a fixed population • i Assume the treat T is randomly assigned following either: • • Bernoulli assignment to each i • Complete randomization
54 / 765 Assignment Mechanism, Classical Experiment
individualistic • probabilistic • unconfounded: given covariates, not be dependent on any • potential outcomes
55 / 765 Notation
Let: N denote the number of units [in the sample], indexed by i • PN number of control units: Nc = i=1(1 Ti ); treated units: • PN − Nt = i=1(Ti ), Nc + Nt = N X is a vector of K covariates, where X is a N K matrix • i × Y (0) and Y (1) denote N-component vectors •
56 / 765 Notation
Let T be the N-dimensional vector with elements T 0, 1 • i ∈ with positive probability Let all possible values be T = 0, 1 N , with cardinality 2N • { } 0, 1 N denotes the set of all N vectors with all elements • { } equal to 0 or 1 Let the subset of values for T with positive probability be • + denoted by T
57 / 765 Assignment: Bernoulli Trials
Pr(T X, Y (0), Y (1)) = 0.5N , • | + where T = 0, 1 N = T { } Pr(T X, Y (0), Y (1)) = qNt (1 q)Nc • | · −
58 / 765 Notation Note
Pr(T X, Y (0), Y (1)) is not the probability of a particular • | unit receiving the treatment it reflects a measure across the full population of N units of • a particular assignment vector occurring The unit-level assignment for unit i is: • X p (X, Y (0), Y (1)) = Pr(T X, Y (0), Y (1)) i | T:Ti =1
were we sum the probabilities across all possible assignment vectors T for which Ti = 1
59 / 765 Assignment: Complete Randomization
An assignment mechanism that satisfies: • ( N ) + X T = T T Ti = Nt , ∈ i=1
for some preset Nt 1, 2,..., N 1 ∈ { − } N Number of assignment vectors in this design: , • Nt N q = t i N ∀
60 / 765 Assignment: Complete Randomization
Pr (T X, Y (0)), Y (1)) = | Nt Nc Nc! Nt ! Nt Nc PN · if T = Nt N! · N · N i=1 i 0 otherwise
61 / 765 Experimental Data
Under classic randomization, the inference problem is • straightforward because: T (Y (1), Y (0)) ⊥⊥ Observations in the treatment and control groups are not • exactly alike, but they are comparable—i.e., they are exchangeable With exchangeability plus noninterference between units, • for j = 0, 1 we have: E(Y (j) T = 1) = E(Y (j) T = 0) | |
62 / 765 Experimental Data
Under classic randomization, the inference problem is • straightforward because: T (Y (1), Y (0)) ⊥⊥ Observations in the treatment and control groups are not • exactly alike, but they are comparable—i.e., they are exchangeable With exchangeability plus noninterference between units, • for j = 0, 1 we have: E(Y (j) T = 1) = E(Y (j) T = 0) | |
63 / 765 Experimental Data
Under classic randomization, the inference problem is • straightforward because: T (Y (1), Y (0)) ⊥⊥ Observations in the treatment and control groups are not • exactly alike, but they are comparable—i.e., they are exchangeable With exchangeability plus noninterference between units, • for j = 0, 1 we have: E(Y (j) T = 1) = E(Y (j) T = 0) | |
64 / 765 Average Treatment Effect
The Average Treatment Effect(ATE) can be estimated • simply: τ¯ = Mean Outcome for the treated − Mean Outcome for the control
In notation: • τ¯ Y¯ Y¯ ≡ 1 − 0 τ¯ = E(Y (1) Y (0)) − = E(Y T = 1) E(Y T = 0) | − |
65 / 765 Average Treatment Effect
The Average Treatment Effect(ATE) can be estimated • simply: τ¯ = Mean Outcome for the treated − Mean Outcome for the control
In notation: • τ¯ Y¯ Y¯ ≡ 1 − 0 τ¯ = E(Y (1) Y (0)) − = E(Y T = 1) E(Y T = 0) | − |
66 / 765 Observational Data
With observational data, the treatment and control groups • are not drawn from the same population Progress can be made if we assume that the two groups • are comparable once we condition on observable covariates denoted by X This is the conditional independence assumption: • Y (1), Y (0) T X , { ⊥⊥ | } the reasonableness of this assumption depends on the section process
67 / 765 Average Treatment Effect for the Treated
With observational data, the treatment and control groups • are not drawn from the same population. Thus, we often want to estimate the average treatment • effect for the treated (ATT): τ¯ (T = 1) = E(Y (1) T = 1) E(Y (0) T = 1) | | − |
Progress can be made if we assume that the selection • process is the result of only observable covariates denoted by X We could alternative estimate the Average Treatment • Effect for the Controls(ATC).
68 / 765 Average Treatment Effect for the Treated
With observational data, the treatment and control groups • are not drawn from the same population. Thus, we often want to estimate the average treatment • effect for the treated (ATT): τ¯ (T = 1) = E(Y (1) T = 1) E(Y (0) T = 1) | | − |
Progress can be made if we assume that the selection • process is the result of only observable covariates denoted by X We could alternative estimate the Average Treatment • Effect for the Controls(ATC).
69 / 765 Average Treatment Effect for the Treated
With observational data, the treatment and control groups • are not drawn from the same population. Thus, we often want to estimate the average treatment • effect for the treated (ATT): τ¯ (T = 1) = E(Y (1) T = 1) E(Y (0) T = 1) | | − |
Progress can be made if we assume that the selection • process is the result of only observable covariates denoted by X We could alternative estimate the Average Treatment • Effect for the Controls(ATC).
70 / 765 ATE=ATT
Under random assignment • ATT = E[Y (1) Y (0) T = 1] = E[Y (1) Y (0)] = ATE − | −
Note: • E[Y T = 1] = E[Y (0) + T (Y (1) Y (0)) T = 1] | − | = E[Y (1) T = 1], by | ⊥⊥ = E[Y (1)]
Same holds for E[Y T = 0] • |
71 / 765 ATE=ATT
Under random assignment • ATT = E[Y (1) Y (0) T = 1] = E[Y (1) Y (0)] = ATE − | −
Note: • E[Y T = 1] = E[Y (0) + T (Y (1) Y (0)) T = 1] | − | = E[Y (1) T = 1], by | ⊥⊥ = E[Y (1)]
Same holds for E[Y T = 0] • |
72 / 765 ATE=ATT
Under random assignment • ATT = E[Y (1) Y (0) T = 1] = E[Y (1) Y (0)] = ATE − | −
Note: • E[Y T = 1] = E[Y (0) + T (Y (1) Y (0)) T = 1] | − | = E[Y (1) T = 1], by | ⊥⊥ = E[Y (1)]
Same holds for E[Y T = 0] • |
73 / 765 ATE=ATT
Since • E[Y (1) Y (0) T = 1] = E[Y (1) Y (0)] − | − = E[Y (1)] E[Y (0)] − = E[Y T = 1] E[Y T = 0] | − | Then, • ATE = ATT = E[Y T = 1] E[Y T = 0] | − |
74 / 765 Review and Details
More details on ATE, ATT, and potential outcomes: [LINK] • Some review of probability: [LINK] •
75 / 765 Selection on Observables
Then, we obtain the usual exchangeability results: • E(Y (j) X, T = 1) = E(Y (j) X, T = 0) | |
ATT can then be estimated by • τ¯ (T = 1) = E E(Y X, T = 1) | { | − E(Y X, T = 0) T = 1 | | }
where the outer expectation is taken over the distribution of X (T = 1) |
76 / 765 Selection on Observables
Then, we obtain the usual exchangeability results: • E(Y (j) X, T = 1) = E(Y (j) X, T = 0) | |
ATT can then be estimated by • τ¯ (T = 1) = E E(Y X, T = 1) | { | − E(Y X, T = 0) T = 1 | | }
where the outer expectation is taken over the distribution of X (T = 1) |
77 / 765 Observational Data: Matching
Before Matching After Matching
30 DID NOT LEARN 30 20 20 Density Density 10 10 DID LEARN 0 0
0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8
X VARIABLE X VARIABLE
78 / 765 Estimands Sample Average Treatment Effect (SATE): • N 1 X τ S = [Y (1) Y (0)] N i − i i=1 Sample Average Treatment Effect on the Treated (SATT): • S 1 X τt = [Yi (1) Yi (0)] Nt − i:Ti =1 Population Average Treatment Effect (PATE): • P τ = E [Y (1) Y (0)] − Population Average Treatment Effect on the Treated • (PATT): P τt = E [Y (1) Y (0) T = 1] − | 79 / 765 CATE
Condiational ATE (CATE): conditional on the sample distribution of sample covariates CATE: • N 1 X τ(X) = E [Yi (1) Yi (0) Xi ] N − | i=1
Condiation on the treated (CATT): • 1 X τ(X)t = E [Yi (1) Yi (0) Xi ] Nt − | i:Ti =1
80 / 765 Other Estimands
Condiational on whatever: 1 Potential outcomes 2 Potential response 3 Other latent attributes–e.g., probability of selection 4 (2) and (3) are probably related in practice
81 / 765 SATE, unbiased?
Assume completely randomized experiment. Follow • Neyman (1923/1990)
τˆ = Y¯t Y¯c • − The estimator τˆ is unbiased for τ • Expectation taken only over the randomization distribution • Nt Note: E [Ti Y (0), Y (1)] = • | N
82 / 765 SATE, unbiased?, proof
Rerwrite τˆ: • N 1 X T Y (1) (1 T ) Y (0) τˆ = i · i − i · i N Nt /N − Nc/N i=1
E [τ ˆ Y (0), Y (1)] = • | 1 PN E[Ti ] Yi (1) E[1 Ti ] Yi (0) i=1 · − · N Nt /N − Nc/N 1 = PN (Y (1) Y (0)) N i=1 i − i = τ
83 / 765 Sample Variance
2 2 2 ¯ ¯ Sc St Stc V Yt Yc = + • − Nc Nt − N S2 and S2 are the sample variances of Y (0) and Y (1): • c t N 2 1 X 2 S = Yi (0) Y¯ (0) c N 1 − − i=1 N 2 1 X 2 S = Yi (1) Y¯ (1) t N 1 − − i=1
2 1 PN 2 S = Yi (1) Yi (0) (Y¯ (1) Y¯ (0)) tc N 1 i=1 − − − − 1 PN 2 = (Yi (1) Yi (0) τ) N 1 i=1 − − −
84 / 765 Alternative From
2 2 2 S = S + S 2ρtc Sc St • tc c t − · · where
N 1 X ρtc = Yi (1) Y¯ (1) Yi (0) Y¯ (0) (N 1) Sc St − · − − · · i=1 N N ¯ ¯ t 2 c 2 2 V Yt Yc = Sc + St + ρtc Sc St • − N Nc · N Nt · N · · · ·
85 / 765 Sample Variance