“Cramming” Before the Exam: Estimating the Causal Effect of Exam Preparatory Programs in a Non-randomized Study
Ming-sen Wang Department of Economics University of Arizona∗†
May 04, 2012
FIRST DRAFT: January 12, 2012
Abstract
In this empirical paper, I estimate the impact of attending exam preparatory pro- grams, in particular “cram schools,” on students’ academic performance. I measure the outcome by admission to a public high school and an “elite” high school. Fo- cusing on the problem that students are not randomly assigned to “cram schools,” I approach the issue using propensity score matching and a Bayesian simultaneous- equations model. Using data from a survey of Taiwanese junior high school students in the Taiwan Youth Project, I find evidence that there is an insignificantly negative
∗I am indebted for continuous guidance of Ronald Oaxaca and helpful comments and suggestions from Katherine Barnes, Price Fishback, Keisuke Hirano, and Tiemen Woutersen. I have benefited from discussions with Mario Samano-Sanchez, Sandeep Shetty, and Ju-Chun Yen. All the remaining errors are of my own. E-mail: [email protected]; the latest version of the paper can be found at: http://www.u.arizona.edu/∼mswang. †Data analyzed in this paper were collected by the research project Taiwan Youth Project sponsored by the Academia Sinica ( AS-93-TP-C01). This research project was carried out by Institute of Sociology, Academia Sinica, and directed by Chin-Chun Yi. The Center for Survey Research of Academia Sinica is responsible for the data distribution. The authors appreciate the assistance in providing data by the institutes and individuals aforementioned. The views expressed herein are the authors’ own.
1 sorting into exam preparatory programs and attending an exam preparatory program improves a student’s possibility of being admitted to a public high school or an “elite” high school. Both approaches indicate similar positive treatment effects.
1 Introduction
In many East Asian countries, such as Taiwan and Japan, attendance of the so-called “cram school” is prevalent. A “cram school” is a type of shadow education that is aimed at improving a student’s exam writing skills. Attending “cram school” imposes additional burdens on a student and her family. It puts additional stress on a student since it requires time and effort. It puts financial loads on parents because sending a child to a program for a month can cost more than tuition fees for a semester in a public school. Given the prevalence and important role of exam preparatory programs in the education system, it is surprising that there are few rigorous evaluations. One problem is that students often self-select into these prep-programs.(Jackson(2012)[33]) As shown in Figure (1), the number of “cram schools” in Taiwan grows steadily. However, there has never been a rigorous proof that attending exam prep-program indeed improves students placement of high school. In a seminal paper, Stevenson and Baker (1992)[47] point out possible factors that foster “cram schools”: (1) the use of a centrally administered examination, (2) the use of “con- test rules” instead of “sponsorship rules”, and (3) tight linkages between the outcomes of educational allocation in elementary and secondary schooling and future educational oppor- tunities. Taiwanese society has all these factors. Graduates of an “elite” university in Taiwan have significant advantages in the labor market (Lin (1983) [36])1. A student’s performance in the Joint High-school Entrance Exam and the Joint College Entrance Exam is strongly linked to future opportunities. It causes a prevalence of “cram schools” in Taiwan and makes Taiwan an ideal candidate to study. The paper distinguishes itself from previous work in two ways (See Stevenson and Baker (1992)[47] and Lin et al.(2006)[37]). Firstly,while other literatures define exam performances as outcome, I focus on admission to public high school and “elite” high school as outcome of interest to avoid selection issue related to taking the Joint Entrance Exam. Since Taiwan has undergone a significant education reform lately as we will discuss in the next section, focusing on admission circumvents complication of modeling and necessity of exclusion restrictions.
1Notice this result can hardly be interpreted as causal since the research does not control for the selection that the graduates of an“elite” university in Taiwan is productive to begin with.
2 Besides, I estimate the effect of “cramming” using a dataset of junior high school students while previous work uses sample from high school students. The difference is meaningful in the sense that senior high school is an important stage of educational stratification in Taiwan. Whether attending prep-programs affects teenagers’ life trajectory to academic track or vocational track is an interesting question per se. I compare estimates from propensity score matching and a Bayesian simultaneous-equations model. Identification of the two approaches comes from different untestable assumptions: propensity score matching relies on conditional independence assumption (Rosenbaum and Rubin(1983)[43]) while the Bayesian model relies on exogeneity of the exclusion restrictions. Both approaches differ slightly in the interpretation of the estimate but indicate positive effects of attending “cram school” on admission to public high school or “elite” high school.
2500
2000
county Taipei City 1500 Taipei County Yilan County
1000 Number of Tutoring Schools Number of Tutoring
500
2002 2004 2006 2008 2010 year
Figure 1: Growth in Number of Tutoring Schools in Taiwan (2002 - 2010)
† Data of this bar chart comes from http://ap4.kh.edu.tw/. The database is maintained by the Education Bureau of Kaohsiung City Government. The database has county-level statistics for all cram schools and after-school tutoring in Taiwan. The figure shows the number of tutoring schools in the 3 countries under study increase over time from 2002 to 2010.
3 1.1 Institutional Backgrounds
In 1987, Taiwan ended the martial law that has been in effect since 1949. Along with the freer and more opener political atmosphere, many civil groups started to request reforms in the education system. One of the most significant changes was to replace the old Joint Exam System with the new Multi-Opportunities System. In the old system, every junior high graduate had to attend the Joint High-school Entrance Exam that took place in the summer after the graduation. Students were ranked based on their exam grades. The ranking determined their priorities to choose an academic high school or a vocational high school. Their performance on the Joint High-school Entrance Exam determined their high school. The Exam decided the educational stratification. In 2001, the Ministry of Education officially executed the new Multi-Opportunities Sys- tem. The main idea of the new system is to separate admissions from exams. Two joint exams, the Basic Scholastic Ability Test and the Joint High-school Entrance Exam, are held in a school year to provide students one more chance. Under the new system, students can be admitted to high schools through multiple channels, such as (1) the Joint Entrance Exam, (2) the Special Admission Quotas for Recommended Students, and (3) Other Chan- nels without Entrance Exam Grades. Even though using grades of the Joint Entrance Exam as outcome provides a universal measurement, it involves complication to handle selection to take the Exam. Defining admission as outcome very much simplifies the modeling.
1.2 Literature Review
Human capital investment has been a research focus ever since Becker(1962)[7]’s first rigorous treatment on the topic. A large literature is dedicated to estimating the returns of the formal schooling.(See Ashenfelter and Krueger (1994)[6]; Card (1995)[11]; Card(2001)[12]; Belzil(2007)[10]) Regan et al.(2007)[41], on the other hand, focuses on the optimal level of stopping schooling instead of estimating the rate of returns. On the other hand, if a prep-program does not directly increase human capital and it only affects a student’s exam performance, the program can be considered as a way to reduce high school costs. It is of particular interest to investigate whether “cram school” increases the likelihood of being admitted to public high school. Admission to an “elite” high school increases the likelihood of being admitted to a better public university2. Again, tuition fees
2Since Taiwanese government subsidizes higher education heavily, public universities in general are ranked as better universities.
4 in a public university are significantly lower than in a private university. Lower tuition fees affect a student’s decision of stopping schooling. As pointed out in Jackson(2010)[32],we can motivate the question in the context of the Becker–Willis-Rosen life cycle model of human capital investment (See Becker(1993)[9] and Willis and Rosen (1979)[52]). Suppose the log of earnings y is an increasing concave function of the years of schooling s: y = eg(s)
Individuals pay a cost c to attend school, and δ is the discount rate. Then in the Becker- Rosen framework, a student who considers two levels of schooling chooses T years over no schooling if:
Z ∞ Z T Z ∞ V (T ) ≥ V (0) = eg(T )e−δtdt − ce−δtdt ≥ eg(0)e−δtdt T 0 0
If c is lowered by the decision to attend a “cram school”, then a student’s utility when she acquires more education increases. A student will more likely acquire more education and postpone termination of schooling. If prep-programs have no effect or negative effects on placement of high school, then attending the programs is fundamentally a rent-seeking behavior.(See Krueger(1974)[34]) The motivation to send a teenager to “cram school” is affected by some behavioral factors, say unrealistic concerns that their children will be left behind if all other children go to “cram school.” Jackson(2010)[32] is the most similar study using a U.S. high school dataset. He looks at the short-term outcome of the Advanced Placement Incentive Program (APIP), which pays both teachers and students for passing grades of Advanced Placement (AP) examinations. Using propensity score matching methods, he finds that APIP adoption is associated with a 13 percent increase in the number of students scoring above 1100/24 on the SAT/ACT and 4.96 percent increase in the number of students matriculating in college. My study shows some similar patterns in Taiwan to his findings.
2 Data: Taiwan Youth Projects
The Taiwan Youth Project (TYP) was started in the spring of 2000, with junior high students from Taipei County, Taipei City, and Yilan County as the study population. In order to
5 examine the effects of Taiwan’s educational reforms on the students, TYP takes two cohorts as the study subjects: the 1st year junior high students with an average age of 13 (those taking reformed high school entrance system) and the 3rd year junior high students with an average age of 15 (those taking old high school entrance system). TYP collects 1000 students in the junior high’s 1st and 3rd year from both Taipei City and Taipei County and 800 students in the junior high’s 1st and 3rd year from Yilan County. The total sampling size is 5600 students. I use the cohort of the first year junior high students since I observe their program attendance history. After sample attrition, I am left with 2449 observations. In Table (1), I summarize the key variables in the dataset.
Table 1: Summary Statistics
Mean SD Mean SD “cram School” in Senior Year 0.48 0.50 Male 0.51 0.50 Sound Family 0.87 0.33 Number of Siblings 3.56 0.87 Ever Fail a Class 0.35 0.48 Admission to Public HS 0.30 0.46 Admission to Elite HS 0.10 0.31 Intent to Attend HS 0.68 0.47 Minutes to “cram School” 18.99 11.60 --- Cram School History for First 2 years 00 0.35 0.48 01 0.09 0.28 10 0.12 0.32 11 0.45 0.50 Counties Taipei City 0.39 0.49 Taipei County 0.39 0.49 Yilan County 0.22 0.41 --- Father’s Educ. Mother’s Educ. Elementary School 0.13 0.34 Elementary School 0.17 0.37 Junior High School 0.26 0.44 Junior High School 0.26 0.44 High School Graduate 0.25 0.43 High School Graduate 0.26 0.44 Vocational School 0.08 0.27 Vocational School 0.10 0.30 Vocational College 0.06 0.24 Vocational College 0.05 0.23 University 0.11 0.31 University 0.08 0.27 Grad School 0.04 0.18 Grad School 0.01 0.11 Not Applicable 0.00 0.05 Not Applicable 0.01 0.09 No Education 0.06 0.24 No Education 0.06 0.24 Family Income less than NTD 30,000 0.18 0.38 NTD 30,000 -NTD 49,999 0.22 0.41 NTD 50,000 -NTD 59,999 0.21 0.40 NTD 60,000 -NTD 69,999 0.07 0.26 NTD 70,000 -NTD 79,999 0.08 0.27 NTD 80,000 -NTD 89,999 0.05 0.22 NTD 90,000 -NTD 99,999 0.04 0.20 NTD 100,000 -NTD 109,999 0.04 0.20 NTD 110,000 -NTD 119,999 0.03 0.17 NTD 120,000 -NTD 129,999 0.02 0.14 NTD 130,000 -NTD 139,999 0.01 0.10 NTD 140,000 -NTD 149,999 0.01 0.10 more than NTD 150,000 0.04 0.20 ---
6 3 Propensity Score Matching
I approach the question firstly by propensity score matching. I define the treatment as attending an exam prep-program in the senior year because attending “cram school” in that year has the strongest linkage to placement of high school. Because in the data we only observe realized outcome of the treatment group, the propensity score matching approach is to construct a counterfactual outcome for each treated unit based on the propensity score. Identification of propensity score matching relies on conditional independence assumption (Rosenbaum and Rubin(1983)[43]):
Ti ⊥ Yi(1),Yi(0)|Xi
where Yi(1) and Yi(0) denote potential outcomes given treatment. Conditional on observable characteristics, potential outcomes are independent of treat- ment. In our context, I assume attending “cram school” is independent of the potential admission outcomes given attending “cram school” or not after controlling for the observed family background and students’ performance in school. It requires a strong but empirically untestable assumption on the mechanism that there is no unobserved characteristics that affect both outcome and exam prep-program attendance. Hence, it is important to select covariates so that the conditional independence assumption is likely to hold. In addition to standard covariates in education literatures, I proxy for ability by whether a student ever fails a class and for motivation by whether she intends to attend high school. Given the richness of covariates I adopt propensity score approach. Rosenbaum and Rubin(1983)[43] shows that conditioning on the full covariates is equivalent to conditioning on the propensity score, which is the coarsest balancing score. I non-parametrically estimate the propensity score by series logit regression. By 10-fold cross-validation, the first-order series yields the smallest predicted error. I present the estimates in the propensity score in Table(2).
3.1 Overlap Condition
An important issue that often hampers the propensity score matching approach is lack of overlap in the covariate distributions. Figure (2) shows the histogram of the estimated propensity scores of both treatment and control groups. Even though the treatment group is concentrated more to higher value of propensity score and the control group is concentrated
7 Table 2: Estimated Propensity Score
Estimate Std. Error z value Pr(>|z|) (Intercept) -4.1436 1.1167 -3.71 0.0002 Male -0.0747 0.1129 -0.66 0.5081 Num of Siblings -0.1443 0.0694 -2.08 0.0376 Sound Family 0.4229 0.1806 2.34 0.0192 Attendance Histories 11 3.3385 0.1450 23.02 0.0000 10 0.4082 0.1914 2.13 0.0329 01 2.8512 0.1986 14.36 0.0000 Fail a Class (Proxy for Ability) 0.5430 0.1269 4.28 0.0000 Intention to HS (Proxy for Motivation) 0.3859 0.1249 3.09 0.0020 Father’s Educ. Yes Mother’s Educ. Yes Father’s Occ. Yes Mother’s Occ. Yes School FE Yes Family Income Level Yes
more to the lower value, both share a common support. An implication of the figure is that we should use a small number of matches to avoid too much smoothing and extrapolation.
3.2 Results
The benchmark result of propensity score matching is presented in Table(3). I compares different matching approaches. In 1-nearest-neighbor matching, the counterfactual out- come is constructed based on the shortest distance in the control group to the treated. 10-nearest-neighbor-matching, instead, matches the closest 10 units. By using more com- parison units, the precision of the estimate increases at the cost of larger bias. The trade-off between 1-nearest-neighbor and 10-nearest-neighbor is well-known variance-bias trade-off in non-parametric literatures. On the other hand, caliper matching uses all the control units within the predefined caliper but drops the treated units that have no matches. The problem with caliper matching is that the choice of caliper is arbitrary to the researcher’s judgment and that dropping unmatched units alters the interpretation of the estimate. Instead of the average treatment effect on the treated (ATT), the estimate of caliper matching should be interpreted as conditional treatment effect on the treated given the matched subset (CATT). All the estimates for ATT are significantly positive, ranging from 15% to 18% improvement in chances of admission to public high school and from 3% to 5% improvement in chances of admission to “elite” high school. In words, the students who attended “cram school” would
8 0
4
3
2
1
0
1 density
4
3
2
1
0
0.0 0.2 0.4 0.6 0.8 1.0 propensity score
Figure 2: Histograms of Estimated Propensity Scores have lost 15% to 18% chances of being admitted to public high school and 3% to 5% chances of being admitted to “elite” high school if she had not attended “cram school.” Since I am interested in estimating ATT, I can apply the covariate balancing strategy proposed by Rubin (2006)[45] given overlap in covariate distributions is a concern. The idea is to select a more balanced subsample before estimating the ATT. The procedure works as follows:
1. Order the treated units by an estimated propensity score
2. Match without replacement by decreasing value of the estimated propensity score to select corresponding control units. This leads to a balanced sample with sample size
2 × N1.
3. Redo an analysis, say propensity score matching, on the balanced sample. Con
An advantage of the approach is that the interpretation of the estimate is not affected by trimming control units as long as we are interested in ATT. I report the result in Table(4). Consistent with the previous results, attending an exam prep-program improves a student’s chance of being admitted to public high school by signif- icantly 15% to 18% and to “elite” high school by 2% to 5%.
9 Table 3: Propensity Score Matching: Full Sample
Outcome Est. A-I S.E.† Num. Matched 1-Nearest-Neighbor Public High School 0.157∗∗∗ 0.035 1199 Elite High School 0.029 0.022 1199 10-Nearest-Neighbor Public High School 0.145∗∗∗ 0.030 1199 Elite High School 0.030 0.020 1199 Caliper δ = 0.001 Public High School 0.180∗∗∗ 0.011 457 Elite High School 0.045∗∗∗ 0.007 457
† The standard errors are calculated based on Abadie and Imbens(2006)[1].
Table 4: Propensity Score Matching: Rubin Subsample
Outcome Est. A-I S.E. Num. Matched 1-Nearest-Neighbor Public High School 0.190∗∗∗ 0.049 1199 Elite High School 0.050 0.036 1199 10-Nearest-Neighbor Public High School 0.187∗∗∗ 0.041 1199 Elite High School 0.053∗ 0.031 1199 Caliper δ = 0.001 Public High School 0.172∗∗∗ 0.013 376 Elite High School 0.024∗∗∗ 0.009 376
10 Table (5) shows the estimates of ATT using the subsample of students who intends to attend high school. The sample gets rid of observations that are interested in professional training or termination of schooling. This is the first attempt to deal with ability sorting issue. Students better at academics would like to attend high school; therefore, they are more likely to go to “cram school.” The estimated effect may be exaggerated. On the other hand, if students who go to “cram school” are those who would like to attend high school but do not have comparative advantage in academic, then we would expect the estimate to be downward biased. Again, the estimator relies on the assumption that the conditional independence assumption holds within the subsample even though some may doubt its validity on the full sample. Since the estimate only exploits a subsample, the interpretation of estimates is again changed from ATT to CATT: treatment effect on the treated given students who would like to go to high school. All estimates show slightly larger effects but still consistent with the previous estimates.
Table 5: Propensity Score Matching: Intention-to-HS Subsample
Outcome Est. A-I S.E. Num. Matched 1-Nearest-Neighbor Public High School 0.218∗∗ 0.061 931 Elite High School 0.096∗∗ 0.046 931 10-Nearest-Neighbor Public High School 0.236∗∗∗ 0.050 931 Elite High School 0.102∗∗ 0.040 931 Caliper δ = 0.001 Public High School 0.176∗∗∗ 0.013 224 Elite High School 0.068∗∗∗ 0.010 224
4 Bayesian Simultaneous Equations Model
As mentioned briefly in the last section, some may be concerned about the validity of condi- tional independence assumption since students may select to attending “cram school” based on their motivation and ability. In this section, I set up a Bayesian simultaneous equations model that attempts to take possible selection into account. ∗ ∗ The model assumes latent potential outcomes Yi (0) and Yi (1) as a linear function of family characteristics, Xi, treatment (“cram school” attendance), Ti, and an unobserved random shock 1i.
11 In addition, I assume that the treatment effect is constant over population
∗ ∗ Yi (1) − Yi (0) = τ, ∀i
and that the unobservable characteristics for each individual are the same whether she gets treatment or not. The constant treatment effect assumption is somehow unrealistic and restrictive. It may still be a good approximation. As noted in Angrist(2001)[2], in practice, more general estimation strategies allowing heterogeneous treatment effect often lead to similar average treatment effect. The assumption allows me to extrapolate the treatment effect on those whose decision is affected by the exclusion restriction to the whole population. I, in turn, express the latent potential outcomes as:
∗ Yi (1) = τ + Xiβ1 + 1i ∗ Yi (0) = Xiβ1 + 1i
The observed outcome becomes:
∗ Yi = Yi(0) + Ti[Yi(1) − Yi(0)] (1)
= τTi + Xiβ1 + 1i (2)
∗ I observe Yi = 1 if Yi > 0; Yi = 0 otherwise. In order to accommodate the selection problem, I follow the standard strategy of Heckman (1979)[30] to assume a household makes their optimal decision whether to send their children to an exam preparatory program. A household sends their children to a “cram school” if ∗ the utility is greater than a certain threshold. Therefore, I can interpret Ti as the latent normalized utility: ∗ Ti = γzi + Xiβ2 + 2i (3)
∗ I observe Ti = 1 if Ti > 0; Ti = 0 otherwise. ∗ ∗ The argument implies: given we know Yi and Ti , and I can solve for the simultaneous equations model, the estimate for τ is an estimate for the average treatment effect. Identi- fication of the model boils down to whether I can solve the simultaneous equations. I will discuss the issue in Section (4.2).
12 4.1 Model Assumptions
In order to estimate the behavioral model specified above, I adopt a parametric approach for the efficiency concern and simplicity.
Normality Assumption " # 1i Xi,Zi ∼ N (0, Σ) 2i
∗ The assumption specifies how the unobserved characteristics affect the outcome Yi and ∗ the selection rule Ti . Under normality assumption, the data augmentation approach ∗ ∗ comes into play. From an initial guess of the latent variables Yi and Ti , we can sequentially estimate the parameters and update the latent variables based on the estimates and the normality assumption.
Re-parametrization Assumption