The determinants of program effectiveness: evidence from a conditional cash transfer program

Jed Friedman, Eeshani Kandpal, Margaret Triyana and Aleksandra Posarac

Abstract

This study extends methodology developed to extend results from one RCT site to another to instead study the determinants of heterogeneous treatment effects of a household conditional cash transfer program in the . Specifically, we extend the covariate balancing technique proposed in Hotz, Imbens, Mortimer (2005) to explore the sources of residual heterogeneity—variation in program impact that cannot be explained by differences in population and location characteristics. The paper also provides a conceptual framework that links systematic differences in implementation quality and community characteristics to intervention effectiveness. In the case of the Philippines, we first document substantial implementation unit-level variation in program impact on stunting. Next, we show that implementation quality, remoteness, and program saturation were correlated with program impact to varying degrees. However, residual heterogeneity was most strongly associated with variables that likely capture implementation quality, suggesting the scope to further strengthen the impact of the program.

1

Introduction

Household conditional cash transfer (CCT) programs have been widely implemented to provide additional resources to poor households, conditional on investing in children’s education and health as well as obtaining maternal and child health services. These programs have been successful in improving the targeted health seeking behavior, but the results on health outcomes are mixed (Fizbein et al 2009). More importantly, treatment heterogeneity—the degree to which an intervention has differential causal effects on the relevant treated unit—is commonly found in

CCT programs. By itself, this heterogeneity is not necessarily surprising or even cause for further inquiry: many types of programs have varying impacts that depend on the characteristics of the sub-populations. Even though it may be the case that some groups experience greater gains than others, this may be due, for example, to differing baseline levels in the outcome of interest or differences in preferences for human capital investment across groups. As such, impact heterogeneity is not necessarily a result of factors within the control of the program such as the quality of local implementation although implementation issues can also be important for local program effectiveness. This study seeks to link program effectiveness and residual heterogeneity—variations that cannot be explained by variations in population and location characteristics. Identifying the determinants of residual heterogeneity will allow us to understand the mechanisms that support program implementation.

This study uses data from the Pantawid Pamilyang Pilipino Program. The program is a household CCT program that was launched in 2008 in the Philippines. The program seeks to promote human capital investments to break the intergenerational transmission of poverty by keeping children in school, improving children’s health, and investing in children’s future.

Eligible poor households were identified using a proxy means test. Those with children under 14

2 years of age and/or pregnant woman would then receive cash transfers every two months. The amount ranges from PhP 500 (USD 10) to PhP 1,400 (USD 29) per household per month, depending on the number of eligible children.

The program has been evaluated using two methods: Randomized Control Trial (RCT) and Regression Discontinuity Design (RDD). The impact evaluation found that the program reached most of its key objectives (Onishi et al 2016, Kandpal et al 2016). The Pantawid CCT program increased children’s enrollment and attendance of school, as well as reduced severe stunting among 0-5 year old children in the RCT areas of study. While these impacts suggest that, on average, the program reached significant achievements, an important component of policy learning is to understand where the program was most effective and to explore possible determinants for this success. More importantly, the impact evaluation finds considerable treatment heterogeneity. One possible cause of such heterogeneity may simply be that impacts tend to be higher when baseline levels themselves are lower and there is more “room for improvement”, presumably due to the lower marginal cost to improve these low indicators. This is an example of a cause ascribed to characteristics of the population and not to the local implementation quality of the program. While such exploratory analysis highlights the importance of examining treatment heterogeneity, even for a successful intervention like

Pantawid, it tells us little about the causes of such heterogeneity. Population characteristics can determine a large proportion of this heterogeneity, which, while informative, does not provide policy-actionable information relating implementation quality with program impact. Therefore, the rest of this study examines the correlates of residual heterogeneity in Pantawid impact after accounting for the effect of population characteristics.

3

This study uses recently developed empirical tools to explore spatial variations in program impact. In particular, we measure the residual heterogeneity on two outcomes: food consumption and stunting. Residual heterogeneity is identified through matched covariate balancing. Once identified, the residual heterogeneity can be related to program data on implementation quality as well as fixed features of localities, such as remoteness, to identify the correlates of heterogeneity in program impact. This associative analysis constitutes the first step towards identifying the drivers of such divergence in program effectiveness. The initial analysis shows that differences in implementation quality, remoteness, and program saturation were all correlated with program impact to varying degrees. However, residual heterogeneity was most strongly associated with variables that likely capture implementation quality, suggesting the scope to further strengthen the program impact on stunting.

Data and method

Data

Data from the Pantawid impact evaluation survey were used in the analysis. We use the

RDD method across all provinces for comparability. Table 1 illustrates the variability in estimated impacts across municipalities. As an example of the findings, the impact of program participation on household food consumption ranges from 0.9 percent (0.009 log units) in

Sibanga (Agusan Del Sur) to almost 100 percent (0.990 log units) in San Luis (also in Agusan

Del Sur) with a median impact across all municipalities of 20 percent (0.202 log units). The variability across municipalities in another key targeted outcome—stunting—is similarly large.

The ratio between the largest and smallest impact is largest for household food consumption

(over 100). This site specific variability is due to both differential impact across sites as well as

4 sampling variability. For each outcome there are municipalities where the impact is statistically significant and where it is not (statistically precise impacts highlighted in yellow).

Table 1: Cross-Municipality Heterogeneity in Program Impacts for Selected Outcomes1

Food Province Municipality cons.2 Stunting3 MEDIAN 0.202 -0.062 Overall MAXIMUM 0.990 -0.989 MINIMUM 0.009 0.868 SAN LUIS 0.990 -0.036 ESPERANZA 0.070 -0.398 Agusan del Sur LORETO 0.131 -0.617 SIBAGAT 0.009 0.211 SANTA JOSEFA 0.121 -0.083 CALANASAN 0.958 -0.810 0.125 -0.770 CONNER 0.163 0.261 SALVADOR 0.491 0.425 Lanao del Norte LALA 0.245 -0.069 BONIFACIO 0.202 -0.019 PLARIDEL 0.054 -0.964 Misamis TUDELA 0.134 0.711 Occidental SAPANG DALAGA 0.654 0.351 LOPEZ JAENA 0.379 0.868 0.451 -0.989 PARACELIS 0.500 -0.295 BASAY 0.772 0.183 Negros Oriental JIMALALUD 0.780 -0.017 LOPE DE VEGA 0.839 -0.582 SILVINO LOBOS 0.726 0.728 North Samar MONDRAGON 0.016 0.120 SAN ROQUE 0.025 0.048 CATUBIG 0.951 -0.301 Occidental SANTA CRUZ 0.077 -0.078 Mindoro PALUAN 0.089 0.394 KALAWIT 0.061 0.785 Zamboanga Del KATIPUNAN 0.520 -0.673 Norte GUTALAC 0.028 -0.188

1 Statistically significant impacts highlighted in yellow. 2 Log units of consumption of all recorded food items. 3 Defined as the height-for-age Z score of children 6-36 months old as being at least two standard deviations below the international reference mean. 5

MANUKAN 0.064 -0.351 BACUNGAN 0.833 -0.062 Note: For comparability, the table reports estimates using Regression Discontinuity Design methodology in both “RCT provinces” and “RDD provinces.” RCT provinces are Lanao del Norte, Mountain Province, Negros Oriental, Occidental Mindoro.

The Proposed Methodology

In this study, we use a methodology inspired by Hotz, Imbens, Mortimer (2005) to investigate correlates of the residual heterogeneity in Pantawid impact. Hotz et al. (2005) extrapolated the results of a job-training program estimated in one site to different sites by adjusting the expected treatment impact for differences in the observable characteristics of participants. The authors balance the observed characteristics of the control population in the program area and the target area through a matching estimator. They then estimate a treatment effect for the target area with the matching (propensity) weights that reflect the characteristics of the target population. The key identifying assumption with this method is that unobservable determinants of program performance are independent of, and unrelated to, the observable determinants. One limitation of this method is that when the characteristics of the sample and target populations are very different, it is very difficult to extrapolate with confidence. However there are few alternative extrapolation methods and the Hotz et al. framework is useful as it imputes program effects in new settings by taking into account observable difference in population characteristics.

We focus on the results for the RCT study only because the framework was developed to extrapolate impacts of randomized experiments to new settings. Further, we focus on two targeted outcomes: stunting and household food consumption as representative outcomes that exhibit important impact in the aggregate that also vary across space. The analysis proceeds as

6 follows. First, we divide the study sample into program-relevant implementation units or sites, the municipalities. Then, we identify the observable basic population characteristics associated with greater program effectiveness. These observable factors also include a limited number of health clinic indicators, as well as indicators of the location in which the population resides. We document variation in population and location characteristics as well as establish the best performing sites. Next, we estimate residual heterogeneity in performance across sites after controlling for population and location characteristics through matched covariate balancing. This residual heterogeneity is then related to survey and program data to further understand the determinants of relative program effectiveness.

In this conceptual framework, the program’s goal to improve health can be attained through the following components: health system, community factors, and program implementation. Figure 1 below presents the hypothesized channels.

CCT goal Health improvement

Health system Community factors Program implementation

Availability of services Population and location Compliance

Availability of equipment Nutrition Program saturation and drugs Access to market Health workers: quality of care

Figure 1: Hypothesized determinants of treatment heterogeneity

The framework presented in the paper allows us to isolate differences in program impact on key outcomes due to variations in implementation quality from those in population and location

7 characteristics or even program saturation. If site-specific impact differences are observed even with the Hotz et al. propensity weighted estimator, we take this to be a reason to further assessing implementer characteristics and the need for inclusion of (better) measures of such characteristics in treatment effect estimates in evaluative research.

Preliminary results

Figures 2 and 3 show that, for the municipalities in the RCT sample, there is considerable variation in impact for two key outcomes: reductions in stunting prevalence among 6-36 month old children and increases in household food consumption. Among the eight municipalities in the

RCT study, the best performing municipality for stunting is Sadanga. The changes in stunting prevalence in the other seven municipalities are less positive in magnitude to varying degrees

(Figure 1); the magnitude of the difference ranged from 2.6 percent higher in Paracelis, the second best-performer in terms of stunting impact, to 62.8 percent higher in Santa Cruz, which was the worst performer. Note that these differences are positive because a lower prevalence of stunting among 6-36 month olds is better.

0.700 0.628 0.600 0.500 0.376 0.400 0.303 0.268 0.300 0.219 0.223 0.200 0.100 0.026 0.000 Basay Jimalalud Lala Paluan Paracelis Salvador Santa cruz

Figure 2: Unadjusted Spatial Variation in Program Impact on Stunting Prevalences Across Pantawid Municipalities: Deviations from the Best Performer (Sadanga municipality)

8

Program impact on gains in household food consumption was highest in Basay. As shown in Figure 2, other municipalities had impacts that ranged from 4.3 percent lower (Santa

Cruz) to 33.6 percent lower in Jimalalud. In addition to the considerable variation in program impact, the different levels of performance of the eight municipalities performed across these outcomes highlights the distinct nature of these outcomes, and suggests that no one municipality was under-performing across the board.

0.000 Lala Salvador Paracelis Sadanga Jimalalud Paluan Santa Cruz -0.050 -0.043 -0.100 -0.150 -0.146 -0.145 -0.200 -0.157 -0.250 -0.300 -0.272 -0.350 -0.307 -0.336 -0.400

Figure 3: Unadjusted Spatial Variation in Program Impact on Food Consumption Across

Pantawid Municipalities: Deviations from the Best Performer (Basay municipality)

However, this variation in impact can be driven partially, or even mostly, by population and location characteristics, rather than by differences in Pantawid implementation. To account for the role of population characteristics behind the heterogeneity in program impact, we balance observable characteristics across the best performer and a low performer by the following: we estimate propensity scores for a binary indicator where 1 equals the best performing site and 0 is the low performer on a function of baseline population characteristics. These propensity scores are then used to weight outcomes in the low performing site to approximate what the outcome would have been if the population in the low performing site exhibited the same characteristics

9 as the population in the best performing site. In order to apply this approach there must be a fair degree of overlap in the propensity scores for the best performing sites and all other sites. One measure of similarity between sample and population is simply the difference in the mean propensity score; we find that the sites have similar propensity scores4.

Now, if the only reason for possible difference in program performance were any imbalances in the observable baseline population characteristics between the sites, then this method should predict differences in program performance fairly well. Nonetheless, as highlighted in Figures 4 and 5, there is considerable residual heterogeneity for both measures.

Indeed, in some cases, controlling for baseline population characteristics actually increases the gap relative to the best performer.

1.200 1.037 1.000 0.800 0.628 0.561 0.600 0.499 0.423 0.376 0.400 0.303 0.334 0.219 0.268 0.223 0.200 0.081 0.026 0.000 Basay Jimalalud Lala Paluan Paracelis Salvador Santa cruz -0.200 -0.192 -0.400

Unadjusted Residual

Figure 4: Absolute and Residual Spatial Variation in Program Impact on Child Stunting Across Pantawid Municipalities Figure 4 shows that Paracelis would in fact have had a stronger Pantawid impact on child stunting than Sadanga if the population in Paracelis had been more like that in Sadanga. On the

4 As a possible rule of thumb, differences greater than 0.25 standard deviations in mean propensity scores may suggest that a large amount of extrapolation would be necessary.

10 other hand, the gap between Sadanga and Santa Cruz is even higher after controlling for population characteristics, suggesting that the residual heterogeneity may be correlated with unobserved factors, perhaps including implementation characteristics.

Figure 5 shows the residual heterogeneity in impact on food consumption, and is consistent with Figure 4, with residual heterogeneity being smaller in four municipalities, and in fact zero residual heterogeneity in Santa Cruz, but greater in Lala and Salvador.

0.050 0.000 0.000 -0.050 Lala Salvador Paracelis Sadanga Jimalalud Paluan Santa Cruz -0.043 -0.100 -0.095 -0.150 -0.130 -0.146 -0.145 -0.200 -0.157-0.161 -0.196 -0.250 -0.254 -0.300 -0.272 -0.350 -0.301 -0.307 -0.336 -0.400

Unadjusted Residual

Figure 5: Absolute and Residual Spatial Variation in Program Impact on Household Food Consumption Across Pantawid Municipalities

Even after adjusting for differences in observable population characteristics, it is clear that other factors may contribute to differential program impacts across study sites. For example location specific features – such as the local capacity to implement the program – can affect program impacts but the covariate rebalancing approach does not directly address this.

Next, we examine correlations between the residual heterogeneity and proxies of local implementation capacity as well as other possible relevant local features, such as geographic remoteness, that can mediate program impact. The list of factors that either relate to the health system that can affect the implementation of Pantawid include: an index aggregating the number

11 of basic health services provided at the rural health facilities in the municipality and the fraction of midwives that had heard of the Compliance Verification System. Community factors that can mediate program performance include: the average distance of each village from the municipal capital, the proportion of Indigenous Persons in the municipality, and the level of the corresponding outcome (stunting and household food consumption) in control villages in each municipality. Factors that reflect the capacity of the local public service system to implement

Pantawid include: the average number of beneficiaries per village in the municipality and measures of local compliance.

In Figures 6 and 7, we present four representative correlations for each outcome variable—the first shows the correlation between residual heterogeneity and control levels of the outcome of interest and then three illustrative correlations from among the other correlates examined. The correlation between impact and location-specific features is represented by the slope of the fitted line.

Figure 6 shows that the negative correlation with the residual heterogeneity in stunting is strongest for the level of stunting among control households in the municipality, suggesting that factors outside implementation control might be driving some of the heterogeneity in program effectiveness, and that the “room for improvement” explanation for spatial heterogeneity might be relevant. On the other hand, three other supply side features –the percent of midwives in the municipality who have heard of the Pantawid Compliance Verification System, an index of basic health services available in local clinics and the proportion of population that are program beneficiaries (an indicator that can relate to facility crowding) – are also related to local program effectiveness, albeit not as strongly as stunting among control households.

12

1 1

.5 .5

0 0

-.5 -.5

.2 .3 .4 .5 .6 .7 .4 .5 .6 .7 .8 Stunting rates from 6 to 36 month olds in control villages % midwives who know of Pantawid Compliance Verification System

1 1

.5

.5

0 0

-.5 -.5

.4 .5 .6 .7 .8 4 4.5 5 5.5 6 6.5 Average proportion of village households eligible for Pantawid Index of basic health service provision

Residual variation in program impact on stunting ratesFitted value s

Figure 6: Correlates of Residual Spatial Variation in Program Impact on Stunting

Figure 7 shows the correlations for program impact on household food consumption, and hints at a crowding out story, with greatest program impact in municipalities with lower levels of exposure to the program. We also find that the proportion of Indigenous Persons in a municipality is negatively associated with program impact on food consumption. There also appears to be greater gains in municipalities with more remote villages. However, unlike in the case of stunting, there is not a strong correlation between levels of food consumption in control households and program impact.

13

0 0

-.1 -.1

-.2 -.2

-.3 -.3

8.8 9 9.2 9.4 9.6 0 20 40 60 80 100 Log per capita food consumption in control households Distance to the municipal capital (in kilometers)

0 0

-.1 -.1

-.2 -.2

-.3 -.3

0 20 40 60 80 100 .4 .5 .6 .7 .8 Proportion of Indigenous Persons in Municipality Average proportion of village households eligible for Pantawid

Residual variation in program impact on household food consumptionFitted values

Figure 7: Correlates of Residual Spatial Variation in Program Impact on Household Food Consumption

Discussion

Impact evaluation methods tell us the mean or expected impact of the program in the study population. However, we often want to know more – in particular we want to know what features of the program, the target population, or the implementation result in greatest impacts. If these features can be modified through specific changes to the program, then program performance can be improved for more beneficiaries. If these features can be addressed through wider policy engagements, then coordinated policies may improve outcomes. If these features are fixed (i.e. remoteness), then perhaps additional investments may be necessary to maximize program performance. In this study, we document the considerable residual heterogeneity—the heterogeneity in program impact that remains after controlling for population characteristics— in

14

Pantawid’s impact on child stunting, secondary enrollment and household food consumption across the municipalities studied. These differences correlate with select measures of location- specific characteristics as well as possible measures of implementation quality. However, as the ultimate goal is to indicate policy levers or complementary investments to enhance program impact, it is important to keep in mind that these are correlations and not causal relationships.

References VJ Hotz, GW Imbens, JH Mortimer. 2005. “Predicting the efficacy of future training programs using past experiences at other locations.” Journal of Econometrics: 125 (1), 241-270.

Bryce, J., Victora, C.G., Habicht, J.P., Black, R.E. and Scherpbier, R.W., 2005. Programmatic pathways to child survival: results of a multi-country evaluation of Integrated Management of

Childhood Illness. Health policy and planning, 20(suppl 1), pp.i5-i17.

CA Flores, OA Mitnik. 2013. “Comparing Treatments across Labor Markets: An Assessment of

Nonexperimental Multiple-Treatment Strategies.” The Review of Economics and Statistics:

95(5), 1691-1707.

Fiszbein, A., Schady, N.R. and Ferreira, F.H., 2009. Conditional cash transfers: reducing present and future poverty. World Bank Publications.

Murray, C.J. and Frenk, J., 2000. A framework for assessing the performance of health systems. Bulletin of the world Health Organization,78(6), pp.717-731.

J Onishi, J Friedman, D Filmer, N Chaudhury. 2016. “Philippines Conditional Cash Transfer

Program Randomized Control Trial Impact Evaluation 2012”. World Bank Report Number

75533-PH.

15

E Kandpal, H Alderman, J Friedman, D Filmer, J Onishi, J Avalos. 2016. “The Child Nutrition

Impacts of a Conditional Cash Transfer Program in the Philippines.” Journal of Nutrition

(forthcoming)

16