<<

Peer

Yuyu Chen Peking University, Guanghua School of Management

Ginger Zhe Jin University of Maryland & NBER

Yang Yue Xiamen University

October 4, 2018

Abstract With 280 million rural laborers migrating to the city in 2017, China is experiencing the largest internal migration in the human history. Using instrumental variables in the 2006 China Agricultural Census, we find that a 10-percentage-point increase in the migration rate of co- villagers raises one's migration probability by 9.18 percent points. Both information exchange at the origin and cost reduction at the destination could explain migration cluster in age, destination, and occupation. However, migration has little effect on the agricultural productivity of non- migrants, probably because labor redundancy is severe at the origin and migrants are more likely of high productivity.

Keywords: internal migration, social network, China. JEL: J6, O12, R23.

Contact information: Yuyu Chen, Guanghua School of Management, Peking University. Email: [email protected]. Ginger Zhe Jin, Department of Economics, University of Maryland, College Park, MD 20742. Email: [email protected]. Yang Yue, Wang Yanan Institute for Studies in Economics and School of Economics, Xiamen University. Email: [email protected]. This project is a collaborative effort with a local government of China. We would like to thank Hongbin Cai, Wei Li, Brian Viard, Roger Betancourt, Loren Brandt, Judy Hellerstein, John Ham, Matthew Chesnes, Seth Sanders, V. Joseph Hotz, Duncan Thomas, Francisca Antman and Hillel Rapoport for helpful comments. All errors are our own.

1 1. Introduction The past 30 years has witnessed an explosive growth of labor migration inside China. In 1990, 34.1 million workers had left their rural home for urban jobs (Cai 1996). This number increased to 67 million in 1999 (Huang and Pieke 2003), 131.81 million in 2006, 140.41 million in 2008, and 280 million in 2017.1 In aggregate, labor transfer out of agriculture accounts for nearly one-ninth of the annual GDP growth of China (Young 2003).2 However, due to Hukou3 and other institutional barriers, most migrating workers do not migrate permanently to the city (Zhao 1999a & 1999b). They leave families behind and travel between rural home and urban job every year. We aim to understand factors that drive such migration, with special attention to the role of peers at the origin. In theory, peer effects on job migration can be positive or negative. On the one hand, job information is easier to share within a village. Earlier migrants can also help new migrants to reduce moving cost at the same destination, even if they do not work in the same occupation. If these positive peer effects are clustered by age and gender, it could lead to significant demographic “holes” in the originating village. On the other hand, migration makes more arable land available for non-migrants at the origin. If migration improves the agricultural productivity and social function of remaining villagers, peer effects can be negative. Since different peer effects imply different policies in labor migration and local development, it is important to identify the sign and magnitude of the peer effects. The biggest challenge is identifying peer effects from confounding factors that affect peers at the same time (Manski 1993). Survey evidence often cites informal contacts (such as family, friends and other acquaintances) as the most important channels that lead to employment (Loury, 2006, Pellizzari 2010). In addition, several papers have attempted to identify peer effects from observational data, using either instrumental variable or detailed network information (Munshi 2003, Mckenzie and Rapoport 2007, Woodruff and Zenteno 2007, Chen et al. 2008, Hiwatari 2016). Built upon this literature, we derive novel instruments from China’s unique “one-child” policy. Starting 1984, China has allowed rural households to have a second baby if the first child is a girl. Not only

1 The latter two numbers are based on National Bureau of Statistics reports accessed at http://www.stats.gov.cn/tjgb/nypcgb/qgnypcgb/t20080227_402464718.htm and http://www.stats.gov.cn/tjfx/fxbg/t20090325_402547406.htm on September 1, 2010. In these reports, migrating workers are defined as rural laborers that have migrated for jobs out of the residential township for at least one month during the calendar year. 2 Young (2003) calculates the annual 7.8% GDP growth based on published data from 1978 to 1998. He shows that the deflated annual growth is 6.1%, of which 0.9% can be attributable to the increase of labor participation. 0.9% is approximately one-ninth of 7.8%. 3 Hukou is a residential permit that is often determined by one’s birthplace. Until the late 1990s, it was extremely difficult for an individual with a rural Hukou to live or work in the city, unless he/she attends college, joins the military, and finds an urban unit to accept him/her after graduation. As of today, individuals with rural Hukou can work in the city but they face many restrictions in receiving education, worker protection, disability assistance, social security, health insurance, or other social benefits in the city for themselves or their family members.

2 does this policy minimize sex selection on the firstborns4, it also implies that rural households with a girl firstborn are more likely to have a second child and less likely to have any boy. Because of this effect on the family size and on the children’s gender composition, having a girl firstborn tends to encourage adult males (e.g. fathers and grandfathers) to migrate but hinder adult females (mothers and grandmothers) from migration. In contrast, having multiples in the firstborn is a random shock to a household, which demands more efforts in child care and ends up discouraging labor migration. Based on these variations, we construct IVs for neighbors’ migration decision using the gender and number of neighbors’ firstborn, where neighbors are defined as residents in the same village. The key assumption is that one household’s fertility outcome does not directly affect the migration decision of its neighbors. We validate this assumption through a variety of statistical tests and robustness checks. Our IV results find an overall positive peer effect in migration: a 10 percentage point increase in the percent of 17-35 year old neighbors migrating out of a village will increase one’s own migration probability by 9.18 percentage points. This effect is stronger from people in the same village than from people in other villages in the same township, confirming our choice of village as the unit of potential peers. A closer look finds the peer effect clustered by age, surname, destination and occupation at the destination. In the meantime, the agricultural income of non-migrants does not change significantly as a result of peer migration in the same village, although they have more land to work with. These results suggest that both information exchange at the origin and cost reduction at the destination could explain the positive peer effects, but migration has little effect on the agricultural productivity of non-migrants, probably because labor redundancy is severe at the origin and laborers of high productivity are more likely to migrate. The rest of the paper is organized as follows. Section 2 provides a brief literature review. Section 3 describes the background and data. Section 4 lays out a basic specification and reports the IV results. Section 5 zooms into peer effects in demographic groups, destinations, and types of jobs. Section 6 examines the effects of migration on the land use and agricultural income of non-migrants. Section 7 concludes.

2. Literature Review The existing literature has stressed the importance of peer effects in both labor migration and job search, but empirical evidence lags behind theory. In job search, Calvo-Armengol and Jackson (2004) show that job information sharing within a social network can explain why employment rate varies across networks, why unemployment rate persists in some networks, and why inequality across networks can be long lasting. Their model implies that a public policy that provides incentives to reduce initial labor market

4 Sex selection may take several forms ranging from selective abortion, to abandon of newborns, to infanticide.

3 dropout could have a positive and persistent effect on future employment.5 In a similar spirit, Carrington et al. (1996) establish a dynamic model of labor migration in which earlier migrants help later migrants to reduce moving costs at the same destination. In their model, migration occurs gradually but develops momentum over time. It explains why migration tends to cluster by geography and why migratory flows may increase even as wage differentials narrow. To the extent that information sharing and moving cost reduction are easier among peers of similar age, gender and education, the positive peer effects on migration could generate demographic holes in the origin. In the meantime, other theories suggest that peer effects on labor migration can be negative because the migrating origins are often closely-knit communities. These communities – often small and rural -- count on internal members to help out each other in local economic and social development (Liu 2010). If some work-age males migrate out of a village, they can lease out agricultural land to remaining work-age males, which increases the labor income of stayers and reduces their economic incentives to emigrate. Similarly, peers’ migration decisions may be negatively correlated if they are close relatives and coordinate in non-agricultural activities such as elderly care and child care. Turning to the empirical literature, numerous facts of migration are consistent with the social network theory of peer migration, but causal links are difficult to establish. For example, many surveys find that having friends or relatives at the destination is positively correlated with one’s migration decision (Caces et al. 1985, Taylor 1986, Zhao 2003). On life after migration, US immigrants are shown to be more geographically concentrated than natives of the same age and ethnicity and often employed together (Bartel 1989, LaLonde and Topel 1991). All these findings suggest that peer migrants may help improve job information and reduce moving costs. But they are also consistent with the alternative explanation that kin, friends, and co-villagers share common preferences and therefore make similar migration decisions. Researchers have used three ways to identify peer effects from confounding factors: one is controlling for a large number of group fixed effects (say census block group as in Bayer et al. 2008) and then exploring employment cluster by a smaller unit (say census block) within the controlled group. The underlying assumption is that there is no unit-level correlation in unobserved individual attributes after taking into account the broader group.6 This method alone is unlikely to succeed here, as co-villagers may have similar unobserved attributes and these attributes are likely to differ across villages. The second approach hinges on random assignment of neighbors. For example, the Moving to Opportunity (MTO) program provides housing vouchers to a randomly selected group of poor families in five US cities (Kling, Liebman and Katz 2007).

5 See Ioannides and Loury (2004) for a detailed review on the effects of social networks in job search. 6 Similar identification strategy has been used in Aizer and Currie (2004) and Bertrand et al. (2000).

4 The third approach is using observational data in closely-knit origin communities (often villages) and looking for instruments that only affect one’s migration decision through its direct impact on peer migration. For example, Munshi (2003) uses rainfall in the origin as an IV for the prevalence of Mexico migrants from Mexico in the US, and finds that the more established migrants there are, the better the employment status is for a new migrant from the same village. As Munshi (2003) acknowledges, lagged rainfall may affect current employment outcomes at the origin, and hence the current migration decision. This is why he focuses on the network effect on employment conditional on a person who has migrated to the US, not the migration decision itself. Mckenzie and Rapoport (2007) use historic migration rate as an IV for the stock of migration in the sending village and study how migration prevalence affects an individual’s current migration decision and the income inequality within the village. To the extent that historic migration rate (like historical rainfall) affects everyone at the origin, it is difficult to distinguish it from other unobserved village attributes. In comparison, our IVs vary across villages. After using the IVs, we are similar to Bayer et al. (2008) in that we examine whether the destination and industrial sector of migrants indicate any social network effects by controlling for the fixed effects of a larger area (county or township). Hiwatari (2016) uses detailed field data of social connections to construct a spatial autoregressive model. This is equivalent to using the household characteristics of peers’ peers as IV for the peer’s migration decision. Chen et al. (2008) use a smaller dataset and a different IV to address the same research question as in this paper. Denoting the individual under study as A, their IV is the political identity of A’s father in the Mao era. While this variable is likely correlated with A’s social ties within the same village (hence A’s migration decision if social ties matter), it is unclear why it is correlated with the neighbors’ migration tendency and why it should be excluded from the main regression.7 As detailed below, we follow the third approach by constructing novel IVs based on China’s unique one-child policy. We argue that, under careful sampling, one’s firstborn outcome is related to one’s own migration decision but does not affect neighbors directly. Similar identification strategies have been pursued in settings other than migration and social network effects (Rosenzweig and Wolpin 1980, Angrist and Evans 1998, Maurin and Moschion 2009, and Qian 2009)8.

7 Since Chen et al. (2008) do not report the IV coefficients, we cannot compare our IV results with theirs. But our OLS results are similar to theirs, suggesting that the findings reported in our study are not specific to our sample area. 8 Rosenzweig and Wolpin (1980) use twins as an exogenous shock to study of the quantity-quality tradeoff in family fertility; Angrist and Evans (1998) use the sex composition of the two eldest siblings as an instrument to identify the effect of family size on mother’s labor market participation; and Maurin and Moschion (2009) studies a French mother’s labor market participation in association with neighbors’ participation, using the sex composition of neighbor’s eldest siblings as the IV. In a recent study that evaluates the effect of family size on school enrollment, Qian (2009) instruments family size by the interaction of an individual’s sex, date of birth and region of birth.

5 3. Background and Data In this section, we first describe the general background of labor migration and fertility control and then zoom into the region covered by our dataset. Rural-to-urban migration Figure 1 describes the trend of rural-to-urban migration and major policy changes in China, based on reports from the National Bureau of Statistics (Han et al. 2010). Although the central government of China had started to allow rural laborers to take urban jobs in 1984, rural migrants were strictly controlled until the early 1990s. Following the Hukou reform (1997) and China’s participation in the World Trade Organization (2001), the total number of rural laborers migrating for urban jobs has grown rapidly, from 49 million in 1998 to 130 million in 2006. By 2006, the scale of rural-to-urban migrants is comparable to rural laborers engaging in non-agricultural activities within their rural hometown. Both types of labor shift contribute to the labor transfer out of agriculture, which is one of the most important engines for the GDP growth in China (Young 2003). Despite the large flow of migrants, rural and urban workers are not close substitutes in the city (Zhao 2005 and Cai, Park and Zhao forthcoming). Most rural-to-urban migrants are unskilled, do not have families in the city, and concentrate in dangerous, dirty or low-pay jobs. The labor market for rural-to-urban migrants is also plagued by the lack of information. Surveying 439 rural migrants in the city of Chang Sha in Spring 2004, Chen (2005) finds that most migrants found the job via informal channels: 57.2% relied on the introduction of relatives, friends, or migrants from the same origin; 13.2% contacted potential employers directly; 6.1% responded to employer recruitment (excluding mass media ads); 1.9% were self-employed; and only 1.4% found a job via advertisements on TV, newspapers or billboards. The fraction of government- organized migration is even smaller (0.5%). When asked how easy it is to find a job in the city, 44.5% answered difficult or very difficult. For the biggest hurdle of job search, 38.3% mentioned the lack of a social network and 25.1% mentioned the lack of job information. Instead of surveying people who have migrated to the city, Du, Park and Wang (2005) asked 582 rural households in four western counties to list the most important factors that affected their migration decision in 2000. The lack of information and social networks is the third most mentioned factor determining men’s migration, lagging behind agricultural labor demand and low education.9 As an indirect evidence for the help of social networks, Bao et al. (2007) find that inter-province migration rate increases with the size of the same-origin migrant network in the destination. Local governments play a limited role in matching rural migrants and urban employers. For example, most government-organized job markets are held in a conference center within the city. Rural

9 The four most important factors for men are (1) agricultural labor demand (25.9 percent), lack of education or skills (25.3 percent), lack of information and social networks (18.3 percent), and inability to finance transportation and search costs (14.1 percent). The three most important factors for women are (1) unwillingness to be separated from children (46.2 percent), agricultural labor demand (21.0 percent), and lack of education or skills (12.7 percent)

6 migrants must go to the city first before attending these job markets. At the other end, origin governments could contact far-away employers and organize rural residents to migrate conditional on job offers. But according to the 2003 rural survey conducted by the National Bureau of Statistics, only 3.3% of the 113.9 million rural migrants were employed via government-organized migration (Jian 2005). The rest relied on friends and relatives (41.3%) or self search (55.4%). These numbers suggest that social networks and other informal channels play a dominant role in determining whether, when and where to migrate. One exception is government efforts in poverty alleviation. According to the 2001-2010 National Poverty Alleviation and Development Plan, the central government of China has designated 148,000 rural villages as “poverty village” based on the village’s population, productive land, and household income. Under this definition, a poverty village may receive subsidy from central and local governments, and become qualified for certain poverty alleviation projects. The region in our dataset (2006) has 2,196 poverty villages with more than 100 adults of age 17-60.10 Some poverty reduction projects – such as the Sunshine Project and the Southwest Poverty Reduction Project – provided off-farm training and employment opportunities to the labors of poverty villages at the regional or national level (Chen 2009). For example, the central government claimed that the Sunshine Project, started in 2004, helped 4.6 million rural laborers find off-farm jobs in the first two years. These projects can also promote labor’s migration incentives and cause the clustering of migration decisions within a village. Unfortunately, we cannot observe whether a poverty village in our data has participated in these projects or not. To be conservative, we exclude all poverty villages from our study sample. Fertility policy and caregiving within household China implements the one-child policy differently in rural and urban areas. Since 1984, the central government allows rural households to have a second child if the firstborn is a girl, but leaves policy implementation to local governments. The second- child policy has several implications. First, there should be little gender selection in the firstborns if every family with a girl firstborn is allowed to have more children.11 Second, conditional on a girl firstborn, households with a strong boy preference will increase family size and try to get a boy in the second birth. Both implications are confirmed in Ebenstein (2011). He shows that the sex of firstborn is balanced (51% being boy) and has changed little between 1982 and 2000, hence the imbalance between male and female as observed in Sen (1990) is mostly driven by gender selection for the second and later-births. As detailed in Section 4, the gender-specific fertility policy generates exogenous variations in family size and gender

10 Another 9 poverty villages have few than 100 adults within the age of 17 to 60. In total, poverty villages account for 55.2% of the total 3,986 villages in our data. 11 In theory, the policy may even encourage sex selection towards girl in the firstborn, but such selection is unlikely to happen in our study area given (1) the strong boy preference and (2) the large faction having a second child after a boy firstborn (despite fines).

7 composition of children12, which are important factors to consider in one’s migration decision but not that of neighbors. This allows us to use the outcome of neighbors’ firstborn fertility outcome to construct IV for neighbors’ migration. As documented in Li, Zhang, and Zhu (2005), China’s one-child policy is only applicable to the Hans, which represent 92% of the Chinese population. We do not know whether an individual is a minority or not, but we do know whether a village is a gathering place for minorities. Later on we report robustness checks excluding minority villages. Most Chinese rely heavily on household members for elderly and child care. According to National Bureau of Statistics (NBS 2016), the number of publicly funded kindergartens dropped from about 150,000 (an 83% market share) in 1998 to about 43,000 (a 24% market share) by 2012. Only about 2% of Chinese people aged 65 and older live in residential care facilities (Feng, Liu, Guan & Mor 2012). In 2010, about 66% of people older than 60 provided care for their grandchildren (Melenberg & Zheng 2012). About 45% of people older than 60 live with their children and derive 22% of their income from their children (National Survey Research Center, 2014). These national statistics mask a significant difference between urban and rural areas, as most market-based child and elderly care institutions are located in urban areas. They are almost non-existent in rural areas, especially if the area is poor and the local government lacks funding. Data Our data is drawn from the 2006 China Agricultural Census (CAC). The National Bureau of Statistics of China has organized local governments to conduct three rounds of CAC in 1996, 2006 and 2016 respectively. The CAC is designed to cover every individual who resided or had registered residence in every village at the time of interview. Thanks to the exhaustive nature of CAC, we are able to define a clear boundary of social networks by village, and test if a broader definition of social network yields different results. In comparison, most migration data used in the literature contains a limited sample of households from a small number of communities.13 Drawn from the 2006 CAC, our data cover all the rural residents registered in a poor area of China as of December 31, 2006. In collaboration with the local government, we are not allowed to reveal the geographic location, but the study area belongs to an inland province whose per capita income is significantly lower than the national average. In total, we observe 5.9 million individuals in 1.4 million households and 3,986 villages. These villages belong to 250 townships and are spread across 8 counties. The size of the whole census area is roughly 16,000 km2 total, with on average area of 4 km2 per village.

12 Researchers have relied on exogenous fertility outcome to study other topics. For example, Wei and Zhang (2011) have used sex-ratio to study household in China; Wu and Li (2011) have used gender of children to study intrahousehold bargaining in China. 13 For example, the Mexican Migration Project used in both Munshi (2003) and Mckenzie and Rapoport (2007) surveys 57 rural communities and 200 households per community. The data used in Chen et al. (2008) – the 2002 Chinese Household Income Project Survey -- covers more counties than our data (121) but their sample consists of 9200 households in 961 villages. This implies that on average they observe no more than 10 households per village.

8 Excluding 15 villages that have fewer than 100 adult labors (age 17-60), Table 1 compares the poverty (2,196) and non-poverty (1,775) villages in our data. On average, poverty villages have a lower education level, less population, more land, and worse public infrastructure than non-poverty villages, which implies lower economic incentives and higher costs of migration from poverty villages. However, the percent of labor migration is not significantly different between the two types of villages (26.61% in poverty villages versus 28.16% in non-poverty villages), suggesting that the poverty reduction projects that target poverty villages may have promoted labor migration in the poverty villages in our data. Since participating in these poverty reduction projects could generate an omitted-variable problem, our study sample focuses on non-poverty villages only. The main part of the data was collected at the household level. The household head was asked to enter information for every family member. If a resident was away from home at the time of interview, his/her information was still collected from the household.14 By this design, we observe detailed household information including how many individuals reside in the household, their relationship to the household head, their age and gender composition, the amount of contract land, the amount of land in use, ownership of housing, the self-estimated value of house(s), ownership of durable goods, the availability of electricity, water and other amenities, the number of household members that receive government subsidies, and engagement in various agricultural activities. Individual level data are limited to age, sex, education, employment, industrial sector, and the number of months away from home for out-of-township employment in 2006. Since a child in the study area may get married or leave home as early as 16 and daughters often leave their own home after marriage, we restrict the child definition to age 0-12. Although for over 99% of households we can infer spouse- or parent-child relationship via his/her reported relationship to the household head, we may not observe all the adult children (age at or above 16) because some of them may have married away and established their own households. This is a common problem in household surveys (see for example Angrist and Evans 1998). In our data, 13.33% of the households report three or more generations, 80.81% report two generations, and 5.81% report one generation. Out of the 369,667 households that have at least two generations, 292,258 (or 79.06 %) report children under age 16. To minimize the potential missing problem of adult children, our IV construction focuses on the 260,894 households that report the oldest child (living in the households) at or under age 12. For the other households, we code each adult’s own fertility information as missing.15

14 If a whole household has migrated but still holds Hukou in the village, the village head will fill out the form for the household. 15 To be precise, out of the 645,523 adult laborers for which we code the gender of firstborn as missing, 645,501 (or 99.91%) is because they report a child over age 13 and therefore we cannot determine with the reported oldest child is the real oldest child.

9 The missing-adult-children problem does not prevent us from calculating the average percentage of multiple birth or the average gender of firstborns at the village level, as long as the real values of these variables are uncorrelated with whether a household reports a child over age 12 or not. This is a reasonable assumption because the missing values are mostly driven by the age and cohort of the adults under study. To justify this assumption, our main analysis will focus on the households that have the household head’s age at or below 35. We also report results including only two-adult families, to address the less than 1% possibility that a household structure may be too complicated for us to infer who is whose child. Supplemental data were collected at the village level including the size of the village in both arable land and registered population, whether the village is a place for minority gathering, the distance to the nearest bus station16, access to water, electricity and other amenities, and whether the village has a national poverty status (as designated by the central government).17 Above all, the 2006 CAC data is especially suitable for studying social network effects in migration because we observe one’s own migration as well as the migration decision of almost all the other adults in the same village. One shortcoming is that we only observe the migration decision at the time of data collection and cannot identify who have migrated long before 2006 and who just started to migrate in 2006. For this reason, the social network effects identified in this study only reflect a cluster of self and peer migration, which could be driven by a group of adults migrating together or some migrants migrating early and then helping others to migrate afterwards. Similarly, we only observe where a migrant migrates to as of December 31, 2006; we don’t know whether s/he has moved directly to the destination, or stepwise from the village to an intermediary location first and then from there to the destination. Sample Our study sample focuses on adults between age 17 and 35 that reside in a non-poverty village of the study area. Individuals that have non-rural Hukou and in-school students are dropped from the sample. After dropping 6 non-poverty villages that have less than 100 adult laborers, our final sample consists of 859,644 adults in 1,775 villages. These villages belong to 192 townships in 8 counties. On average, there are 9.25 non-poverty villages within each township and these villages are geographically adjacent to each other. By using township fixed effects, we can absorb many confounding factors at the township level. The CAC asks how many months a respondent has been away from the residential village for out- of-township employment during 2006. One month away from home is defined as being away for more than 15 days in that month. Based on this question, we define an adult as a “migrant” if s/he has been away for

16 The exact question is to the nearest bus/rail/dock station, but there is no railway station or major river in the study area. 17 Due to potential measurement errors in the registered population, we calculate the number of adults per village from our study sample and use it to proxy village population.

10 at least one month in 2006. This definition yields 28.16% of adults in our sample being migrants in 2006. As shown in Figure 2, the majority of migrants report that they stay away from home for at least 10 months a year. This suggests that most migrants live and work in a far-away place and only come back home for short visits. Later on, we will report the robustness check using 6+ months away from home as an alternative definition of migration. Each migrant is also required to report the migration destination and industrial sector. Table 2 reports summary statistics for major migration destinations (by province). In addition to the percent of migrants going to each destination, we report the number of bus/railway hours needed to transport to each destination from the center of the study area18, as well as the relative income across destinations. Scaling the 2006 per (rural) capita income of the study area to one, Table 2 shows that almost all the destinations have significantly higher per capita income than the study area; some are even eight or ten times higher.19 Consistent with the literature, the most attractive destinations are either high income or short distance. However, income gap and distance do not explain everything. For example, destination F has the highest income per capita in the list. The next highest-income destination (A) is almost the same distance from the sampled area as F, but the percent of migrants to A (28.88%) is much higher than to F (1.98%). Apparently other forces are at work when people decide where to migrate. In terms of industrial sectors, migrating workers report whether they work in manufacturing, construction, services, or other sectors. Based on the study sample, Table 3 compare migrants and non-migrants in individual, household, village, and township-level variables. Consistent with the literature, migrants are on average 2.2 years younger, more likely to be male, with 0.83 more year of schooling but less likely to be the head of household. Figure 2 reports the percent of migration by age and gender. It is clear that young adults aged 20-25 are most likely to migrate. Migration tendency declines sharply after age 30. The percent of migration is similar for men and women before age 22, but men are significantly more likely to migrate after 22, probably because married women stay home for childbearing, child care and elderly care. Turning to household structure, migrants are more likely to come from a household that has fewer children under age 12 but more adults in age 17-23, 45-59 and over 60. This partly explains why the probability of having any boy under 12 is much lower for migrating households (37%) than for non- migrants (55%). Migration could also be related to fertility history and boy preference, which we will

18 Since there is no railway station in the study area, we first compute the bus hours from the area center to the province capital and add that to the number of railway hours from the province capital to other provinces. 19 The comparison is based on China National Statistical Book, so the income difference may reflect differences in observable attributes. For example, a rural migrant to A may not expect to earn the average income in A because he is less educated and does not have full access to all the job opportunities of his education level due to Hukou requirement in some city jobs. In this sense, Table 2 is only suggestive.

11 further explore in Section 4. Both migrants and non-migrants have a similar percent of girl firstborn (both 49%) on average, raising a concern that the gender of girl firstborn may not have enough statistical power to be a good IV. As shown later, this impression is incorrect because a girl firstborn tends to encourage adult males to migrate but keep adult females at home. Our IV strategy will make use of this variation. As expected, both capital ownership and ease to transport differ between migrants and non- migrants. Migrants are more likely to have a lower house value and some outstanding loans, but their contracted land and land in use (at the household level) is no less than that of non-migrants. The latter is masked by the difference in the number of adults within a household. At the village level, migrants do have less land per adult. In terms of transportation, migrants are 17% closer20 to the nearest bus station, and they are slightly more likely to live in a village with access to drivable road. Migration clusters Taking the village as the unit of observation, migration rate per village ranges widely from 0% to over 80% (Figure 4). At the individual level, Panel D of Table 3 reports by one’s migration status the percent of co-villagers that migrate in 2006 excluding adults in own household. Clearly, migrants are more likely to come from high-migration villages. We further regress migration percentage per village on village population, whether the village is a minority gathering place, average house value, average people per household, average age, average gender, land per household, average education of adults, distance to the nearest bus/rail/dock station, and township fixed effects. This regression has an R- square of 0.635, suggesting that more than 35% of the cross-village migration heterogeneity is driven by something else other than fundamental socio-economic difference across villages. The cluster pattern of migration is more striking if we examine the distribution of migrating destination and industrial sector within each village. The first row of Table 4 shows that, if we single out the most common industrial sector within each village, 69.56% of same-village migrants work in that sector. This number is higher than what we would get if we repeat the exercise by township (61.61%), county (58.85%) or the whole area (55.67%). Similarly, the percent of migrants to the most common destination is more concentrated by village (52.18%) than by township (47.65%), county (42.70%), or the whole area (38.92%). The rest of Table 4 shows that same-village migrants are more clustered by the combination of destination and occupation than migrants from the same township or the same county. All these statistics support the conjecture that each village is a closely-knit social network and people interact with each other much more within the village than across villages. Given the facts that the average area per village is only 4 km2 and villages in the same township are adjacent by definition, the migration clusters shown here is similar to the employment clusters documented in Bayer et al. (2008). In Bayer et al. (2008), workers residing in the same census block tend to work in the same census block, as compared to residents of nearby

20 This percentage is computed by 1- (average distance of migrants to the nearest bus station)/(avg distance of non- migrants) = 1-3.64/4.38.

12 census blocks. However, unlike Bayer et al. (2008), we use IV to further control for potential omitted village attributes.

4 Main results In this section, we first describe the main specification for the peer effects of migration, and then present evidence for the validity of our instruments. Regression results and robustness checks are reported afterwards. Econometric Specification For an individual i in household h, village v, township t and county k, the basic specification is:

!" = $% + ' ∙ )" + * ∙ )+ + , ∙ )- + . ∙ !/"|- + 1" (1) where !" is a binary variable indicating whether individual i is a migrant in 2006; $% denotes township fixed effects; )" denotes i’s individual attributes such as age, gender, year of schooling, whether the firstborn singleton is a girl, whether the first birth is multiple, whether the second birth is multiple, as well as the minimum and maximum ages of own children. As discussed before, the variables on own children have missing values because some individuals do not have first or second birth, some individuals report children over age 12 which by our definition entails missing for firstborns, or some family structures are too complicated to pin down who is whose child (less than 1% of the sample). Later we show robustness checks using the sample of two-adult families only.

We control for a long list of household attributes in )+, including the number of family members by age group (0-7, 7-16, 17- 23, 24-44, 45-59, 60+), whether there is at least one boy (aged 0-12) in the household, the amount of contract land, the debt status of the household, and the prevalence of the household head’s surname in the village. The last one captures the household’s political status and extent of social networks within the village. We do not control for the amount of land in use by household because this could be a result of migration. In Section 6, we will examine how land in use of non-migrants correlates with the degree of peer migration. The most important village level variables ()-) includes the distance to the nearest bus/rail/dock station, access to drivable roads, the total adult population, and the total acreage of arable land. The latter two capture the degree of land-population pressure in the village.

Of central interest are the coefficients (.) of peer migration (!/"|-), which are measured by the percent of same-village laborers (aged 17-35) that migrate in 2006, excluding all adults in household h. We cannot not treat male and female peer migration rates separately because they are highly correlated. However, we control for the interaction between the gender and firstborn fertility outcome of the study individual, to capture the fact that fertility outcomes have opposite effects on one’s own migration decision depending on their gender. Equation (1) is estimated by a linear probability model, with errors clustered by

13 village (v) and adjusted for heteroscadasticity. As omitted variables may capture similar socioeconomic status, similar preference or common environment, we expect .345 > .7545.

Validity of Instruments We propose two IVs for peer migration ((!/"|-): (1) the percent of same- village households that have a first birth involving two (or more) children; (2) the percent of same-village households whose first birth is a singleton girl. Both IVs are conditional on the households that have any adult labor aged 17-35 and for which we can clearly define the oldest child, excluding household h. We focus on firstborns only, because births of higher order are more likely subject to sex selection (Ebenstein 2011). The validity of the IVs relies on two assumptions: first, the gender of -i’s firstborn and whether -i has multiples in the first birth are correlated with -i’s own migration decision; second, these variables are uncorrelated with the other households’ migration decision in the same village. In the absence of sex selection21, the occurrence of twins, triplets, or a girl in the firstborn should be out of the control of a household. However, this does not automatically imply the second assumption holds because we encounter several measurement errors and households in which blood relations may help each other in child care and elderly care. Below we discuss each threat in detail. One measurement error is in the definition of firstborn. Since our data captures a one-time snapshot, the oldest child in our sample may not be the first birth if some elderly sibling(s) has grown past age 16 or died before the data collection time. Conditional on households reporting the oldest child no more than age 12 alleviates the problem. Relatedly, there might be some sex selection in favor of boys in the observed oldest child, even if the actual firstborns are balanced in sex. While we cannot rule out such sex selection, it is comforting to note that the percent of singleton girl in the observed first births (48.81%, versus 50.28% for singleton boy) is close to the natural ratio (James 1987 and Cai and Lavely 2005). Consistent with Ebenstein (2011), we also find significant gender differences in second and later children. As shown in Table 5, households with a girl firstborn are more likely to have a second child (63%) than those with a boy firstborn (59%).22 Moreover, the second and later births are more likely to have (at least one) boy if the firstborn is a girl (66%) than otherwise (42%). Put it another way, the probability of having (at least one) boy and having (at least one) girl after firstborn is very close if the firstborn is a boy (42% vs. 44%), but far away if the firstborn is a girl (66% vs. 37%). Some of these numbers are larger than 50% because they include children born after the second birth. In particular, 16.37% of all children with valid firstborn data are third-born, and 5.51% are fourth or above. Combined, households with a girl firstborn tend to have

21 In a rural area as poor as our sample, there is no fertility treatment service. 22 The high percentage of second-birth conditional on a boy firstborn suggests that either the fertility control is not strict in the study area or many families are willing to pay for the monetary fine in association with a second birth after a boy firstborn. This will not invalidate our study so long as there is a statistical difference of second birth conditional on the gender of the firstborn.

14 more children under age 12 (2.03) than those of a boy firstborn (1.88). This confirms the conjecture that the gender of firstborn affects family size and the gender composition of later-borns. The second measurement error is that we may miscount two close-by births as twins because our data only report age in years instead of months or days. This data problem may lead to (1) an over-estimate on the percent of multiple births, and (2) a higher-than-natural rate of mixed gender in these multiples. The latter could occur if a girl first-born in January motivates the birth of a subsequent boy in November or December. To check these concerns, we find that among all the firstborns the likelihood of having two or more children at the same age is 0.91%, which is consistent with the natural probability of multiple births in both the international literature (James 1987) and the period of time in China before the implementation of one-child policy (Cai and Lavely 2005). Even if the fertility outcome of firstborn is exogenous, one may argue that two households’ fertility and migration decisions may be directly correlated if their household heads are close relatives.23 For example, consider two middle-age brothers who have a mother of 55 years old. If one brother has a girl firstborn, he may invite the mother to live in and take care of the baby so that he can migrate for work. This living arrangement may affect the other brother’s fertility or migration decision. Unfortunately, our data do not indicate blood relationship. In one robustness check, we will restrict IVs to same-village households that have different surnames as the study household. The other assumption for the IV validity is that individuals’ fertility outcome must be correlated with their own migration decision. As shown in the first column of Table 6, OLS results suggest that adult males are more likely but adult females are less likely to migrate if they have a girl firstborn. This is understandable, because having a girl firstborn tends to increase the likelihood of having later births and more resources are needed to support the increased family size. In contrast, having multiple births in the firstborn tend to discourage adults from migration, and this effect is negative for both genders suggesting that the need for child care at home may dominate the need for resources to support the family. These patterns persist after we use IVs on the peer migration rate. The correlation between migration and girl firstborn may also relate to households’ boy preference, because households with a girl firstborn tend to have a larger family size but are less likely to have any boy. As shown in Appendix Table A1 (which reports the full set of coefficients from the same regressions as in Table 6), family size of every age range tends to increase the likelihood of migration, but having at least one boy in the household lowers the likelihood of migration. This suggests that a household with strong boy preference may be more willing to stay home to take close care of the boy(s) than migrate to accumulate more wealth for the boy(s) and the household. One may also argue that because parents rely on their sons

23 It is very rare to have a daycare center or elderly care center in rural China, especially if the area is poor and the local governments have few resources to provide such public goods.

15 for elderly care, they may have fewer incentives to work away from home and save for themselves if they have sons. Tan (2003) suggests somewhat the opposite: while parents may still have such perception in their mind (which justifies the boy preference in fertility), parents’ actual economic return from sons is no higher than that of daughters. The main reasons are (1) adult sons tend to give less percent of their own income to the parents, (2) more and more adult sons do not live with parents after marriage, and (3) daughters also offer elderly care to the parents, especially if the parents have no sons. Conditional on the IV validity, one may suspect a weak IV problem as the percent of girl firstborn should be close to 50%, especially for large villages.24 As shown in Figure 5, the percent of girl firstborns ranges from 18.9% to 66.7% and the percent of multiple births in the firstborn ranges from 0% to 0.8%, at the village level. These variations occur because village size varies greatly within our data: 14.9% of villages have 100-1000 adult labors (17-60), 45.6% have 1000-2000, 26.6% have 2000-3000 and 13.9% have 3000+. If we only count adult labors aged between 17 and 35, the median number of adults per village is 439, and 60.27% have 100-500 adult labors. By law of large numbers, these smaller villages demonstrate more variations in the percent of girl firstborn and the percent of multiples in the firstborns. To be sure, the IV results reported below are accompanied with a conditional likelihood ratio (LR) test for weak instruments, and F-statistics for the first stage.25 Key results The key results of specification (1) are presented in Table 6. In addition to the OLS results in Column (1), we present three columns of IV estimates: the first using the percent of same-village adults (17-35) having multiples in the firstborn as the only IV for the percent of peer migration in the same village (17-35, excluding the study household); the second using the percent of girl firstborn of co-villagers as the only IV; and the third using both IVs. As shown in Panel A, both IVs are highly significant in the first stage. Panel B of Table 6 shows the 2SLS estimates and the coefficients for basic demographics (age, gender, education) and firstborn outcomes, while other coefficients from the same specification are reported in Appendix Table A1. All four columns find that migrants are younger, more educated, and have more access to drivable roads. They are also more likely to be male, and have less house value and less contract land. These patterns are consistent with the existing literature on both international migration (Rosenzweig 1988, Lucas 1997) and internal migration within China (Zhao 1999a & 1999b). As discussed above, firstborn outcomes – having a girl firstborn or having multiples – could affect one’s own migration decision in multiple ways, via future fertility, family size, boy preference, etc. Because these effects apply to every individual, the first stage coefficients reflect all the mechanisms by which peers’ firstborn outcomes could affect peer migration rate. For example, at the individual level, having multiples

24 Similar argument applies to the percent of multiple birth. 25 We adopt conditional LR because it is more robust than Anderson-Rubin and score tests (Andrews and Stock 2007).

16 in the firstborn has a negative coefficient on self-migration, but it also increases family size, which has a positive impact on self-migration. In aggregate, when peers in a village have a higher rate of multiple firstborns, it is correlated with a higher rate of peer migration in the first stage. Similarly, having a girl firstborn has opposite effects on the migration of male and female adults in that household, and its subsequent effect on family size and boy presence affects self-migration as well. In aggregate, the first stage finds a negative correlation between the rate of girl firstborns in a village and peer migration rate. Regardless of sign and magnitude, the first-stage coefficients of both IVs are highly significant, and their strength are confirmed by the F-statistics and the conditional LR tests in Panel B. The key coefficient for peer migration rate, l , is 1.384 in the OLS (Column 1), which is much higher than the 2SLS coefficient when we use peers’ girl firstborn as the only IV (0.867, Column 3) or use both IVs (0.918, Column 4). This confirms the concern that the OLS coefficient likely picks up confounding factors among peers. After correcting it by IVs, the magnitude of peer effects remains economically large, implying that a 10 percentage point increase in the proportion of peer migration has the same influence as an increase of education by 7.3 years. Taking Table 6 Column (4) as the preferred specification, the 2SLS estimate suggests that every one percentage point increase in the percent of same-village adults migrating away will increase one’s own migration probability by 0.918 percentage point. Two factors may explain this seemingly large effect of social interactions: First, most Chinese rural-to-urban migrants leave families at the origin and therefore have plenty of opportunities to communicate with people in the same village. Second, due to the lack of job information via formal channels, potential migrants must rely on friends, relatives, and other social networks. Given the geographic sparseness of rural areas, current migrants in the same village are likely the most important source of job information in remote destinations. Robustness Checks The following robustness checks ensure that the reported effects of social interactions are not driven by sample selection, variable construction, or invalid instruments. To address the concern that working away from home for 1-2 months is not migration, we redefine migration as working away for at least 6 months. Column (1) of Table 7 shows that the 2SLS coefficient (using all three IVs) is smaller but still significantly different from zero (0.576 vs. 0.918). The smaller magnitude can be explained by the fact that those who migrate away for a long time often migrate to a far- away province and have fewer opportunities to interact with peers. Because close relatives may coordinate in migration decisions but we do not know who and who are close relatives, Column (2) restricts the calculation of peer migration rate and IVs to peers with different surnames (from the study household). Obtaining similar results (0.887) in this specification suggests that the observed correlation between self and peer migration is not driven by households being close relatives.

17 Additional robustness checks consider two extra groups of peers. The first group is the adults that live in the same township but not in the same village. Since township covers a set of adjacent villages, same-township adults may communicate across villages. Column (3) of Table 7 reports the 2SLS estimates including both the percent of peer migration in the same village and the percent of peer migration in the same township but different villages. Both are instrumented by the two IVs constructed for the relevant peers. Results suggest that peers from the same township but different villages have a positive but insignificant effect on one’s own migration decision. In particular, its magnitude (0.897) is slightly smaller than that of same-village peers (0.918), but the standard error is huge. This confirms the assumption that people tend to interact more with co-villagers than with people in nearby villages. In Column (4), we add information about the second peer group, namely the percent of migration for the adults who live in the same household. In theory, migration within a household may be positively correlated due to social interactions or unobserved household factors. The correlation could also be negative if the household makes individual migration decisions jointly (for example, insurance concern may motivate the household to diversify in agricultural and non-agricultural activities), or if there are unobserved individual factors that are different across family members. Unfortunately, the gender of firstborns and the occurrence of multiple births are applicable to both parents, hence we cannot use them for the percent of same-household adults that migrate. For this reason, the regression reported in Column (4) includes same- household migration, but do not use any IV for this variable. Although we still use IVs for peers outside the household, the coefficient on same-household peers does not necessarily identify the causal effect within a household. Keeping this in mind, Column (4) suggests that there is a positive correlation in the migration decisions of self and other household members. The relatively larger coefficient on the percent of same household migration indicates that insurance concern is not a dominating factor in the migration decisions within a household. To address concerns on potential sample selection, Column (5) of Table 7 include all adult laborers redefines aged 17-60 instead of 17-35. This change produces similar coefficient of peer migration (0.936 vs. 0.918), which is not surprising because migration tends to concentrate in young age (Figure 3). Column (6) conditions the analysis sample on the households that have only two adults. These households have a simple relationship among family members, which allows us to clearly define fertility history. The effect of peer migration is slightly lower for this sub-group of population (0.708 vs. 0.918 for full sample), probably because two-adult families have a hard time finding live-in help for child care which is much needed if parents stay away from home for a long time. The last column of Table 7 excludes minority (non- Han) gathering villages because minorities are not subject to the one-child policy and minorities are much less likely to migrate than the Hans. The 2SLS effect of peer migration increases a little for this reason (1.110 vs. 0.918).

18

5. Mechanisms of Positive Peer Effects According to the IV results, co-villagers enjoy a net positive peer effect on migration, which could reflect information exchange at the origin or reduction of moving cost at the destination (Calvo-Armengol and Jackson 2004 and Carrington et al. 1996). It is difficult to distinguish the two mechanisms because our cross-sectional data were collected at the origin, with little information on social interactions at the destination. A third mechanism is peer pressure. If migration is a positive signal of ability, observing more peers migrating could leave a non-migrant feeling inferior. Such peer pressure may imply stronger peer effects within similar age, gender and education, but not necessarily within the same destination for the same sector.26 To detect peer effects by age, we separate the sampled adults into two groups (17-25 and 26-35), and compute peer migration rate (8/"|-) for the two groups separately. The group-specific peer migration rates are instrumented by the two IVs described above, but each IV is constructed within the corresponding age group in the same village. To detect how peers of different age groups ({;<, ;7}) affect individual ? in a specific group ;, we revise Specification (1) as

!"|@ = $% + ' ∙ )" + * ∙ )+ + , ∙ )- + .< ∙ !/",@A|- + .7 ∙ !/",@B|- + 1". (2) and apply it to the sub-sample of each group separately. This way we can identify a matrix of 2SLS coefficients for the effect of 17-26 peers on 17-26, the effect of 26-35 peers on 17-26, the effect of 17-26 peers on 26-35, and the effect of 26-35 peers on 26-35. Similar specification can be applied to grouping by other demographics such as gender, surname or education. Table 8a reports the results by age groups. According to the 2SLS coefficients, peer effects are much larger within the same age group (0.653-0.723) than across different groups (0.0232-0.157). In term of statistical significance, only the peer effects among those of 17-26 are significant, probably because 17- 26 is the prime age of migration and people in this age range are more likely influenced by peers. Table 8b repeats the exercise by three surname groups: adults with the most dominant surname in the village, adults with the second most dominant surname in the village, and adults of other surnames. Results suggest large, positive peer effects within the same surname group, but much lower or even negative effects across different groups. In terms of significance, only the coefficient within the most dominant surname and the coefficient within other surnames are statistically significant, probably because we have a bigger sample in

26 The peer pressure could be related to destination or industrial sector if going to a specific destination, say Beijing, has a positive signaling value in the eyes of peers. In the data, destination- or industrial-specific peer pressure is not distinguishable from the social network effects, not only because they are observational equivalent, but also because such peer pressure is likely to rely on the help from earlier migrants to result in clustered migration.

19 these two groups than in the middle group. Unreported, we also run Specification (2) by gender and education, but find no clear clustering in either dimension. Specification (2) is applicable to sub-samples by demographics, but not by destination and industrial sector. This is because destination and sector choices are made by migrants conditional on migration. To detect peer effects by destination and occupation, we return to the full study sample. Suppose

d there are n potential destinations and yi is denoted 1 if individual i migrates to destination d, 0 otherwise.

d We then regress yi on the percent of same-village adults migrating to destinations 1, 2 … n, according to the following specification:

C CA CB CF !" = $% + ' ∙ )" + * ∙ )+ + , ∙ )- + .< ∙ !/"|- + .7 ∙ !/"|- + ⋯ + .E ∙ !/"|- + 1". (3)

The coefficients, {.<, .7, … , .E}, capture the correlation between self-destination and peer destination. Unfortunately, our IVs are only relevant for whether peers migrate or not, not where to migrate or what to do after migration. To identify social interactions by destination or industrial sector, we must find additional IVs for each destination or each industrial sector.

We compute the travel distance from village v to destination dj by summing up the distance of the village to the nearest bus/rail/dock station, the distance from the station to the township it belongs to, and the distance from the township to the destination. For destinations within the sampled province, we define the distance as distance from village v to the provincial capital. For destinations that fall in the residual category of “others”, we compute the distance from village v to the biggest city of an adjacent province.

CI Based on the distance variables, we define the instruments for !/"|- as the distance from J to KL times the two instruments used in Specification (1). Because individual i’s decision to migrate to KL will consider

CI CA CB CF distance to all alternative destinations, the 2SLS regression of !"|- on {!/"|-, !/"|-, … , !/"|-} also controls

CI for the distances from J to {K<, K7, … , KE} directly. Since we run a regression of !"|- for each destination separately with township fixed effects, any unobserved correlation between the origin township and the destination is already accounted for. It is more difficult to construct sector-specific instruments because we know nothing about employers of migrants. However, there are natural demographic differences across sector: most construction workers are male, most service industry workers are female, and manufacturing jobs usually require more skills than construction and service jobs. All these jobs prefer young to old. In light of these variations, we first compute the percent female, the percent of each age group (16-22,23-29, 30-39, 40+), and the percent of each education group (6 years of schooling, 7-9, 10-12, 13+) for each village. We then

MA MB MN interact them with the two IVs used before as IVs for !/"|- (manufacturing), !/"|- (service), !/"|-

20 MO (construction), and !/"|- (other). Like the destination regressions, the 2SLS regression specific to each sector controls for the percent of female, age groups, and education groups at the village level. Tables 8c and 8d report the 2SLS regression of Specification (3) for migration choice of destination and industrial sector. It is apparent that migrants from the same village are highly clustered by destination and industrial sector. Almost all the coefficients on the diagonal (indicating the same destination or the same sector) are positive and significant. In contrast, most off-diagonal coefficients (indicating peer effects across destination and sector) are of smaller magnitude and insignificant from zero; some are even significantly negative because different destinations (or sectors) are potential substitutes. Overall, the within-village cluster by sector is most consistent with villagers sharing job information at or about the destination, while the cluster by destination could be consistent with both the reduction of moving cost and the sharing of job information.

6. Effect of peer migration on non-migrants The 2SLS results presented above suggest a net positive peer effect in migration decision, but they do not rule out the existence of negative peer effects. Could a higher peer migration rate alleviate the population-land pressure in the village and thus help non-migrants improve their agricultural income? To answer this question, we regress a non-migrant family’s land use and agricultural income

(P"|QRST) on its peer migration rate in the same village, including the same controls as before. This amounts to Specification (4) on the sub-sample of non-migrant families:

P"|QRST = $% + ' ∙ )" + * ∙ )+ + , ∙ )- + U ∙ !/"|- + V". (4) Table 9 presents the 2SLS results, using the same IVs as in the main results (Table 6) for peer migration rate. According to the first two columns, a higher peer migration rate in the village leads to a higher likelihood of a non-migrant family working on rented land and in total having more land in use. This is as expected. However, the last two columns of Table 9 suggest that more land in use does not significantly improve the non-grant family’s agricultural productivity (measured by household agricultural income per unit of land in use) or agricultural income per labor (measured by household agricultural income per adult aged 17-60). One explanation is that the land that migrating peers rent out to non-migrants is small and of low quality. Another explanation is that non-migrants are less productive laborers. Though one’s firstborn outcomes are unlikely to be correlated with one’s agricultural productivity, the positive peer effects (via information sharing, reduction of moving cost, and peer pressure) could be heterogeneous. If they are more effective on those of high agricultural productivity, villages with a higher migration rate may yield more productive labors to migration, generating a negative selection effect on the remaining non-migrants. Overall, Table 9 suggests that migration relieves the population-land pressure at the origin, but does not

21 lead to improvement in agricultural productivity or agricultural income per adult for the non-migrant families.

7. Conclusion The unprecedented labor migration in China provides an excellent opportunity to deepen our understanding of whether, why and where laborers migrate. Constructing instruments based on whether neighbors have a girl firstborn and whether neighbors have multiple birth(s), we find a large, positive peer effect within a village. Evidence suggests that the effect is clustered by age and surname, and partly driven by co-villagers sharing job information and/or reducing moving costs at the same destination. However, migration of co-villagers does not increase the agricultural productivity or agricultural income per adult in the non-migrant families, although they do use more land as a result of peer migration. The peer effects found in our study implies that a policy that subsidizes a small fraction of the rural population to migrate could have a large and persistent effect on subsequent migrations from the same village. Interestingly, not only does the snowball effect help transfer agricultural labor to non-agricultural activities, it could also create a significant demographic hole in the high migration villages, because migrants tend to be young, male, and probably more productive than non-migrants. Such demographic- specific migration has already generated many left-behind children, who grow up with only one or even zero parents nearby (Gao et al. 2010). Although grandparents can offer care to these left-behind children, and their migrant parent(s) can send money back to the rural household, these children feel lonely and suffer problems such as mental instability, lack of confidence, and school bullying (Zhang 2013; Xiang 2016, Wu, Song and Huang 2016, Growing Home 2016). Evaluating the impact of migration on these social-economic issues is a promising direction for future research.

8. References Aizer, Anna and Janet Currie. (2004) "Networks or Neighborhoods? Correlations in the Use of Publicly-Funded Maternity Care in California," Journal of Public Economics 88(12): 2573-2585. Andrews, Donald W.K. and James H. Stock. (2007) “Inference with Weak Instruments,” Advances in Economics and Econometrics, Theory and Applications: Ninth World Congress of the Econometric Society, Vol. III, ed. by R. Blundell, W. K. Newey, and T. Persson. Cambridge, UK: Cambridge University Press. Angrist, Joshua D and William N. Evans. (1998) “Children and Their Parents' Labour Supply: Evidence from Exogenous Variation in Family Size” American Economic Review, vol. 88(3) : 450-77. Angrist, Joshua D. and Kevin Lang. (2004) “Does School Integration Generate Peer Effects? Evidence from Boston's Metco Program” American Economic Review, Vol. 94, No. 5 (Dec., 2004), pp. 1613-1634. Bao, Shuming, Örn B. Bodvarsson, Jack W. Hou, and Yaohui Zhao. (2007) “Interprovincial Migration in China: the Effects of Investment and Migrant Networks.” IZA Discussion No. 2924. Bartel, Ann P. (1989) “Where Do the New U.S. Immigrants Live?” Journal of Labor Economics 7(4): 371-391.

22 Bayer, Patrick, Stephen L. Ross and Giorgio Topa. (2008) "Place of Work and Place of Residence: Informal Hiring Networks and Labor Market Outcomes," Journal of Political Economy, 116(6): 1150- 1196, December. Becker, Gary. (1975) : A Theoretical and Empirical Analysis (2nd Edition), New York: National Bureau of Economic Research. Borjas, George J. (1994) “The Economics of Immigration,” Journal of Economic Literature 32(4): 1667-1717. Caces, F, F Arnold; J.T. Fawcett and R.W.Gardner. (1985) “Shadow Household and Competing Auspices: Migration Behavior in Philippines” Journal of Development Economics, 17:5-25. Cai, Fang. (1996) “An economic analysis for labor migration and mobility” Social Sciences in China (Spring): 120–35. Cai, Fang, Albert Park, and Yaohui Zhao “The Chinese Labor Market in the Reform Era,” forthcoming in Loren Brandt and Thomas Rawski, eds., China’s Great Economic Transformation (New York: Cambridge University Press). Cai, Fang and Wang Dewen. (2003) “Migration As Marketization: What Can We Learn from China’s 2000 Census Data?” The China Review, Vol. 3, No. 2: 73–93. Cai, Yong and William Lavely. (2003) “China’s Missing Girls: Numerical Estimates and Effects on Population Growth” The China Review, Vol. 3, No. 2: 13–29. Calvo-Armengol, Tony and Matthew Jackson. (2004) “The Effects of Social Networks on Employment and Inequality,” American Economic Review 94(3): 426-454. Carrington, W.J, E. Detragiache and T. Vishwanath. (1996) “Migration with Endogenous Moving Costs” American Economic Review, 86(4): 909-30. Chen, Jinyong. (2006) “Reform of Hukou Policy and Rural-urban Migration in China” in Fang Cai and Zansheng Bai edited Labor Migration in Transition China 2006, Social Science Academy Press (China). Chen, Jun. (2005) "On the Problem of Information Shortage Encountered by Peasant-workers in Job Hunting,” Hunan Social Sciences, 2005(5): 83-85. Chen, Yiu Por (2009),”Cream-Skimmer or Underdog? Labor Type Selectivity, Pre-Program Wage, and Rural Labor Training Program Outcome,” IZA Discussion Papers 3979. Chen, Zhao, Shiqing Jiang, Ming Lu, and Hiroshi Sato. (2008) “How do Heterogeneous Social Interactions Affect the Peer Effect in Rural–Urban Migration? Empirical Evidence from China.” LICOS Discussion Papers 22408, Centre for Institutions and Economic Performance, K.U.Leuven. Du, Yang, Albert Park, and Sangui Wang. (2005) “Migration and Rural Poverty in China”, Journal of Comparative Economics, 33:4 (2005): 688-709. Du, Yin. (2000) “Rural Labor Migration in Contemporary China: An Analysis of Its Features and the Macro Context,” in West, Loraine & Zhao, Yaohui edited Rural Labor Flows in China, Institute of East Asian Studies, University of California. Duflo, Esther and Emmanuel Saez. (2002) “The Role of Information and Social Interactions in Retirement Plan Decisions: Evidence from a Randomized Experiment,” The Quarterly Journal of Economics, Vol. 118, No. 3 (Aug., 2003), pp. 815-842. Ebenstein, Avraham. (2011). “Estimating a Dynamic Model of Sex Selection in China.” Demography 48, no. 2 (2011): 783. Fei, John C. H and Ranis Gustav. (1964) Development of the Labor Surplus Economy: Theory and Policy, New Haven, CT: Yale University Press. Feng, Zhanlian, Chang Liu, Xinping Guan, and Vincent Mor. (2012) “China's rapidly aging population creates policy challenges in shaping a viable long-term care system.” Health Affairs, 31(12), 2764-2773. Gao, Yang, Li Ping Li, Jean Hee Kim, Nathan Congdon, Joseph Lau, and Sian Griffiths. (2010) “The impact of parental migration on health status and health behaviours among left behind adolescent school children in China,” BMC Public Health, 10(1), 56. Growing Home. (2016), A Research Report on Boarding School Students in the Countryside of China, Retrieved from: http://www.growinghome.org.cn.

23 Han, Jun, Zhihong Wang, Chuanyi Cui, and Yupeng He. (2010) “The General Trend of Chinese Migrant Workers: Viewing the Twelfth Five-Year Plan.” Reform (8) 7- 31. Hare, Denise and Shukai Zhao. (2000) “Labor Migration as a Rural Development Strategy: A View from the Migration Origin,” in West, Loraine & Zhao, Yaohui edited Rural Labor Flows in China, Institute of East Asian Studies, University of California. Huang, Ping and Frank N. Pieke. (2003) “China Migration Country Study,” Paper presented at the Regional Conference on Migration, Development and Pro-Poor Policy Choices in Asia, Dhaka, June 21– 24, 2003. Ioannides, Yannis and Linda Datcher Loury. (2004) “Job Information Networks, Neighborhood Effects and Inequality.” Journal of Economic Literature 42(4): 1056-1093). James, William H. (1987) “The Human Sex Ratio, Part1: A Review of the Literature,” Human Biology 59:721–752. Jian, Yulan. (2005) “China's rural labor force transfer of the status quo and countermeasures”, The Rural Economics, 2005(5): 110-112. Kling, Jeffrey R, Jeffrey B. Liebman and Lawrence F. Katz. (2007) “Experimental Analysis of Neighborhood Effects,” Econometrica 75(1): 83–119. Knight, John, Song Lina and Jia Huaibin. (1999) “Chinese Rural Migrants in Enterprises: Three Perspectives” Journal of Development Studies 35: 73-104. LaLonde, R. J., and R. H. Topel. (1991) “Labor Market Adjustments to Increased Immigration, ” in J. M. Abowd and R. B. Freeman, eds., Immigration, Trade, and the Labor Market, Chicago: University of Chicago Press. Lee, Everett S (1966) “A Theory of Migration,” Demography 3: 47-57. Lewis, W. Arthur. (1954) “Economic Development with Unlimited Supplies of Labour,” Manchester School of Economic and Social Studies 22:139-91. Li, Hongbin, Junsen Zhang and Yi Zhu. (2006) “The Effect of the One-Child Policy on Fertility in China: Identification Based on Differences-in-Differences” Working Paper, the Chinese University of . Liang, Qiusheng and Che-Fu Lee. (2006) “Fertility and Population Policy: An Overview” in Fertility, Family Planning, and Population Policy in China, edited by D.L. Poston, Jr; Che-Fu Lee; Chiung-Fang Chang; Sherry L. McKibben and Carol S. Walther, City: Routledge Taylor & Francis Group. Lin, Justin Yifu; Gewei Wang and Yaohui Zhao. “Regional Inequality and Regional Transfers in China” in Cai Fang and Bai Nansheng edited Labor Migration in Transition China 2006, Social Science Academy Press (China). Liu, Jialu. (2010) “Does Migration Income Help Hometown Business? Evidences from Rural Households Survey in China”, Economics Bulletin, Vol. 30 no.4 pp. 2598-2611. Lowry, Ira S. (1966) Migration and metropolitan growth: Two Analytical Models. San Francisco: Chandler. Lucas, Robert E.B. (1987) “Emigration to South Africa’s Mines” American Economic Review 77: 313-330. Lucas, Robert E.B. (1997) “Internal Migration in Developing Countries” Chapter 13, Handbook of Population and Family Economics, edited by M.R. Rosenzweig and O. Stark. City: Elsevier. Manski, Charles F. (1993) “Identification of Endogenous Social Effects: The Reflection Problem,” Review of Economic Studies, 60:531-542. Mallee, Hein. (2000) “Agricultural Labor and Rural Population Mobility: Some Observations,” in West, Loraine & Zhao, Yaohui edited Rural Labor Flows in China, Institute of East Asian Studies, University of California. Maurin, Eric, and Hulie Moschion. (2009) “The Social Multipiler and Labor Market Participation of Mother,” American Economic Journal: Applied Economics 2009 1(1): 251-272. Mckenzie, David and Hillel Rapoport. (2007) “Network Effects and the Dynamics of Migration and Inequality: Theory and Evidence from Mexico,” Journal of Development Economics 84: 1-24.

24 Mckenzie, David and Nicole Hilderbrandt. (2005) “The Effects of Migration on Child Health in Mexico,” Economia 6(1): 257-289. Melenberg, B., and Zheng, J. (2012). Health Expectancy of the Chinese Elderly: Current Trends and Future Projection. City: Tilburg University. Meng, Xin. (2000) “Regional wage gap, information flow, and rural--- urban migration,” in West, Loraine & Zhao, Yaohui edited Rural Labor Flows in China, Institute of East Asian Studies, University of California. Men, Kepei and Wei Zeng. (2003) “Prediction of China Population over the Next 50 Years.” The Journal of Quantitative & Technical Economics (in Chinese): F224. Munshi, Kaivan. (2003) “Networks in the Modern Economy: Mexican Migrants in the U.S. Labor Market.” Quarterly Journal of Economics, 118(2): 549-600. National Survey Research Center (2014). China Longitudinal Aging Social Survey. Beijing, China. National Bureau of Statistics. (2016). National data. [Data file]. Retrieved from: http://data.stats.gov.cn/easyquery.htm?cn=C01 Qian, Nancy. (2009) “Quantity-Quality and the One Child Policy: the Only-Child Disadvantage in School Enrollment in Rural China,” NBER working paper #14973. Rosenzweig, Mark R and Kenneth L Wolpin. (1980) “Testing the Quantity-Quality Fertility Model: The Use of Twins as a Natural Experiment,” Econometrica, January 1980a, 48(1): 227-40. Rosenzweig, Mark. (1988) “Labor Markets in Low-income Countries” Chapter 15 in the Handbook of Deveopment Economics, Volume 1, edited by H. Chenery and T. N. Srinivasan. City: Elsevier. Sen, Amartya. (1990) “More than 100 million women are missing,” New York Review of Books 37(20): 61-66. Sheng, Laiyun. (2008) Flows or Migration: the Economic Analysis China’s Rural Labor Flow. City: Shanghai Far East Publishers (China). Sjaastad, Larry A. (1962) “The costs and returns of human migration,” The Journal of Political Economy, 1962, vol. 70, no. S5: 80-93. Tan, Kejian. (2003) “Value Difference of Sons and Daughters to Their Families in Poverty-stricken Areas,” Population Research (in Chinese): 2003 27(2). Taylor, J. Edward. (1986) “Differential migration, networks, information and risk,” in Migration, Human Capital and Development, edited by O. Stark. Greenwich, CT: JAI Press Wan, Chuan, Dezhu Wang and Bo Li. (2004) “Debates on Temporary Residence Permit System and Necessity for Adopting It.” Chinese Journal of Population Science (in Chinese): C29. Wang, Dewen, Wu Yaowu, and Cai Fang. (2003) “Migration, unemployment, and urban labor market segregation in China’s economic transition.” working paper, Beijing: Institute of Population and Labor Economics, Chinese Academy of Social Sciences. Wei, Shang-Jin and Xiaobo Zhang (2011) "The Competitive Motive: Evidence from Rising Sex Ratios and Savings Rates in China," Journal of Political Economy, 119(3): 511-564. White, Tyrene. (1992) “Birth Planning between Plan and Market: The Impact of Reform on China’s One-Child Policy.” China’s Economic Dilemmas in the 1990’s: The Problems of Reforms, Modernization, and Interdependence. Studies in Contemporary China. Armonk, N.Y.U.S.C.J.E. Committee and London, Sharpe, 1992: 252-69. Woodruff, Christopher and Rene Zenteno. (2007) “Migration Networks and Microenterprises in Mexico.” Journal of Development Economics 82: 509-528. Wu, Fangwen, Yingquan Song and Xiaoting Huang. (2016) “School Bullying Generates More Harm to Board School Students in Countryside: An Empirical Analysis Based on 17,841 Students in Rural Boarding Schools,” Management of Primary and Secondary School (8): 8-11. Wu, Xiaoyu and Lixing Li (2011), “Gender of Children, Bargaining Power and Intrahousehold Resource Allocation in China,” Journal of Human Resources, 46(2): 295-316. Xiang, Yanni. (2016) “Research on How to Cultivate Moral Emotion of Left-behind Children in Rural Areas – Taking the Baoping Village of Yongshun County, Xiangxi Automomous Prefecture of Hunan Province as an Example.” East China Normal University, Master Thesis.

25 Young, Alwyn. (2003) “Gold into Base Metals: Productivity Growth in the People’s Republic of China during the Reform Period,” Journal of Political Economy, Vol 111, no. 6. Zhang, Qian. (2013) “Comparative Research between Mental Health and Behavior Problems of Left- Behind Children and Those of non-Left-Behind Ones in Rural Shandong Province.” Shandong University, Master Thesis. Zhang, Weiqing. (1998) Introduction to Family Planning in China. (in Chinese), China Population Publishing House. Zhao, Yaohui. (1999a) “Leaving the Countryside: Rural-to-Urban Migration Decision in China,” American Economic Review Papers and Proceedings, May 1999, 89(2): 281-286. Zhao, Yaohui. (1999b) “Labor Migration and Earnings Differences: the Case of Rural China,” Economic Development and Cultural Change 47(4): 767-782. Zhao, Yaohui. (2002) “Cause and Consequences of Return Migration: Recent Evidence from China,” Journal of Comparative Economics, 30: 376-394. Zhao, Yaohui. (2003) “The Role of Migrant Networks in Labor Migration: The Case of China,” Contemporary Economic Policy, 21: 500-511. Zhao, Zhong. (2005) “Migration, labor market flexibility, and wage determination in China: a review,” The Developing Economies, 43:285-312.

26 Figure 1 Transfer of rural labor to non-agriculture activities, 1985-2004, all China

Note: Data from Han, Wang, Cui and He (2010). The variable “labor in home town non-agricultural industries” is defined by the number of rural laborers working in local non-agricultural sectors. The variable “migrant workers” is defined by the number of rural laborers working outside the hometown.

Figure 2 Distribution of migrants by # of months away from home in 2006, study sample

0.5

0.4

0.3

0.2

0.1

0.0 1 2 3 4 5 6 7 8 9 10 11 12

Female migrants Male migrants Labor migrates to other area of the same province Labor migrates to other province

27

Figure 3 Percent of adults that migrate in 2006, by age and gender, study sample

Figure 4 Histogram of migration percentage per village, raw data, study sample

150 100 Frequency 50 0 0 .2 .4 .6 .8 Migration rate per village

28

Figure 5a Percent of first born as female, village level

Figure 5b Percent of first born as multiple birth, village level

29

Figure 6a: Age distribution in villages where 0-10% adults migrate Figure 6b: Age distribution in villages where 10-20% adults migrate 80000 30000 60000 20000 40000 Head count Head Head count Head 10000 20000 0 0

0 20 40 60 80 100 0 20 40 60 80 100 age age Migrant Non-migrant Migrant Non-migrant

Figure 6c: Age distribution in villages where 20-30% adults Figure 6d: Age distribution in villages where 30+% adults migrate migrate 25000 25000 20000 20000 15000 15000 Head count Head 10000 Head count Head 10000 5000 5000 0

0 20 40 60 80 100 0 age 0 20 40 60 80 100 age Migrant Non-migrant Migrant Non-migrant

30

Table 1: Comparison of poverty villages and non-poverty villages

Variables Non-poverty villages Poverty villages % of first-born children being girls 0.49 0.49 (0.07) (0.07) % of first-born children being multiples 0.02 0.02 (0.01) (0.01) % of laborers migrate 0.28 0.27 (0.45) (0.44) Average year of education 6.71 6.10 (0.90) (1.15) Average land per laborer 1.58 1.87 (0.73) (0.77) Distance to the nearest bus station (kilometer) 4.53 8.45 (6.71) (9.43) # of population in the village 1386.84 1352.37 (867.34) (817.67) The village is a minority gathering 0.21 0.41 (0.41) (0.49) Regular water use is guaranteed 0.43 0.30 (0.50) (0.46) Have organized production and sale of agriculture products 0.10 0.06 (0.31) (0.24) % of natural gathering groups that have access to electricity 0.98 0.93 (0.11) (0.21) % of natural gathering groups that have access to telephone 0.84 0.71 (0.30) (0.38) % of natural gathering groups that have access to TV signal 0.93 0.87 (0.23) (0.30) % of natural gathering groups that have access to drivable road 0.72 0.59 (0.31) (0.33) # of villages 1775 2196 Note: Exclude 6 non-poverty and 9 poverty villages that have fewer than 100 labor-age adults (17-60). Standard deviation in parentheses.

31

Table 2: Distribution of migrants by destination, study sample

Destination % migrants Per capita income of 2006 Railway hours to (relative to rural of the sampled area) destination rural urban Within province within the sampled area 8.21% 1.00 3.57 0 outside the sampled area 15.43% 0.97 4.56 3-4 Across province Province A 28.88% 3.59 8.40 26 Province B 21.53% 2.49 7.57 20 Province C 9.92% 2.37 6.71 38 Province D 4.72% 1.10 5.07 10.5 Province E 3.61% 2.85 6.83 36 Province F 1.98% 4.47 10.98 26.5 Province G 0.20% 1.41 5.57 6.5 Province H 0.63% 1.47 4.51 13 Other provinces 4.89% 1.91 6.27 n.a. Total 100% Note: This table is conditional on migrants in our study sample. Out of all adults (17-35) in our study sample, 27% are defined migrants because they have been away from the village for the reason of work for more than 15 days in 2006.

32

Table 3: Summary statistics by migration status, study sample

All Adults age 17-35 Migrants Non-Migrants (1) (2) (3) mean std dev mean std dev mean std dev Panel A: Individual Attributes Age 26.44 (5.42) 24.85 (5.02) 27.07 (5.44) Years of schooling 7.57 (2.26) 8.16 (1.82) 7.33 (2.37) Female 0.46 (0.50) 0.41 (0.49) 0.48 (0.50) Being household head 0.35 (0.48) 0.17 (0.38) 0.38 (0.49) My first single birth is girl1 0.49 (0.50) 0.49 (0.50) 0.49 (0.50) My first birth is multiples2 0.01 (0.11) 0.01 (0.12) 0.01 (0.11) Minimum age of my own children under age 163 6.21 (4.40) 6.82 (4.95) 6.03 (4.21) Maximal age of my own children under age 163 8.69 (4.56) 8.74 (4.94) 8.68 (4.44) Panel B: Household attributes Household head age 44.07 (12.67) 49.21 (11.49) 42.05 (12.55) Household head being female 0.07 (0.25) 0.08 (0.27) 0.07 (0.25) Household head yr of schooling 6.46 (2.81) 6.24 (2.87) 6.54 (2.78) # of HH members age 0-6 0.55 (0.78) 0.40 (0.71) 0.60 (0.80) # of HH members age 7-16 0.73 (1.00) 0.51 (0.86) 0.81 (1.04) # of HH members age 17-23 0.84 (1.07) 1.16 (1.14) 0.71 (1.01) # of HH members age 24-44 1.72 (0.87) 1.69 (1.00) 1.74 (0.81) # of HH members age 45-59 0.66 (0.86) 0.94 (0.89) 0.55 (0.82) # of HH members age 60+ 0.27 (0.59) 0.34 (0.64) 0.25 (0.57) # of girls in the HH age 0-12 0.59 (0.79) 0.42 (0.71) 0.65 (0.81) # of boys in the HH age 0-12 0.69 (0.80) 0.49 (0.72) 0.76 (0.82) Has any boy in the HH? (age 0-12) 0.50 (0.50) 0.37 (0.50) 0.55 (0.49) Estimated house value (10,000 yuan) 2.74 (3.30) 2.55 (2.74) 2.81 (3.49) Have any outstanding loans 0.12 (0.32) 0.14 (0.34) 0.11 (0.31) Contract land (mu)5 3.22 (2.46) 3.37 (2.36) 3.16 (2.50) Land in use (mu)5 3.81 (2.95) 3.90 (2.94) 3.78 (2.96)

33

Prevalence of HH head's surname in the village 0.10 (0.12) 0.10 (0.11) 0.10 (0.12) HH head's surname is the largest surname in the village 0.21 (0.41) 0.21 (0.40) 0.21 (0.41) HH head's surname is the second largest in the village 0.13 (0.33) 0.12 (0.33) 0.13 (0.33) HH head's surname is the third largest in the village 0.09 (0.29) 0.09 (0.29) 0.09 (0.29) HH head's surname is below the third largest in the village 0.57 (0.49) 0.58 (0.49) 0.57 (0.50) Panel C: Village attributes Distance to the nearest bus station (kilometer) 4.17 (6.40) 3.64 (5.51) 4.38 (6.70) The village is a minority gathering 0.20 (0.40) 0.18 (0.38) 0.21 (0.41) # of adults 17-60 in the village 1175.86 (568.07) 1162.83 (582.76) 1180.96 (562.12) Average land per adult (mu)4 1.50 (0.69) 1.49 (0.62) 1.51 (0.71) Regular water use is guaranteed 0.44 (0.50) 0.49 (0.50) 0.41 (0.49) Have organized production and sale of agriculture products 0.10 (0.30) 0.12 (0.32) 0.10 (0.29) % of natural gathering groups that have access to electricity 0.98 (0.10) 0.98 (0.09) 0.98 (0.11) % of natural gathering groups that have access to telephone 0.85 (0.28) 0.87 (0.26) 0.84 (0.29) % of natural gathering groups that have access to TV signal 0.93 (0.22) 0.95 (0.20) 0.93 (0.23) % of natural gathering groups that have access to drivable road 0.72 (0.30) 0.73 (0.29) 0.72 (0.30) Male village head 0.97 (0.17) 0.97 (0.17) 0.96 (0.19) Male village party secretary 0.95 (0.22) 0.95 (0.22) 0.94 (0.24) The education years of the village head 9.61 (2.73) 9.57 (2.75) 9.78 (2.60) The education years of the village party secretary 8.90 (2.69) 8.88 (2.69) 9.01 (2.69) Village head with military experience 0.17 (0.38) 0.17 (0.38) 0.17 (0.38) Village party secretary with military experience 0.11 (0.31) 0.11 (0.31) 0.10 (0.31) Panel D: attributes of adults in the same village % of adults in the village that are migrants (exclude all adults in self HH) 0.27 (0.22) 0.45 (0.17) 0.21 (0.20) % of first-born children being girl (exclude self, single birth only) 0.49 (0.06) 0.48 (0.06) 0.49 (0.07) % of first-born children being multiples (exclude self) 0.02 (0.01) 0.02 (0.01) 0.02 (0.01) Observations 859644 242040 617604 Notes: An individual is defined as migrant if s/he has been away from the village for the reason of work for more than 15 days in 2006. For data reasons as described in Section 2, (1) "whether own first child is girl" has 1,834,855 missing observations (55.1%), (2) "own first birth is multiples" has 1,823,197 missing observations (54.78%), (3) minimum and maximum child age has 1,257,900 missing observations each (37.8%). (4) One Chinese mu is equal to 666.7 square meters or 0.1647 acres.

34

Table 4: Within-village cluster of migrants, by destination, occupation and surname (conditional on migrants only) Average per Average per Average per Whole village township county area % of migrants in the most common occupation 69.56% 61.61% 56.85% 55.67% (0.50) (0.49) (0.50) % of migrants in the most common destination 52.18% 47.65% 42.70% 38.92% (0.50) (0.50) (0.49) % in the most common occupation, conditional on migrants in the most common 74.99% 64.22% 58.23% 59.83% destination (0.20) (0.17) (0.14) % in the most common occupation of each destination 74.51% 63.79% 57.80% 55.67% (0.44) (0.48) (0.49) # of observations 1624 250 8 1 Notes: Standard error in parentheses. The percentages are computed as follows: suppose 11 migrants of a village went to two destinations (A and B) for two occupations (X1 and X2). If 5 went to destination A with 3 in X1 and 2 in X2, and the other 6 went to B with 1 in X1 and 5 in X2, the % in the most common occupation is 7/11, the % in the most common destination is 6/11, the % in most common occupation conditional on the most common destination is 5/6, and the % in the most common occupation of each destination is (3+5)/(5+6).

35

Table 5: Summary of adults by the number and gender of child birth, study sample

# of Migrate or Have second Have boy in Have girl in Number of girls Number of boys observations not birth or not second or later second or later under age 12 under age 12 birth birth (1) (2) (3) (4) (5) (6) (7) (8) Panel A: All adults aged 17-35 All adults 859644 0.28 0.49 0.58 (0.45) (0.76) (0.75) Adults with kids 658200 0.29 0.59 0.69 (0.45) (0.79) (0.81) Adults with clear 401636 0.21 0.61 0.54 0.41 0.91 1.05 firstborn definition (0.40) (0.49) (0.67) (0.64) (0.79) (0.76)

Adults with multiple 3656 0.29 0.45 0.41 0.30 1.22 1.52 birth firstborn (0.46) (0.50) (0.69) (0.60) (0.99) (0.90) Adults with single 397980 0.21 0.61 0.54 0.41 0.91 1.05 birth firstborn (0.40) (0.49) (0.67) (0.64) (0.79) (0.76) Panel B: Conditional on the first-born child being single birth Firstborn is boy 201930 50.28% 0.21 0.59 0.42 0.44 0.45 1.43 (0.40) (0.49) (0.61) (0.64) (0.64) (0.61) Firstborn is girl 196050 48.81% 0.20 0.63 0.66 0.37 1.37 0.66 (0.40) (0.48) (0.70) (0.63) (0.63) (0.70) Panel C: Conditional on the first-born children being multiple birth Firstborns are all girls 928 0.23% 0.29 0.57 0.71 0.29 2.29 0.72 (0.45) (0.49) (0.85) (0.59) (0.59) (0.86) Firstborns are all 1272 0.32% 0.26 0.36 0.23 0.27 0.28 2.25 boys (0.44) (0.48) (0.54) (0.56) (0.57) (0.55) Firstborns have 1456 0.36% 0.33 0.46 0.37 0.34 1.37 1.39 mixed gender (0.47) (0.50) (0.63) (0.64) (0.67) (0.64) Notes: Unit of analysis is adult labor aged 17-35 as defined in the study sample. Columns (5) and (6) are conditional on the families that have second birth. 36

Table 6: Baseline results on the study sample (OLS and IV, Specification 1)

Panel A: First stage

Dependent Variables: peer migration (Y-i ) (1) OLS (2) 2SLS (3) 2SLS (4) 2SLS % of same-village with multiples 0.367*** 0.365*** first birth (0.00829) (0.00829) % of same-village with a girl -0.0243*** -0.0228*** firstborn (0.00164) (0.00164) # of observations 859644 859644 859644 R square 0.654 0.653 0.654 Panel B: Second Stage

Dependent Variable: self migration (Yi ) % of same-village adults 1.384*** 1.296*** 0.867** 0.918*** migrating (0.0156) (0.4980) (0.3510) (0.3070) Conditional LR test for weak IV [0.208 1.547] [ 0.321, 2.271] [0.317 1.518] F(1, 859644) = F(1, 859644) = F(2, 859644) = F-test for IV 1961.41 218.75 1077.87

Prob > F Prob > F = Prob > F =0.0000 =0.0000 0.0000 Female -0.0176*** -0.0178*** -0.0185*** -0.0184*** (0.00170) (0.00191) (0.00185) (0.00182) Having a girl firstborn 0.0167*** 0.0167*** 0.0164*** 0.0164*** (0.00198) (0.00201) (0.00202) (0.00201) Having a girl firstborn*female -0.0286*** -0.0286*** -0.0287*** -0.0287*** (0.0167) (0.0167) (0.0164) (0.0164) First birth is multiples -0.0222*** -0.0221*** -0.0217*** -0.0217*** (0.00672) (0.00673) (0.00676) (0.00676) First birth is multiples *female -0.0206*** -0.0206*** -0.0203*** -0.0204*** (0.00653) (0.00653) (0.00656) (0.00655) Age 0.0361*** 0.0360*** 0.0357*** 0.0357*** (0.00151) (0.00158) (0.00155) (0.00154) Age square -0.000777*** -0.000775*** -0.000770*** -0.000770*** (0.0000282) (0.0000291) (0.0000287) (0.0000286) Years of schooling 0.0126*** 0.0126*** 0.0126*** 0.0126*** (0.000480) (0.000487) (0.000483) (0.000482) Other controls reported in Appendix Observations 859644 859644 859644 859644 R-squared 0.294 0.294 0.285 0.287 Notes: Robust standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1. All regressions control for township fixed effects. Errors are clustered by village.

37

Table 7 Robust checks on Specification 1

Dependent variables: migration Sample Main Sample Alternative Samples Restricting peers to same- village Include Include migration Redefine households migration of of other adults Sample migration as 6+ with different other adults in in the same based on adults Family with Exclude minority months away surnames same township household aged 17-60 only two adults gathering villages Specification 2SLS 2SLS 2SLS 2SLS 2SLS 2SLS 2SLS 1 (2) (3) (4) 5 6 7 % of same-village adults that 0.576*** 0.887*** 0.965*** 0.573*** 0.936*** 0.708** 1.110*** migrate (exclude own household) (0.249) (0.309) (0.263) (0.187) (0.0446) (0.255) (0.197) % of same-township adults that 0.897 migrate (exclude own village) (1.357) % of same-household adults that 0.685*** migrate (exclude self) (0.0175) Control Other Variables Yes Yes Yes Yes Yes Yes Yes Numbers of Observations 859644 859644 859644 859644 2496455 480768 685086 R-squared 0.267 0.283 0.285 0.451 0.278 0.214 0.289 Notes: Significance at 10% (*), 5% (**), 1% (***). Robust standard errors in parentheses. All regressions control for township fixed effects and all the other variables used in Table 6. Errors are clustered by village.

38

Table 8a 2SLS regression by age cohort

Dependent Variable: Migration sample 17-26 27-35 (1) (2) % of same-village adults aged 17-26 that migrate 0.653** 0.0232 (0.0943) (0.0272) % of same-village adults aged 27-35 that migrate 0.157 0.723 (0.581) (0.813) Observations 426732 430923 R-squared 0.284 0.235 Notes: Significance at 10% (*), 5% (**), 1% (***). Robust standard errors in parentheses. All regressions include township fixed effects and all the control variables used in Table 6. Errors are clustered by village. Shaded cells refer to influence from peers of the same age group.

Table 8b 2SLS regression by surname groups

Dependent Variable: migration (1) (2) (4) Adults with second Adults with Sample Adults with most most dominant other dominant surname surname surnames % of migrants among adults that have the most dominant surname in the village 1.502** 1.103 0.298 (0.632) (1.273) (0.698) % of migrants among adults that have the second most dominant surname in the village -0.201 2.607 -0.398 (0.621) (3.443) (0.616) % of migrants among adults that have other surnames in the village -0.303 -2.039 1.381*** (0.835) (2.706) (0.488) Observations 180074 107762 575313 R-squared 0.296 0.225 0.283 Notes: Significance at 10% (*), 5% (**), 1% (***). Robust standard errors in parentheses. All regressions include township fixed effects and all the control variables used in Table 6. Errors are clustered by village. Shaded cells refer to influence from peers of the same surname group.

39

Table 8c 2SLS regression by destination, study sample

Dependent Variable: Migration to Within province A B C D (1) (2) (3) (4) (5) % of same village adults migrating to same province 0.0458** 0.0230 0.00764 -0.0262 -0.00574 (0.0232) (0.0293) (0.00897) (0.0273) (0.0160) % of same village adults migrating to A -0.0114 -0.0142 0.00848 -0.0143 -0.0293 (0.0309) (0.0313) (0.0113) (0.0284) (0.0182) 0.0201** % of same village adults migrating to B 0.00634 0.00214 * -0.00506 0.00527 (0.0145) (0.0144) (0.00488) (0.0135) (0.00852) - 0.0945** 0.0468** % of same village adults migrating to C -0.0271 0.000385 -0.0101 * * (0.0297) (0.0322) (0.0108) (0.0248) (0.0163) 0.0489** 0.0977** - % of same village adults migrating to D * * 0.000286 0.0465** 0.0177** (0.0169) (0.0222) (0.00583) (0.0203) (0.00877) Observations 859644 859644 859644 859644 859644 R-squared 0.076 0.083 0.030 0.078 0.062 Notes: Significance at 10% (*), 5% (**), 1% (***). Robust standard errors in parentheses. All regressions include township fixed effects, all the control variables used in Table 6, and percent of same village adults migrating to other destinations. Errors are clustered by village. Shaded cells refer to influence from peers migrating to the same destination.

40

Table 8d 2SLS regression by occupation, study sample Dependent Variable: Migration for (1) (2) (3) (4) Manufacturin Constructio Other g Service n jobs % of same-village adults migrating for manufacturing jobs 0.0570*** -0.0150** -0.00541 -0.0322** (0.0158) (0.00759) (0.00612) (0.0153) 0.0318** % of same-village adults migrating for service jobs -0.00555 * 0.000226 0.00770 (0.0141) (0.00547) (0.00510) (0.0107) % of same-village adults migrating for construction jobs 0.0194 0.000356 0.0268*** 0.00217 (0.0173) (0.00748) (0.00520) (0.0152) 0.0688** % of same-village adults migrating for other jobs -0.0322 -0.00173 -0.000515 * (0.0215) (0.00951) (0.00808) (0.0172) Observations 859644 859644 859644 859644 R-squared 0.070 0.015 0.019 0.043 Notes: Significance at 10% (*), 5% (**), 1% (***). Robust standard errors in parentheses. All regressions include township fixed effects, and all the control variables used in Table 6. Errors are clustered by village. Shaded cells refer to influence from peers migrating for the same occupation.

Table 9 2SLS regression on land transfer and agricultural productivity.

Sample: non-migrant families

(1) (2) (3) (4) VA R IA B LES HH rented Land in use HH agricultural HH agricultural land income/land in use income/adult # % of same-village adults aged 17-35 that migrate 2.142* 1.014*** 2,011.027 -340.821 (0.0341) (0.153) (3,179.358) (243.0) Other control Y Y Y Y Observations 316156 316156 316156 316156 R-squared 0.685 0.742 0.031 0.101 Note: Robust standard errors in parentheses, *** p<0.01, ** p<0.05, * p<0.1. All regressions control for township fixed effect and all the household and villages attributes in the main regressions.

41

Appendix

Table A1: Full table for the baseline results reported in Table 6 (OLS and IV)

Panel A: First stage

Dependent Variables: peer migration (Y-i ) (1) OLS (2) 2SLS (3) 2SLS (4) 2SLS % of same-village with multiples 0.367*** 0.365*** first birth (0.00829) (0.00829) % of same-village with a girl -0.0243*** -0.0228*** firstborn (0.00164) (0.00164) Number of observations 859644 859644 859644 R square 0.654 0.653 0.654 Panel B: Second Stage

Dependent Variable: self migration (Yi) % of same-village adults 1.384*** 1.296*** 0.867** 0.918*** migrating (0.0156) (0.4980) (0.3510) (0.3070) Conditional LR test for weak IV [0.208 1.547] [ 0.321, 2.271] [0.317 1.518] F(1, 859644) = F(1, 859644) = F(2, 859644) = F-test for IV 1961.41 218.75 1077.87 Prob > F Prob > F Prob > F =

=0.0000 =0.0000 0.0000 Female -0.0176*** -0.0178*** -0.0185*** -0.0184*** (0.00170) (0.00191) (0.00185) (0.00182) Having a girl firstborn 0.0167*** 0.0167*** 0.0164*** 0.0164*** (0.00198) (0.00201) (0.00202) (0.00201) Having a girl firstborn*female -0.0286*** -0.0286*** -0.0287*** -0.0287*** () () () () First birth is multiples -0.0222*** -0.0221*** -0.0217*** -0.0217*** (0.00672) (0.00673) (0.00676) (0.00676) First birth is multiples *female -0.0206*** -0.0206*** -0.0203*** -0.0204*** (0.00653) (0.00653) (0.00656) (0.00655) Age 0.0361*** 0.0360*** 0.0357*** 0.0357*** (0.00151) (0.00158) (0.00155) (0.00154) Age square -0.000777*** -0.000775*** -0.000770*** -0.000770*** (0.0000282) (0.0000291) (0.0000287) (0.0000286) Years of schooling 0.0126*** 0.0126*** 0.0126*** 0.0126*** (0.000480) (0.000487) (0.000483) (0.000482) # of HH members aged 0-6 0.0170*** 0.0173*** 0.0189*** 0.0187*** (0.00204) (0.00272) (0.00244) (0.00233) # of HH members aged 7-16 0.00847*** 0.00875*** 0.0101*** 0.00997*** (0.00137) (0.00207) (0.00178) (0.00169) # of HH members aged 17-23 0.0347*** 0.0350*** 0.0362*** 0.0361*** (0.00123) (0.00189) (0.00161) (0.00153)

42

# of HH members aged 24-44 0.0380*** 0.0383*** 0.0396*** 0.0395*** (0.00131) (0.00198) (0.00169) (0.00160) # of HH members aged 45-59 0.0386*** 0.0388*** 0.0401*** 0.0399*** (0.00138) (0.00197) (0.00175) (0.00166) # of HH members aged 60+ 0.0170*** 0.0173*** 0.0189*** 0.0187*** (0.00141) (0.00228) (0.00195) (0.00183) Having at least one boy in HH -0.0148*** -0.0149*** -0.0150*** -0.0150*** (0.00209) (0.00210) (0.00212) (0.00212) Minimum age of own child 0.00756*** 0.00756*** 0.00758*** 0.00757*** (0.00034) (0.00034) (0.00034) (0.00034) Second birth is multiples -0.0246*** -0.0248*** -0.0255*** -0.0254*** (0.00777) (0.00782) (0.00794) (0.00791) Is household head 0.0103*** 0.0102*** 0.00960*** 0.00967*** (0.00225) (0.00237) (0.00232) (0.00231) Age of household head 0.00356*** 0.00363*** 0.00362*** 0.00362*** (0.00012) (0.00013) (0.00013) (0.00013) Household head is female 0.00357*** 0.00358*** 0.00363*** 0.00362*** (0.00012) (0.00013) (0.00013) (0.00013) Years of schooling for household 0.00288 0.00274 0.00203 0.00211 head (0.00264) (0.00278) (0.00276) (0.00274) Estimated house value -0.00195*** -0.00195*** -0.00197*** -0.00197*** (0.00036) (0.00036) (0.00037) (0.00037) Contract land (mu) -0.00518*** -0.00528*** -0.00577*** -0.00572*** (0.00029) (0.00063) (0.00051) (0.00047) Prevalence of HH head's 0.00137 0.00273 0.00939 0.00861 surname in village (0.0075) (0.0109) (0.0110) (0.0105) Distance to nearest bus station 0.000339** 0.000432 0.000886* 0.000832** (0.00015) (0.00054) (0.00047) (0.00041) Village is a minority gathering 0.000736 0.00112 0.00299 0.00278 (0.00189) (0.00308) (0.00432) (0.00401) # of adults in the village 2.96e-06*** 3.18e-06* 4.23e-06** 4.11e-06** (0.000001) (0.000002) (0.000002) (0.000002) Arable land in the village (mu) 8.51e-06*** 7.36e-06*** 1.36e-05* 1.02e-05*** (0.000001) (0.000003) (0.000007) (0.000002) Regular water use is guaranteed -0.00137 -0.00118 -0.000243 -0.000359 (0.00176) (0.00203) (0.00351) (0.00324) Village has organized production 0.0059 0.00501 0.000652 0.00116 and sale of agricultural prods (0.00362) (0.00674) (0.00836) (0.00778) % of natural gathering groups in -0.0117 -0.0131 -0.0203 -0.0194 village w/ access to electricity (0.0073) (0.0114) (0.0170) (0.0156) % of natural gathering groups in 0.00186 0.00444 0.017 0.0155 village w/ access to telephone (0.0032) (0.0148) (0.0120) (0.0106) % of natural gathering groups in 0.000231 -0.00115 -0.0079 -0.00712 village w/ access to TV signals (0.00453) (0.00927) (0.01100) (0.01010) % of natural gathering groups in 7.84E-05 -0.000787 -0.005 -0.00451 43

village w/ access to drivable roads (0.00282) (0.00573) (0.00645) (0.00595) Male village head -9.42E-05 -0.00123 -0.0068 -0.00615 (0.00348) (0.00742) (0.00824) (0.00753) Male village party secretary 0.00459 0.00523 0.00839 0.00801 (0.00381) (0.00549) (0.00755) (0.00692) The education years of the -0.000686** -0.000853 -0.00167* -0.00157* village head (0.000289) (0.000996) (0.000889) (0.000803) The education years of the -0.000254 -0.000356 -0.00085 -0.000791 village party secretary (0.00030) (0.00067) (0.00071) (0.00065) Village head with military -0.0025 -0.0025 -0.0025 -0.00249 experience (0.00212) (0.00214) (0.00347) (0.00325) Village party secretary with 0.00139 0.000356 -0.00471 -0.00412 military experience (0.00226) (0.00640) (0.00602) (0.00542) Observations 859644 859644 859644 859644 R-squared 0.294 0.294 0.285 0.287 Robust standard errors in parentheses. *** p<0.01, ** p<0.05, * p<0.1. All regressions control for township fixed effects. Errors are clustered by village.

44