Internal Migration and Firm Growth: Evidence from .∗

Clement Imbert Marlon Seror Yifan Zhang Yanos Zylberberg

Preliminary and incomplete – do not circulate

Abstract

This paper provides some of the first empirical evidence on the role of internal migration in manufacturing growth, using Chinese data. We first identify shocks to rural livelihoods caused by variation in international agri- cultural prices and local climatic conditions. We then combine these shocks with a gravity model to predict yearly migrant inflow into each urban center. Finally, we use household survey data and a census of large firms to estimate the causal impact of migrant inflows on the urban economy. Preliminary re- sults suggest that by increasing labor supply, migration lowers labor costs and increases the profitability of manufacturing firms.

JEL codes: D24; J23; J61; O15.

1 Introduction

As countries develop, labour shifts from the traditional—agriculture—to the modern sector—manufacturing,—which implies migration from rural to urban areas (Lewis, 1954; Kuznets, 1964; Harris and Todaro, 1970).1 Despite the fact that the movement

∗Imbert: Warwick University, [email protected]; Seror: PSE, [email protected]; Zhang: CUHK, [email protected]; Zylberberg: Bristol University, [email protected]. We are grateful to Gharad Bryan, Jon Temple, Christine Valente, Thomas Vendryes, Chris Woodruff for useful discussions and comments. We also thank participants in Bristol, CUHK and Warwick for helpful comments. The usual disclaimer applies. 1In a recent study of the determinants of structural change in today’s developed economies, Alvarez-Cuadrado and Poschke (2011) propose an empirical exercise to distinguish “push” from “pull” models. In “labour push” models, rising productivity in agriculture releases labour, which in

1 of labour is central in structural transformation, there is little empirical evidence on its short- and medium-run impact on the urban manufacturing sector. The objective of this paper is to estimate the causal impact of migration inflows on urban labor markets and manufacturing firms in China. We use variation in rainfall and world prices for agricultural commodities, combined with information on cropping patterns and potential yields to construct exogenous shocks to agricultural labour returns in each rural prefecture.2 We then combine these shocks with a gravity model, which includes distance between rural origin and urban destination and population at destination to predict migration inflows into each urban area. Finally, we use these origin-based fluctuations to instrument immigrant inflows and estimate their effect on the urban economy through the observation of workers and firms. China arguably offers the best context to study the role of migration in economic development. The Chinese economy has experienced a remarkably rapid structural transformation, with a sharp fall in the share of agriculture and a symmetric rise in manufacturing and services for the last three decades. China’s agricultural em- ployment share was about 70% in 1980 and is predicted to taper off at 24% in 2020 (ADB, 2014).3 At the same time, China has seen massive migration flows from rural to urban areas. The stock of rural-to-urban migrants, i.e. the urban population with a rural household registration or residence permit (), rose from 46.5 million in 1982 to 205.6 million or 30.9% of the total urban population in 2010 (Chan, 2012).4 This rapid evolution allows us to study migration and manufacturing growth with coherent data sources spanning a significant part of the structural transformation period. Our empirical strategy proceeds in three steps. In a first step, we construct shocks to agricultural incomes. For this we collect geocoded grids (1km×1km) provided by the Food and Agricultural Organisation (FAO). We multiply the 1990 harvested area with a model-based measure of potential yield combining crop requirements and soil characteristics to create a measure of expected output for each crop and turn triggers industrialization (Gollin et al., 2002). In “labour pull” models, technological change increases productivity in manufacturing, which attracts workers out of agriculture (Herrendorf et al., 2013). In both narratives, migration—and thus structural change—is prompted by a produc- tivity gap between sectors. 2Prefectures are the second administrative division in China below the province (there were about 345 prefectures in 2005). 3In comparison, Alvarez-Cuadrado and Poschke (2011) find that it took 108 years on average for agricultural employment share to decline from 60% to 20% of the labour force in 12 of today’s developed economies. 4This figure suggest that internal alone is of the same order of magnitude as international migration worldwide. In 2010 the stock of international migrants was an estimated 222 million (United Nations, 2015).

2 prefecture. We then combine the expected output with two crop-specific shocks. First, we isolate short-term fluctuations in international crop prices and transform these price variations into variations in expected agricultural income (for a fixed agricultural portfolio). Second, we interact the crop water requirement during the growing season with monthly precipitation to create a yearly distance to ideal water requirement in each prefecture. These origin shocks exhibit a large time-varying volatility coming from the World demand and supply or rainfall cycles but also large cross-sectional differences due to the wide variety of harvested crops across China. In a second step, we combine rural income shocks with a gravity model which uses distance between rural origins and urban destinations and population at destination to predict migration inflows into urban areas. Fluctuations in agricultural income due to international prices and rainfall generate significant variations in outflows from rural areas. An origin-specific agricultural portfolio 10% above its long-term value (about 1 standard deviation) is associated with a 0.25 p.p. lower outmigration incidence. Similarly, a 1 standard deviation increase in our measure of distance to ideal water requirement is associated with a 0.18 p.p. lower outmigration incidence. Both effects are very robust and generate economically significant variations in mi- gration outflows (the average outmigration incidence is around 1.4 p.p.). We next use a gravity model based on geographic distance and historical data on destina- tion populations to transform these rural outflows into immigration inflows to urban destinations. Our approach is similar in that respect to Boustan et al. (2010). In a third step, we identify the causal impact of migrant inflows on the urban economy. We first use an annual survey of urban households (Urban Household Survey) and estimate the effect of migration on wages and employment for urban “natives”. We find that migration inflows exert a downward pressure on urban wages and crowd urban residents out of wage employment. The implied wage elasticity with respect to migration is 0.15 to 0.28. As expected, the effects are stronger for less educated workers, who are close substitutes for migrant labour. We next use a yearly census of large firms from the National Bureau of Statistics (NBS) and estimate the effect of migration on labour costs and profitability. We show that migration inflows markedly reduce the wage rate (wage bill divided by employment) and increase profitability (value added minus labour costs divided by revenues) for urban firms. The effects are stronger for firms employing mostly unskilled labour. Our (preliminary) findings contribute to different strands of the literature. First, this paper contributes to the literature on structural transformation by estimating the direct impact of rural-to-urban migration on the modern sector, using worker and

3 firm data. Our findings that migration decreases wages and increases profitability in urban areas relate to “labour push models,” which generally imply that, by releasing labour, labour-saving rising agricultural productivity may trigger industrialization (Gollin et al., 2002; Alvarez-Cuadrado and Poschke, 2011; Bustos et al., 2015). Our results may also complement Marden (2015), who finds that for an earlier period of Chinese development (the 1990s), the increase in farm profits due to agricultural reforms provided credit to finance non-agricultural sector growth. Second, this paper relates to the nascent literature which uses firm-level data to study how migrants’ labour supply is absorbed by the economy (Peri, 2012; Kerr et al., 2015; Dustmann and Glitz, 2015).5 The context of our study is however very different. Urban China has experienced massive flows of internal migrants and its economy has been expanding at a very high rate, with a constant reallocation of resources toward small, young and productive firms (Song et al., 2011). Third, this paper relates to the literature on the effects of immigration on labour markets (Borjas, 2003), and more specifically to studies that focus on internal mi- gration. Boustan et al. (2010) study the labour market effects of changes in internal migration in the US during the Great Depression. El Badaoui et al. (2014), Imbert and Papp (2014), Kleemans and Magruder (2014) and Feng et al. (2015a) among others study the labour market effects of migration in Thailand, India, Indonesia and the United States, respectively. Fourth, this paper contributes to the literature on the role of migration in shap- ing economic development in China. Ge and Yang (2014) use wage decomposition methods and a simple calibration to show that migration depressed unskilled wages in urban areas by at least 20% throughout the 1990s and 2000s. Based on aggregated data at the provincial level, De Sousa and Poncet (2011) find that migration helped alleviate upward pressures on Chinese wages in 1995-2007. In contrast, Meng and Zhang (2010) provide evidence of a modestly positive or zero effect of rural migrants on native urban workers’ labour market outcomes, and Combes et al. (2015) put forward a strong positive externality on local wages. Mayneris et al. (2014) consider another type of shock to the labour market, an increase in legislated minimum wages. As we do with migration, they assess the impact of the shock on firm outcomes in China. To the best of our knowledge, our paper is the first microeconomic paper to investigate and provide evidence of the effect of migration on firm outcomes, and thus contributes to linking to empirics “push” models of rural outmigration fuelling modern sector growth. It complements Facchini et al. (2015), who show that trade

5Giesing and Laurentsyeva (2015), provide evidence on the effect of emigration on firm outcomes in Eastern European countries.

4 shocks increase demand for labor in manufacturing and stimulate internal migration (which is consistent to a “pull” model).6 Much attention has been given to the mechanisms and patterns of the Chinese growth, and a large body of literature is devoted to migration in China. However, while the role of rural-to-urban migration in fuelling economic growth finds a large echo in the policy debate in China,7 the economic literature has given it much less attention. For example, Song et al. (2011) focus on three main features of the Chinese economic take-off—high output growth, with high and sustained returns to capital, reallocation within the manufacturing sector from large state-owned en- terprises (SOEs) to smaller private firms, and large savings invested abroad. Their explanation relies on credit market imperfections, which force small productive firms to save before growing at the expense of larger, less productive firms. Interestingly, migration from rural areas may also help explain these stylized facts. Indeed, the constant increase in labour supply of migrants, by moderating urban wage growth, may have allowed firms to sustain high profits, accumulate internal savings and finance profitable investments despite credit constraints. The remainder of the paper is organized as follows. In Section 2, we describe our three main data sources allowing us to create migration flows, labour market outcomes and firm-specific outcomes. We also detail how we isolate exogenous vari- ations at origin that impact migration flows. In Section 3, we describe our empirical strategy, in particular how we generate synthetic migration flows thanks to our agricultural productivity shocks and estimates of migration flows on urban labour markets and firm outcomes at destination. We present our main results in section 4. Section 5 concludes.

2 Data

This section presents the data we use and how we construct the main variable of our analysis.8. We first present our two main sources of exogenous variation in agricultural returns to labour, i.e. the price and yield shocks. We next present our measures of migration flows and urban outcomes.

6Macroeconomic discussions of the link between migration and productivity can be found in Au and Henderson (2006), Au and Henderson (2007) and Tombe and Zhu (2015). These papers all focus on mobility restrictions, as do Bosker et al. (2012), an economic geography analysis, and Vendryes (2011), a theoretical paper. 7See Meng and Zhang (2010) for a survey. 8As we rely on many data sources, we describe them briefly below and provide a more detailed discussion in the appendix.

5 2.1 Rural income shocks

In order to construct shocks to productivity of labor in agriculture, we combines three types of information: potential agricultural output, international prices and rainfall.

Potential Agricultural Output We construct the potential output for each crop in each prefecture, by combining a measure of harvested area, and a measure of yield both provided by the Food and Agriculture Organization (FAO). First, we extract from the 1990 World Census of Agriculture the geo-coded map of harvested area for each crop (in a 30 arc-second resolution, approximately 1km). We then overlay this map with a map of prefectures, and we construct total harvested 9 area hc,o for a given crop c and a given prefecture o. Second, we use a measure of potential yield per hectare as computed in the Global Agro-Ecological Zones (GAEZ) Agricultural Suitability and Potential Yields dataset. The measure is model-based and uses information on crop requirements (e.g. the length of yield formation period and the stage-specific crop water requirements), soil characteristics (i.e. the ability of the soil to retain and supply nutrients) in order to generate a potential yield for a given crop, and a given soil under 5 scenarios: rain-fed (high/intermediate/low water input), and irrigated crop (high/intermediate water input). For each crop c and prefecture o, we use information on whether it 10 was rain-fed or irrigated in 1990 to construct potential yield yic,o.

The interaction between harvest area and potential yield hc,oyic,o is our mea- sure of potential agricultural output for each crop in each prefecture in 1990. Fig- ure 3 displays potential output hc,oyic,o for rice and cotton, and illustrates the large geographic variation in agricultural portfolios. By construction, hc,oyic,o is time- invariant. We next combine potential output at the prefecture level with two time- varying shocks, international prices and rainfall shocks.

International price shock As a measure of exogenous changes in international demand for crops, we use the World Bank Commodities Price Data (“The Pink Sheet”).11. We consider prices in constant 2010 USD and per kg between 1980 and 2009 for the following commodities: banana, cassava, coffee, cotton, an index

9We collapse our analysis at the prefecture level to match migration data but agricultural shocks can be constructed at a 30 arc-second resolution over the whole country. 10The measure is given as a 30 arc-second resolution geo-coded map which we overlay with prefecture maps to generate the prefecture average. 11The data is freely available online at http://data.worldbank.org/data-catalog/commodity- price-data

6 of foddercrops, groundnut, maize, millet, potato, pulses, rapeseed, rice, sorghum, soybean, sugar beet, sugar cane, sunflower, tea and wheat.12 These crops account for the lion’s share of China’s agricultural production over the period of interest (they represented 90% of total agricultural output in 1998 and 79% in 2007).13 We also collected producer prices, exports and production as reported by the FAO between 1991 and 2013 for China (and other countries) to check that international price variations translate into producer price variations. In order to identify shocks in international prices, we use next a deviation from long-term trend hpc,t by applying a Hodrick-Prescott (HP henceforth) filter on the logarithm of nominal prices. The Appendix Figure A3 presents the series for three crops, i.e. rice, bananas and groundnuts, and illustrates the magnitude of fluctua- tions: The market value of rice production decreases by 40% between 1998 and 2001 and increases by 70% between 2007 and 2008. As shown in Figure A3, fluctuations in prices are not pure transitory shocks but rather behave as an AR(1) process with rare and large jumps. Hence our price shocks capture the equivalent of business cycle fluctuations in international crop prices. Finally, we transform fluctuations in World prices into an estimate of the value of crop production for each year in each prefecture. In order to do this, we construct for each prefecture o the value gap for the agricultural portfolio. We consider the crop- specific deviations from long-term trend, {hpc,t}c, and weight them by a constant weight equal to the expected share of agricultural revenue for crop c in prefecture o.

These shares are {hc,oyic,op¯c}c where hc,oyic,o is potential output in 1990 described above andp ¯c is a snapshot of international crop prices in 1980. ! ! X X po,t = hc,oyic,op¯chpc,t / hc,oyic,op¯c (1) c c

The price shocks po,t exhibit some time-varying volatility coming from World demand and supply, but there are also large cross-sectional differences. A prefecture is only exposed to the variations in the prices of crops that it produces. The wide variety of harvested crops across China guarantees a large cross-sectional variance in prices po,t that will be exploited in our main empirical strategies. Panel A of Figure A4 shows price shocks po,1999−2000 in 1999 and 2000, just before farmers experienced a crisis across China due to a strong decrease in the price of rice. These shocks are likely to have a strong effect on outmigration. Indeed, fluc-

12We exclude from our analysis one crop, i.e. tobacco, for which (i) China has a dominant position and directly influences the international prices and (ii) China National Tobacco Corporation, a state-owned enterprise, has a monopoly on cigarette production. 13http://data.stats.gov.cn/english/easyquery.htm?cn=C01

7 tuations in prices exhibit some persistence: prices follow a process that looks like an AR(1). Accordingly, a negative shock does not only affect returns to labour in the same year but also the following ones. This persistence helps us in triggering migration outflows but will also introduce some auto-correlation in the resulting immigration inflows to urban centers.

As the fluctuations in po,t entirely come from fluctuations in the World com- modity prices, we need to assume that these prices are driven by supply shocks in other exporting countries, demand fluctuations in importing countries or the World agricultural market integration, but that these demand and supply fluctuations are orthogonal to Chinese urban labour demand.14.

Rainfall shocks In our analysis, we use a second type of shocks to agricultural income based on rainfall deficit during the growing period of each crop. Our rainfall data is a monthly precipitation measure (0.5 degree latitude x 0.5 degree longitude precision) which covers the period 1901-2011 and mostly relies on the Global Historical Climatology Network.15 Once collapsed at the prefecture level,

This provides us with a measure rao,m,t of rainfall for prefecture o in month m and year t. We refine this rainfall measure to account for the growing cycle of each crop, i.e. (i) the harvest season and (ii) rainfall requirements. For a given year, there are several sources of variation across Chinese prefectures in actual yields due to rainfall. First, different locations receive different levels of rainfall. Second, exposure to rainfall depends on the growing cycle of the different harvested crops (winter, spring or summer/autumn crops). In addition, some crops are resistant to large water deficits while others immediately perish with low rainfall. The large cross-sectional variations in each year may come from (i) a direct effect of local rainfall, (ii) an indirect effect coming from the interaction with the crop-specific growing cycle.

We rely on the measure rao,m,t of rainfall for prefecture o in month m and year t and we construct for each crop a measure wrc of the minimum crop water require- 16 ment during the growing season Mc as predicted by the yield response to water.

14One potential issue is that agricultural prices could have a direct effect on firms which use agricultural products as inputs. We test the robustness of our results by excluding these firms from our analysis 15UDel AirT Precip data was provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at http://www.esrl.noaa.gov/psd/. 16http://www.fao.org/nr/water/cropinfo.html

8 We then generate

 P α ! ! X max{ wrc − rao,m,t, 0} X r = m∈Mc h yi p¯ / h yi p¯ . (2) o,t wr c,o c,o c c,o c,o c c c c

This measure has a very intuitive interpretation. The quantity max{P wr − m∈Mc c rao,m,t, 0} is the deficit between actual rainfall and the minimum crop water require- ment wrc during the growing season. We then penalize this deficit with a factor α capturing potential non-linearities in the impact of rainfall deficit. In our baseline specification, this penalization parameter α will be set equal to 3.17 A high ratio max{P wr −ra ,0} m∈Mc c o,m,t would be associated with a bad harvest for the specific crop. wrc We then weight these ratios by potential output for each crop in each prefecture.

Panel B of Figure A4 displays rainfall shocks rao,1999−2000 in 1999 and 2000. As expected, there is large year-to-year variation in rainfall availability. Also, for a given year, because of differences in cropping patterns across prefectures, the spatial auto-correlation of rainfall shocks is much lower than the correlation of rainfall itself. While the exogeneity of rainfall shocks is not questionable, in order to use it as instrument for rural to urban migration we need to assume that urban labor demand is not directly affected by rainfall.18 We view price and rainfall shocks as complement, since they have different strengths. On the one hand, price shocks reflect business fluctuations and will likely have a stronger effect on rural to urban migration than rainfall shocks, which are short-lived. On the other, rainfall shocks are idiosyncratic, which makes it more likely for us to identify their immediate effect on migration flows.

2.2 Migration and urban outcomes

We now describe our measures of migration flows and workers and firms outcomes in urban areas.

Migration flows In order to measure migration flows, we use a random 20% ex- tract of the 1% Population Survey 2005, also called “2005 mini-census”. These data are representative of the whole of China and contain data on occupation, industry, income, ethnicity, education level and housing characteristics. Most importantly for our purpose, the 2005 mini-census is the first to contain comprehensive data on migration status. This can be determined thanks to information on household registration type (agricultural or non-agricultural) and on the places of registration

17The results are robust to more conservative values for α, e.g. α = 1 or α = 2. 18In the analysis, we test the robustness of our results by controlling for local rainfall shocks.

9 and of residence, which are available down to the prefecture level.19 Migrants are further asked the main reason for leaving their place of registration and when they did so. Because of their quality and degree of detail, the census data collected by the National Bureau of Statistics are widely used in the literature (Combes et al., 2015; Facchini et al., 2015; Meng and Zhang, 2010; Tombe and Zhu, 2015, inter alia). Moreover, information on places of origin and residence can be combined with retrospective data on the year that the respondent first left her place of registration (censored above six years prior to the interview) in order to create a matrix of yearly net migration flows across all Chinese prefectures between 1999 and 2005, as well as to determine the migrant stock in 1999. There were about 345 prefectures (diji qu/shi) in China over this period, home to 3.7 million people on average. Prefec- tures are the third tier of government in China, below the central and provincial governments, and the lowest level of government with accessible data on bilateral migration flows. Unlike most studies relying on census data, migration flows are directly observed rather than computed as a difference of stocks. However, our measure of migration has two limitations that must be borne in mind in the subsequent analysis. First, since migration flows are reconstructed ex post, we expect some attrition, i.e. re- turnees are not counted as former migrants but as agricultural hukou holders living in their prefectures of registration. Second, the census does not record when the respondent arrived at her place of residence but only when she left her place of registration, hence we have to assume that the two happen in the same year. Some migrants may have however resided in other urban centers in between (step migra- tion). Return and step migration may dilute the effect of the shocks at origin on migration flows.20 Figure 4 illustrates the rise in migrant flows between 2000 and 2005 as a share of the total locally registered urban population, i.e. locally registered (at the prefecture level) non-agricultural hukou holders. We consider only inter-prefectural migration flows. The rising trend and the magnitude of migration flows is striking: In 2005, the inflow of migrants from other prefectures was in excess of 6%, as against less than 2% in 2000. Two interesting facts pertain to the composition of the incoming migrants. First, between 78% in 2000 and 83% in 2005 of the yearly migrant inflow consist of rural hukou holders, the remainder being accounted for by urban dwellers originating

19Unfortunately, information on the place of residence does not distinguish between rural or urban settings. 20The 2005 mini-census also contains information on the place of residence one and five years prior to the interview. In appendix B.1, we use these data to quantify return and step migration. The results suggest that return migration is substantial, but step migration negligible.

10 from other prefectures. Second, on average more than 78% of interprefectural rural- urban migrations recorded over the period 2000-2005 involved the crossing of a provincial border. We provide in the Appendix (B.1) a more detailed description of migration flows and migrants.

Wages and employment In our analysis, we first study labour market outcomes from the worker point of view, using household survey data. The household data used to assess the link between migration and destination-specific labour market outcomes come from the national Urban Household Survey (UHS) collected by the National Bureau of Statistics. The UHS is a nationally representative survey of Urban China that covers the period 2002-2008. It is based on a three-stage stratified random sampling, whose design is similar to that of the Current Population Survey in the United States (Ge and Yang, 2014; Feng et al., 2015b). Its sample includes 18 provinces and 207 pre- fectures.21 The data we use for our analysis are annual cross-sections, with a sample size that ranges from 68,376 to 94,428 individuals (in 2002 and 2008 respectively). Before 2002, the population covered by the UHS explicitly excluded the “floating population” of agricultural hukou holders living in urban areas. Since 2002, all households living in urban areas are eligible. However, sampling still ignores urban dwellers living in townships and in the suburban districts of Beijing, Chongqing, Shanghai, and Tianjin (Park, 2008). Rural-urban migrants, who are more likely to live in peripheral areas of cities, are therefore under-represented. Our analysis is thus restricted to the locally registered urban population.22 The UHS is a very rich dataset with detailed information on individual employ- ment, income —including monthly wages, bonuses, allowances, housing and medi- cal subsidies, overtime, and other income from the work unit—and household-level characteristics. It also includes detailed data on household expenditures collected using diaries—see Feng et al. (2015b) for more detail—. As our main income mea- sure, we use monthly wages divided by a prefecture- and year-specific consumer price index which we constructed ourselves using consumption data.23 We also construct three employment outcomes: wage employment, unemployment and self- employment (which also includes firm owners).24 Table A1 in the Appendix provides 21Although the 18 provinces capture much of China’s regional disparities, it must be noted that they may not constitute a faithful picture of China as a whole. The provinces are Beijing, Shanxi, Liaoning, Heilongjiang, Shanghai, Jiangsu, Zhejiang, Anhui, Jiangxi, Shandong, Henan, Hubei, Guangdong, Chongqing, Sichuan, Yunnan, Shaanxi and Gansu. 22The UHS data also tend to oversample employees from state and collective enterprises, where response rates are higher (Ge and Yang, 2014; Feng et al., 2015b). 23Statistical Yearbooks in China do not publish CPIs below the province level. 24Working hours in the month preceding the survey were also recorded in UHS 2002-2006. How-

11 average summary statistics of key variables over the period 2002-2008.

Firms Our second piece of information on the urban economy comes from firm- level data spanning 1998-2007 from the National Bureau of Statistics (NBS).25 The NBS implements every year a census of all state-owned enterprises and all non-state firms with sales exceeding 5 million RMB.26 It covers the industrial sector, which is defined as mining, manufacturing and utilities In our analysis, we focus on the manufacturing sector in which large firms are responsible for most of the production: the NBS sample is responsible for 90% of gross manufacturing output. The data are based on a standard firm survey and contain information on each firm’s location, industry, ownership type, number of employees and a wide range of accounting variables (e.g. output, input, value added, wage bill, fixed assets, financial assets, etc.). In our preliminary analysis we consider two firm outcomes: labor costs and profitability. As our measure of labor costs, we construct the wage rate as the wage bill divided by the total number of employees. As our measure of profitability, we use revenues divided by sales. There is a number of caveat with using the NBS census. First, the 5 million RMB threshold that defines whether a firm belongs or not to the NBS census was loosely implemented. In effect, it is impossible to know the exact level of sales before implementing the survey and some firms only entered the database several years after having reached the sales cut-off.27 This truncation potentially introduces a selection bias. For that reason, we restrict ourselves to the balanced panel of firms over the period in most of our analysis, and we only resort to the unbalanced panel as a robustness check. Second, matching firms over time in the NBS is difficult because of frequent changes in firm identifiers. In order to match “identifier-switchers,” we use the fuzzy algorithm developed by Brandt et al. (2014), which uses slowly-changing firm characteristics such as its name, address or phone number. While total sample size ranges between 150,000 and 300,000 per year, we end up with 55,000 firms when we limit the sample to the fully balanced panel between 1999 and 2005. Third, although we shall use the terms “firm” and “enterprise” interchangeably ever, as pointed out by Ge and Yang (2014), they vary within a very narrow range, which means that the UHS measure might understate actual variations in working hours. For this reason, we do not use hours of work in our analysis. 25The following description borrows heavily from a detailed discussion in Brandt et al. (2014). 26The average exchange rate over the period of interest was 8.26 RMB to the USD, so 5 million RMB represent about $605,000. 27Conversely, about 5% of private and collectively-owned firms, which are subject to the thresh- old, continue to participate in the survey even if their annual sales fall short of the threshold.

12 in the remainder of the paper, the NBS data cover “firms” in the narrow sense of “legal units” (faren danwei). Subsequently, different subsidiaries of the same enterprise may be surveyed, provided they meet a number of criteria, including having their own names, being able to sign contracts, possessing and using assets independently, assuming their liabilities and being financially independent.

3 Empirical strategy

In this section, we first describe how we create exogenous rural-to-urban migration flows based on our price and rainfall variations at origin. Our strategy closely follows Boustan et al. (2010) to evaluate labour market effects of internal migration in the United States.28 We then present the empirical strategy we use to estimate the causal impact of migrant inflows on the urban sector.

3.1 Predicting rural-urban migration flows

Let Mo,d,t denote the migration flows between origin o (rural areas of a prefecture o) and destination d (a “city,” i.e. urban areas in a prefecture d) in a given year t = 2000,..., 2005, which we construct using retrospective questions of the 2005 29 mini-census. We construct the outmigration rate in year t, mo,t, by dividing the sum of migrants who left o in year t by the number of adults who still reside in o, which we denote with Ro. Formally, we have: P d Mo,d,t mo,t = . Ro

We also construct the probability that a migrant from o goes to d at time t, which Mo,d,t we denote with po,d,t = P . d Mo,d,t For the sake of exposition, we describe our strategy for a given shock so,t to the rural origin o in year t, which may either be a price shock or a rainfall shock. In order to estimate the causal effect of migration inflow on urban destinations, we need variations in migration flows that are unrelated to potential destination outcomes. Our empirical strategy follows Boustan et al. (2010), and interacts two sources of exogenous variation. First, we use price and rainfall variations as ex- ogenous determinants of migration outflows in each rural prefecture. Second, we

28A similar approach is adopted by El Badaoui et al. (2014), Feng et al. (2015a) and Kleemans and Magruder (2014). 29A ”prefecture” comprises both urban and rural areas. There is some debate on how well urbanisation is captured in Chinese data—see Chan (2007). Note that our results are not sensitive to such a measurement issue since we assume, based on the literature and Census data on reasons for migration, that rural outmigrants settle in urban areas.

13 combine we use a gravity model which includes geographic distance between prefec- tures and urban population in 1990 to allocate rural migrants to urban destinations. This provides us with a prediction of migrant inflow in each urban area that is exogenous with respect to urban outcomes.

Exogenous variations in migration outflows We first regress migration out- flow from each rural area on shocks to agricultural income. Formally, we estimate the following equation:

mo,t = β0 + βsso,t−1 + δt + νo + εo,t, (3)

where o indexes the origin, and t indexes time t = 2000,..., 2005. mo,t and so,t denote is the outmigration rate and the shock at origin o in year t, respectively.

νo denotes for origin fixed effects and captures any time-invariant characteristics of origins, e.g. barriers to mobility. We use 1990 population at origin as weight to generate consistent outmigration predictions in the number of migrants.

As our measure of shock so,t, we use the average of rainfall or price shocks in t − 1 and t − 2. A migration spell at date t = 2005 for instance corresponds to a migrant worker who moved between October 2004 and October 2005. Hence, given the timing of the growing cycle for most crops in our sample, migration spells in period t are most likely to be impacted by rainfall and price variations in t − 1 and before—especially if there are lags in the decision to migrate.30 Estimating equation 3 yields the predicted migration rate mgo,t from origin o in year t: mgo,t = βe0 + βe1so,t + νeo + δet

31 where δet is the average of the time effect. We then multiply the migration rate by rural population at origin Ro to compute predicted migration flows from o:

Mgo,t = mgo,t × Ro

We present the estimation of equation (3) in Table 1. In the first two columns, we report the estimates for the price variations with lags (column 1) and with lags and forwards (column 2). The third and fourth columns display the estimates for the rain variations, and the last two columns include both price and rainfall shocks.

30Incorporating contemporary price/rainfall shocks in the analysis does not change the results. We also estimate the same specification using forward shocks, i.e. the average of prices in t + 1 and t + 2, to show that shocks are not anticipated. 31We remove time variation from our predictions, in order to avoid correlation between our migration flows and destination trends in outcomes.

14 In Table 1 – columns 1 and 2, we see that migration flows are negatively correlated with (lagged) price deviations from their long-term values. A price 10% above its long-term value is associated with a 0.25 p.p. lower migration incidence, which is on average around 1.3 p.p. in our sample. In order to better understand the magnitude of this effect, let us normalize by the standard deviations of our variables. An additional standard deviation in the price shocks decreases migration incidence by 0.18 standard deviations. This effect is thus economically large, and quite precisely estimated. Figure 1 plots the residuals of outmigration (y-axis) against the residual value of the prefecture-specific agricultural portfolio as predicted by international prices (x-axis), once cleaned by prefecture and year fixed-effects. The relationship is globally linear. As shown in Table 1 – columns 3 and 4, migration flows are positively correlated with rainfall deficits (see definition in section 2). A standard deviation increase in rainfall deficits is associated with a 0.18 p.p. higher migration incidence. Figure 2 displays the relationship between outmigration and rainfall deficits, once prefecture and year fixed effects are partialled out. The relationship seems linear. As a robustness check, we test whether shocks are anticipated and find that forward variations in rainfall or prices do not predict migration outflows (Column 2, 4 and 6 of Table 1). Finally, we include both types of shocks in the estimation. As columns 5 and 6 of Table 1 show, price and rainfall shocks have independent effects on migration outflows. The estimated coefficients on the lags and forwards of our constructed shocks in the joint regression are similar to those in the separate specifications.

Exogenous variations in origin-destination migration flows We next esti- mate the following equation:

po,d = f(disto,d) + γP opd,1990 + µo + εo,d, (4)

where po,d is the share of migrants from prefecture o who went to prefecture d, disto,d is the distance between o and d, f is a parametric function of distance and

P opd,1990 is the total urban population of prefecture d in 1990. Equation 4 yields pco,d, the predicted probability for migrants from prefecture o to go to prefecture d based on distance, a fixed and exogenous characteristic of the pair (o, d), and the attractiveness of d captured by its lagged population. The specifications are weighted by P opd,1990. We report the results of this estimation for three simple parametric specifications

15 in Table 2. In the first column, we use a linear specification in distance.32 In column 2, we add a quadratic term and we use the inverse of distance in column 3. As apparent in this table, (i) distance is a very strong predictor of migration flows and (ii) the last specification in column 3 generates a much better fit of the data. 33

Predicted migration flows Finally, we combine predicted migration outflows (Equation 3) and predicted probabilities to come from each origin to each destination (Equation 4) to predict migration inflows into each urban destination. Formally, we compute : X Mgd,t = Mgo,t × pco,d, (5) o6=d where Mgd,t are migration inflows in destination d in year t, Mgo,t is predicted migration outflow from origin o in year t and po,d is the predicted probability that a migrant from o goes to d. In order to avoid that migration inflows are correlated with destination outcomes, we exclude from Mgd,t immigration flows attributable to rural areas of prefecture d. This two-stage process yields synthetic migration inflows into prefectures of des- tination that are exogenous with respect to destination outcomes. We first pro- vide some intuition about the nature of these exogenous variations in Figures A6

(measure Mgd,t as predicted by price variations) and A7 (measure Mgd,t as predicted by rainfall variations). We report these measures cleaned for cross-sectional time- invariant factors in 2001 (left panels) and 2004 (right panels). As shown in Fig- ure A6, there is some spatial auto-correlation in these measures arising from the spatial auto-correlation of crop composition across prefectures and the transforma- tion of outflows to inflows involving distance between prefectures. There is also some auto-correlation across periods as international prices exhibit persistence in their fluctuations. However, there are also large cross-sectional and time-varying fluctuations that we can use for our analysis. Figure A7 illustrates cross-sectional and time-varying fluctuations for the immigrant inflows measure as predicted by rainfall variations. In order to test whether our migration predictions are accurate, we regress the actual migrant inflows observed in the mini-census data on the predicted immigrant inflows. Table 3 reports the correlation between actual and predicted migration rates. As Columns 1 and 3 show, the relationship is strong, positive and significant

32 The distance between two prefectures o and d, po,d, is measured as the distance between the centroids of o and d. 33This is confirmed by Figure A5 in Appendix which displays the average migration share to each destination by distance from the origin.

16 with destination-fixed effects. It remains so after adding year fixed effects (Columns 2 and 4). The coefficient in both specifications is close to one. This suggests that, as expected and by construction, our instrument successfully predict variation in migration inflow between years for a given prefecture and across prefectures for a given year, even if they do not explain most of the total variation in migration rates. This baseline relationship between actual and exogenous variations in immigration rates will serve as a first stage in our analysis to estimate the impact of migration on urban labour markets and firm outcomes. In a robustness check, we only keep migrants from different provinces and run a similar exercise as in Table 3 (see Table A6). The predictive power of the syn- thetic migration flows is not affected by the restriction to migration spells between provinces. This feature is important because it allow us to separate the potential di- rect effects of price or rainfall shocks on a province (through demand for non-tradable goods for instance) from the indirect effects through the arrival of workers. We now turn to the second stage of our analysis, which estimates how rural-to- urban migration is absorbed by the urban modern sector.

3.2 Migration flows and labour market outcomes

In order to estimate the effect of migration on urban labour market outcomes, we use employment and wage data from the Urban Household Survey.34 We estimate the impact of migration on labour market outcomes of individual i in destination d in year t by regressing each outcome, which we denote with yi,d,t, on predicted migration that year, Mgd,t, and a vector of individual characteristics Xi.

The vector Xi includes dummy variables for individual i’s marital status, gender, education level (primary, lower secondary, upper secondary and tertiary), and age (24-35, 35-44, 45-54 and 54-64). We also include seven occupation dummies in order to better control for workers’ skills.35 In order to control for labour market conditions at destination and aggregate fluctuations in labour market outcomes, we also include destination and year fixed effects. The effect of Md,t on yi,d,t is estimated through

34Since UHS does not cover all prefectures, but only a representative sample of 18 provinces and 207 prefectures, we checked that our predictions and actual migration rates are indeed well correlated within the UHS sample (Results available upon request). 35UHS occupation categories are “Head of organization,” “Professional skill worker,” “Staff,” “Commercial and service worker,” “Agriculture,” “Production operator,” “Soldier” and “Other occupations”. Since occupation itself may be an outcome of migration, we check that our results are robust to excluding it from the vector of controls.

17 Two-Stage Least Squares (2SLS) with Mgd,t as an instrument: ( Md,t = b0 + bmMgd,t + bxXi + ed + nt + ed,t , (6) yi,d,t = β0 + βmMd,t + βxXi + ηd + νt + εi and standard errors are clustered at the level of the prefecture of destination×year.36

3.3 Migration flows and firm outcomes

We next turn to the estimation of the effect of migrant inflows on firm outcomes. One challenge with firm data is that some variables, e.g. size, are not station- ary and these differential trends would not be captured by firm fixed effects. We describe below our strategy when dealing with stationary variables: wage rate and profitability (profits normalized by sales) and we describe in the appendix the em- pirical strategy to deal with non-stationary variables. We take advantage of the panel structure of the data and implement a 2SLS-FE specification in which we regress the outcome of firm j in year t in urban prefecture d on migration inflow in d, which we denote Md,t, using predicted migration Mgd,t as instrument and including firm fixed effects ηj. ( Md,t = b0 + bmMgd,t + ej + nt + ed,t , (7) yj,d,t = β0 + βmMd,t + ηj + νt + εj,t with standard errors clustered at the level of the prefecture of destination×year.37

4 Results and discussion

In this section, we discuss our preliminary findings on the absorption of labour supply in urban centers. We first analyze the impact on urban labour markets, which helps identify the nature of the shock induced by immigant inflows at destination. We then discuss preliminary findings on firm outcomes.

36Because the regressor of interest, the migration rate, is itself predicted, correct inference re- quires to bootstrap the first stage. The standard errors in the second stage are however correctly estimated through 2SLS. 37Because the regressor of interest, the migration rate, is itself predicted, correct inference re- quires to bootstrap the first stage. The standard errors in the second stage are however correctly estimated through 2SLS.

18 4.1 Effects of migration inflow on urban workers

In order to identify the shift in urban labour supply, we use repeated cross sections from the National Urban Household Survey and consider the effect of migration inflows on labour market outcomes of locally registered urban residents aged 15 to 64. In this exercise, we ignore the existence of heterogeneity between migrants and “natives,” i.e. assume that they are perfectly substitutable. As Table A3 in the Appendix shows, however, migrants are significantly less skilled than urban workers.38 For this reason, we also estimate the change in labour market outcomes for “natives” with primary education and lower secondary education only. Table 4 presents our estimates of the effect of migration inflows on four outcomes: wages, wage employment, unemployment and self-employment of urban residents. The first column presents results from a simple OLS regression of each outcome on the actual immigration rate. The second and third columns present 2SLS es- timations, using rainfall and price shocks, respectively, as instruments for migrant inflows. We first consider the impact on urban wages. The OLS estimate is negative but small: a 1 p.p. increase in the immigration rate is associated with a 0.09% decrease in wages. The IV estimates are negative and larger in magnitude: If migrants are attracted to cities that offer higher wages, OLS estimates should indeed be biased upwards. Using rainfall and price shocks to predict migration, we find that a 1 p.p. higher immigration rate is associated with 0.17% - 0.22% lower wages. The effects become larger when we focus our attention on urban residents with lower secondary education or less, who are more likely to compete for jobs with migrants. A 1 p.p. higher immigrant rate is associated with a 0.17% - 0.31% decrease in wages. Overall, these estimates suggest that, once cleaned for the potential demand-driven fluctuations, an influx of rural migrants depresses urban wages. Following Borjas (2003) we can recover the elasticity of urban wages with respect to migration by 1 multiplying the coefficient by (1+m)2 , where m is the ratio of migrants to native. 1 In our context, the migration rate is about 5%, hence (1+m)2 ≈ 0.90. The implied wage elasticities from our estimates are between 0.15 and 0.28, which is lower than Borjas’s (2003) own estimates (0.3 − 0.4). We next consider the effect of rural to urban migration on the status of active urban residents (wage employment, self-employment or unemployment). The OLS estimates are close to zero and mostly insignificant, as are the IV estimates using price shocks as instrument. The IV estimates using rainfall shocks, however dis-

38See section B.1 in the Appendix for a systematic comparison between rural migrants and urban residents.

19 play significant decrease in wage employment: a one percentage point increase in migration decreases wage employment of urban residents by 9 percentage point (the average participation to wage employment is above 90%). Correspondingly, unem- ployment and self-employment seem to increase (the effect on self-employment is not significant). These results provide some evidence that employers substitute urban workers with rural migrants, leaving urban residents unemployed or leading them to become self-employed. However, these effects are not consistent across instrumen- tation strategies. Overall, our results confirm that the arrival of migrants shifts labour supply downward. The estimated effect of migration on wages is relatively small, as com- pared to those from the literature on international migration into developed coun- tries (Borjas, 2003) and to other studies on internal migration in developing countries which use a similar strategy (Boustan et al., 2010; El Badaoui et al., 2014; Imbert and Papp, 2014; Kleemans and Magruder, 2014). One reason behind such pattern could be that the labour market for urban residents is regulated with the existence of minimum wages, while the labour markets for migrants are unregulated. The marginal labour cost may thus drastically respond to the arrival of migrants when the average labour cost (mostly driven by residents) remains quite high.

4.2 Effects of migration inflow on manufacturing firms

We now turn to the firm side and analyze the impact of exogenous changes in migration inflows on labor costs and profitability. In Table 5, we analyze specification 7 on the subsample of firms present from 1999 to 2005. We look at two “stationary” outcome variables: wage rates, which are defined as total wage bill divided by total labour force, and profitability, which is equal to total profits (value added minus wage bill) divided by total revenue.39 We take the logarithm of both variables. In column 1, we report the correlation between these variables (at the end of period t) and migrant inflows during period t. In column 2 (resp. 3), we use our migration flows as predicted by the rainfall (resp. price) shocks to instrument actual movements from rural to urban areas. Note that firms relying on migrants may be selected in terms of unobservable characteristics. All regressions in Table 5 and the following thus include firm fixed effects to clean for fixed firm-specific determinants of their reliance on migrant workers. As shown in the top panels of Table 5 – column 1, the correlation between firm- level wage rates and total migrant inflows is negative and significant. As expected given the lower education level of migrants—see Table A3,—the effect is slightly

39We ignore here the ownership structure of the firm.

20 larger in absolute value when one restricts the analysis to firms that rely heavily on unskilled labour, i.e. food manufacturing, beverage manufacturing, footwear, wood processing, and textile. To interpret the size of these correlations, the within standard deviation of migration flows is around .2, which implies that current mi- gration flows higher by 1 within standard deviation would be associated with a small 0.03% decrease in wage rates. These results are however likely to be biased upwards (towards zero) as migrants tend to settle in high-wage destinations. Columns 2 and 3 address this concern thanks to the instrumental variable strat- egy delineated in Section 3 and show a very different picture. Coefficients become much larger in absolute value when migration flows are purged of the endogeneity in migration decisions. A 1% higher immigration rate translates to 1.2% lower wage rates when using rainfall as a source of exogenous variation and 0.6% lower when we rely on price shocks. In standardised terms, a 1 within standard deviation increase in the immigration rate yields a .25% (resp., .13%) drop in wage rates using the rainfall- (resp., price-) based instrument. The range can be explained by two fac- tors. First, the estimation based on rainfall tends to be noisier. Second, rainfall and price shocks identify different local average treatment effects (LATEs). Whereas a shortage of rainfall is likely to trigger distress migration, price fluctuations exhibit some serial correlation and might thus lead rural dwellers relying on agriculture for a living to update their expectations on returns to farming and engage in more planned migration. We would therefore expect a stronger immediate impact of immigration on urban labour markets when rainfall is exploited as a source of identification. Finally, we can note that the effects are larger in magnitude—albeit imprecisely estimated—when we focus on low-skill industries. The bottom panels of Table 5 explore the effect of migrant inflows on firms’ prof- itability, for the whole sample first and then focusing on low-skill sectors. Profitabil- ity is positively and consistently affected by migration flows. The effect becomes positive and significant when we implement our instrumental variable strategy.40 The effect of a 1 within standard deviation increase in the immigration rate iden- tified thanks to international prices (resp., rainfall) on firms’ profitability is a .2% (resp., 3%) rise in profitability for the whole sample. Firms that hire mostly low- skilled workers enjoy a larger positive effect of immigration: +.3% (resp., +.4%). We run a number of robustness checks to verify that our estimates are indeed cap-

40Note that the OLS effect is lower than the coefficients on the instrumented migration flows and statistically indistinguishable from zero. One explanation is that opposite effects are at work: First, an influx of migrants enables firms to enhance their profits and grow; second, destinations experiencing migration flows have already experienced some economic growth with larger and more established firms than in other regions, thereby attracting migrants through higher posted wages.

21 turing labour supply shifts induced by origin-driven fluctuations. All corresponding tables are in the Appendix. One concern could be that firms in cities rely on the provision of important crops as intermediate inputs, and are directly affected by World crop prices as final good producers (rice vinegar exporters for instance). One possible solution is to exploit the differential flows between products and migrants with the latter moving much farther than the former (migration costs are paid once while transportation costs are paid continuously). Instead, we use the precise indicators of industries and perform two robustness checks. First, in order to clean for the potential shortages in crop provisions for some cities close to the fields, we exclude all firms potentially using one of our crops as an input. These consist of—among others—food exporters and part of the textile industry. We report the results of this analysis in Table A7. The results are virtually unchanged compared to Table 5. We also control for the direct effect of price and rainfall shocks in destination prefectures. These controls are built based on equations 1 and 2, respectively, and lagged in order to match the way migration shocks were created. The results, reported in Table A8, are imprecisely estimated but confirm our findings that immigration depresses wages and boosts firm productivity at destination. Second, there may be some delay between the arrival of migrants and the result- ing increase in firm factor use. Although this does not jeopardise our identification strategy or the interpretation of the results, we provide in Table A9 the effects of lagged shocks. Results are consistent with contemporary shocks. The results of this section give some credit to our constructed migration flows: migration shifts labour supply to the right, thereby decreasing wages and boosting labour demand. In the next section, we look more precisely at this effect and better identify which firms gain from the newly-available resources.

4.3 Reallocation of resources across firms [Work in progress.]

5 Conclusion

A key link in the chain of events between rural to urban migration and manufacturing growth is the impact of migration on urban labor markets and firms. This paper provides some of the first causal empirical evidence of this impact using Chinese data. We predict migrant inflows into urban areas based on shocks to agricultural in- comes in rural origins and distance between prefectures of origin and destination. These predictions are exogenous with respect to urban workers’ and firms’ environ-

22 ments, which allows us to tackle the issue of migrants self-selecting into buoyant labour markets and provide causal estimates of the effect of migration on urban outcomes. Using a representative survey of urban households, we find that migrant inflows from rural areas have a negative effect on urban dwellers’ wages and—to a lesser extent—employment. We next use a census of large firms and show that migration decreases labor costs and improves profitability of manufacturing firms This new piece of evidence brings together two main features of Chinese devel- opment, massive internal migration and manufacturing growth despite severe credit constraints (Song et al., 2011). By keeping labor costs low, rural-to-urban migration may have allowed firms to accumulate larger profits, which were then reinvested to finance future growth.

23 References

ADB, “The Declining Share of Agricultural Employment in the People’s Republic of China: How Fast?,” ADB Economics Working Paper Series 419, Asian Devel- opment Bank (ADB) November 2014.

Alvarez-Cuadrado, Francisco and Markus Poschke, “Structural Change Out of Agriculture: Labor Push versus Labor Pull,” American Economic Journal: Macroeconomics, July 2011, 3 (3), 127–58.

Au, Chun-Chung and J. Vernon Henderson, “How migration restrictions limit agglomeration and productivity in China,” Journal of Development Economics, 2006, 80, 350–88.

and , “A numerical simulation analysis of (Hukou) labour mobility restrictions in China,” Journal of Development Economics, 2007, 83, 392–410.

Borjas, George J., “The Labor Demand Curve Is Downward Sloping: Reexamin- ing The Impact Of Immigration On The Labor Market,” The Quarterly Journal of Economics, November 2003, 118 (4), 1335–1374.

Bosker, Maarten, Steven Brakman, Harry Garretsen, and Marc Schramm, “Relaxing Hukou: Increased labor mobility and China’s economic geography,” Journal of Urban Economics, 2012, 72, 252–66.

Boustan, Leah Platt, Price V. Fishback, and Shawn Kantor, “The Effect of Internal Migration on Local Labor Markets:American Cities during the Great Depression,” Journal of Labor Economics, October 2010, 28 (4), 719–746.

Brandt, Loren, Johannes Van Biesebroeck, and Yifan Zhang, “Challenges of working with the Chinese NBS firm-level data,” China Economic Review, 2014, 30 (C), 339–352.

Bustos, Paula, Bruno Caprettini, and Jacopo Ponticelli, “Agricultural pro- ductivity and structural transformation: evidence from Brazil.,” 2015. Forthcom- ing American Economic Review.

Chan, Kam Wing, “Misconceptions and Complexities in the Study of China’s Cities: Definitions, Statistics, and Implications,” Eurasian Geography and Eco- nomics, 2007, 48 (4), 383–412.

, “Migrant and development in China: trends, geography and current issues,” Migration and Development, 2012, 1 (2), 187–205.

24 Combes, Pierre-Philippe, Sylvie Demurger, and Shi Li, “Migration exter- nalities in Chinese cities,” European Economic Review, 2015, 76 (C), 152–167.

Dustmann, Christian and Albrecht Glitz, “How Do Industries and Firms Re- spond to Changes in Local Labor Supply?,” Journal of Labor Economics, 2015, 33 (3), 711 – 750.

El Badaoui, Eliane, Eric Strobl, and Frank Walsh, “The Impact of Internal Migration on Local Labour Markets in Thailand,” EconomiX Working Papers 2014-12, University of Paris West - Nanterre la D´efense, EconomiX 2014.

Evans, David S, “Tests of alternative theories of firm growth,” The Journal of Political Economy, 1987, pp. 657–674.

Facchini, Giovanni, Maggie Y. Liu, Anna Maria Mayda, and Minghai Zhou, “The impact of China’s WTO accession on internal migration,” December 2015.

Fan, Cindy C., China on the Move, Routledge, 2008.

Feng, Shuaizhang, Michael Oppenheimer, and Wolfram Schlenker, “Weather Anomalies, Crop Yields, and Migration in the US Corn Belt,” March 2015.

, Yingyao Hu, and Robert Moffit, “Long Run Trends in Unemployment and Labor Force Participation in China,” August 2015.

Ge, Suqin and Dennis Tao Yang, “Changes In China’s Wage Structure,” Journal of the European Economic Association, 04 2014, 12 (2), 300–336.

Giesing, Yvonne and Nadzeya Laurentsyeva, “Brain Drain and Firm Pro- ductivity: Evidence from the Sequential Opening of EU Labour Markets,” 2015. Unpublished.

Gollin, Douglas, Stephen Parente, and Richard Rogerson, “The Role of Agriculture in Development,” American Economic Review, May 2002, 92 (2), 160–164.

Harris, John R. and Michael P. Todaro, “Migration, Unemployment and De- velopment: A Two-Sector Analysis,” The American Economic Review, 1970, 60 (1), pp. 126–142.

25 Herrendorf, Berthold, Christopher Herrington, and Akos Valentinyi, “Sectoral Technology and Structural Transformation,” Technical Report 9386, C.E.P.R. Discussion Papers March 2013.

Imbert, Clement and John Papp, “Short-term Migration and Rural Workfare Programs: Evidence from India,” 2014. Manuscript.

Kerr, Sari Pekkala, William R. Kerr, and William F. Lincoln, “Skilled Immigration and the Employment Structures of US Firms,” Journal of Labor Economics, 2015, 33 (S1), S147 – S186.

Kleemans, Marieke and Jeremy Magruder, “Labor Market Changes in Re- sponse to Immigration: Evidence from Internal Migration Driven by Weather Shocks in Indonesia,” 2014. Manuscript.

Kuznets, Simon, Agriculture in Economic Development, McGraw-Hill Book Com- pany, 1964.

Lewis, Arthur, “Economic Development with Unlimited Supplies of Labour,” The Manchester School, 1954, 22 (2), 139–191.

Marden, Sam, “The agricultural roots of industrial development: “forward link- ages” in reform era China,” 2015. Manuscript.

Mayneris, Florian, Sandra Poncet, and Tao Zhang, “The cleansing effect of minimum wages. Minimum wages, firm dynamics and aggregate productivity in China,” Core Discussion Papers 2014/44 October 2014.

Meng, Xin and Dandan Zhang, “Labour Market Impact of Large Scale Inter- nal Migration on Chinese Urban ’Native’ Workers,” IZA Discussion Papers 5288 October 2010.

Park, Albert, China Urbanizes: Consequences, Strategies, and Policies, The World Bank, 2008.

Peri, Giovanni, “The Effect Of Immigration On Productivity: Evidence From U.S. States,” The Review of Economics and Statistics, February 2012, 94 (1), 348–358.

Song, Zheng, Kjetil Storesletten, and Fabrizio Zilibotti, “Growing Like China,” American Economic Review, 2011, 101 (1), 196–233.

Sousa, Jose De and Sandra Poncet, “How are wages set in Beijing?,” Regional Science and Urban Economics, 2011, 41 (1), 9–19.

26 Tombe, Trevor and Xiaodong Zhu, “Trade, Migration and Productivity: A Quantitative Analysis of China,” June 2015.

United Nations, “Trends in International Migrant Stock: The 2015 revision.,” Technical Report, United Nations, Department of Economics and Social Affairs 2015.

Vendryes, Thomas, “Migration constraints and development: Hukou and capital accumulation in China,” China Economic Review, 2011, 722, 669–92.

27 A Figures and tables

Figure 1. Value of agricultural portfolio at origin and outmigration rates.

Notes: This Figure illustrates the relationship between the standardized value of the prefecture-specific agricultural portfolio as predicted by international prices (x-axis) and outmigration (y-axis). We consider the residuals of all measures once cleaned by prefecture and year Fixed-Effects. For the sake of exposure, we group prefecture×year observations, create 100 bins of observations with similar price shock and represent the average outmigration rate within a bin. The lines are locally weighted regressions on all observations.

28 Figure 2. Rainfall deficits relative to water requirements at origin and outmigration rates.

Notes: This Figure illustrates the relationship between the standardized rainfall deficit relative to water requirements for the origin-specific agricultural portfolio (x-axis) and outmigration (y-axis). We consider the residuals of all measures once cleaned by prefecture and year Fixed-Effects. For the sake of exposure, we group prefecture×year observations, create 100 bins of observations with similar rainfall shock and represent the average outmigration rate within a bin. The lines are locally weighted regressions on all observations.

Figure 3. Potential output in China for rice and cotton (1990).

(a) Paddy rice. (b) Cotton.

Notes: These two maps represent the potential output constructed with 1990 harvested areas and potential yield (GAEZ model) in 1990 for 2 common crops in China, i.e. paddy rice (left panel), and cotton (right panel).

29 Figure 4. Evolution of migration rates between 1999 and 2005.

Sources: 2005 Mini-Census.

30 Table 1. Migration flows and price/rainfall shocks (2000-2005).

Specification (3)

Migration outflows (1) (2) (3) (4) (5) (6)

Price Lags p[t−2,t−1] -0.0249*** -0.0196*** -0.0213*** -0.0143** (0.0036) (0.0059) (0.0037) (0.0060) Price Forwards p[t+1,t+2] 0.00889 0.0130* (0.0079) (0.0074)

31 Rainfall Lags r[t−2,t−1] 0.0617*** 0.0623*** 0.0518*** 0.0552*** (0.0069) (0.0068) (0.0067) (0.0064) Rainfall Forwards r[t+1,t+2] -0.00629 -0.0177** (0.0086) (0.0088)

Observations 2,022 2,022 2,022 2,022 2,022 2,022 R-squared 0.807 0.808 0.807 0.807 0.811 0.812 Origin FE Yes Yes Yes Yes Yes Yes Year FE Yes Yes Yes Yes Yes Yes Robust standard errors are reported between parentheses. The unit of observation is an origin×a year and the regression is weighted by origin rural population in 1990. Migration outflows are yearly outflows normalized by the prefecture’s rural population in 2005. Price (resp. Rainfall) Lags are defined as the average normalized price deviations (resp. rainfall deficits) in period t − 1 and t − 2. Price (resp. Rainfall) Forwards are defined as the average normalized price deviations (resp. rainfall deficits) in period t + 1 and t + 2. See section 2 for a complete description of the price and rainfall deficit construction. Table 2. Distance and migration flows between origins and destinations (2000-2005).

Specification (4)

Migration flows (share) (1) (2) (3)

Distance do,d (1,000 km) -0.0116*** -0.0449*** (0.000539) (0.00286) 2 Squared Distance do,d 1.04e-08*** (8.50e-10) Inverse Distance 1/do,d 9.424*** (0.757) Destination population (1,000), 1990 P opd,1990 0.943*** 0.956*** 0.949*** (0.0557) (0.0552) (0.0546)

Observations 116,622 116,622 116,622 R-squared 0.206 0.231 0.255 Origin FE Yes Yes Yes Robust standard errors are reported between parentheses. The unit of observation is an origin×a destination×a year. Migration flows (share) are the number of migrants going from origin o to destination d normalized by the total number of migrants from origin o. For the sake of exposition, we normalize distance do,d and destination population P opd,1990 by 1, 000.

Table 3. Comparison of actual and predicted immigration rate in urban areas (2000-2005). (1) (2) (3) (4)

Prediction - rainfall 1.328*** 0.914*** (0.334) (0.241) Prediction - price 0.757*** 0.911*** (0.246) (0.224)

Observations 2,028 2,028 2,028 2,028 R-squared 0.812 0.875 0.813 0.879 Year FE No Yes No Yes Destination FE Yes Yes Yes Yes Standard errors are clustered at the destination level and are reported between parentheses. *** p<0.01, ** p<0.05, * p<0.1. An observation is a destination×year. The immigration rate is the number of agricultural hukou holders from all origin prefectures who went to a destination prefec- ture d in a given year divided by population at destination. The independent variable correspond to Mdd,t as defined in equation 5. Regressions are weighted by total urban adult population at destination.

32 Table 4. Effect of migration flows on wages earned by urban residents and unemployment proba- bility.

OLS 2SLS: rainfall 2SLS: price Effect of migration inflows on ... (1) (2) (3) Real monthly wages -0.090*** -0.224* -0.170* (0.023) (0.131) (0.0906) [191,394] [190,989] [191,394]

Real monthly wages (low skill) -0.081*** -0.306* -0.169** (0.016) (0.166) (0.0855) [48,375] [48,375] [48,375]

Wage Employment -0.0063 -0.0991* 0.000309 (0.0066) (0.0590) (0.0114) [212,197] [212,197] [212,197]

Wage Employment (low skill) -0.0072 -0.177* 0.0040 (0.014) (0.102) (0.018) [58,045] [58,045] [58,045]

Unemployment 0.0081* 0.0408** -0.0081 (0.0044) (0.0169) (0.0116) [212,197] [212,197] [212,197]

Unemployment (low skill) 0.0089*** 0.0318** -0.0032 (0.0032) (0.0137) (0.0091) [58,045] [58,045] [58,045]

Self-Employment -0.0018 0.0583 0.0078 (0.0027) (0.0451) (0.0119) [212,197] [212,197] [212,197]

Self-Employment (low skill) -0.0017 0.146 -0.0008 (0.0112) (0.0923) (0.0202) [58,045] [58,045] [58,045]

Prefecture and Year FE Yes Yes Yes Standard errors are clustered at the prefecture/year level. The unit of observation is an individual. In the first two panels, the dependent variable is the log of wages deflated using a consumer price index computed by the authors using the UHS data. In the next six panels, the dependent variables are dummies that take the value one if the individual works for wage, is unemployed or is self- employed. See section 3 for a complete description of the price- and rainfall-related migration flows. All specifications include characteristics of the resident population (proportions by marital status, gender, age group, education level, rural registration, and firm ownership for the wage specifications) and log adult population, as well as year and prefecture fixed effects. The first stages are reported in Table A6.

33 Table 5. Effect of migration flows on wages and profitability using firm data.

OLS 2SLS: rainfall 2SLS: price Effect of migration inflows on ... (1) (2) (3) Wages -0.168*** -1.221* -0.614** (0.0518) (0.706) (0.255) [327,070] [327,070] [327,070]

Wages (low skill) -0.185*** -1.452 -0.707 (0.0546) (1.515) (0.645) [179,984] [179,984] [179,984]

Profitability -0.134 1.541* 0.890*** (0.0978) (0.865) (0.330) [303,957] [303,957] [303,957]

Profitability (low skill) -0.0370 1.990* 1.408*** (0.0813) (1.160) (0.470) [167,829] [167,829] [167,829]

Prefecture and Year FE Yes Yes Yes Standard errors are clustered at the prefecture/year level. The unit of observation is a firm × a year. In the top two panels, the dependent variable is the log of total wage bill divided by the number of employees. In the bottom two panels, the dependent variable is the log of profits divided by revenues. See section 3 for a complete description of the price- and rainfall-related migration flows. The first stages are reported in Table A6. Low skill indicates firms in sectors employing mostly low-skill workers (i.e. food manufacturing, beverage manufacturing, footwear, wood processing, and textile).

34 A Additional tables and figures

Figure A1. Share of return migrants by age.

Sources: 2005 Mini-Census.

Figure A2. Share of step migrants as a function of age and time since departure.

Sources: 2005 Mini-Census.

35 Figure A3. Price deviations from trends on International Commodity Markets 1998-2010 (blue: banana, red: rice, teal: groundnut).

Note: These series represent the Hodrick Prescott residual applied to the logarithm of international commodity prices for three commodities: banana, rice and groundnut. For instance, the price of rice can be interpreted as being 35% below its long-term value in 2001.

Figure A4. Price and rainfall shocks across Chinese prefectures in 1999/2000.

(a) Price shock. (b) Rainfall shock.

Notes: These two maps represent the standardized price shock po,t in 1999/2000 (left panel), and standardized rainfall shock ro,t in 1999/2000 (right panel). Note that 1999/2000 corresponds to a pre-crisis period: in 2001, the price of rice decreases which generates a very negative shock across China concentrated in rice-producing prefectures.

36 Figure A5. Origin-destination migration predictions—the role of distance.

Notes: Migration flows constructed with census data (2000-2005).

Figure A6. Measure Mdd,t of immigrant inflows to cities as predicted by prices in 2001 and 2004.

(a) 2001 (b) 2004

Notes: These two maps represent the quantities M\d,2001 and M\d,2004, where M[d,t is the measure of immigrant inflows as predicted by price variations and the weighting distance matrix between origins and destinations.

37 Figure A7. Measure Mdd,t of immigrant inflows to cities as predicted by rainfall in 2001 and 2004.

(a) 2001 (b) 2004

Notes: These two maps represent the quantities M\d,2001 and M\d,2004, where M[d,t is the measure of immigrant inflows as predicted by rainfall variations and the weighting distance matrix between origins and destinations.

Figure A8. Evolution of the share of private firms in the industrial sector.

Sources: 1998-2007 NBS above-scale firm data.

38 Table A1. Descriptive statistics from the UHS data (2002-2008). Mean St. Dev.

Age 43.17 11.00 Female 0.50 0.50 Married 0.88 0.33 Born in prefecture of residence 0.61 0.49

Education: Primary education 0.05 0.21 Lower secondary 0.27 0.45 Higher secondary 0.25 0.43 Tertiary education 0.42 0.49

Unemployed 0.02 0.14 Self-employed/Firm owner 0.05 0.23 Employee 0.71 0.45 Public sector 0.63 0.48 Private sector 0.37 0.48 Total monthly income (RMB) 1537.52 1416.81 Monthly wage income (RMB) 1353.36 1264.84 Monthly transfer income (RMB) 56.71 287.76

Industry: Agriculture 0.01 0.10 Mining 0.02 0.14 Manufacturing 0.22 0.42 Utilities 0.03 0.18 Construction 0.03 0.17 Transportation 0.06 0.24 Information transfer, etc. 0.04 0.18 Wholesale and retail trade 0.12 0.33 Accommodation and catering 0.03 0.16 Finance 0.02 0.15 Real estate 0.04 0.19 Leasing and commercial services 0.02 0.15 Scientific research 0.03 0.18 Public facilities 0.01 0.11 Resident services 0.10 0.30 Education 0.06 0.23 Health care 0.03 0.18 Culture and entertainment 0.01 0.11 Public administration 0.10 0.30

Obs. 2002 54,564 2003 62,194 2004 65,806 2005 77,976 2006 70,853 2007 75,539 2008 76,874 All variables except Age and Income are dummy-coded. The table displays averages over the period 2002-2008. The sample is restricted to locally registered urban hukou holders aged 15-64.

39 Table A2. Descriptive statistics from the 2005 mini-census (1/2). Count Share of total Std. Dev. resident urban population

Rural migrants from another province 94,326 0.15 0.36 Rural migrants from another prefecture 122,756 0.19 0.40

Count Percent Cumulative Percent

Reason for moving

Work or business 100,670 82.01 82.01 Follow relatives 6,474 5.27 87.28 Marriage 5,783 4.71 91.99 Support from relatives/friends 4,461 3.63 95.62 Education and training 1,367 1.11 96.73 Expropriation and relocation 603 0.49 97.22 Job transfer 522 0.43 97.65 Mission 498 0.41 98.06 Recruitment 158 0.13 98.19 Deposit household registration demand 142 0.12 98.31 Other 1,956 1.59 99.90 Missing 122 0.10 100.00

Count Percent Cumulative Percent

Starting year of last migration spell

2005 25,968 21.18 21.18 2004 24,917 20.32 41.50 2003 17,893 14.59 56.09 2002 11,110 9.06 65.15 2001 7,468 6.09 71.24 2000 7,325 5.97 77.21 1999 or before 27,954 22.79 100.00 “Rural migrants” are defined as inter-prefectural migrants with an agricultural hukou aged 15-64. “Total resident urban population” refers to the population in the prefecture that is either locally registered and holds a non- agricultural hukou or resides in the prefecture but holds an agricultural hukou from another prefecture. The sample in the middle and bottom panels is restricted to inter-prefectural rural migrants.

40 Table A3. Descriptive statistics from the 2005 mini-census (2/2). Rural-urban Local urban Difference p-value migrants hukou

Age 30.22 38.54 -8.32* 0.000 Female 0.49 0.49 -0.00* 0.009 Married 0.64 0.76 -0.12* 0.000

Education: Literate 0.97 0.99 -0.02* 0.000 Primary education 0.20 0.08 0.12* 0.000 Lower secondary 0.60 0.33 0.27* 0.000 Higher secondary 0.14 0.33 -0.19* 0.000 Tertiary education 0.02 0.24 -0.22* 0.000

Unemployed 0.02 0.09 -0.07* 0.000 Self-employed/Firm-owner 0.20 0.16 0.04* 0.000 Employee 0.77 0.81 -0.04* 0.000 Employee w/o labour contract 0.48 0.29 0.18* 0.000 Public sector 0.11 0.72 -0.61* 0.000 Private sector 0.89 0.28 0.61* 0.000 Total monthly income (RMB) 961.84 1157.07 -195.24* 0.000

Industry: Agriculture 0.05 0.06 -0.01* 0.000 Mining 0.01 0.03 -0.02* 0.000 Manufacturing 0.51 0.20 0.31* 0.000 Utilities 0.00 0.03 -0.03* 0.000 Construction 0.09 0.04 0.05* 0.000 Transportation 0.03 0.08 -0.05* 0.000 Information transfer, etc. 0.00 0.01 -0.01* 0.000 Wholesale and retail trade 0.15 0.14 0.00 0.078 Accommodation and catering 0.06 0.04 0.03* 0.000 Finance 0.00 0.03 -0.03* 0.000 Real estate 0.01 0.01 -0.01* 0.000 Leasing and commercial services 0.01 0.02 -0.01* 0.000 Scientific research 0.00 0.01 -0.01* 0.000 Public facilities 0.00 0.01 -0.01* 0.000 Resident services 0.05 0.03 0.02* 0.000 Education 0.00 0.10 -0.10* 0.000 Health care 0.00 0.04 -0.04* 0.000 Culture and entertainment 0.01 0.01 -0.01* 0.000 Public administration 0.00 0.11 -0.10* 0.000 International organisations 0.00 0.00 0.00 0.200 Obs. 122,756 509,817 All variables except Age and Income are dummy-coded. Only the income of individuals who reported having a job is considered. The sample is restricted to individuals aged 15-64. * p<0.01

41 Table A4. Descriptive statistics from the NBS firm-level data. Public sector Domestic Foreign private sector private sector

Real capital stock 37539.69 20346.01 47592.38 Sales revenue 63149.08 71267.68 167520.80 Value added 18470.79 17106.11 40216.00 Total wage bill 3695.91 2938.08 6613.63 Total number of employees 340.20 216.93 318.76 All variables except “Total number of employees” are in RMB 1,000. The table displays yearly averages over the period 1998-2007.

Table A5. Correlation between crop international prices and local Chinese prices/production.

VARIABLES Prices Output

Price (International) .402*** .201** (.0861) (0.0623) Price (China) .0824* (.0432)

Observations 210 210 R-squared .579 .337 Trends Yes Yes Robust standard errors are reported between parentheses. The unit of observation is a crop×a year. The two regressions include time trends, and weighted by the average crop production share over the period 1991-2010. Dependent and the main explaining variables are in logs.

Table A6. Comparison of actual and predicted immigration rate in urban areas (robustness check without intra-province migration spells, 2000-2005). (1) (2)

Prediction - rainfall 0.889*** 0.735*** (0.324) (0.269) Prediction - price 0.547** 0.988*** (0.244) (0.275)

Observations 2,028 2,028 2,028 2,028 R-squared 0.807 0.861 0.807 0.863 Year FE No Yes No Yes Destination FE Yes Yes Yes Yes Standard errors are clustered at the destination level and are reported between parentheses. *** p<0.01, ** p<0.05, * p<0.1. An observation is a destination×year. The immigration rate is the number of agricultural hukou holders from all origin prefectures who went to a destination prefec- ture d in a given year divided by population at destination. The independent variable correspond to Mdd,t as defined in equation 5. Regressions are weighted by total urban adult population at destination.

42 Table A7. Effect of migration flows on wages and profitability using firm data – robustness check: industries linked with agriculture.

OLS 2SLS: rainfall 2SLS: price Effect of migration inflows on ... (1) (2) (3) Wages -0.162*** -1.348* -0.663** (0.0512) (0.744) (0.262) [293,385] [293,385] [293,385]

Profitability -0.133 1.256 0.718** (0.0979) (0.776) (0.305) [272,361] [272,361] [272,361]

Prefecture and Year FE Yes Yes Yes Standard errors are reported between parentheses and clustered at the prefecture×year level. The unit of observation is a firm in a given year. In the top panel, the dependent variable is the log of total wage bill divided by the number of employees. In the bottom panel, the dependent variable is the log of profits divided by revenues. See section 3 for a complete description of the price- and rainfall-related migration flows.

Table A8. Effect of migration flows on wages and profitability using firm data – robustness check: controlling for shocks in the prefecture of destination.

OLS 2SLS: rainfall 2SLS: price Effect of migration inflows on ... (1) (2) (3) Wages -0.139*** -1.264 -0.473 (0.0465) (0.979) (0.296) [326,367] [326,367] [326,367]

Profitability -0.172* 1.890 0.959** (0.101) (1.466) (0.461) [303,436] [303,436] [303,436]

Prefecture and Year FE Yes Yes Yes Standard errors are reported between parentheses and clustered at the prefecture×year level. The unit of observation is a firm in a given year. In the top panel, the dependent variable is the log of total wage bill divided by the number of employees. In the bottom panel, the dependent variable is the log of profits divided by revenues. See section 3 for a complete description of the price- and rainfall-related migration flows.

43 Table A9. Effect of migration flows on wages and profitability using firm data – robustness check: lagged shocks.

OLS 2SLS: rainfall 2SLS: price Effect of lagged migration inflows on ... (1) (2) (3) Wages -0.111*** 0.335 -0.629*** (0.0383) (0.577) (0.243) [323,730] [273,390] [273,390]

Profitability -0.234* 2.058 1.005** (0.120) (1.769) (0.409) [299,422] [252,694] [252,694]

Prefecture and Year FE Yes Yes Yes Standard errors are reported between parentheses and clustered at the prefecture×year level. The unit of observation is a firm in a given year. In the top panel, the dependent variable is the log of total wage bill divided by the number of employees. In the bottom panel, the dependent variable is the log of profits divided by revenues. See section 3 for a complete description of the price- and rainfall-related migration flows.

44 B Data description

B.1 Migration flows and census

In this section, we provide some descriptive statistics about migrants and migration flows.

Patterns of migration in the mini-census Table A2 displays the shares of rural-to-urban migrants in the total urban population of prefectures. We define rural-to-urban migrants as agricultural hukou holders who crossed a prefecture boundary and belong to working-age cohorts (15-64).41 The upper panel of Table A2 distinguishes between inter-prefectural migrants and those who left their provinces of origin. We see that inter-prefectural migrants represented 19% of a prefecture’s total number of urban residents on average in 2005, while inter-provincial migrants accounted for 15% of it, which reveals that a majority (77%) of inter-prefectural migrations imply the crossing of a provincial boundary. The middle panel presents the reasons put forward by inter-prefectural agricultural hukou migrants for leaving their places of registration. A vast majority (82%) moved away in order to seek work (“Work or business”), mostly as labourers, while all other rationales attracted much smaller shares.42 When we look at the last migration spell for these migrants (lower panel), we see that most inter-prefectural migrants (56.46%) arrived in the three years before the survey, illustrating the acceleration of migration in the early 2000s and potentially the selection bias generated by return migration.43 We now investigate the extent to which return migration and step migration affect our description of migration flows.

Return and step migration in the mini-census In this paper, we construct annual migration flows between each prefecture of origin and destination by com- bining information on the current place of residence (the destination), the place of

41Although data are not available, it is clear from the literature that rural-to-rural migration, represents a small share of outmigration from rural areas, not least because most of it is explained by marriages, which usually give right to local registration (Fan, 2008; Chan, 2012). Only 4.7% of agricultural hukou inter-prefectural migrants in the 2005 mini-census reported having left their place of registration to live with their spouses after marriage. 42The only other reasons that display shares in excess of 1% are “Education and training,” “Other,” “Live with/Seek refuge from relatives or friends,” which Fan (2008) based on metadata from the Population Census Office dubs “Migration to seek the support of relatives or friends,” “Following relatives,” which should be understood as “Family members following the job transfer of cadres and workers” (ibid.), and “Marriage”. 43Data on return migration are scarce. Chan (2012) highlights a “noticeable, though still small, but increasing amount of outmigration” from provinces that have been migration magnets since the early 2000s.

45 registration (the origin) and the year in which the migrant left her place of regis- tration. We implicitly assume that all migrants who left the origin in year Y have reached the destination that same year and stayed there. As discussed in section 2, we may underestimate migration flows in year Y if some of the migrants who left in year Y have gone back to their place of origin before the census (return migration). We may also be mistakenly assigning the arrival of a migrant to year Y if instead of directly going to destination she stopped on the way and only arrived some years later (step migration). In order to measure return and step migration, we use the information from the 2005 census about the province of residence in 2004 and 2000. Unfortunately, the census does not report the prefecture of residence in 2004 and 2000. However, as shown in Figure 4, a majority of rural to urban migrants go beyond province borders. We first consider the extent of return migration. Among all migrants from rural areas who lived in their province of registration in 2000 and who lived in another province in 2004, we compute the fraction that had returned to their province of registration by 2005. As Figure A1 shows, this share is not negligible: in a given year, between 4 and 6% of rural migrants who have left their province of registration in the last six years go back a year later. This fraction is higher for older migrants. Return migration is hence an important phenomenon, which leads us to underestimate true migration flows, and the effect of shocks on out-migration. We next study the importance of step-migration. Among all migrants who lived in their province of registration in 2000 and are living in another province in 2005, we compute the fraction that lived in yet another province in 2004. As Figure A2 shows, only a minority of migrants have changed provinces of destination in the last year. Step-migration is concentrated in the first year of migration and virtually zero thereafter. One limitation of this approach is that we cannot measure step-migration if it occurs within a province. With this caveat in mind, these results do suggest that for most migrants we correctly assign the year of arrival at destination.

Comparison of urban dwellers by hukou status The UHS data are represen- tative of urban “natives,” not of the urban population as a whole, and urban workers differ significantly depending on their hukou status. As is usual with internal migra- tion, we consider in the main specifications that migrants and “natives” are highly substitutable. However, Chinese rural-to-urban migrants tend to be younger (and thus less experienced) and less educated, which reduces their ability to compete with urbanites for the same jobs. Table A3 provides summary statistics on key characteristics of inter-prefectural

46 migrants and compares them with the locally registered urban population. It appears that migrants and natives are statistically significantly different on most accounts, the former being on average younger, less educated, more likely to be illiterate, and more often single, and employed without a labour contract. Important facts for the analysis that follows are that rural-to-urban migrants are overrepresented in privately owned enterprises and in manufacturing and construction industries: 91% of them are employed in the private sector as against 42% of locally registered non-agricultural hukou holders; and the share of rural-to-urban migrants working in manufacturing and construction is 51% and 9%, as against 20% and 4% for urban natives, respectively. Migrants also stand out as earning significantly less. The simple t test reported in Table A3 shows that migrants’ monthly income is 17% lower than urban natives’; the difference increases to about 40% when one takes into account the fact that migrants are attracted to prefectures where they can expect higher wages.44 As expected, notable differences from urban natives in the 2005 mini-census data can be spotted. This should be kept in mind when extrapolating results based on the UHS to the rest of China.

B.2 NBS data

We discuss here some issues with NBS data and how we tackle them, and provide some descriptive statistics.

B.2.1 Issues with the firm panel There are a number of issues with using the NBS data to study the effect of migration on firm growth. We now discuss these issues and explain how we take them into account while constructing our variables of interest. First, firms may have an incentive to under-report the number of workers as it serves as the basis for taxation by the local labour department. This should be a par- ticular concern with migrants, who represent a large share of the workforce and may be easier to under-report. Along the same lines, workers hired through a “labour dispatching” (laodong paiqian) company are not included in the employment vari- able.45 This implies that migrant workers are likely to be severely under-counted in the firm data. We will estimate the impact of migration inflows on firm performance without being able to observe the firm-specific increase in employment.46 44Results available upon request. 45In manufacturing SOEs, there was also a practice of reclassifying and gradually excluding laid- off workers—euphemistically, on “furlough” (xiagang)—from their accounts. Although much of this process had been completed by the start of our study period, it may still induce some decline in employment in the first couple of waves. 46Wage bill may also be slightly under-estimated as some components of worker compensation

47 Second, some variables are not documented the same way as in standard firm- level datasets. In particular, fixed assets are reported in each data wave by summing nominal values for different years. We use the procedure developed in Brandt et al. (2014) using (i) the change in nominal capital stock as a proxy for nominal fixed investment, (ii) a fixed depreciation rate at 9% and (iii) the investment deflator developed by Loren Brandt and Thomas Rawski. Following Brandt et al. (2014), if the firm’s past investments and depreciation are not available in the data, we use information on the age of the firm and estimates of the average growth rate of nominal capital stock at the 2–digit industry level between 1993 and the firm’s year of entry in the database.

Descriptive statistics from the firm panel Table A4 displays key descrip- tive statistics across public, domestic private and foreign private firm ownership over the period 1998-2007.47 Public enterprises, a broad category that encompasses state-owned and collective enterprises, have a larger capital stock, spend more on their wage bills and have more employees than domestic private firms. Conversely, the latter report significantly higher sales revenues and perform better in terms of value added. Table A4 yields a very different image of state-owned and collective enterprises when compared to the foreign private sector: Real capital stock, sales revenues, value added and the total wage bill are all higher in foreign-owned firms; only the total number of employees is higher in the public sector. Figure A8 shows the evolution of the share of private firms in the NBS sample along the same characteristics. Private firms still accounted for a relatively small share of total real capital stock, value added, sales revenues, wage bill and employ- ment in 1998 but represented over 80% of the total under all five indicators by 2007. The evolution in terms of employment is particularly striking: Whereas only 32% of total employment could be attributed to private firms in the NBS sample in 1998, they accounted for 89% of it in 2007.

B.2.2 Issues with non-stationary variables In order to estimate the effect of migration on firm growth, we use a strategy which accounts for the non-stationarity of firm size and thus most variables characterizing firm output or factor use. In are not recorded in all years, e.g. pension contributions and housing subsidies, which are reported only since 2003 and 2004, respectively but accounted for only 3.5% of total worker compensation in 2007. 47Ownership type is defined based on official registration (qiye dengji zhuce leixing). Out of 23 exhaustive categories, Table A4 uses three categories: (i) state-owned, hybrid or collective, (ii) domestic private, and (iii) foreign private firms, including those from Kong, Macau, and Taiwan.

48 order to illustrate our approach, consider a certain firm j located in city d and using j,d d a bundle of input Ht in order to produce a numeraire good. Let Wt denote the j,d unit cost of input, and At the firm-specific productivity. The firm maximization problem is:

n j,d j,d α d j,do max At (Ht ) − W Ht , j,d t Ht which generates the following input demand schedule (in which lower case letters are the logarithm of variables):

ln(α) aj,d wd hj,d = + t − t . t 1 − α 1 − α 1 − α | {z } | {z } firm-specific growth process factor shock

As a consequence, the stationarity of firm demand (and the subsequent firm out- comes) depends on the stationarity of the firm-specific technological process (Evans, 1987). For instance, under Gibrat’s law, firm i would grow at a certain given growth j,d j,d j,d j,d rate νi and at+1 = at + νi + εt+1 where εt+1 is the innovation. In such case, it is important to take the difference in the previous equation in order to have the firm-specific growth component as a “fixed effect”:

ν ∆ wd ∆ hj,d = i + t,t−1 t + εj,d. t,t−1 t 1 − α 1 − α t

We base our empirical strategy on this assumption of constant firm-specific growth rate, and consider first-differences so as to keep stationary variables on both sides. We then estimate the impact of migration on firm growth for firm j in destination d at time t by regressing each firm outcome, which we denote yj,d,t, on predicted migration, and time and firm fixed effects. ( ∆t,t−1Md,t = b0 + bm∆t,t−1Mgd,t + bzZi + ed + nt + ed,t , (8) ∆t,t−1yj,t = β0 + βm∆t,t−1Mdd,t + δt + πj + εj,t where standard errors are clustered at the level of the prefecture of destination×year.

49