2017 PAA Annual Meeting Mengyu LIU

The Causal Effect of Dialect Skill on Migrants’ Residence: Evidence from Contemporary China

Mengyu LIU Division of Social Science, The Hong Kong University of Science and Technology M. Phil. 852-52227406 Division of Social Science, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong SAR. [email protected]

1

2017 PAA Annual Meeting Mengyu LIU

The Causal Effect of Dialect Skill on Migrants’ Residence: Evidence from Contemporary China1 Mengyu LIU

Abstract: This paper discusses the relations between dialect skill and migrants’ residence. The

dialect is a proxy for culture, and migrants’ residence is related to their social integration. My

focus is whether migrants with better dialect skill are more likely to live in the local community.

The main data set analyzed is CLDS2012, and both probit and OLS estimation methods are

utilized. Results show that migrants who speak local dialects well tend to live in the community

which mainly consists of local residents. Holding other factors constant, a level of dialect skill

better, the probability of living in the local community increases by 2.7 percent, and the

percentage of local residents in the community increases by 3.8 percent. To solve the

endogeneity issues, I employ the dialect distance as an instrumental variable for dialect skill. IV

estimates confirm the causal relations between dialect skill and migrants’ residence. The channel

of communication barrier is also ruled out by considering migrants’ Putonghua skill. Based on

rigorous empirical evidence, this study reveals the effect of cultural adaption on migrants’ social

integration.

1. Introduction

There has been a great body of research on migrants’ residential issues. Scholars have analyzed

migrants’ residence from institutional, economic, cultural, and social factors (Abu-Lughod 1961;

He & Yang 2013; Wang, Zhang & Wu 2016). Although the cultural factor is always mentioned,

1 Thanks for the comments from Prof. Jane Zhang. Prof. Xiaogang Wu provided the data set of CLDS2012 with county code, and Prof William Lavely shared the Chinese dialects data set with me. I really appreciate their kindness.

2

2017 PAA Annual Meeting Mengyu LIU

it lacks empirical evidence, let alone tested with causal inference. Using the dialect skill as a

proxy for culture and the dialect distance as the instrumental variable, this paper aims to provide

a causal analysis on the relations between migrants’ dialect skill and their residence. My

interested question is: do migrants with better dialect skill tend to live in the local community?

Based on the data set of 2012 China Labor-force Dynamics Survey (CLDS2012), I analyze

migrants who have leaved their hukou origins for more than a half year. Migrants’ residence I

discuss here is whether migrants live in the local community. There are two measures of it. One is whether the community mainly consists of local residents; the other is the percentage of local residents in the community. Estimates from probit show that there are positive relations between migrants’ dialect skill and their probability of living in the local community. More specifically, controlling other variables, the marginal effect of dialect skill on migrants’ residence is 0.027.

Results from OLS are similar with those from probit. A level of better dialect skill leads to the percentage of local residents in the community increases by 3.8 percent.

Estimates from probit and OLS may suffer from endogenous problems. Migrants who have a strong intention to integrate into their destinations tend to learn the local dialect and live in the local community at the same time. Reverse causality also exists. I employ the dialect distance as an instrumental variable for dialect skill to solve these problems. The dialect distance measures to what extent two dialects are similar. It is correlated with migrants’ dialect skill, but uncorrelated with other individual characteristics. IV estimates are consistent with what we get from probit and OLS. Migrants with better dialect skill tend to live in the local community. The causal effect of dialect skill on migrants’ residence is confirmed.

I suppose the effect of dialect skill comes from the cultural adaption. To rule out the possibility of communication barrier, I interact migrants’ dialect skill and their Putonghua skill.

3

2017 PAA Annual Meeting Mengyu LIU

The coefficient of the interaction term is not statistically significant. I also restrict the sample to

migrants with Putonghua skill, and find the positive relations between dialect skill and possibility

of living in the local community still exist. These results suggest that the effect of dialect is not

due to the communication.

This paper empirically investigates the existence of cultural effect. Previous studies usually

talk about culture but lack empirical evidence, because it is very difficult to measure it. This

article uses the dialect as a proxy for culture. More importantly, by employing the dialect

distance as an instrumental variable for dialect skill, it reveals the causal relations. This study sheds light on our understanding about cultural adaption.

The remaining of the article is organized as follows. Section 2 is background and section 3 introduces data. Estimation strategy is in section 4, followed by empirical evidence in section 5.

Section 6 concludes.

2. Background

2.1 Migrants’ residence in China

One popular term about migrants’ residence is residential segregation (Lieberson & Carter

1982). It measures to what extent some social groups with specific characteristics live together.

Residential segregation is an important area in social stratification research (Duncan & Duncan

1955). Evenness, exposure, concentration, centralization, and clustering are five dimensions to

measure it (Massey & Denton 1988).

Since the reform and opening up in late 1970s, China has witnessed a large scale of internal

migration (Liang 2016; Liang & Ma 2004). In the end of 2014, there were more than 250 million

migrants (NHFPC 2015). These migrants leave their hukou origins and float to cities, so they are

4

2017 PAA Annual Meeting Mengyu LIU also called “floating population” (Ma, Duan, & Guo 2014). In destinations, many migrants live together and behave differently with local residents in social and cultural aspects. As a consequence, the floating population community, which is also called villages within cities

(Yang & Zhu 2016), and the “dual community” (Zhou 2000; Zhou & Tian 2016) appear in urban areas.

Recent research find that residential segregation based on migration status emerge in cities

(Yang & Zhu 2016), especially in metropolises (Chen & Hao 2014; Liang 2015). For example, in

Shanghai, one of the biggest cities in China, residential segregation based on hukou has been very worse (Chen & Hao 2014).

Migrants’ residence is closely related to their social integration (He & Yang 2013; Yang

2015). Living in the local community provides migrants more opportunities to contact with local residents. If migrants themselves tend to live together, the interaction between migrants and local residents will be fewer. In China’s context, migrants do not have local hukou, and they cannot enjoy most of the social welfare. Living conditions of the “floating community” are worse than local communities.

Scholars have analyzed migrants’ residence from institutional (Chen & Hao 2014; He & Yang

2013), economic (Yang & Zhu 2016), social (Wang, Zhang & Wu 2016; Wu & Logan 2015), and cultural (Zhou 2000; Zhou & Tian 2016) factors. Hukou system is the largest institutional barrier for migrants’ residence. Since migrants do not hold the local hukou, much social welfare in destinations has nothing to do with them. The housing prices of local community are usually higher than the “floating community”. Only migrants affordable for rents chose to live in the local community. Social networks also affect migrants’ residence. The community where they

5

2017 PAA Annual Meeting Mengyu LIU live can provide them social support. In terms of culture, migrants and local residents have different cultural norms. There is a cultural distance between migrants and local residents.

Many empirical studies have analyzed the influences of institutional, economic and social factors on migrants’ residence. Except for some ethnographies, cultural factor is rarely tested, because it is difficult to measure it in empirical research.

2.2 Dialect

The dialect can be used as a proxy for culture. Differences in dialects are related to different culture (Bai & Kung 2014). One dialect has specific regional characteristics. In other words, a dialect is linked with some cultural norms shared by people who speak it. So the dialect is related to the identity.

Some social scientists have tried to use the dialect to measure culture. Xu, Liu, & Xiao (2015) find that the diverse dialects have a negative effect on economic growth, the mechanism of which is diverse dialects block the diffusion of technology. The dialect also matters in migration

(Liu, Xu, & Xiao 2015). There is a reverse U-shape between the dialect distance and migration probability. Migrants with high dialect fluency earn higher income (Chen, Lu, & Xu 2014), because people expose their identity through dialect. In a study of relations between migration and trust, Zheng (2013) use the dialect distance as an instrumental variable for social contact between migrants and local residents.

Similar to previous research, this study regards dialect as a proxy for culture and aims to investigate the causal relations between dialect skill and migrants’ residence in contemporary

China.

6

2017 PAA Annual Meeting Mengyu LIU

3. Data

The main data set analyzed in this paper is 2012 China Labor-force Dynamics Survey

(CLDS2012), which was collected by Center for Social Survey of Sun Yat-sen University in

2012. CLDS is designed as a nationally representative longitudinal survey in both urban and rural China (excluding Hong Kong, Macau, Taiwan, , and Hainan), aiming to collect information about individuals, households and communities. CLDS utilizes multistage cluster, stratified, PPS sampling methods and interviews labor force aged 15 to 64 in the household. In

CLDS2012, 303 communities, 10,612 households, and 16,253 individuals were successfully interviewed2. CLDS defines migrants as those who have left their hukou origins for more than a half year. According to this definition, we get 1,476 migrants.

The Migrants’ residence is defined as whether the community where migrants live is local.

The boundary of the community is the administrative region of the Neighborhood Committee (居

委会) or the Village Committee (村委会). The first measure of local community is whether the community mainly consists of local people, which is reported by the interviewee. This measure of local community may be inaccurate. So I employ the other measure of local community, which is the percentage of local residents living in the community, as a double check. In the community questionnaire, the community leader reports numbers of total residents and migrants within the community. I calculate the percentage of local residents by dividing the number of them by that of total residents.

The key independent variable is the dialect skill, which is the proficiency of migrants’ local dialect in their destinations. It is measured in 5 levels: 1. do not understand at all; 2. can partly

2 For more information, see css.sysu.edu.cn.

7

2017 PAA Annual Meeting Mengyu LIU understand when listening to it in everyday life and work; 3. can barely understand when listening to it in everyday life and work; 4. can listen well and speak a little in everyday life and work; and 5. can communicate in the dialect fluently in everyday life and work. I treat it as continuous in this study.

Other control variables include age, gender, hukou type (rural or urban), years of education,

Chinese Communist Party (CCP) membership, occupation (leaders, professionals, clerks, service workers, agricultural workers, industrial workers, informal labor, others, and missing), house type (self-owning, renting, other types), Putonghua skill (cannot, not fluently, fluently with accent, very fluently), and community type (urban or rural). After dropping missing values, we still have 1,068 migrants in our sample. Descriptive statistics are summarized in Table 1.

[Table 1 about here]

The instrumental variable – the dialect distance – is generated from the Language Atlas of

China, which was edited by the Chinese Academy of Social Sciences and the Australian

Academy of the Humanities and was published in 1987. This book is the most authoritative document mapping the layouts of more than one hundred Chinese dialects. According to the

Language Atlas of China, Chinese dialects (Sinitic stock) are divided into ten super groups. Each super group consists of some groups, which are further classified by sub-groups. In other words,

Chinese dialects are stratified into three levels – super group, group, and sub-group. Detailed classification of Chinese dialects are reported in the appendix.

Based on the work of William Lavely (2000), I generate a data set containing detailed dialects information in 2,660 Chinese counties. After merging it with CLDS2012, I get the dialects in migrants’ hukou origins and their destinations. The dialect distance is to what extent the dialect

8

2017 PAA Annual Meeting Mengyu LIU

in origins is different from that in destinations. Similar to Liu, Xu & Xiao’s work (2015), the

basic coding scheme of the dialect distance is: if two dialects are same, I code it as 0; if two

dialects belong to the same group but different sub-groups, the dialect distance is 1; if they

belong to the same super group but different groups, it is coded as 2; and the dialect distance

between two super groups is 3. So 0, 1, 2, and 3 are different dialect distances between migrants’

hukou origins and their destinations.

4. Estimation Strategy

This study aims to estimate the effect of dialect skill on migrants’ residence. One measure of

migrants’ residence is whether or not the community migrants live mainly consists of local

people. Since the dependent variable is binary, I employ the probit model:

Pr (���������=1) = Φ (� + �Dialectskill + ��) (1)

where ��������� equals to 1 if migrant i lives in the community which mainly consists of local residents. Dialectskill is our key independent variable, which is the dialect skill of migrant i. � is our interested coefficient, and we expect it to be positive, meaning that migrants with better language skill are more likely to live in the community mainly consisting local residents. � is a set of control variables and � is the constant.

The other measure of migrants’ residence is the percentage of local residents in the community. In this setting, the dependent variable is continuous so we can use OLS estimation method:

���������� = � + �Dialectskill + �� + ε (2)

9

2017 PAA Annual Meeting Mengyu LIU

where ���������� is the percentage of local residents in the community where migrant i lives.

Independent variables are the same as those in equation (1), and ε is the error term. In equation

(2), we expect � to be negative, meaning that migrants with better dialect skill more possibly

live in the community consisting of more local residents.

However, estimates from probit and OLS may be biased. One big issue is related to omitted

variables. Migrants who expect to integrate into the local society are likely to learn its dialect,

and they also tend to live in the local community at the same time. Also, migrants who are more

capable can learn local dialects quickly, and they also tend to live in the local community for

better social integration. Another concern is reverse causality. Living in the local community can

help migrants acquire better local dialects. In this case, it is not that the better dialect skill makes

migrants live in the local community, but that living in the local community makes migrants

acquire better dialect skill. Measurement error in migrants’ dialect skill may also exist.

Introducing an instrumental variable can be a good solution to deal with these problems. We

need to find a variable that is only correlated with the dialect skill but uncorrelated with other

variables. The dialect distance can be such an instrument for the dialect skill. Migrants tend to

have better dialect skill if the dialect distance between hukou origins and destinations is smaller,

and this dialect distance has nothing to do with other factors.

5. Empirical Results

5.1. Baseline estimates

Table 1 reports the baseline estimates of effects of dialect skill on migrants’ residence. In column

1, 2 and 3, the dependent variable is whether the community mainly consists of local residents.

Comparatively, the dependent variable in column 4, 5 and 6 is the percentage of local residents

10

2017 PAA Annual Meeting Mengyu LIU in the community. In column 1 and 4, I do not include any control variables. Age, gender, hukou type, CCP membership, years of education, and occupation are controlled in column 2 and 4, and

Putonghua skill, house type, and urban community are further added in column 3 and 6.

[Table 2 about here]

Baseline estimates show that migrants with better dialect skill are more likely to live in the local community. Without any controls, the coefficient of dialect skill from probit is 0.143, which is statistically significant at 0.01 level, and the marginal effect of it is 0.053 (column 1). It means that a level increase in dialect skill is related to 5.3 percent increase in the probability of living in the local community. Also, the coefficient of dialect skill from OLS is 0.062 (column

4), which means that if dialect skill increases by one level, the percentage of local residents in the community increases by 6.2 percent.

After including control variables, the coefficients of dialect skill in probit and OLS are still positive and statistically significant. Holding all other factors constant, the marginal effect of dialect skill on migrants’ residence is 0.027 (column 3), and a level increase in dialect is related to 3.8 percent increase in percentage of local residents in the community.

Although the baseline results suggest that migrants with better dialect skill are more likely to live in the local community, these estimates may be biased and suffer from endogenous issues.

Migrants’ intention of social integration and ability can affect their dialect skill and residence simultaneously, and living in the local community may help migrants learn local dialect. To solve these problems, I employ an instrumental variable in the next step.

5.2. IV estimates

11

2017 PAA Annual Meeting Mengyu LIU

In this section, I introduce the dialect distance as the instrumental variable for dialect skill. The dialect distance measures to what extent the dialect in migrants’ hukou origins is different from that in their destinations. If the dialect distance is smaller, it means that two dialects are more similar, and it is more likely for migrants to acquire better dialect skill in their destinations. Since the dialect distance is a macro-level factor, it is not affected by individual intention or ability, and it has nothing to do with other personal characteristics.

Table 3 shows the IV estimates of effects of dialect skill on migrants’ residence. The first measure of local community is binary, so I utilize IV-probit estimation and the result is reported in column 1. Column 2 reports the estimate of normal IV regression from 2SLS. In the first stage, the dialect distance is negatively associated with the dialect skill, meaning that migrants with smaller dialect distance tend to have better dialect skill.

[Table 3 about here]

In the second stage, controlling other factors, the estimated coefficient of dialect skill from IV- probit is 0.213, 2.6 times larger than that from probit, and it is statistically significant at 0.01 level. The marginal effect of dialect skill on migrants’ residence is 0.069, which means that one level dialect skill better, the probability of living in the local community increases by 6.9 percent.

Similarly, the result from 2SLS also finds significantly positive relation between the dialect skill and migrants’ residence. Holding other factors constant, the coefficient of dialect skill is 0.113 and statistically significant at 0.01 level. It means that one level increase in dialect skill causes

1.3 percent increase in the percentage of local residents in the community. These results from IV estimation confirm that there is causal relations between dialect skill and migrants’ residence.

Migrants with better dialect skill are more likely to live in the local community.

12

2017 PAA Annual Meeting Mengyu LIU

The single measure of the dialect distance may be problematic. So I try different methods to measure it. Results are reported in Table 4. Following Liu, Xu, & Xiao’s approach (2015), in column 1 and 4, I change the coding scheme of the dialect distance from 1-2-3 to 1-10-100. In the first stage, the dialect distance is negatively associated with the dialect distance. In the second stage, both coefficients of the dialect skill from IV-probit and 2SLS are positive, 0.257 and 0.132 respectively, and statistically significant at 0.01 level. The marginal effect of dialect skill on migrants’ residence is 0.082, meaning that a level of migrants’ dialect skill better, the probability of living in the local community increases by 8.2 percent. And if migrants’ dialect skill increase by one level, the percentage of local residents in the community increases by 13.2 percent. The better dialect skill causes migrants to live in the local community.

[Table 4 about here]

Another measure of the dialect distance is using the exponent form (Zheng 2013). The dialect distance is coded as 2, 2, and 2, in other words, 2, 4 and 8. Column 2 and 4 in Table 4 show the results. The dialect distance still has negative relations with the dialect skill, and migrants with better dialect skill are more likely to live in the local community. The marginal effect of dialect skill on migrants’ residence is 0.072, and a level of better dialect skill leads to 11.8 percent increase in the percentage of local residents living in the community.

There is a concern that the original dialect people can speak is not the dialect in their hukou places, but that in their birth places. Because people may migrate from their birth places to present hukou places. When calculating the dialect distance, we can use the distance between migrants’ birth places and their destinations. Column 3 and 6 in Table 4 report the estimates using this measure of dialect distance, which are very similar with what we got before. If the dialect distance between migrants’ birth places and their destinations are smaller, migrants tend

13

2017 PAA Annual Meeting Mengyu LIU to have better dialect skills in their destinations. And migrants with better dialect skills are more likely to live in the local community.

5.3. Does Putonghua skill matter?

In the above parts, we have confirmed that there are causal relations between dialect skill and migrants’ residence using IV estimation. Migrants with better dialect skill are more likely to live in the local community. Since we use dialect as a proxy for culture, their relations suggest that the culture effect exists when migrants choose their living communities in destinations, and cultural adaption is important in the process of migrants’ social integration.

But there is a competing argument that the dialect is related to communication. If migrants cannot speak the dialect in their destinations, it is hard for them to communicate with local people, which further leads them not to live in the local community. We can test this argument by interacting migrants’ dialect skill with their Putonghua skill. Putonghua is Chinese national language and has been popularized in the whole China. If migrants can speak Putonghua, we assume that there does not exist communication barrier between migrants and local residents.

Table 5 reports the results. Column 1 and 4 are baseline results and column 2 and 5 add the interaction term between dialect skill and Putonghua skill. We can find that both coefficients of the interaction terms are not statistically significant. It means that there is no heterogeneous effect of dialect skill in migrants with or without Putonghua skill. In other words, the effect of dialect skill is not due to communication barrier.

[Table 5 about here]

To provide more rigorous evidence, in column 3 and 6, I restrict the sample to migrants with

Putonghua skill (fluently with accent or very fluently). For those migrants, they do not have

14

2017 PAA Annual Meeting Mengyu LIU

communication barrier with local residents. Results are consistent. The coefficient of dialect skill

in probit is 0.074, which is statistically significant at 0.05 level, and the marginal effect is 0.024.

The coefficient of dialect skill in OLS is 0.038, same with that in the whole sample.

6. Conclusions

This paper analyzes the causal effect of dialect skill on migrants’ residence. Our interested

question is: are migrants with better dialect skill more likely to live in the local community?

Estimates from probit show that, controlling other factors, migrants with better dialect skill tend

to live in the community which mainly consists of local residents, i.e., the local community. A

level of dialect skill higher leads to 2.7 percent increase in the probability of living in the local

community. OLS estimates are consistent with what we get in probit. If migrants’ dialect skill

increases by a level, the percentage of local residents in the community will increase by 3.8

percent. These results suggest that migrants’ dialect skill has positive relations with their

probability of living in the local community.

To solve the endogenous issues, I employ the dialect distance as the instrumental variable for

dialect skill. IV estimates confirm that migrants with better dialect skill are more likely to live in

the local community. I also try different methods to measure the dialect distance, and results are consistent.

I suppose the dialect is a proxy for culture. To remove the possibility of communication barrier, I interact migrants’ dialect skill with their Putonghua skill and restrict the sample to migrants with Putonghua skill. Results show that communication barrier does not exist. The effect of dialect skill is due to cultural adaption.

15

2017 PAA Annual Meeting Mengyu LIU

This study helps us better understand migrants’ residence from the perspective of culture. It

provides valid evidence to support that the culture adaption does matter in migrants’ social

integration.

References

Abu-Lughod, J. (1961). Migrant adjustment to city life: the Egyptian case. American Journal of

Sociology, 67(1), 22-32.

Bai, Y., & Kung, J. (2014). Does genetic distance have a barrier effect on technology diffusion?

Evidence from historical China. Working paper.

Chen, J., & Hao, Q. (2014). Residential segregation under rapid urbanization in China: evidence from Shanghai. Academic monthly, 46(5), 17-28.

Chen, Z., Lu, M., & Xu, L. (2014). Returns to dialect: Identity exposure through language in the

Chinese labor market. China Economic Review, 30, 27-43.

Duncan, O. D., & Duncan, B. (1955). Residential distribution and occupational stratification. American journal of sociology, 60(5), 493-503.

He, Z., & Yang, J. (2013). Settling down or lodging in cities? A comparative study of living

conditions among internal migrants in China. Population research, 37(6), 17-34.

Lavely, W. (2000). Coding scheme for the Language Atlas of China.

https://csde.washington.edu/downloads/01-07.pdf.

Liang, H. (2015). Residential segregation under the dual labor market: an empirical research

from Shanghai. Shandong social sciences, (8).

16

2017 PAA Annual Meeting Mengyu LIU

Liang, Z. (2016). China’s great migration and the prospects for an integrated society. Annual review of sociology, 42, 451-471.

Liang, Z., & Ma, Z. (2004). China's floating population: new evidence from the 2000 census. Population and development review, 30(3), 467-488.

Lieberson, S., & Carter, D. K. (1982). Temporal changes and urban differences in residential segregation: a reconsideration. American Journal of Sociology, 88(2), 296-310.

Liu, Y., Xu, X., & Xiao, Z. (2015). The pattern of labor cross-dialects migration. Economic research journal, (10).

Ma, X., Duan, C., & Guo, J. (2014). A comparative study on four types of floating population.

Chinese journal of population science, (5).

Massey, D. S., & Denton, N. A. (1988). The dimensions of residential segregation. Social forces, 67(2), 281-315.

National Health and Family Planning Commission. (2015). 2015 China floating population development report. www.nhfpc.gov.cn.

Wang, Z., Zhang, F., & Wu, F. (2016). Intergroup neighbouring in urban China: implications for the social integration of migrants. Urban Studies, 53(4), 651-668.

Wu, F., & Logan, J. (2015). Do rural migrants ‘float’in urban China? Neighbouring and neighbourhood sentiment in Beijing. Urban Studies, 0042098015598745.

Xu, X., Liu, Y., & Xiao, Z. (2015). Dialect and economic growth. China journal of economics,

2(2), 1-32.

17

2017 PAA Annual Meeting Mengyu LIU

Yang, J. (2015). Research on the assimilation of the floating population in China. Chinese social sciences, (2), 61-79.

Yang, J., & Zhu, G. (2016). Together but separately: a study of residential segregation between floating population and local residents. Shandong social sciences, (1), 78-89.

Zheng, B. (2013). Three essays on migration in China. HKUST PhD Thesis.

Zhou, D. (2000). Migrant labor and the “dual community” in the Pearl River Delta. Journal of

Sun Yat-sen University (social science edition), 40(2), 107-112.

Zhou, D., & Tian, X. (2016). The “dual community” and urban residential space. Shandong social sciences, (1), 90-95.

18

2017 PAA Annual Meeting Mengyu LIU

Table 1. Descriptive Statistics

Local community Percentage of local residents Total Dialect skill cannot 53.1% 51.3% 160 listen a little 53.9% 47.3% 128 listen 54.6% 55.3% 108 listen and speak a little 58.7% 59.9% 259 very fluently 74.1% 73.4% 413 Age (mean) 36.4 35.3 Gender female 66.5% 66.1% 517 male 59.4% 57.9% 551 Hukou rural 61.3% 61.1% 858 urban 69.1% 65.3% 210 Years of education (mean) 8.4 8.5 CCP member yes 71.1% 66.9% 45 no 62.5% 61.7% 1023 Occupation leaders 57.1% 53.4% 28 professional 56.4% 56.1% 195 clerks 51.2% 47.1% 86 service workers 68.4% 68.1% 209 agricultural workers 68.9% 79.0% 61 industrial workers 55.6% 50.7% 171 informal labor 59.4% 56.2% 32 others 75.0% 71.9% 68 missing 69.3% 69.7% 218 House type self-owning 76.1% 76.5% 263 renting 58.7% 57.9% 673 others 57.6% 53.2% 132 Putonghua skill cannot 62.5% 74.8% 72 not fluently 64.1% 70.3% 92 fluently with accent 59.7% 58.8% 494 very fluently 66.3% 61.4% 410 Community type rural 80.8% 72.4% 390 urban 52.5% 55.8% 678 Total 62.8% 61.9% 1,068

19

2017 PAA Annual Meeting Mengyu LIU

Table 2: Estimated Effects of Dialect Skill on Migrants’ Residence from Probit and OLS

Dependent variable Community is local Percentages of local residents Probit Probit Probit OLS OLS OLS Regression model (1) (2) (3) (4) (5) (6) Dialect skill 0.143*** 0.121*** 0.080*** 0.062*** 0.050*** 0.038*** (-0.027) (-0.028) (-0.03) (-0.006) (-0.006) (-0.006) [0.053] [0.044] [0.027] Age Yes Yes Yes Yes Male Yes Yes Yes Yes Urban hukou Yes Yes Yes Yes CCP member Yes Yes Yes Yes Years of education Yes Yes Yes Yes Occupation Yes Yes Yes Yes Putonghua skill Yes Yes House type Yes Yes Urban community Yes Yes Constant -0.182* 0.478** 0.965*** 0.605*** 0.491*** 0.296*** (-0.102) (-0.241) (-0.359) (0.025) (0.053) (0.073) Pseudo R-squared / R-squared 0.021 0.041 0.120 0.089 0.169 0.244 Observations 1,068 1,068 1,068 1,068 1,068 1,068 Note: Robust standard errors in round brackets; marginal effects from probit models in square brackets. *** p<0.01, ** p<0.05, * p<0.1.

20

2017 PAA Annual Meeting Mengyu LIU

Table 3: IV Estimates of Effects of Dialect Skill on Migrants’ Residence

Dependent variable Community is local Percentages of local residents IV-Probit IV Regression Regression model (1) (2) Dialect skill 0.213*** 0.113*** (-0.066) (0.014) [0.069] First Stage First Stage Dialect distance -0.521*** -0.521*** (-0.03) (-0.03) Controls Yes Yes Log pseudo-likelihood -2339.071 R-squared 0.135 Observations 1,068 1,068 Note: Robust standard errors in round brackets; marginal effects from probit models in square brackets. Controls include age, male, urban hukou, CCP member, years of education, occupation, Putonghua skill, house type, and urban community. *** p<0.01, ** p<0.05, * p<0.1.

21

2017 PAA Annual Meeting Mengyu LIU

Table 4: Different IV Estimates of Effects of Dialect Skill on Migrants’ Residence

Dependent variable Community is local Percentages of local residents IV-Probit IV-Probit IV-Probit IV Regression IV Regression IV Regression Regression model (1) (2) (3) (4) (5) (6) Dialect skill 0.257*** 0.223*** 0.223*** 0.132*** 0.118*** 0.114*** (0.059) (0.063) (0.063) (0.014) (0.014) (0.014) [0.082] [0.072] [0.072] First Stage First Stage Dialect distance (1-10-100) -0.014*** -0.014*** (0.001) (0.001) Dialect distance (2-4-8) -0.198*** -0.198*** (0.011) (0.011) Dialect distance (using birth place) -0.543*** -0.543*** (0.031) (0.031) Controls Yes Yes Yes Yes Yes Yes Log pseudo-likelihood -2313.045 -2324.098 -2322.301 R-squared 0.076 0.121 0.135 Observations 1,068 1,068 1,068 1,068 1,068 1,068 Note: Robust standard errors in round brackets; marginal effects from probit models in square brackets. Controls include age, male, urban hukou, CCP member, years of education, occupation, Putonghua skill, house type, and urban community. *** p<0.01, ** p<0.05, * p<0.1.

22

2017 PAA Annual Meeting Mengyu LIU

Table 5: Estimates of Interactive Effects between Dialect Skill and Putonghua Skill on Migrants’ Residence

Dependent variable Community is local Percentages of local residents Probit Probit Probit OLS OLS OLS Regression model (1) (2) (3) (4) (5) (6) Dialect skill 0.080*** -0.011 0.074** 0.038*** 0.002 0.038*** (0.030) (0.119) (0.032) (0.006) (0.024) (0.007) [0.027] [-0.004] [0.024] Putonghua skill 0.148** 0.038 0.008 -0.036 (0.058) (0.148) (0.012) (0.031) [0.049] [0.013] Dialect skill*Putonghua skill 0.028 -0.011 (0.035) (0.007) [0.009] Other controls Yes Yes Yes Yes Yes Yes Pseudo R-squared / R-squared 0.12 0.12 0.13 0.244 0.246 0.248 Observations 1,068 1,068 904 1,068 1,068 904 Note: Column 1, 2, 4, and 5 use the whole sample; column 3 and 6 use the restricted sample keeping migrants with Putonghua skill (fluently with accent or very fluently). Robust standard errors in round brackets; marginal effects from probit models in square brackets. Other controls include age, male, urban hukou, CCP member, years of education, occupation, house type, and urban community. *** p<0.01, ** p<0.05, * p<0.1.

23

2017 PAA Annual Meeting Mengyu LIU

Appendix A1. Chinese Dialects Super group Group Sub-group Jishen Habu Heisong Jingshi Huaicheng Beijing Mandarin Chaofeng Shike Baotang Jilu Mandarin Shiji Canghui Qingzhou Denglian Gaihuan Zhengcao Cailu Luoxu Xinbeng Zhongyuan Mandarin Fenhe Guanzhong Mandarin Qinlong Longzhong Nanjiang Jincheng Yinwu Lanyin Mandarin Hexi Hami Beijiang Dianxi Qianbei Kungui Guanchi Ebei Xinan Mandarin Wutian Cinjiang Diannan Xiangnan Guiliu Changhe

24

2017 PAA Annual Meeting Mengyu LIU

Hongcho Jianghuai Mandarin Tairu Huangxiao Unclassified Mandarin Jingzhan Jishe Hui Hui Xiuyi Qide Yanzhou Taihu Taizhou Oujiang Wu Wu Wuzhou Chuqu Xuanzhou Bingzhou Lvliang Shangdang Wutai Jin Jin Dabao Zhanghu Hanxin Zhiyan Changyi Xiang Xiang Loushao Jixu Changjing Yiliu Jicha Fuguang Gan Gan Yingge Datong Leizi Dongsui Huaiyue Quanzhang Datian Minnan Chaoshan Leizhou Min Puxian Houguan Mindong Funing Minbei

25

2017 PAA Annual Meeting Mengyu LIU

Minzhong Fucheng Wenchang Qiongwen Wanning Yaxian Changgan Shaojiang Guangfu Siyi Gaoyang Cantonese Gelou Wuhua Yongxun Qinlian Yuetai Yuezhong Huizhou Yuebei Hakka Hakka Tingzhou Ninglong Yugui Tonggu Residual dialects

26