<<

Low in : How Credible are Recent Census Data Yang Fan Renmin University of China [email protected]

Introduction According to the results of censuses conducted by China’s State Statistics Bureau in 2000 and 2010, the total fertility rate (TFR) in China was 1.22 in 2000, and it was down to 1.18 in 2010. The rate was highly suspect because it was lower than those of the more developed countries in East Asia in 2010, such as Japan (1.4) and (1.2), both of which have much higher level of economic development and urbanization than China (Population Reference Bureau, 2010). The birth and fertility results of recent censuses have activated the discussions of data quality of these censuses and the real fertility level in China, which have been controversial issues for decades. So it is necessary to figure out how credible are recent census data, especially the fertility data, which are the fundamental index of population and one of the important factors closely related to the public policies. This paper focuses on assessing the fertility data quality in China’s recent censuses and method of improving it.

Reviews Great and persistent efforts have been made to evaluate the data quality of censuses in China since 1990s. In the beginning of the discussion, a controversial issue has been whether the extremely low fertility in China is the actual situation or caused by the underreporting. On the one hand, some argued that there was not enough evidence proving that in the 1990s that there was missing reporting (Zhang Guangyu, Yuan Xin, 2004). From this perspective, they thought the low fertility rate in the census reflected actual fertility level and tendency in 1990s. On the other hand, however, others argued that there must be underreporting in the census and used many methods to prove and estimate it (Cui hongyan, Zhang Weiming, 2002; Yu Xuejun, 2002). But even those who in favor with the latter point couldn’t agree with each other about the extent of the underreporting, with the estimated TFR in 2000 varying from 1.2-2.3 according to different researches (Ren Qiang, 2005). Scholars always put their hopes on the “next” census, assuming the data quality of “next” census would be better and verify their theories, which would end the discussion for good. To their disappointment, the TFR based on Census 2010 was still very low, which confirmed some of them in the extremely low fertility in China, while the others becoming more suspicious of the data quality of census. In sum, gaps between the researchers were extended instead of narrowed. My own view is that consistent results (low TFRs) in recent censuses didn’t confirm the truth that the data quality of census in China is good since there may be some systematic missing reports and bias, considering that the motivation and possibility of underreporting at least show no sign of reduction, compared with 20 years ago. Data from other resources, which are independent of census system, should be used to assess the data quality of census.

Data and Methods Three different kinds of demographic data in China will be used in this paper: 1.Census data collected by State Statistical Bureau in 2000 and 2010; 2. the latest household registration data collected by the Ministry of Public Security; and 3. the latest educational statistics of students in primary school collected by the Ministry of Education. Firstly, the original age and sex specific data of the three sources are used to calculate the number of births of each year from 1990 to 2010, with the technique of life table and survive rate. Second, comparison of the amount of the same birth cohort will be made among the three categories of data to study the consistence and contradiction of them. The household registration data and educational statistical data are used to research whether there was underreporting in the census data. Then, we will investigate and interview staffs at the basic level who are responsible for collecting the three kinds of data to access the quality of data collection process and figure out the advantages and disadvantages of each data. Finally, we use both the household registration data and educational statistical data as amendments to improve the quality of census data and give the estimation of the real TFR in China from 2000 to 2010.

Elementary Findings

The Problem of Census Data Firstly, there was underreporting in census data, especially in the low-age groups. We compared the numbers of people of the same birth cohort at different census time points and found out that the number of people in the same birth cohort increased instead of decreasing ten years later. As we can see from Table 1, the number of persons in 0 age group is 13.79 million in 2000 and goes up to 14.45 million when the people in the cohort are 10 years old in 2010, so are the other age groups, which is the violation of common sense. The only reasonable explanation is the missing report in the 2000 census. When compared with data of other resources, the situation is worse. We calculated the number of birth of each year from 1996 to 2000 using age specific data of different resources separately and compare the results. It turned out that both the results of household registration and educational statistics are much higher than that of census data (Table 2).

Table 1 The Number of People in the Same Birth Cohort at Different Census time Point Age Group* Census 2000 (million) Census 2010 (million) 0(10) 13.79 14.45 1(11) 11.50 13.94 2(12) 14.01 15.40 3(13) 14.45 15.23 4(14) 15.22 15.89 5(15) 16.93 18.02 6(16) 16.47 18.79 7(17) 17.91 20.78 8(18) 18.75 20.76 9(19) 20.08 21.54 * number outside the brackets stands for the age of the birth cohort in 2000, while the those in the brackets stands for the age of the same birth cohort in 2010

Table 2 The Calculated Number of Birth according to Different Data Sources: 1996-2000 (million) Year Census 2000 Census 2010 House Registration Education Statistics 1996 15.75 16.54 17.88 18.16 1997 14.93 15.84 16.69 17.60 1998 14.45 16.01 16.27 17.24 1999 11.82 14.48 15.17 16.61 2000 14.11 15.01 15.24 16.93

In contrast to the underreporting problem in the low-age groups, the census 2010 showed overreporting of adults aged 20-24. In Census 1990, the number of girls aged 0-4 was 55.39 million, and the number of women aged 20-24 increased to 63.40 million in Census 2010. It’s difficult to believe that such huge difference can be explained only by the underreporting in 1990. When compared with the amount of same age groups in house registration data, which was about 60 million in 2010, the number of women aged 20-24 was very likely to suffer severe over reporting. Some scholars attribute this problem to the failure of new register method of floating population in Census 2010(Cui hongyan, Xu lan, and Li rui, 2013; Zhai zhenwu, Zhang huanjun, 2013). Because of the underreporting of children and overreporting of women in Census 2010, the TFR would be substantially underestimated when we used the number of children as the numerator and the number of women as the denominator in the calculation.

The Advantages and Disadvantages of Household Registration Data and Educational Statistics

We also found out that the data quality of educational statistics is best of the three through the surveys and interviews because the number of children in the educational statistics has no close relationship with the interest of local government and reporters, which is quite different from census data. As it can be seen from table 3, the number of children didn’t change much for each birth cohort at their different ages, indicating the high quality and consistency of Educational Statistics. But we can only use the educational statistics to calculate the fertility level 6 or 7 years before the statistical year because that the education data is the age specific statistics of school children who are at least 6 years old.

Table 3 Number of 7-10 Year-Old Students of 1996 to 2000 Birth Cohorts in Educational Statistics person

Birth Cohort 1996 1997 1998 1999 2000 Age 7 17022225 16482027 16123203 15781091 16197876 8 17417748 16813490 16595346 15985110 16339099 9 17637784 17174732 16812126 16123822 16372235 10 17907822 17367095 16921433 16136898 16360598

What is more, the registration information of adults in House Registration System was quite accurate, while the information of children under 18 was not so complete. Because some people would not register until they need the registration certification, for example, age 6 for the education and age 18 for the ID card. The numbers of persons of each birth cohort at different registration time points were shown in Figure 1. The numbers of young age groups increased sharply during 2007-2012, which demonstrated an underreporting problem in these age groups. In contrast, the numbers of people aged 18 and above kept steady at different time points, indicating the consistency of data. Figure 1 Population of Birth Cohorts from House Registration Data in Different Years

The estimated TFR Amended by Educational Statistics and House Registration Data Firstly, the age and sex specific data of the low-age groups in census data and education statistical data were divided by the survive rates to calculate the numbers of births. Secondly, believing in the high quality of educational statistics, the numbers of births from 2000 to 2005 based on the education statistics data were used as numerators when calculated the TFRs. But the births of 2006-2009 can’t be calculated from the educational data because the youngest age group of educational statistical data was age-7 group in Educational Statistics 2012, which was born in 2005. Therefore, the tendency of census data and the fertility level of education statistical data were combined to calculate the estimated births of educational data from 2006-2009. In the third step, we fit a linear relationship between the births of census data from 1990 to 2005 and those of educational data with the OLS method(y=0.7792x+4822837,R2=0.9933). Then the regression equation and the births of census data from 2006 to 2009 were used to get the estimated births of educational data from 2006 to 2009. Fourth, the total number of births for each year was multiplied by the fertility distribution of women aged 15-49(based on the Census 2000 and 2010) to calculate the numbers of age-specific births. Finally, the numbers of women of different age groups in house registration data were used as the denominator when we compute the age-specific fertility rate to correct the over reporting of women of childbearing age in census data. The estimated TFRs after emendation are listed as below:

Table 4 The Estimated TFRs Amended by Educational Statistics: 2000-2009 Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 TFR 1.64 1.65 1.60 1.61 1.62 1.70 1.68 1.67 1.66 1.60

Conclusion and Discussion After the comparison between the census data and the data from other resources, which are independent of census system, we can draw a conclusion that there were underreporting problems in the low-age groups in the census data. Because of the underreporting of children and overreporting of women in Census 2010, the TFR would be substantially underestimated when we used the number of children as the numerator and the number of women as the denominator in the calculation, and the estimated TFRs after correction were around 1.60-1.70 in China since 2000, which were much higher than the results of Census 2000 and Census 2010. In the future, more and more resources of data, such as the new-born baby data and the vaccination records collected by the hospital system, should be used to evaluate and improve the quality of census data. But always keep it in mind that there is no best data and every data will has its merits and demerits.