Educational Inequality and Intergenerational Mobility in Africa∗

Alberto Alesina Sebastian Hohmann Harvard University, CEPR and NBER London Business School

Stelios Michalopoulos Elias Papaioannou Brown University, CEPR and NBER London Business School and CEPR

March 14, 2018

Abstract We investigate the evolution of inequality and intergenerational mobility in educational attainment in Africa. Using census data covering more than 50 million people in 23 countries we document the following regularities. First, since independence, inequality has fallen across countries and intergen- erational mobility has risen, reflecting the rise in education across the continent. Second, the overall drop in African inequality can be attributed mostly to declines in within-country, within-region and within-ethnicity components. Third, the initially moderate regional and ethnic differences in education persist, revealing strong inertia across these lines. Fourth, we describe the geography of educational mobility across regions and ethnic groups uncovering strong “poverty-trap” dynamics. Educational mo- bility is higher in regions and ethnicities with above-country-average schooling at independence. Fifth, we explore the geographic, historical, and contemporary correlates of intergenerational mobility both across regions and ethnic lines. Colonial investments correlate strongly with educational mobility, while geography and pre-colonial features play a lesser role. The analysis further uncovers “Gatsby Curve” dynamics with intergenerational mobility being low in regions with high inequality.

Keywords: Africa, Development, Education, Inequality, Intergenerational Mobility.

JEL Numbers. N00, N9, O10, O43, O55

∗Alberto Alesina Harvard University and IGIER Bocconi, Sebastian Hohmann, London Business School, Stelios Michalopoulos. Brown University, Elias Papaioannou. London Business School. We thank Remi Jedwab and Adam Storey- gard for sharing their data on colonial roads and railroads in Africa, Julia Cag´eand Valeria Rueda for sharing their data on protestant missions, and Nathan Nunn for sharing his data on Catholic and Protestant missions. We would like to thank for their comments conference participants at the university of Zurich, Brown University and Oriana Bandiera for her insightful discussion.

1 1 Introduction

According to many observers Africa is not only the poorest continent but also the most unequal (Dowden (2008); World Bank (2016)). However, data on African income and wealth inequality are scant and incomplete. In addition we do not have much information on regional and ethnic differences in well- being and previous works have relied on luminosity to proxy for ethnic and spatial inequality (Alesina, Michalopoulos, and Papaioannou (2016)). Since the 2000s the formerly “hopeless continent” has become the “hopeful one” (Economist, 2000, 2011) and there is rising euphoria on Africa’s future. However, we lack an understanding of the distribution of African growth post-independence across regions and ethnic groups. Likewise, while the Economist has recently coined Africa as the continent of 1.2 billion opportunities (2016), there is not much research on mobility. Where is the land of educational opportunity in Africa? Are regional differences in education present at independence declining? Are there systematic ethnic differences in educational mobility? What is the association between education levels, inequality, and intergenerational mobility across countries, regions, and ethnicities? And which factors correlate with educational mobility? We begin to answer this set of questions using census data on education covering more than 53 million individuals in 23 countries since independence. We organize our analysis into two main parts.

1.1 Overview of Descriptive Patterns

We begin with a detailed description of the evolution of inequality in education across Africa, conducting the analysis across three domains: countries, administrative regions, and ethnic lines.1 We then study intergenerational mobility (IM) in education. Following Chetty et al. (2014) we measure in both relative and absolute terms. The relative IM index –that closely follows the literature– is based on regressing children’s years of schooling on parental education. The absolute IM index reflects the likelihood that offspring of parents who do not have any formal schooling (we term such individuals “illiterate”), manage to complete at least primary education (we call them “literate”). Our analysis uncovers the following regularities. First, since the late colonial times, inequality has fallen, as African countries have experienced rising education. However, there are sizeable and -if anything- growing asymmetries across countries. The between-country component of pan-African inequality has risen over time, reflecting the relative success (and failures) of African countries in expanding education. IM has risen too, but there are large cross- country differences. Second, when we look within countries we find that the drop in educational inequality stems from large declines in within-region inequality rather than a systematic decline of regional disparities. At- independence regional differences in education persist. A similar picture applies to IM in education, as we observe large differences in IM across administrative regions. Third, a similar decomposition of country-wide inequality into a between and a within-ethnicity

1For the regional analysis, we examine coarse and fine regional units. We label the coarse units as “provinces” and the fine regional units as “districts”. Overall we have 346 admin-1 units (provinces) and 2, 444 admin-2 units (districts).

2 component reveals a sizeable drop of within-ethnicity inequality, but minuscule declines across ethnic lines. Likewise, there are non-negligible differences in IM across groups. The gains in education have disproportionately benefited some ethnic groups. This finding is consistent with the well documented phenomenon of ethnic favoritism and ethnic-based discrimination in Africa (e.g., Wimmer, Cederman, and Min (2009)). Fourth, the persistent regional and ethnic gaps in education and in educational IM apply to both males and females. They are especially strong for rural households, much attenuated in urban centers, and less pronounced for migrants. Fifth, the analysis uncovers stark regional and ethnic-specific components in the transmission of educational attainment. Regions that have gained the most since the 1960s are regions that at independence had higher educational attainment. And Africans belonging to ethnic groups with a higher parental education appear, on average, more intergenerationally mobile. This finding suggests the presence of poverty traps and/or peer effects in educational attainment.

1.2 Overview Correlation Analysis

We then explore the correlates of inequality and IM in education across regions and ethnicities. We do not claim causality, but merely want to shed light on the role of geographic, historical (colonial and precolonial), and contemporary features for educational IM. We uncover the following: First, the strongest correlate of both spatial and ethnic IM is parental education. The likelihood that children of parents without formal education will complete at least primary school (absolute IM) is strongly and negatively correlated with the share of the “old” generation without formal education. This is true across countries, regions and ethnic groups. The share of parents without any schooling explains more than half of the observed variability in IM. Second, among various geographic variables, only distance to the capital and the ecological conditions favorable to malaria correlate negatively with IM. Natural resources, terrain features, and proximity to the coast do not seem to play a role. Third, proximity to colonial railroads/roads and to Christian, especially Protestant, missions correlate strongly with regional IM, even when one nets out the direct impact of these colonial investments on education at independence. Fourth, the only ethnic-specific (precolonial) trait which correlates with educational IM is the eth- nicity’s mode of subsistence economy. Individuals belonging to groups that during the pre-colonial era were mainly dependent on agriculture have, on average, higher IM, as compared to those tracing their ancestry to pastoral groups. In contrast, other ethnic features that have been linked to contemporary development, such as political centralization, slavery, class stratification, and polygyny do not systematically correlate with IM. Fifth, when we look at contemporary features, we find a strong negative association between (regional and ethnic) inequality and IM. This “Gatsby curve” result echoes the evidence on income inequality and IM across US regions documented by Chetty et al. (2014), as well as the cross-country patterns in Corak (2013). The analysis further shows a link between industrial specialization and educational IM, with educational IM being considerably higher (lower) in districts with a large (low) share of agriculture.

3 1.3 Related Literature

Our paper lies in the intersection of three strands of literature. The first is the growing body of research documenting and understanding the evolution of inequality in income, wealth, consumption, education, and health, mostly across industrial countries (see for reviews, Alvaredo, Chancel, Piketty, Saez, and Zucman (2018), Alvaredo et al. (2013), Piketty (2014), and Atkinson, Piketty, and Saez (2011)). Especially related to our paper are studies mapping differences in IM across countries and regions (e.g., Corak (2013), Chetty, Hendren, Kline, and Saez (2014, 2015))2 and empirical papers on IM in education (Solon (1999), Black and Devereux (2011)).3 We contribute to this research agenda by (i) providing a descriptive overview of educational inequality across African countries, regions, and ethnic groups; (ii) mapping educational IM and (iii) exploring its main correlates at both the regional and ethnic level. The second is the literature on the interplay between inequality and development. Due to measure- ment challenges, noisy data, opposing theoretical predictions, and hard-to-account for features shaping both the level and distribution of development, this literature has failed to detect robust empirical reg- ularities (see Benabou (2005), Galor (2011) and Atkinson (2016) for overviews). Especially related to our paper are works focusing on regional inequalities and development (Williamson (1965), Kanbur and Venables (2008), Kim (2009), Young (2013), and Wough (2014)), as well as studies on ethnic inequality (e.g., Alesina, Michalopoulos and Papaioannou (2016), Baldwin and Huber (2010), Huber and Mayoral (2016), Huber and Suryanarayan (2013), Cederman, Weidmann and Gleditsch (2011), Stewart (2002), Chua (2003), and Robinson (2001)).4 Our main contribution to this strand is the use of education data (rather than consumption or survey-based income) that reduce noise and given the fine spatial, tempo- ral and and ethnic disaggregation allow for a more comprehensive understanding of inequality across the African landscape. Third, there has been a growing interest on the Black Continent both in properly measuring well- being, poverty, and output (Young (2012); Pinkovskiy and Sala-i-Martin (2014)) and on the correlates of African development (e.g., Henderson et al. (2018)). The literature has moved from mostly cross-country approaches and narratives (e.g., Collier and Gunning (1999), Bates (2006), Easterly and Levine (1997)) focusing on national features to within-country analyses that connect Africa’s contemporary development to its colonial (and even precolonial) past. This body of research has uncovered strong evidence of historical continuity as well as instances of rupture in the evolution of its economy and polity (see Michalopoulos

2Hilger (2015) calculates IM in the US since WWII, while Olivetti and Paserman (2015) study IM in the United States from 1850 till 1940. Charles and Hurst (2003) use PSID data to estimate intergenerational persistence in wealth across US households. Alesina, Stancheva, and Teso (2018) compare actual and perceptions in several industrial countries. 3Early studies on IM in education include Bowles (1972), Blake (1985), and Spady (1996). More recently, Hertz, Jayasun- dera, Piraino, Selcuk, Smith, and Verashchagina (2007) estimate country-level IM coefficients for various cohorts across 42 countries. Azam and Bhatt (2015) and Golley and Kong (2013) estimate IM in education in India and China, respectively. 4Our paper’s results on ethnic inequality and IM are related to the large literature on ethnic favoritism, discrimination, and repression. See, among others, Posner (2005), DeLuca, Hodler, Raschky, and Valsecchi (2015), Burgess, Jedwab, Miguel, Morjaria, and Padro-i-Miguel (2015), DeLuca, Hodler, Raschky, and Valsecchi (2015), Franck and Rainer (2012), Dickens (2017), Wimmer, Cederman, and Min (2009)). On ethnic identification see Logan (2011), Michalopoulos and Papaioannou (2015b), Eifert, Posner and Miguel (2010) and Habyarimana et al. (2009), among many others.

4 and Papaioannou (2018) for a review).5 By examining education -that correlates strongly with well- being and income- we shed light on the considerable within-country regional and ethnic disparities. The examination of IM and the uncovered strong within-family inertia in education provide an explanation of this research’s main result, that pre-independence features tend to correlate strongly with contemporary proxies of development.

Structure The paper is organized as follows. In Section 2 we present the census data on edu- cational attainment. In Section 3 we detail the construction of the inequality statistics (individual-level inequality, between and within-region inequality, between and within-ethnicity inequality). We then dis- cuss the main patterns of inequality in education since African independence. In Section 4 we explore inertia in regional and ethnic inequality since the late colonial times. In Section 5 we turn to IM in educa- tion. We first present the geography of IM across countries, regions, and ethnicities, and then discuss the main cross-country patterns. In Section 6 we present a mapping of the land of opportunity in Africa that portrays the likelihood of upward IM across regions. Second, we show the strong role of initial education in explaining variation in IM. Third, we examine the correlates of the sizeable spatial variability in IM. In Section 7 we first provide an assessment of ethnic differences in IM and explore its correlates. In Section 8 we summarize.

2 Data

2.1 Why Education?

We focus on inequality and IM in education for many reasons. First, education is a major choice-investment that individuals undertake. And since the influential work of Becker and Tomes (1979), a large literature has examined the intergenerational transmission of human capital across families (see Solon (1999) and Black and Devereux (2011) for literature reviews, while Mogstad (2017) discusses the “human capital approach to IM”). Second, education is strongly correlated with income/wealth both across countries (e.g., Barro and Lee (2014)) and regions (Gennaioli et al. (2013, 2014)); a large body of research in labor economics shows that education causally affects lifetime income (Card (1999), Krueger and Lindahl (2001)). Individual (Mincerian) returns to schooling are sizable and if anything larger in low-income countries (Psacharopoulos (1994), Caselli (2014), Young (2013), Patrinos (2014)).6 Third, besides wages,

5Nunn (2010, 2014), Cage and Rueda (2016, 2017), Wantchekon, Klasnja, and Novtna (2015), Okoye and Pongou (2014), Mantovanelli (2014), Jedwab, Meier, and Moradi, (2017) and Huillery (2010) examine the role of Christian missions and colonial investments in human capital on Africa’s development, among many others. Building on the country case evidence of Kerby, Jedwab and Moradi (2017) in Kenya, Jedwab and Moradi (2016) on , and Okoye and Pongou (2017) in , Jedwab and Storeygard (2017) examine the role of colonial roads and railroads on spatial development across Africa. Acemoglu et al. (2016) focus on indirect colonial rule in , while Lowes and Montero (2017) on King Leopold’s concessionary agreements in Congo. Michalopoulos and Papaioannou (2016) connect ethnic partitioning during Africa’s Scramble to contemporary conflict. Starting with Nunn (2008) many works trace various aspects of Africa’s underdevelopment to the slave trade epoch. Other studies uncover the legacy of precolonial institutional, cultural, and economic features on contemporary development (e.g., Gennaioli and Rainer (2007), Michalopoulos and Papaioannou (2014), Michalopoulos, Putterman and Weil (2017), Mosconna, Nunn, and Robinson (2017), Fenske (2015))). 6Young (2013) reports Mincerian returns in the range of 11.3% (OLS) to 13.9% (2SLS) in a sample of 14 Sub-Saharan African countries with data on labor income from the Demographic and Health Surveys. These estimates are higher than in

5 education is related to the quality of life and people’s aspirations. As we demonstrate in Appendix A using data from the Demographic and Health Surveys (DHS) and the Afrobarometer Surveys, education correlates strongly with various proxies of well-being across Africa; living conditions, as reflected by the DHS composite wealth index and the Afrobarometer, child mortality and fertility, hopes and aspirations, attitudes toward domestic violence, and proxies of political and civic engagement. Fourth, consumption data for African countries are noisy, cover small samples, and are not spatially disaggregated. Likewise, income and wealth data are scant, available only for a tiny share of the African population and only for a handful of countries.7 The share of the underground economy in Africa is large (La Porta and Shleifer (2008, 2016) and tax evasion is rampant, reflecting African states limited capacity to collect income taxes (Besley and Persson (2011)). In contrast, education data are available through various African censuses since the late 1960s. Not only measurement error in educational attainment is a lesser concern compared to that of reported income or wealth, but the education data are available at a fine temporal, geographic and even ethnic resolution. Moreover, education is useful in mapping and studying IM, as people tend to complete schooling by their early-twenties (and in Africa usually earlier) and so, unlike with lifetime earnings, the analysis can start when adults are relatively early in the life-cycle.

2.2 Sample Characteristics

Our analysis is based upon 52.6 million individual records, retrieved from 63 national censuses from 23 African countries: Botswana, , Cameroon, Egypt, Ethiopia, Ghana, , Kenya, , Malawi, , Morocco, Mozambique, Nigeria, Rwanda, , Sierra Leone, , Sudan, South Sudan, Tanzania, Uganda, and Zambia. We retrieved the micro data from IPUMS (Integrated Public Use Microdata Series) International. This database, hosted at the University of Minnesota Population Centre, takes representative samples from national censuses (typically 10%), harmonizes the data series and makes them available in the public domain.8 Appendix Table 26 gives information on our sample (countries, census years, coverage rates, and the number of individuals, provinces, districts, and ethnicities). Figure 1 shows the evolution of coverage. As of 2015, the sample countries were home to slightly more than 850 million people, representing around 72 percent of Africa’s population and 75 percent of its GDP. Our sample is smaller than the total number of observations in IPUMS (which is approximately 120 million) for three reasons. First, we drop individuals without schooling information; while this is not a serious issue for most countries, it is for Ethiopia in 2007, and Burkina Faso in 1985, where respectively

11 non-SSA low income countries [range between 8.7% (OLS) and 10.4% (2SLS)] and the “consensus” estimate of 6.5% − 8.5% in high income countries. Caselli (2016) reports lower Mincerian returns in Sub-Saharan Africa (around 8.5%), though in line with the earlier work of Psacharopoulos he also estimates a negative relationship between Mincerian returns and years of schooling (which is steeper in 1995 as compared to 2005). Moreover, Patrinos (2014) estimates higher Mincerian returns in SSA (12.5%) compared to the rest of the world (9.7%). 7Alvaredo et al. (2017) report that for countries like Ghana, Kenya, Tanzania, Nigeria, and Uganda, the income data encompass less than 1% of the adult population. 8The data from Nigeria come from household surveys conducted in consecutive years over 2006 − 2010. To maximize coverage, we aggregate the yearly observations and count them as one “census-year”.

6 Figure 1: Total population, (African) GDP share covered by countries in the sample

.8 800

.7 600 .6

400 .5 total population 200 .4 population and GDP share

.3 0

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 year

total population (millions), 23 countries total population (millions), 14 countries population share, 23 countries population share, 14 countries GDP (current USD) share, 23 countries GDP (current USD) share, 14 countries

around 85% and 45% of the observations have missing schooling data. Second, since we focus on adults, we are dropping individuals below the age of 18. Third, we include individuals aged 9 years more than their total years of schooling. The reason for the latter is that individuals enter school at age 6 and hence a 9-year old individual without any schooling can reasonably be assumed to have completed her schooling. To estimate IM, we match individuals to their parents, using Census information for individuals of various generations who cohabitate.9 As not all individuals live with their parents, the IM sample declines. Our IM estimates are based on approximately 11 million individuals, who cohabitate with their parents or grandparents (or both). This raises “selection” concerns that we discuss below and in the supplementary appendix. IPUMS reports information on respondents’ current residence allowing us to assign individuals to “coarse” and “fine” current administrative units. Districts are typically admin-2 divisions, though in some countries these are admin-3 areas (e.g., Sudan or Mali). Provinces are the coarser units, almost always admin-1 areas (e.g., provinces in South Africa or states in Nigeria).10 We have information from 346 provinces and 2, 444 districts across the 23 countries.

9Observations from three early censuses, Burkina Faso 1985, Kenya 1979, and Liberia 1974 drop out entirely from the IM analysis, because there are no household identifiers that would allow us to assign individuals to different generations. 10For Botswana and Nigeria, IPUMS reports just one administrative unit, “Districts” in Botswana and “States” in Nigeria; we thus use this aggregation both for districts and for provinces. In a few instances (in Ghana after 1984, in Burkina Faso in 1985, in Ethiopia in 1984, in Malawi in 1987, and in South Africa after 1996) the number of districts and regions changes between censuses in given country, as administrative boundaries are sometimes redrawn. For our analysis, we have harmonized administrative boundaries.

7 For 14 countries, the censuses also report respondents’ ethnicity. We assign individuals to an ethnicity using IPUMS information on either ethnicity or maternal language or primary language spoken at home. We focus on ethnicities that make up more than 1% of the country’s population; in some cases we aggregate smaller ethnic groups into their respective ethnic families (e.g., the Asante in Ghana are part of the Akan). Across the 14 countries, we have 178 ethnic/linguistic groups. In the Appendix we describe the procedure in more detail. Figure 1 also gives the evolution of coverage for the ethnic sample. The population of the 14 countries was in 2015 (1960) 400 (100) million; roughly 30% − 35% of the total African population and GDP.

3 Inequality

3.1 Measuring Inequality

A large fraction of the respondents report no schooling; thus we need to account for the many zeros in the construction of the inequality statistics. We use the Gini index -that allows for zeros- and a Generalized Entropy Index (GE) -that is additively decomposable- to measure inequality across individuals, countries, provinces, and districts.11

Ranking individuals i ∈ {1, ...., N} from the lowest to the highest level of individual schooling, si, in a given unit (Africa, country, region, ethnicity) the Gini index (G) is defined as:

N 1 2 X G = 1 + − (N − i + 1)si N SN 2 i=1 where S is the mean of years of schooling in the respective sample. For the country level tabulations, S denotes country-wide average schooling years from N respondents. For the regional and ethnic Ginis we replace individual schooling and number of respondents with the regional and ethnic averages of schooling and the numbers of regions and ethnicities respectively. We calculate generalized entropy indices, that (in contrast to the Gini index) are additively decom- posable into a within-group and a between-group component (Shorrocks 1980). The GE(2) index is:

N " θ # 1 X si GE(θ)total = − 1 Nθ(θ − 1) S i=1

where θ takes on the value of 2. In this case GE(2)total reduces to one half of the coefficient of 1 σ2 12 variation squared, i.e., GE(2)total = 2 µ2 (relative standard deviation).

11The Theil index, which is the generalized entropy index with parameter 1, does not accommodate zeros. Hence, it is not useful in our setting. 12When θ equals 0, the GE(0) index is the mean log deviation, while when θ equals 1, the GE(1) index is the Theil index. The GE index can take on values between 0 (perfect equality or maximum entropy) and ∞ (perfect inequality). θ controls how much weight the measure gives to distances between the average S and individual observations si,g at different parts of the distribution. Values close to zero give higher weights to differences at the lower end of the distribution, θ = 1 weights all distances equally, and θ > 1 weighs large values more heavily. The latter means that, in the context of income inequality, the measure can be sensitive to very high incomes. Since we apply it to schooling, this is not a concern. See the monograph by

8 The GE(2) index can be decomposed into the between-group (e.g., regions or ethnicities) and the within-group component.

G " θ # G  θ  1−θ Ng " θ # 1 X Ng Sg X Sg Ng 1 X si,g GE(θ)total = − 1 + − 1 θ(θ − 1) N S S N N θ(θ − 1) S g=1 g=1 g i=1 g

= GE(θ)between + GE(θ)within.

N denotes the number of individuals, G the number of groups, and Ng the number of individuals in each group. g indicates population or sample groupings (e.g., regions, ethnicities). Hence, si,g are years of schooling of individual i belonging to group (region, ethnicity) g. Sg is the sum of schooling of all P individuals in a given group, Sg = i∈g si,g and Sg denotes average schooling of individuals in a given group, S = Sg . S = PN s , S = S . g Ng i=1 i,g N In the case of regions (ethnicities), the first term reflects regional (ethnic) inequality, weighted by groups’ sizes, while the second term reflects the weighted average inequality within regions (ethnicities), where weights are regions’ (ethnicities’) population shares.

3.2 New Inequality Statistics

Table 1 reports the newly compiled inequality statistics and some basic summary statistics; mean years of schooling (in column (1)) and the share of individuals by educational attainment, zero schooling, less than primary completed, and secondary schooling completed (columns (2)-(5)). Columns (6) and (7) report the Gini and the GE(2) indexes reflecting overall inequality. Table 1: Schooling summary statistics at the census-level

(1) (2) (3) (4) (5) (6) (7) schooling share zero share less share primary share secondary schooling schooling country year mean schooling than primary completed completed Gini GE(2) Botswana 1981 2.951 0.483 0.721 0.279 0.032 0.654 0.808 Botswana 1991 4.578 0.347 0.536 0.464 0.07 0.531 0.466 Botswana 2001 6.279 0.232 0.386 0.614 0.163 0.41 0.262 Botswana 2011 8.287 0.155 0.246 0.754 0.246 0.322 0.167 Burkina Faso 1985 0.486 0.931 0.95 0.05 0.006 0.949 8.531 Burkina Faso 1996 0.849 0.89 0.907 0.093 0.019 0.916 4.892 Burkina Faso 2006 1.41 0.805 0.86 0.14 0.031 0.858 2.707 Cameroon 1976 2 0.664 0.795 0.205 0.01 0.748 1.318 Cameroon 1987 3.409 0.505 0.64 0.36 0.026 0.624 0.713 Cameroon 2005 5.718 0.343 0.418 0.582 0.099 0.482 0.38 Egypt 1986 3.23 0.727 0.727 0.273 0.204 0.759 1.422 Egypt 1996 4.091 0.666 0.666 0.334 0.27 0.705 1.071 Egypt 2006 6.148 0.495 0.495 0.505 0.409 0.555 0.542 Ethiopia 1984 0.684 0.846 0.955 0.045 0.014 0.91 4.765 Ethiopia 1994 1.338 0.765 0.89 0.11 0.031 0.856 2.622 Ethiopia 2007 1.952 0.686 0.836 0.164 0.039 0.791 1.651 Ghana 1984 4.125 0.559 0.612 0.388 0.05 0.636 0.769 Ghana 2000 5.248 0.492 0.533 0.467 0.131 0.59 0.612 Ghana 2010 6.542 0.321 0.407 0.593 0.172 0.457 0.337 Guinea 1983 0.898 0.907 0.918 0.082 0.034 0.927 5.649

Cowell (2009) for further discussion of inequality measures.

9 Guinea 1996 1.443 0.843 0.873 0.127 0.035 0.883 3.316 Kenya 1969 3.199 0.545 0.691 0.309 0.058 0.652 0.791 Kenya 1979 3 0.533 0.72 0.28 0.047 0.655 0.804 Kenya 1989 4.67 0.366 0.52 0.48 0.128 0.5 0.4 Kenya 1999 5.85 0.227 0.414 0.586 0.177 0.404 0.252 Kenya 2009 6.785 0.19 0.332 0.668 0.239 0.362 0.205 Liberia 1974 1.384 0.829 0.87 0.13 0.034 0.87 2.951 Liberia 2008 4.334 0.497 0.603 0.397 0.143 0.617 0.67 Malawi 1987 2.747 0.514 0.85 0.15 0.02 0.652 0.79 Malawi 1998 3.93 0.371 0.768 0.232 0.053 0.548 0.494 Malawi 2008 5.099 0.285 0.678 0.322 0.091 0.466 0.339 Mali 1987 0.92 0.888 0.891 0.109 0.02 0.909 4.587 Mali 1998 1.004 0.839 0.905 0.095 0.015 0.886 3.556 Mali 2009 1.655 0.751 0.847 0.153 0.034 0.828 2.182 Morocco 1982 1.668 0.772 0.879 0.121 0.034 0.83 2.162 Morocco 1994 2.731 0.656 0.8 0.2 0.077 0.747 1.281 Morocco 2004 3.594 0.551 0.72 0.28 0.105 0.674 0.879 Mozambique 1997 1.583 0.598 0.915 0.085 0.01 0.678 1.097 Mozambique 2007 2.421 0.45 0.834 0.166 0.023 0.586 0.714 Nigeria 2006-2010 5.174 0.447 0.485 0.515 0.267 0.552 0.514 Rwanda 1991 2.716 0.475 0.475 0.525 0.025 0.609 0.652 Rwanda 2002 3.334 0.357 0.723 0.277 0.026 0.531 0.483 Senegal 1988 1.641 0.793 0.83 0.17 0.026 0.844 2.406 Senegal 2002 2.299 0.711 0.767 0.233 0.047 0.785 1.613 Sierra Leone 2004 2.493 0.662 0.753 0.247 0.024 0.744 1.275 South Africa 1996 7.251 0.19 0.304 0.696 0.224 0.339 0.184 South Africa 2001 7.548 0.179 0.295 0.705 0.264 0.334 0.18 South Africa 2007 8.081 0.117 0.247 0.753 0.234 0.278 0.129 South Africa 2011 9.127 0.093 0.183 0.817 0.38 0.244 0.105 South Sudan 2008 1.17 0.825 0.885 0.115 0.022 0.876 3.218 Sudan 2008 1.886 0.729 0.827 0.173 0.036 0.813 1.977 Tanzania 1988 3.631 0.434 0.59 0.41 0.033 0.525 0.481 Tanzania 2002 4.851 0.299 0.423 0.577 0.066 0.4 0.282 Tanzania 2012 5.575 0.243 0.345 0.655 0.116 0.363 0.234 Uganda 1991 3.665 0.402 0.664 0.336 0.016 0.559 0.522 Uganda 2002 4.701 0.297 0.569 0.431 0.06 0.489 0.384 Zambia 1990 4.329 0.424 0.561 0.439 0.077 0.554 0.512 Zambia 2000 5.219 0.328 0.477 0.523 0.113 0.482 0.37 Zambia 2010 6.793 0.176 0.339 0.661 0.185 0.366 0.208

We assign individuals to 10-year birth-cohorts to exploit time variation. The cohorts are individuals born pre-1950, 1950 − 1959, 1960 − 1969, 1970 − 1979, 1980 − 1989, and post-1990. For each cohort we calculate the Gini and GE(2) indicators for all individuals, provinces, districts, and ethnicities, using all available censuses in each country; for example, for the 1950s cohort in Cameroon we use information from the 1976, 1987, and 2005 Census. Appendix table 27 reports the overall (individual), regional, and ethnic Ginis, as well as the between and within region and ethnicity generalized entropy measures for all countries and cohorts. Before proceeding we examined the correlation structure of the newly complied inequality statistics. [See Appendix Table 28. Panel A gives the cross-sectional correlation. As in our analysis, we also study the evolution of inequality and mean education over time, Panel B reports the “within”-country correlation, netting out country and general birth-cohort fixed-effects]. First, overall country-level inequality correlates strongly with regional inequality (correlation around 0.88). Second, the correlation between overall and ethnic inequality is also strong, but weaker. The correlation between the overall and the ethnic Gini index is 0.67, while the corresponding correlation between the overall GE(2) and the between-ethnicity component is 0.78. Third, the Ginis and the generalized entropy measures are strongly correlated. The correlation

10 between the overall Gini and GE(2) is around 0.80. Gini coefficients reflecting regional, district, and ethnic inequality correlate strongly with the between-region, district, and ethnic component of the generalized entropy measures (correlations around 0.85 for districts-regions and 0.80 for ethnicity). Fourth, once we net birth-cohort fixed effects and country fixed effects, effectively examining the correlations over time, the correlations weaken (though they retain significance). The correlation between overall and regional inequality is now around 0.7, while the correlation of overall and ethnic inequality is 0.3.

3.3 Patterns in Overall (Individual-level) Inequality

3.3.1 Africa-wide Patterns

In Figures 2 we examine the evolution of educational inequality in Africa across cohorts (right-axis) and contrast it with mean years of schooling (left-axis). In Figure 2 (a) we use all (63) censuses, while in Figure 2 (b) we only use the latest census from each country to avoid over-weighting countries with more IPUMS censuses (so we use 23 Censuses mostly after 2000). There is a clear negative association between inequality and mean education, a pattern that is similar to the cross-country analysis of Castello-Climent and Domenech (2008).

Figure 2: Pan-African mean years of schooling and inequality by 5-year birth cohort

(a) all birth cohorts (b) latest birth cohort only

6 1 6 1

.8 .8 4 4

.6 Gini index .6 Gini index 2 2 mean years of schooling mean years of schooling

.4 .4 0 0

00-04 10-14 20-24 30-34 40-44 50-54 60-64 70-74 80-84 90-94 00-04 10-14 20-24 30-34 40-44 50-54 60-64 70-74 80-84 90-94 birth cohort birth cohort

mean years of schooling individual Gini mean years of schooling individual Gini

Is the co-movement of mean and inequality “mechanical”? It is worth pausing for a moment to consider the different ways changes in the mean can relate to changes in inequality. First, the Gini and the GE(θ) indices are scale-invariant; so a doubling, say, of education across cohorts will not affect the inequality statistics. Second, if mean education increased by adding, say, one year of schooling to all individuals, inequality will fall.13 And if the one-year increase in the average stems from individuals in the bottom two quartiles gaining 2 years of schooling, then inequality will fall even more.14 By contrast, if

13An easy way to see this is to make the constant added to everyone’s schooling very large, then differences from the average become minuscule for every individual, and so will overall inequality. 14For this statement to be unambiguously true, we assume that adding this constant preserves the initial ranking.

11 mean years of schooling increase on average by one, because individuals at the top quartiles gain 2 years each, then inequality will rise alongside the mean. The upshot is simple: If mean schooling expands because of constant gains everywhere across the distribution or because of gains concentrated in the lower end of the distribution, inequality will fall. As gains in average schooling in Africa are mainly about lowering illiteracy (younger generations attending primary schools), the strong negative association between mean and inequality is not surprising. The CDFs of years of schooling for the various cohorts plotted in Figure 3 provide a clearer under- standing of the evolution of education over time.

Figure 3: Pan-African CDF of years of schooling, birth decades 1950s-1980s

1

.8

.6 CDF .4

.2

0 0 3 6 9 12 15 18 years of schooling

1950s 1960s 1970s 1980s

First, there is clear downward shift of the distribution, reflecting rising schooling. Second, most of the average increase comes from rising primary school attainment. For example, 55% (68%) of Africans born in the 1950s have not attended any school (had not completed primary, < 6 years). For the 1980s born cohort, roughly 30% of the population has not attained any schooling and 42% had less than 6 years of formal schooling. “Illiteracy” has been almost halved, a significant achievement. Third, high-school attainment has also risen considerably. Around 10% of the 1950s-born cohort had completed secondary education; the corresponding share for the 1980s cohort is close to 40%. Fourth, increases at the top have been tiny over the period under examination. The increase in mean years of schooling does not come from rising tertiary attainment, that has very recently started to expand and only in a handful of African countries. In Figure 4 we plot the evolution of the generalized entropy index since the 1950s (thick blue line) and also report the decomposition of overall Africa-wide inequality into a between-country and a within-country component (dashed blue lines) using only the latest census for each country.15

15As we show in Appendix figure 36, the patterns are similar when we use all censuses.

12 Figure 4: Pan-African inequality across individuals, within and between countries, latest censuses only

.8 1

.8 .6

.6 .4

GE(2) index values .4 .2 within and between share

.2 0 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort

GE(2) overall within share GE(2) within between share GE(2) between

This is useful entry in to assessing the relative importance of national features in late colonial times and its post-independence evolution. In the 1950s and the 1960s most Africans were illiterate, mean years of schooling was low, and inequality was quite high. Most of the African-wide inequality reflected a “within-country” component; its share of total pan-African inequality was 90% for the 1950s-born-cohort (red dashed line). The between-country share of educational inequality in the 1950s − 1960s was small, around 15% (red solid line), as education differences across countries were small. The sizable drop of educational inequality on the continent reflects a drop of the within-country component that moves in parallel. Between-country inequality in education in absolute terms remained flat; as a consequence the share of the between-country component of African inequality rises to 30% for the 1980s-born cohort. This result should not come as a surprise, as there is wide cross-country heterogeneity in economic performance. Some countries have been quite successful in raising their citizen’s education (e.g., Botswana, South Africa, Tanzania, Kenya), while others have been less successful (e.g., Burkina Faso, Ethiopia, Mali, Senegal). Since at independence there were small differences across countries (as most Africans had not even attended primary schooling), the cross-country component has become relatively more important over time.

3.3.2 Cross-Country Patterns

Figures 5 (a) and 5 (b) plot mean years of schooling (on the y-axis) for each country birth-decade cohort against the Gini index (x-axis). Panel (a) gives the cross-sectional association pooling across all country- cohorts, while panel (b) shows the association net of country and birth-decade fixed effects.16 There is a strong, almost one-to-one negative correlation between mean years of schooling and the individual-level

16 INDIV The plots illustrate the association from the following regression equation Ginic,b = α + (αc + γb) + β × sc,b + c,b.

13 Gini index (R2 = 0.89) that is the analogous to figure 4 at the country-birth decade level. The association continues to be strong (R2 = 0.81) within country-within-birth decade.17

Figure 5: Individual schooling inequality and mean schooling at the country-birth-decade-level

(a) unconditional, birth-decade level (b) country and birth-decade fixed effects, birth-decade level

ZAF BWA 10 BWA ZAF 2 BWA ZAF SSD

8 BWA KEN BFAMLI KEN EGY SDN CMR 1 EGY ZMBZAFGHACMR MOZ KEN NGA EGY ETHKENSENSLEGHA ZAF BWA GIN EGYSENLBR ZMB GHAGHA MWIGINNGA 6 TZATZA MWI ZMB GHA TZA KENZMBZAFKEN CMR NGA TZA CMRRWAMLI SLE TZA CMRZMB MARSSDUGA UGA UGA ZAF EGY SDNZMBGHAUGATZA MWI NGA BFAMARMWIUGALBRRWAZMBMOZMAR RWA KENLBR 0 CMR MAR MWIZAFNGANGAUGA UGA BWA LBRLBR GHAMOZZMBRWASENSDNETHETH RWA CMRMAR MAR NGALBRMOZBWAMLIETHLBRSENMWIEGYSLESSD NGA GINTZA 4 MWIUGA EGY LBRMAR RWA TZA NGA RWAUGA MWIGHAKENETHBFAGHABFABFACMRSDN MAR SLE SLEEGYMLI RWA MWI ZAFZAFMOZZMB SLE GHA MLISDNSSD MOZ LBR EGY SSD TZA ZMBETHSLESENMAR GIN Mean years of schooling RWA KENNGASLE -1 CMR MOZMOZ UGASENSDNSENSLESENSDNEGY SEN 2 MWIBWAETHBFAMLI BWA KEN TZA CMRETHSDNGINGINGIN MOZ SSDBFA MLIMLISDNSSDMLILBR RWA ETHSLESSDBFA MOZ MARSENSDNBFASSD SSD ETHMLIGINBFA -2 0 BWA Mean years of schooling net country + decade FEs .2 .4 .6 .8 1 -.158 -.105 -.051 .003 .057 .111 .165 Gini, individual years of schooling Gini, individual years of schooling net of country + decade FEs REGRESSION FIT: yrschool_cb = 10.3464 -10.5025*Gini_cb + e_cb REGRESSION FIT: yrschool_cb = a_c+g_b -12.5873*Gini_cb + e_cb R-squared = 0.89 R-squared = 0.81

3.4 Regional Inequality & Average Education

As countries accumulate human capital, overall country-level inequality declines, but what happens to inequality across regions? In Figures 6 we examine the cross-country and the within-country associa- tion between average schooling and spatial inequality, conducting the analysis at both the province level (346 admin-1 units) and the district level (2, 444 admin-2 units) geographic units.18 The cross-sectional correlation in Panels (a) and (c) between spatial inequality and mean years of schooling is negative and highly significant. This applies to both the broad and the fine regional disaggregation (R2 of 0.62 and 0.67). The correlation is evidently weaker than the one between schooling and overall inequality (where the R2 was 0.90). Panels (b) and (d) plot the association between the province and district Ginis and mean schooling netting out birth-decade and country-fixed effects; this allows to examine whether spatial inequalities fall as countries’ educational attainment rises. The within-country correlation between district and region inequality and mean years of schooling becomes weak and noisy (with R2 falling to 0.03 and 0.19, respectively). This pattern stands in contrast to the strong and highly significant within-country and within-cohort correlation between overall (individual-level) inequality and mean years of schooling (in Figure 5 (b)).

17The within-correlation suggests that a point increase in the Gini coefficient across decade-birth-cohorts is associated with a drop of 0.125 in mean years of schooling. Looking at changes from the 1950s to the 1980s birth cohort, a particular success story is Botswana, which has increased average schooling by 5.2 years more than the overall increase; this rise in education was associated with a fall in the schooling Gini by almost 40 points. In contrast, average schooling in Senegal has moved in tandem with the pan-African rise over this period; the education Gini declined only slightly. 18 REGION The plots illustrate the association from the following regression equation GINIc,b = α + (αc + γb) + β × sc,b + c,b.

14 3.5 Ethnic Inequality & Average Education

Figures 7 (a) and (b) plot the cross-sectional and the within-country association between ethnic Ginis and mean years of schooling at the country-birth-decade level.19 There is a negative cross-sectional association between ethnic inequality and country-level years of schooling. Yet, the correlation is weaker and more noisy than the corresponding correlations at the individual level (Figure 5 (b)) and even at the spatial level; the R2 is 0.36, as compared to 0.65 and 0.69 for regional and district-level inequality). When we condition on country and general cohort effects, so as to examine whether improvements in mean education across cohorts are matched by declines in ethnic inequality within countries (Figure 7 (b)), the correlation weakens further (R2 = 0.21).

Figure 6: Between regions-mean schooling inequality and mean schooling

(a) Provinces, levels (b) Provinces, country and decade fixed effects

ZAF BWA 10 BWA ZAF 2 BWA ZAF EGY KENKEN SSD ZAF GHACMR BWA ZMBEGY KEN NGA BWAZMB GHA BFA MLI TZATZAMWIZMB GHAGHA SDN CMR ZMBCMRCMR NGA 1 MOZ EGY TZA UGA EGYZAF ETH 5 UGAMWI NGA GHASEN SLE LBRUGA KEN LBR KENEGY GINZAF MARMARLBRBWALBR CMR SEN MWI NGA RWA MWIEGY GIN TZAZMB ZAFKEN RWA TZA UGA NGA MLI CMRKENTZA RWA MARMWI MAR RWA SLE SLE GHA ZMBSSDUGASDN MOZLBRZMBMAR SLE SEN RWA UGA GHA ZMBBFATZA RWA SLEKENETH NGA 0 UGA ZAF MARMARCMRMOZMARMWIUGARWA LBRNGA UGAMOZMWIMOZ EGYBWASENSLE SENSENSDN ETHSDN MWI GHA MOZSENSDNNGA ETH TZA CMRBFAMLI GIN SDN ETH MLIBWALBRSLESEN SSDZMBETHRWALBR MOZ GINSSDBFA GIN ETH NGA TZAEGYMOZMWI RWA LBR MLIMLISSDSSDMLI SLE SDN BFA NGALBRETHRWAGHA UGAGINMWI MAR MOZMAR BFA SEN ETH BFA GHABFAKENCMRSDN SSD BFASDN SLE MLI SLEEGY SSD GIN BFAMLI ETH ZMB ZAF ZAF MOZ

0 SDNSSD TZA MLI SSD GINEGY Mean years of schooling -1 CMR SEN KEN BWA

-2

-5 BWA Mean years of schooling net country + decade FEs 0 .2 .4 .6 -.103 -.072 -.042 -.011 .019 .05 .08 Gini, region mean years of schooling Gini, region mean years of schooling net of country + decade FEs REGRESSION FIT: yrschool_cb = 7.3049 -14.9414*GiniP_cb + e_cb REGRESSION FIT: yrschool_cb = a_c+g_b -5.3156*GiniP_cb + e_cb R-squared = 0.62 R-squared = 0.03

(c) Districts, levels (d) Districts, country and decade fixed effects

ZAF BWA 10 BWA ZAF 2 BWA ZAF SSD BWA KENKEN EGY MLI BFA ZAF SDN CMR ZMB CMRGHAEGY NGA 1 EGY BWA KEN MOZ ETH ZMB GHAGHA ZAFGHAKENSEN SLE TZATZA ZMBMWI GHA EGY LBR SEN GIN NGA CMR NGA GIN MWI KEN TZA ZMB CMR ZMBTZAZAF KEN UGAUGA EGYZAF MLIRWACMRTZASLE 5 MWI KENNGA MAR UGASSD BWAUGA LBRLBRLBR ZMBSDNGHAUGA TZA MAR CMR MOZMARMWIRWARWALBRMAR BFA ZMB RWA MAR 0 UGA MARMWIZAF CMRNGAUGA NGA RWA UGA MWI NGA EGY MOZGHASDNETHRWASEN ZMBETH TZA ETHNGA LBREGYBWAMLISENSLE MOZSSDMWILBR MAR NGA TZA GIN RWA MWI BFA RWAETHCMRUGASDNMWIGHALBR MAR MOZLBR SLE GHA BFASLEGHA BFASLEKEN ZMB MAR SLESEN ZAF MLIEGYZAF RWA ETHKENNGASLE ZMB SDNSSD MOZ UGA MOZ SEN SENSLESEN SDNEGY TZA SSD MLI MOZBFABWAMWI MLIETH SDN GIN EGY

Mean years of schooling TZA GIN SDN MOZBFACMRGIN GIN ETH -1 CMR LBR MLIMLISSDMLISSD SDN SENBWA RWA BFA SSDETHSLE KEN MOZ MARBFA SENSSD SDN SSD GIN BFAMLI ETH 0 -2 BWA Mean years of schooling net country + decade FEs 0 .2 .4 .6 -.083 -.049 -.015 .019 .053 .087 .121 Gini, district mean years of schooling Gini, district mean years of schooling net of country + decade FEs REGRESSION FIT: yrschool_cb = 7.7733 -14.2605*GiniD_cb + e_cb REGRESSION FIT: yrschool_cb = a_c+g_b -10.7854*GiniD_cb + e_cb R-squared = 0.67 R-squared = 0.19

19 ETH The plots illustrate the association from the following regression equation GINIc,b = α + (αc + γb) + β × sc,b + c,b.

15 Figure 7: Between ethnicities mean schooling inequality and mean schooling

(a) Ethnicities, levels (b) Ethnicities, country and decade fixed effects

ZAF BWA 10 BWA ZAF 2 BWA ZAF

ZMB ZAFGHA BWA BWA MLI BFA ZMB MWI GHA GHAGHA 1 MOZ ZMB UGA ZAF ETH 5 MWIUGA ZAF SENGHA SLE LBR LBRLBRUGA BWA SENLBR MAR ZMB MWI MWI UGA ZAF MLI MWIMAR SLE SLE GHA UGAMAR LBRMOZSEN ZMBUGA GHA BFA ZMB MARETHSLE SLE MARLBRMOZMARUGA MWI ZMB SENMOZSENMOZSENUGA SLE 0 MWI ZAF UGASENGHA MAR MWI ETHMLI BFA BWA LBRSLEMOZETHETH ZMB ETH MOZETH BFA LBR MWIBWASEN MLIMOZ MLIMLIMLI LBR SLE BFA MAR UGAGHAMWI ETH LBR MARMOZETH SEN BFA GHABFA BFA BFA SLE SLE MLI ETHMLI BFA ZMB ZAF MOZ ZAF 0 MLI

Mean years of schooling -1 SEN BWA

-2

-5 BWA Mean years of schooling net country + decade FEs 0 .2 .4 .6 .8 -.136 -.089 -.041 .006 .054 .101 .149 Gini, ethnicity mean years of schooling Gini, ethnicity mean years of schooling net of country + decade FEs REGRESSION FIT: yrschool_cb = 5.9354 -9.5923*GiniE_cb + e_cb REGRESSION FIT: yrschool_cb = a_c+g_b -6.4104*GiniE_cb + e_cb R-squared = 0.36 R-squared = 0.21

3.6 Summary and Country Cases

The descriptive analysis so far shows that since the end of colonization pan-African educational inequality has declined. This should not come as a surprise, as at independence few individuals had completed primary education and inequality was thus very high. As an increasing number of Africans acquire education and these gains are mostly on the low end of the educational distribution, inequality is falling. At the same time, cross-country differences are becoming relatively more important, as not all African nations have managed to expand education at the same rate. The most interesting pattern of the cross-country analysis, however, is the contrast in how increases in average education have mapped into changes in individual level inequality, and regional and ethnic inequal- ity, respectively. Improvements in education (and the associated robust fall of individual-level inequality) are only weakly correlated with spatial inequalities and intracountry differences in ethnic inequality. African countries are obviously heterogeneous across many aspects relevant for the distribution of education. And to economize on space, we have mostly taken so far a pan-African viewpoint. In Appendix D.1.2.1 we decompose overall individual-level inequality, as measured by the GE(2) index, into a between- and a within-regions, and, where information is available, into a between- and within-ethnicities compo- nent for each of the 23 sample countries. Unlike the picture between and within-countries across the entire continent (Figure 4), the pattern is less uniform, pointing to considerable country heterogeneity. For some countries, like Kenya, Nigeria, and Sudan, the between-region and ethnicity component of educational in- equality has become over time relatively more important. For other countries, like Botswana, the regional and ethnic inequality has declined considerably, alongside rising education. But for most majority of coun- tries, the between-region and between-ethnicity components of inequality show no trends. As individual inequality declines overall, this result points to strong educational inertia across regions and ethnicities.

16 4 A Closer Look at Inertia

In an effort to better understand the dynamics of human capital accumulation at the regional and ethnic level, we examined whether the relative positions in education of regions and ethnicities have changed for the 1950s-born and the 1980-born cohort. While this may be a relatively short period of time, it coincides with large increases in primary and secondary educational attainment.

4.1 Regional Inertia

In Figures 8 (a)-(b) we examine inertia in spatial inequality looking at the coarse regional (province) level and at finer geographic units (districts).20 Figure 8 (a) and (b) plot the average schooling for individuals born in the 1950s at the regional level, partialling out the countrywide average for all individuals born in the 1950s (horizontal axis) against the corresponding statistic for the 1980s-born cohorts (vertical axis). We add the 45 degree line (and mark provinces and districts based on the country). How close/far regions are from zero in the 1950s and 1980s indicates reflects spatial inequality, while how close/far observations are from the 45 degree line indicates inertia. Figure 8 (a) illustrates considerable inertia in education across provinces. Most observations are clustered around the 45-degree line. A simple regression of the 1980s region-cohort mean net of 1980s overall country mean on the 1950s region-cohort mean net of 1950s overall country mean yields a coefficient (s.e.) of 0.985 (0.033) and an R2 of 0.74. Figure 8 (b) repeats the analysis across districts; the coefficient on 1950s demeaned schooling is 0.81 (s.e. 0.011, R2 of 68%). Persistence in regional differences in education, however, differs considerably across countries. In Table 2 we report the rank-rank correlation (and associated in-sample fit) for each country. On the one extreme, in Ghana, Malawi, Sudan, South Sudan, and Nigeria there is a close to perfect rank correlation across regions’ in mean years of schooling for the 1980 and the 1950 cohort (rank-ρ > 0.9 across provinces and districts). On the other hand, South Africa is the country the lowest persistence (rank-rank correlation of 0.40 and 0.65 at the province and the district level, respectively). In Ethiopia, Uganda, Tanzania, and Morocco inertia are in-between the two extremes (rank correlations in-between 0.58 and 0.80).

4.2 Ethnic Inertia

In Figure 8 (c) we examine inertia in ethnic differences in education comparing again the 1950s and the 1980s cohort. As there are some outliers, relatively small ethnic groups that in the 1950s had substantially higher levels of education compared to their country average, Figure 8 (d) reproduces the association dropping outliers. These are the Krio (Creole) in Sierra Leone (2.2% of the sample for the 1950s birth cohort), the Asian communities in Uganda (0.15%), a French-speaking minority in Burkina Faso (1.3%), and English in Botswana (3.4%). In Chua’s (2003) terminology these groups are all “small market-dominant ethnic minorities” that enjoy considerable political and economic power since colonial times.

20In the Appendix we report graphs looking at literacy rather than mean schooling years. The patterns are similar.

17 Figure 8: Inertia in mean schooling by group: regions districts and ethnicities 1950s vs 1980s

(a) Regions (b) Districts

5 10

5

0 0

-5

-5 -10 -5 -2.5 0 2.5 5 -5 -2.5 0 2.5 5 mean years of schooling, individuals born in 1950s - country mean mean years of schooling, individuals born in 1950s - country mean

Burkina-Faso Botswana Cameroon Egypt Burkina-Faso Botswana Cameroon Egypt Ethiopia Ghana Guinea Kenya Ethiopia Ghana Guinea Kenya Liberia Morocco Mali Mozambique Liberia Morocco Mali Mozambique Malawi Nigeria Rwanda Sudan Malawi Nigeria Rwanda Sudan Senegal Sierra-Leone South Sudan Tanzania Senegal Sierra-Leone South Sudan Tanzania

mean years of schooling, individuals born in 1980s - country Uganda South Africa Zambia 45-degree line mean years of schooling, individuals born in 1980s - country Uganda South Africa Zambia 45-degree line

(c) Ethnicities (d) Ethnicities without outliers

15 4

10 2

BFA, French

5 SLE, Krio 0 BWA, English UGA, Other non-African 0 -2

-5 -4 -5 -1 3 7 11 -4 -2 0 2 4 mean years of schooling, individuals born in 1950s - country mean mean years of schooling, individuals born in 1950s - country mean

Burkina-Faso Botswana Ethiopia Ghana Burkina-Faso Botswana Ethiopia Ghana Liberia Morocco Mali Mozambique Liberia Morocco Mali Mozambique Malawi Senegal Sierra-Leone Uganda Malawi Senegal Sierra-Leone Uganda South Africa Zambia 45-degree line South Africa Zambia 45-degree line mean years of schooling, individuals born in 1980s - country mean years of schooling, individuals born in 1980s - country

There is an evident positive association, that is, however, weaker than the association across admin- istrative units. A regression that associates demeaned ethnic-level average years of schooling for the 1980s to the 1950s cohort coefficient (s.e.) of .586 (.029) and an R2 of 0.69 [0.581 (0.043) and an R2 of 0.53 when dropping outliers]. As Table 2 shows, there is non-negligible country heterogeneity in the persistence of ethnic differ- ences in education. The rank-rank correlation exceeds 0.90 in Malawi, Burkina Faso, Ghana, Botswana, Mozambique and Ethiopia. It is high in Mali and Senegal (exceeding 0.80). It is below 0.80 in Uganda, Liberia, Morocco, Senegal, Sierra Leone, and Zambia and it is the lowest in South Africa (0.45). We further explored the role of regions and ethnicities in explaining static and dynamic differences education, conducting variance decompositions. We report this analysis in appendix E.1.

18 Table 2: Rank-Rank correlation 50s and 80s for ethnicities, provinces, and districts

province district ethnicity (1) (2) (3) (4) (5) (6) (7) (8) (9) countryρ ˆ R2 N ρˆ R2 N ρˆ R2 N Sudan 0.989 0.979 15 0.961 0.923 129 – – – Kenya 0.976 0.953 8 0.881 0.776 173 – – – Malawi 0.974 0.949 24 0.908 0.825 223 0.982 0.964 11 South Sudan 0.939 0.882 10 0.798 0.637 72 – – – Nigeria 0.930 0.864 38 – – – – – – Ghana 0.915 0.838 10 0.915 0.838 102 0.933 0.871 9 Burkina Faso 0.896 0.802 13 0.750 0.562 45 0.960 0.922 14 Botswana 0.895 0.801 21 – – – 0.983 0.967 9 Cameroon 0.893 0.797 7 0.880 0.775 39 – – – Liberia 0.850 0.723 9 0.794 0.630 50 0.750 0.563 17 Sierra Leone 0.846 0.716 14 0.757 0.573 90 0.736 0.542 13 Egypt 0.846 0.716 24 0.877 0.768 235 – – – Mozambique 0.818 0.669 11 0.776 0.603 143 0.911 0.830 18 Zambia 0.810 0.655 8 0.898 0.807 72 0.645 0.417 11 Senegal 0.783 0.614 9 0.905 0.819 28 0.818 0.669 10 Tanzania 0.684 0.468 23 0.782 0.612 113 – – – Uganda 0.622 0.387 38 0.775 0.600 159 0.764 0.584 22 Morocco 0.621 0.385 16 0.762 0.580 54 0.800 0.640 5 South Africa 0.583 0.340 9 0.654 0.428 216 0.448 0.200 12 Ethiopia 0.559 0.313 12 0.650 0.423 63 0.868 0.754 14 Mali 0.517 0.267 9 0.775 0.601 242 0.846 0.716 13 Rwanda 0.476 0.226 12 0.732 0.536 104 – – – Guinea – – – – – – – – – Note that Guinea does not have individuals born in the 1980s since the last census there was conduced in 1996. Nigeria and Botswana are omitted from the district analysis since we only have admin-1 information for these countries.

5IM

The uncovered inertia in regional disparities in education across countries, regions, and ethnicities hint at “poverty trap” dynamics, where even small-modest initial differences matter for the subsequent develop- ment path. In this section, we construct measures of IM (persistence) in education.

5.1 Methodology

We measure IM as the transmission of education from an “old” to a “young” generation within a household. The education of an individual belonging to the “young” is simply her observed years of schooling or educational attainment. We construct the education of the old whose education is relevant for individual i as the average education of individuals one generation older than individual i living within the same

19 household.21 Note that for households with three or more generations of adults, an individual’s education can appear both as the education of an “old” generation (say vis a vis one’s children) and as the education of a “young” generation (say vis a vis one’s parents). Overall we use information from 11, 259, 329 “young” individuals who cohabitate with an “older” generation. Appendix Table 31 gives details on the IM sample, providing information on how many individuals are linked to one or both parents, or other individuals in the previous generation.

5.1.1 “Relative” IM

Following the large literature on educational (and income) IM (e.g., Black and Devereux (2011), Solon (1999)), our first measure of IM is obtained by running variants of the following regression:

s1,itbc = αc + βcs0,itc + [γ0,b + δ1,b + θt] + 1,itc (1) where s1,itbc denotes years of schooling of individual i, born in a specific decade (birth-cohort b), observed in census-year t and residing in country c; s0,itc is the mean years of schooling of the generation immediately older than individual i, residing in the same household.

Initially we pool all individuals from all countries including country-specific constants (αc) and

country-specific slopes βc; this allows us to get an approximate estimate of differences across countries in the intrafamily coefficient of educational persistence. We estimate versions of the above regression equation also adding census-year fixed effects (at) and birth-cohort fixed effects of the old (γ0,b) and the young (δ1,b), so as to account for differences over time and other unobserved factors (though the results are almost the same). In the same vein, we estimate region and ethnicity-specific persistence coefficients pooling observa- tions from all censuses in each country. For each country we separately estimate:

s1,itbcre = [αr + αe] + [βrs0,itcbre] + [βes0,itcbre] + [γ0,b + δ1,b + θt] + 1,itc for each country c (2)

αr and αe are region and ethnicity constants. βr and βe are region and ethnicity specific persistence in education parameters that quantify potential differences in the transmission of education across generations across regions and ethnic lines, respectively. The IM in education index is simply one minus the persistence parameter, which we allow to differ

across countries (1 − βc), regions (1 − βr) and ethnic lines (1 − βe). Following Chetty et al. (2014) we refer to these measures as “relative” IM. 21For years of schooling, we allow for fractional values of average years of schooling. For educational attainment (no schooling, primary, secondary, university), we round previous generation average attainment to the nearest integer to be able to assign it to a category.

20 5.1.2 “Absolute” IM

There are five main educational attainment categories (no schooling, less than primary, completed primary, completed secondary, and completed tertiary).22 After merging the two initial categories (no schooling and less than completed primary schooling) we obtain 4 by 4 matrices of “absolute” IM in education (this terminology follows again Chetty et al. (2014)). Figure 9 (a) below gives the Africawide transition matrix using all censuses, while Figures 9 (b) and (c) focus on Mozambique and Tanzania, two neighboring East African countries that differ in the degree of IM.

Figure 9: Visualization of transition likelihoods

(a) Africa

1

.8

.6

.4

.2

conditional likelihood of child attainment 0

< than primary, 74.5% primary completed, 19.3%secondary completed, 5.1%university completed, 1.2% parental attainment

< than primary primary completed secondary completed university completed

(b) Mozambique (c) Tanzania

1 1

.8 .8

.6 .6

.4 .4

.2 .2

conditional likelihood of child attainment 0 conditional likelihood of child attainment 0

< than primary, 88.8% < than primary, 63.8% primary completed, 9.7%secondary completed, 1.4%university completed, .2% primary completed, 31.6%secondary completed, 4.1%university completed, .5% parental attainment parental attainment

< than primary primary completed < than primary primary completed secondary completed university completed secondary completed university completed

22Individuals with incomplete primary are assigned to less-than-primary, individuals with incomplete secondary to complete primary and individuals with incomplete tertiary to completed secondary.

21 Each cell in the transition matrix reports the probability that the child has the row educational at- tainment, conditional on his/her parents having the corresponding column educational attainment. Across Africa roughly 75% of the old generation has not completed primary schooling; with only 1.2% of the “old” having tertiary education. However, as younger generations acquire more education 26% of Africans whose parents have not completed primary schooling, manage to complete primary education, 12% finish high-school and 2% even manage to get a college degree (see column (1)). Since three-fourths of “old” Africans did not have any schooling, we focus on the likelihood that kids from parents without any schooling or less than completed primary (that we label as “illiterate”) manage to at least complete primary education (we label them as “literate”). As with the relative IM index, we construct absolute IM measures at the country level and for each country also at the region and the ethnicity level. This allows us to examine whether African regions differ in upward educational IM (opportunity) and also examine whether there are significant corresponding ethnic differences. First, we construct the following indicator variables:

illit • I0,ibct = 1 if the parent of individual i born in birth-decade b in country c and observed in census-year t is illiterate (either no schooling or less than completed primary) and zero otherwise.

lit,illit • I1,ibct = 1 if a child i born to illiterate parents in birth-decade b in country c and observed in census- year t is literate and zero otherwise. Again we define literacy as having completed at least primary education.

Pooling observations across all countries and censuses, we run the following regressions:

illit o o y I0,ibct = αc + [γb + δb + θt] + ict (3) lit,illit y o y I1,ibct = αc + [γb + δb + θt] + ict, (4)

illit lit,illit where I0,ict and I1,ict are the indicators for parental and child education. We run the first specification in the full sample of individuals for which we observe previous-generation education; and we estimate the second specification for all children of “uneducated” parents (either with no schooling or without completed y primary). This ensures that the country fixed effects (αc ) can be interpreted as conditional proportions – that is, we want to know what proportion of children of uneducated parents become educated. To account for unobserved factors we also estimate specifications conditioning on birth-cohort fixed effects (for both y o the “young” (δb ) and the “old” (γb )), as well as census-year fixed effects (θt). Then, we run similar o specifications for each country and estimate corresponding measures of average parental education (αr o y y and αe) and absolute IM in education (αr and αe ) at the region- and at the ethnicity level. We estimate country-by-country:

illit o o o o I0,itbcre = [αr + αe] + [γb + δb + θt] + ict (5) lit,illit y y o o I1,itbcre = [αr + αe ] + [γb + δb + θt] + ict, (6) once unconditionally, and once conditioning on birth-cohort effects for old and young, as well as census-year

22 fixed effects.

5.2 Country-level Patterns

5.2.1 Newly Compiled Statistics

Table 3 reports the two country-level estimates of IM.

Table 3: Relative and absolute intergenerational mobility by country

relative (persistence) absolute (mobility) (1) (2) (3) (4) unconditional t, b FEs unconditional t, b FEs ˆ ˆ y y country βc βc αˆc αˆc South Africa 0.363 0.355 0.722 0.527 Nigeria 0.425 0.435 0.669 0.399 Tanzania 0.407 0.436 0.648 0.695 Botswana 0.531 0.446 0.608 0.513 Kenya 0.490 0.502 0.557 0.315 Zambia 0.479 0.448 0.479 0.357 Egypt 0.574 0.533 0.463 0.280 Ghana 0.448 0.428 0.448 0.300 Rwanda 0.427 0.443 0.440 0.439 Cameroon 0.578 0.533 0.433 0.269 Uganda 0.539 0.540 0.399 0.401 Liberia 0.384 0.405 0.363 0.067 Morocco 0.844 0.805 0.290 0.101 Malawi 0.585 0.563 0.285 0.045 Sudan 0.755 0.767 0.240 -0.058 Sierra Leone 0.557 0.579 0.239 0.002 Senegal 0.740 0.736 0.229 0.285 Mozambique 0.649 0.642 0.205 0.013 Guinea 0.597 0.606 0.175 0.011 Ethiopia 0.976 0.946 0.161 -0.016 Mali 0.740 0.731 0.157 -0.048 Burkina Faso 0.740 0.693 0.155 -0.062 South Sudan 0.458 0.481 0.100 -0.196 Column (1) shows the slope from a simple social mobility regression without cohort or

year fixed effects with higher βˆc → more persistence (lower mobility). Column (2) shows the same but conditional on year- and birth-cohort fixed effects for both young and old. Column (3) shows the likelihood of attaining at least primary schooling conditional on y having illiterate (without primary schooling) parents with higherα ˆc → higher mobility without fixed effects. Column (4) shows the same as column (3) but conditional on year- and birth-cohort fixed effects for both young and old. Countries sorted by column (3).

Columns (1)-(2) report the typically estimated (relative) educational IM measures that associate offspring education to their parents’ (equation (1)). Column (1) gives the unconditional estimates; while

23 column (2) reports estimates where we account for census and cohort fixed effects. Columns (3)-(4) report the absolute IM estimates without any controls and with time and cohort fixed effects, respectively (equations (3) and (4)). These estimates give the (unconditional and the conditional on time and birth effects) likelihood that kids whose parents are uneducated, will complete primary education or higher (absolute IM). A couple of interesting patterns emerge. First, the (negative, recall that β measures 1-IM) correlation of the two persistence / IM proxies is strong, but not perfect (0.66 and 0.76 if we drop South Sudan). Second, there is wide variation in both proxies of educational IM. The relative IM index ranges from 0.36 in South Africa to 0.98 in Ethiopia, where children’s education matches almost perfectly their parents’. Even setting aside these two extremes, the relative IM ranges from 0.38 in Liberia to 0.84 in Morocco. The absolute IM index also exhibits wide variation. The likelihood that children of parents without education will manage to complete at least primary education ranges from just 10% in South Sudan to 72% in South Africa and 67% in Nigeria. Third, variation in IM has varies across African regions. We calculate the highest persistence (lowest IM) in education in the Sahel (in Ethiopia, Morocco, Sudan, Burkina Faso and to a lesser extent in Mali and Senegal) and the lowest in South Africa with countries in Western and Eastern Africa being in the middle.

5.2.2 IM and Average Education

We then examine the association between IM and the average level of education. Figure 10 (a) associates the country-specific intercept capturing the mean education of the “young generation” (bac) against the country- specific slope (βˆc) in regression equation (1) that captures persistence in education across generations within households (relative IM). There is an evident negative association between average education (of the “old”) and persistence; in countries with higher average education, the persistence estimate is lower. Examples include South Africa, Nigeria, and to a lesser extent Botswana, Kenya, Ghana, and Tanzania. In contrast, in countries with low average schooling of the “old”, such as Ethiopia, Senegal, Sudan and Mali, persistence is high and educational IM is low. Figure 10 (b) looks at absolute IM; the figure plots the likelihood that kids of uneducated parents manage to complete at least primary education (in the vertical axis) against the share of “old” without completed primary education. A clear negative association emerges between the level of education of the old generation and absolute IM. For example, in Ethiopia, Burkina Faso, North and South Sudan and Morocco, where the share of illiteracy among the “old” generation exceeds 90%; the likelihood that children from parents without schooling will complete primary is below or close to 20%. In contrast, the likelihood that children of parents without primary education will complete primary or higher education exceeds 40% in countries where the “old” generation is –on average– more educated, as, for example, in South Africa, in Nigeria, Zambia and to a lesser extent Botswana, Tanzania and Kenya.23

23The plots show associations conditional on birth-cohort fixed effects for both the young and the old and when we also add census-specific constants to account for trends, differential reporting, and other features. As we show in Appendix Figure 60, the patterns are similar when we do not condition on fixed effects.

24 Figure 10: Conditional on cohort and year fixed effects country-level means and IM

(b) Absolute IM: country-specific likelihood that children of (a) Relative IM: country-specific intercept and slopes uneducated parents become educated

1 uneducated = less than primary

ETH .8

TZA

.6 c MAR .8 ZAF BWA SDN MLI SEN RWA .4 UGA NGA BFA ZMB KEN GHA MOZ SEN EGY CMR .6 GIN .2 SLE MWI

country-specific slope, b CMR UGAEGY MAR LBR KEN MWI SSD GINMOZ 0 SLE ETH ZMB RWA BWA MLI SDN GHA NGATZA BFA LBR .4 share educated children of uneducated parents ZAF -.2 SSD -1.76 -.44 .88 2.2 3.52 4.84 6.16 .63 .76 .89 1.02 1.14 1.27 1.4

country-specific intercept, ac share uneducated old

5.2.3 IM and Initial Education. Heterogeneity

We exploit the richness of the census data to explore heterogeneity with respect to , the type of household residence (urban-rural), and the migration status (of parents). Figures 11 (a), (b), (c) show the results for absolute IM.24

Rural-Urban In Figure 11 (a) we distinguish between urban and rural households (using the IPUMS classification). Three noteworthy results emerge. First, as in the full sample, the likelihood that kids of illiterate parents will manage to complete at least primary education is inversely related to the mean education of the “old” generation for both urban and rural households. Second, there is an evident rural-urban gap with the level of educational IM being lower for rural households (see also Young (2014)). Third, the negative association between absolute IM and the share of “illiteracy” of the old generation is quite steep for rural households, while for urban households the association is flatter. This implies that “poverty trap” dynamics are especially strong in rural Africa.

Gender In Figure 11 (b) we distinguishes by gender (of the “young” generation). The negative association between the likelihood of completing schooling for kids of parents without any education and the share of illiterate (less than primary schooling) of the “old” generation is strong for both . The likelihood of exiting family illiteracy (absolute IM) is higher for boys, as compared to girls. The gender gap in IM is present in all countries. But there is some heterogeneity. The gap is relatively small in South Africa and Botswana and relatively large in Mali, Sudan, and Sierra Leone.

24The patterns are similar when using the relative IM measure and are reported in the Appendix.

25 Figure 11: Country-level means and absolute IM, conditional on year and birth-cohort FEs, by urban / rural residence, gender of the child, and parental migration status

(a) Urban/rural (b) Male/female

uneducated = less than primary uneducated = less than primary .8 1 ZAF CMR TZA ZMB .6 NGA TZA ETH KEN ZAFGHA EGY ZAF TZA RWAUGA BWAZMB SDN .5 BWA NGA MWIBWA ZAF GHA .4 LBR TZA EGY UGA NGA GINMOZ BFA KEN SEN CMR RWA BWA NGA LBR KEN MLI UGA KEN MAR SEN SLE CMREGY SENZMB MWI ZMB SLECMRGINMOZ .2 UGARWA GHA ETH GHA EGY MLI 0 BFA SDN SSD SSD RWA MAR MWI GINLBRMOZMWIBFAETH 0 LBR SLE MLI SDN GIN SLE ETH SEN MOZ MLI SDN SSD BFA share educated children of uneducated parents -.2 SSD share educated children of uneducated parents -.5 .39 .53 .68 .82 .97 1.11 1.26 .62 .75 .89 1.02 1.15 1.28 1.42 share uneducated old share uneducated old

urban rural male female

(c) Non-migrant and migrant parents

uneducated = less than primary ZAF .6 ETHBWA ZAF BWA KEN ZMB EGY .4 TZATZA CMR GHA EGY ZMB KENGINMOZ RWAGHA .2 SLE UGA LBR MLI SDN CMR RWA UGA SEN BFA MWI GIN 0 MOZ BFA LBRMWI SEN SLE ETH MLI SDN -.2 SSD

share educated children of uneducated parents SSD .48 .65 .82 .99 1.16 1.33 1.5 share uneducated old

migrant parents non-migrant parents

Migrant Status In Figure 11 (c) we distinguish observations by the migration status of the “old”. Out of the roughly 8.5 million observations linked to an older generation and for whom we have census information on migration, 6.5 million have non-migrant parents.25 The observations with migrant parents are shifted to the north-west with respect to those with non-migrant parents. This suggests both that migrant parents are less likely to be uneducated and that, conditional on parents having no education, children of migrants parents are more likely to complete at least primary schooling.

25Appendix F.3 discusses the migration data in more detail.

26 5.2.4 Sample Selection

The IM sample is restricted to offspring who reside with their parents. Hence, there are concerns of sample selection, as we do not observe parents and kids who live apart. Presumably intergenerational influences are more salient for children living at home under the direct influence of parents. If this is the case our measure of IM may an underestimate.26 For most of our analyses, an overall upward or downward bias of the IM estimates would not be a major problem, unless this selection bias differentially affects countries, regions, or ethnicities. In our setting, where most of the “action” on IM comes from kids of parents without any education managing to complete primary education, sample selection can be addressed in a relatively straightforward manner. Since most primary schooling begins in the age of six and lasts at most six years, for most 15- year-olds we can safely assess whether they have completed primary schooling or not. And as by that age, the overwhelming majority of children still live with their parents, sample selection is a lesser concern.27

Figure 12: Country-level absolute IM for children aged 15-17 and 23-25 net of year- and cohort fixed effects 1 ZAF BWA

TZA EGY GHACMRNGA ZAFZMB KEN MAR UGABWA .5 NGA TZARWA SEN KEN RWA EGY SLE GIN ZMB CMRGHA ETHMLI LBRUGA MWI BFA MOZ MAR SDN LBR MWI SEN SSD GIN ETH SLE 0 MOZSDN MLIBFA SSD -.5 .4 .6 .8 1 1.2 share old without completed primary

age 15-17 age 23-25 share kids of parents without primary completing at least

To make inferences about the direction of the sample selection bias, ideally we would like to have two random samples of children: one in which we observe parental education for everyone – those who have moved out and those who remain – and another sample for which we observe parental education of only the stayers. Among 15-year-olds children, we have a reasonable approximation of the first group. A large portion of them will eventually move out, but most of them have not yet done so. The 25-year-old ones mirror the overall population in that most have moved out (in fact the median age for a person for whom we observe the previous generation’s education is 24). While it is true that 15-year-olds and 25-year-olds

26Sample selection is likely most severe for college education, as individuals often move away from their families to attend university. Since in Africa tertiary education is low around 4% across birth cohorts, the fact that we miss people who move to attend colleges is a minor concern. See appendix F.4.1 for plots of the shares of individuals with completed higher education by birth-cohort for our sample and by year from the Barro-Lee data. 27Appendix Figure 62 shows the likelihood of observing parental education by age of individual – once unconditional, and once conditional on country- and census-year fixed effects.

27 are not exactly comparable – they have grown up and received education at slightly different points in time – their main distinguishing feature is the severity of sample selection, which kicks in over this age range. We make use of this feature to construct estimates for non-primary-to-primary absolute IM for the age group of 15-17-year olds and the 23-25-year olds. Figure 12 shows country-level estimates of less-than-primary to primary absolute IM net of year and cohort fixed effects for children aged 15-17 and children aged 23-25, respectively. Three points are worth highlighting. First, the share of uneducated parents for these groups are comparable (the dots are not systematically shifted in the x-direction). Second, IM is substantially higher for 15-17 year olds, indicating that restricting the analysis to coresident households underestimates the true extent of IM. Third, the negative relationship between the share of uneducated parents and IM is not affected by sample selection.

6 Regional Patterns

In this Section we first provide a mapping of the land of educational opportunity in Africa and then explore the correlates of IM across administrative units.

6.1 Where is the Land of Opportunity in Africa?

Figure 13 (a) portrays the distribution of absolute IM (the likelihood that children of uneducated parents manage to complete at least primary education28 across 2, 444 African (admin-2) districts.29 The average and the median are around 0.38; but the range is wide. The percentage of the variance explained by the country constants is 0.70 (for the estimates conditional on year and birth-cohort fixed effects, the R2 drops to 48%). We observe non-negligible regional remaining differences within countries. As an example, Figure 14 (a) portrays the variability in absolute IM across 102 admin-2 units in Ghana. While average IM in Ghana is 0.50, regional differences range from 0.221 − 0.692. As Appendix Table 38 shows this wide variability in IM applies to almost all countries.30 Figure 13 (b) portrays the share of old individuals without completed primary education across the continent, while Figure 14 (b) zooms in on Ghana. There is an evident negative association between the share of illiterate among the “old” generation and absolute IM, a pattern that echoes the cross-country association in Figure 10 (b). Figures 15 (a)-(b) shows the association between absolute IM and the share of uneducated among the old generation at the coarse and fine level of regional disaggregation. Regional IM is strongly linked to a the lack of education among the previous generation. The R2 of the simple unconditional relationship at the admin-1 level is 73%; when we add country fixed effects the within-R2 is 67.5%.

28Unconditional estimates, i.e., not conditioning year and cohort fixed effects from equations (5) and (6) estimated country- by-country). 29Appendix Table 36 gives summary statistics. 30For example, in Burkina Faso the average 0.155 IM estimate masks regional IM estimates ranging from 0.04 to 0.469. In Uganda the range in IM across regions is even wider [0.042 − 0.725]. The regional differences in IM are also present when we use the intergenerational persistence in schooling estimate (see Appendix table 37).

28 Figure 13: Pan-Africa: District-level share of uneducated old and likelihood that children of uneducated parents complete at least primary

(a) Fraction of children of uneducated old completing (b) Fraction uneducated (less than primary) parents; darker at least primary; darker colors → higher IM colors → lower education

missing missing 0.000 - 0.136 0.000 - 0.325 0.137 - 0.247 0.326 - 0.416 0.248 - 0.358 0.417 - 0.500 0.359 - 0.460 0.501 - 0.584 0.461 - 0.556 0.585 - 0.688 0.557 - 0.644 0.689 - 0.794 0.645 - 0.722 0.795 - 0.875 0.723 - 0.801 0.876 - 0.923 0.802 - 0.880 0.924 - 0.953 0.881 - 1.000 0.954 - 1.000

Figure 14: Ghana: District-level share of uneducated old and likelihood that children of uneducated parents complete at least primary

(a) Fraction of children of uneducated old completing at (b) Fraction uneducated (less than primary) parents; darker col- least primary; darker colors → higher IM ors → lower education

0.148 - 0.273 0.236 - 0.370 0.274 - 0.382 0.371 - 0.447 0.383 - 0.544 0.448 - 0.483 0.545 - 0.589 0.484 - 0.510 0.590 - 0.623 0.511 - 0.531 0.624 - 0.653 0.532 - 0.560 0.654 - 0.667 0.561 - 0.624 ! 0.668 - 0.697 ! 0.625 - 0.829 Accra Accra 0.698 - 0.730 0.830 - 0.882 0.731 - 0.822 0.883 - 0.925

29 The highly significant estimate suggests that a one percentage point increase in literacy of the old generation is associated with a one percent higher likelihood that the children of parents without any education will manage to complete primary schooling or higher.31 This pattern is pervasive, as it applies to all countries. The negative association is especially strong for rural households, and it applies to both genders. In the Appendix we report various scatterplots illustrating the negative association between schooling of the “old” generation and educational IM using both the intergenerational educational persistence estimate or the share of kids from uneducated parents who complete at least primary school.

Figure 15: Likelihood of children of parents with less than primary completing at least primary school

(a) Admin-1 provinces (b) Admin-2 districts

1 1

.8 .8

.6 .6

.4 .4

.2 .2

0 0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1 share uneducated old share uneducated old

share educated children of uneducated parents Burkina-Faso Botswana Cameroon Egypt share educated children of uneducated parents Burkina-Faso Botswana Cameroon Egypt Ethiopia Ghana Guinea Kenya Ethiopia Ghana Guinea Kenya Liberia Morocco Mali Mozambique Liberia Morocco Mali Mozambique Malawi Nigeria Rwanda Sudan Malawi Nigeria Rwanda Sudan Senegal Sierra-Leone South Sudan Tanzania Senegal Sierra-Leone South Sudan Tanzania Uganda South Africa Zambia Uganda South Africa Zambia

6.2 Correlates of Regional IM

We now examine the correlates of regional IM within countries. We do not aim to identify causal effects, but simply uncover its main correlates. We run specifications linking the (relative and absolute) IM estimates

(IMr,c) with location-geography, historical, and upon-independence population characteristics, accounting

for country fixed effects (θc). The specification reads:

o IMr,c = θc + Gr,cΦ + Hr,cΓ + Zr,cΨ [+λEr,c] + ζr,c. (7)

In equation (7), Gr,c are geographic features of region r in country c; Hr,c denotes historical, colonial and

precolonial, characteristics, and Zr,c are variables computed as averages from census data for individuals born before 1960. We estimate correlations of these variables with IM one at a time – that is, each version

of equation (7) has only one Gr,c or Hr,c or Zr,c. Since the education level of the old generation is a strong o correlate of IM, we also report specifications controlling for it, Er,c. The appendix provides definitions and sources for all variables used in the regional analysis and gives summary statistics.

31For the admin-2 districts, the numbers are R2 of 0.71 for simple OLS, 0.88 for country FEs (66% within R2), coefficient of −1 for OLS, −0.78 for country FEs.

30 6.2.1 Geography

Table 4 presents the analysis on the geographic correlates of absolute IM; we report in the Appendix analogous estimates with the relative IM index that are quite similar. The table reports three specifications. In column (1) we examine the role of the various geographic variables in explaining variation of illiteracy among the “old” (specification A). Column (3) associates IM with various geographic/locational/ecological features simply conditioning on country fixed effects (specification B). Column (5) repeats estimation conditioning also on the share of the old generation without completed primary education (“illiterate old”) that correlates strongly with IM (specification C).

Natural Resources A large literature on the “natural resource curse” has linked conflict and other aspects of underdevelopment to the presence of oil, diamonds, and other precious metals. [See, among others, Ross (2004), Berman et al. (2017), Guidolin and La Ferrara (2007).]32 We thus associated IM with distance to the closest diamond mine and oil field. Rows (1) and (2) give the results. There is no significant association between proximity to natural resource and either relative or absolute IM. We also examined whether IM is related to proximity to other mineral sites (like silver or platinum mines), without detecting any significant correlation. Proximity to natural endowments is uncorrelated with illiteracy (column (1)).

Distance to the Capital Much evidence documents the limited capacity of modern African states to exercise control far from the capitals. [See, among others, Herbst (2000), Michalopoulos and Papaioannou (2014)] The unconditional correlation in column (3) suggests a significant association between proximity to capitals and IM. A 10 percent increase in distance to the capital is related to 0.05 increase in the share of illiteracy among individuals whose parents had not completed primary education [standardized “beta” coefficient of −0.24]. However, when we control for the share of illiterate old in a district (in column (5)), the coefficient on log distance to the capital drops in absolute value (−0.015) since proximity to the capital is strongly related to the stock of literacy (column (1)).

Distance to the Border Border areas in many parts of the continent appear unruly and a sizeable part of conflict takes place in areas close to the national border, which often also split ethnic groups between two or more modern states.33 There is no association between distance to the border and IM. There is some weak evidence that in areas proximate to borders literacy rates are -on average- lower, though the correlation does not pass standard significance levels (column (1)).

Distance to the Coast The distance to the coast is linked to the presence of Europeans during colonization (that mostly settled in coastal areas and towns) and the impact of African slave trade that

32In recent work Hohmann (2018) shows that across African regions natural resource shocks are associated with higher education and structural transformation. 33See Alesina, et al. (2011) and Michalopoulos and Papaioannou (2016) for evidence linking border artificiality and ethnic partitioning to underdevelopment and conflict.

31 Table 4: District-level correlates of absolute mobility: Geography

LHS = share illiterate old LHS = share literate children of illiterate old Specification A Specification B additional variable N additional variable N additional variable share illiterate old N (1) (2) (3) (4) (5) (6) (7) -0.785∗∗∗ 2441 (.0822) oil field dummy -.0006 2440 .0174 2440 .017 -0.785∗∗∗ 2440 (.0297) (.0371) (.0171) (.0821) diamond mine dummy .0171 2440 -.0269 2440 -.0134 -0.784∗∗∗ 2440 (.0218) (.0187) (.0086) (.082) log(distance to capital) 0.051∗∗∗ 2440 -0.052∗∗∗ 2440 -0.015∗ -0.748∗∗∗ 2440 (.0097) (.0107) (.0075) (.0778) log(distance to national border) -.0042 2440 .0001 2440 -.0032 -0.785∗∗∗ 2440 (.0081) (.009) (.0054) (.0813)

32 log(distance to coast) 0.021∗∗∗ 2440 -0.023∗∗ 2440 -.0064 -0.778∗∗∗ 2440 (.0077) (.0091) (.0043) (.0803) log(distance to closest river) -.0003 2440 -.0044 2440 -.0046 -0.785∗∗∗ 2440 (.0057) (.0078) (.0039) (.0815) log(distance to closest lake) .0113 2440 -.0145 2440 -.0057 -0.782∗∗∗ 2440 (.0104) (.0101) (.0042) (.0815) log(agricultural suitability) -.0017 2410 .0058 2410 .0044 -0.784∗∗∗ 2410 (.0136) (.0188) (.01) (.0838) log(stability of malaria transmission) .0066 2178 -0.009∗ 2178 -0.004∗∗ -0.811∗∗∗ 2178 (.0043) (.0046) (.0016) (.0659) log(elevation) -.0059 2431 .0024 2431 -.0022 -0.782∗∗∗ 2431 (.0126) (.0131) (.0056) (.082) log(terrain ruggedness) -0.017∗ 2438 0.023∗∗ 2438 .0101 -0.776∗∗∗ 2438 (.0086) (.0105) (.0069) (.0823) The dependent variable in columns (1)-(2) is the fraction of illiterate parents (incomplete primary). In columns (3)-(7) it is the fraction of children of parents without primary education who complete at least primary school. All specifications include country fixed effects (not reported). Specification A adds one additional variable at a time. Specification B adds the same set of variables and also controls for the fraction of parents without primary school in the district. Standard errors clustered at the country-level in parentheses. ∗p < 0.1, ∗ ∗ p < 0.5, ∗ ∗ ∗p < 0.01. Blue lines in the tables indicate that variables are significantly correlated with mobility even when we control for the stock of illiterate parents (column (5)). was pervasive in coastal areas.34 As the level specification A in column (1) shows, distance to the coast is correlated with illiteracy, reflecting, among other things, the relatively higher levels of development in coastal areas (Henderson et al. (2018)). IM is somewhat lower in areas proximate to the coast (column (3)), but the coefficient loses significance, being tiny, when we control for the share of the illiteracy of the previous generation (column (5)). Similarly, distance from rivers and lakes does not matter for IM.

Malaria Malaria has been invariably linked to Africa’s underdevelopment.35 Row 9 reports the results of the three specifications. In line with earlier works, the level specification in column (1) shows that illiteracy is weakly higher in places with ecological conditions favorable to malaria. Besides its “level” effect, malaria is also related to IM, as column (3) shows. When we condition on the share of illiteracy of the old generation in the region, the coefficient on the malaria suitability index falls in absolute value, but retains its statistical significance (column (5)). Conditional on the share of illiterate parents, a one standard deviation increase in malaria suitability is associated with a 0.04 standard deviation fall in the share of children of illiterate parents who manage to at least complete primary education (beta coefficient).

Other Geographic Aspects We also linked IM to various measures reflecting geographic/land endowments, namely soil quality (land suitability) for agriculture, elevation, and terrain ruggedness, as works have linked these features to regional development (e.g., Michalopoulos (2012), Nunn and Puga (2012), Aslan (2016)). There are no evident systematic patterns, as neither of these variables correlates with IM once the education level of the old cohorts is taken into account.

6.2.2 Historical Traits

Table 5 reports specifications associating the IM index with various historical variables (in the appendix we report analogous specifications with the relative IM measure). The structure is the same as in table 4. In each row we report three specifications that explore the role of the specific historical trait (in columns (3) and (5)) and on the illiteracy rate of the “old cohort” (in column (1)).

Development at independence We find a positive correlation between IM and the (log of) population density in 1950, which for most countries in our sample corresponds to the period just before independence (column (3)). The correlation retains statistical significance, once we control for the share of parents without any education (column (5)). So there is some evidence of inertia as IM is related to the level of development in the end of colonization.

Colonial Infrastructure Investments Colonial investments in railroads and roads seem to have played a crucial role in shaping African countries’ post-independence development path (See Kerby, Jedwab and Moradi (2017), and Jedwab and Moradi (2016)).36 We find a strong positive association between

34See Nunn ( 2008) and Nunn and Wantchekon, (2011). 35See, among others, Gallup and Sachs (2001), Sachs (2003), Cervellati and Sunde (2015), Weil (2017), Cervellati et al. (2016). 36Data on colonial roads come from Jedwab, Moradi, and Kerby (2016) and cover all Sub-Saharan African countries, but South Africa. So, in these specifications we drop South African regions.

33 proximity to railroads and literacy among the “old” (row (2), column (1)). Log distance to colonial railroads is related also to the share of literacy for children whose parents have not completed primary education (column (3)).37 The standardized beta coefficient that reflects the impact of the one standard deviation change of distance to railroads on IM is −0.26 in column (3)). The correlation is robust to adding province fixed effects (reported in the Appendix). There is also a strong positive association between proximity to colonial roads, which we find using the recently compiled data of Jedwab and Storeygard (2017) and IM (column (3)). The correlation retains significance once we control for the share of literacy of the “old” (in column (5)). These results suggest that colonial infrastructure investments not only had an impact on development at independence (as reflected on the education level of the “old”), but also on the intergenerational transmission of education beyond any initial effect.

Colonial Missions We also examine the correlation between IM and proximity to colonial missions using digitized data from Nunn (2012) and Cage and Rueda (2016). Overall there are 1, 321 (361 Catholic, 933 Protestant, 27 British and Foreign Bible Society) and 723 (Protestant only) missions in these data sets. We find a strong within-country positive association between proximity to Christian missions and literacy rates of the “old” (column (1)). Log distance to Christian missions is a significant correlate of IM (column (3) and even we we control for the share of literacy among the old (column (5)). The same result holds for relative IM (see Appendix). Row (5) and (6) show that IM is correlated more strongly with the presence of Protestant as compared to Catholic missions.

Precolonial Political Centralization and Early Statehood We then explored the correlation between IM and pre-colonial political centralization that recent works have linked to contemporary devel- opment.38 Row (7) reports specifications that correlate IM (and the share of literacy among the old cohort) with the distance to the centroid of the nearest large kingdom or empire using data from Brecke (1999), as geocoded by Besley and Reynal-Querrol (2015). In row (8) we use log distance to precolonial states using Murdock’s data (1959, 1967) though data is missing for some parts of the continent. There is no systematic link between distance to precolonial states and IM nor with the level of literacy (or mean years of schooling) for the old. The results are similar with the relative IM index that is likewise uncorrelated with proximity to precolonial states.

37The estimate implies that in districts very close to colonial railroads (log distance of 0) the percentage of literacy for individuals born to illiterate parents is around 18.5 points higher compared to districts being 150 km far from colonial roads (ln(150) × 0.037 ≈ 0.185). 38See among others Michalopoulos and Papaioannou (2013, 2015), Gennaioli and Rainer (2006, 2007), and Depetris-Chauvin (2017).

34 Table 5: District-level correlates of absolute mobility: History

LHS = share illiterate old LHS = share literate children of illiterate old Specification A Specification B additional variable N additional variable N additional variable share illiterate old N (1) (2) (3) (4) (5) (6) (7) -0.785∗∗∗ 2441 (.0822) HYDE population density in 1950 -0.028∗∗∗ 2438 0.027∗∗∗ 2438 0.006∗ -0.764∗∗∗ 2438 (.0099) (.0097) (.0032) (.0831) log(distance to closest colonial railroad) 0.037∗∗∗ 1935 -0.039∗∗∗ 1935 -0.009∗∗∗ -0.826∗∗∗ 1935 (.0089) (.0096) (.0031) (.0715) log(distance to closest improved or better road in 1960) 0.037∗∗∗ 2151 -0.034∗∗∗ 2151 -0.006∗∗ -0.760∗∗∗ 2151 (.0078) (.0067) (.0025) (.0898) log(distance to closest mission) 0.051∗∗∗ 2440 -0.052∗∗∗ 2440 -0.014∗∗∗ -0.740∗∗∗ 2440 35 (.0106) (.0116) (.0038) (.0742) log(distance to closest Catholic mission (Nunn only)) 0.052∗∗∗ 2440 -0.045∗∗∗ 2440 -.0039 -0.778∗∗∗ 2440 (.0142) (.0151) (.0072) (.0857) log(distance to closest Protestant mission) 0.050∗∗∗ 2440 -0.050∗∗∗ 2440 -0.013∗∗∗ -0.747∗∗∗ 2440 (.0115) (.0123) (.0036) (.0752) log(distance to closest precolonial empire (Besley and Reynal-Querol)) -.0054 2440 -.0117 2440 -.0159 -0.786∗∗∗ 2440 (.0209) (.0184) (.0149) (.0776) log(distance to closest precolonial state (Murdock)) .0026 2440 -.0111 2440 -.0091 -0.784∗∗∗ 2440 (.0147) (.0121) (.0085) (.0808) The dependent variable in columns (1)-(2) is the fraction of illiterate parents (incomplete primary). In columns (3)-(7) it is the fraction of children of parents without primary education who complete at least primary school. All specifications include country fixed effects (not reported). Specification A adds one additional variable at a time. Specification B adds the same set of variables and also controls for the fraction of parents without primary school in the district. Standard errors clustered at the country-level in parentheses. ∗p < 0.1, ∗ ∗ p < 0.5, ∗ ∗ ∗p < 0.01. Blue lines in the tables indicate that variables are significantly correlated with mobility even when we control for the stock of illiterate parents (column (5)). 6.2.3 Contemporary Covariates for individuals born before 1960

Finally, we explore the correlation of IM with a set of variables that we derive from the census data itself, where we restrict the sample to individuals from which we construct each variable to those born before 1960. The idea is to correlate IM with some measure of initial conditions at independence. In selecting the covariates from the census, we follow Chetty et al. (2014) who explore the correlation of regional IM with a wide range of covariates across U.S. Commuting Zones. The results are reported in table 6.

Inequality We first examined the association between IM and inequality across African regions. The Gini index correlates strongly with both the share of uneducated parents (positive association, column (1)) and with IM (negative association). The latter holds unconditionally (column (3)) and when we condition on the share of uneducated parents (column (5)). This result mirrors the findings of Chetty et al. (2014) and is known in the literature as the “Gatsby curve” (Krueger (2012), Corak (2013)).

Urbanization and migration Then we examined the association between IM and urbanization and migration. The share of individuals living in urban areas and the share of individuals classified as migrants are both strongly negatively related to the share of uneducated old. Unconditionally, IM is also positively associated with both variables; educational IM is higher among more urbanized regions and in districts with higher migration. But this correlation seems entirely driven by initial conditions; it becomes statistically indistinguishable from zero, once we condition on the share of uneducated “old” in the district (column (5)).

Family structure We then examined the role of family structure. The share of adult individuals who are married correlates strongly positively with illiteracy and, unconditionally, strongly negatively with IM. As with urbanization, however, this correlation seems again entirely driven by initial conditions and is not significant conditional on the share of illiterate parents.

Labor force composition Finally, we examined the association between IM and industrial spe- cialization, as reflected on employment in agriculture, manufacturing, and services. The share of individuals employed in the modern sector (manufacturing and services) correlates positively with IM and negatively with the share of uneducated parents. Conversely, the agricultural labour share correlates negatively with IM and positively with the share of uneducated old.39 Conditioning on the share of uneducated old in column (5), we find that the positive association of IM with the services labour share remains significant, as does the negative association of the agricultural labour share (though significance drops to the 10% level for the latter). The manufacturing labour share is still positively associated with IM but no longer significant.

39These three categories account for most of employment. We have left out employment in construction and public utilities.

36 Table 6: District-level correlates of absolute mobility: cohorts included in contemporary covariates = up to 1960

LHS = share illiterate old LHS = share literate children of illiterate old Specification A Specification B additional variable N additional variable N additional variable share illiterate old N (1) (2) (3) (4) (5) (6) (7) -0.785∗∗∗ 2441 (.0822) Gini index 1.050∗∗∗ 2441 -0.908∗∗∗ 2441 -0.562∗∗∗ -0.330∗∗∗ 2441 (.0525) (.1073) (.1479) (.0745) urban share -0.313∗∗∗ 1868 0.224∗∗∗ 1868 -.0269 -0.802∗∗∗ 1868 (.049) (.0422) (.0376) (.0784) migrant share -0.322∗∗∗ 2320 0.247∗∗∗ 2320 -.002 -0.773∗∗∗ 2320 (.0469) (.0341) (.0277) (.0894)

37 share married individuals 0.617∗∗ 2441 -0.599∗∗ 2441 -.1189 -0.778∗∗∗ 2441 (.302) (.2601) (.1194) (.0813) agricultural labour share 0.433∗∗∗ 2267 -0.392∗∗∗ 2267 -0.112∗ -0.644∗∗∗ 2267 (.0538) (.0306) (.0591) (.1211) manufacturing labour share -0.816∗∗∗ 2267 0.700∗∗∗ 2267 .0822 -0.757∗∗∗ 2267 (.25) (.2274) (.0995) (.1021) services labour share -0.537∗∗∗ 2267 0.510∗∗∗ 2267 0.177∗∗ -0.619∗∗∗ 2267 (.0827) (.0585) (.0751) (.0975) The dependent variable in columns (1)-(2) is the fraction of illiterate parents (incomplete primary). In columns (3)-(7) it is the fraction of children of parents without primary education who complete at least primary school. All specifications include country fixed effects (not reported). Specification A adds one additional variable at a time. Specification B adds the same set of variables and also controls for the fraction of parents without primary school in the district. Standard errors clustered at the country-level in parentheses. ∗p < 0.1, ∗ ∗ p < 0.5, ∗ ∗ ∗p < 0.01. Blue lines in the tables indicate that variables are significantly correlated with mobility even when we control for the stock of illiterate parents (column (5)). 6.3 Summary

Figure 16: Dot plot of correlates of district-level IM

oil field dummy distance to capital diamond mine dummy distance to border distance to coast distance to river distance to lake geography agricultural suitability malaria transmission elevation terrain ruggedness −0.50 −0.25 0.00 0.25 population density distance to railroad distance to road distance to mission distance to Catholic mission history distance to Protestant mission distance to precolon. empire distance to precolon. state −0.50 −0.25 0.00 0.25 Gini index urban share migrant share agricultural labour share manufacturing labour share services labour share share married individuals contemporary, up to 1960 contemporary, share individuals in single hhs −0.50 −0.25 0.00 0.25 point estimate and 90% CI This figure shows point estimates (standardized coefficients) and 90% confidence intervals for correlates of district-level IM. The estimates come from regressions with the likelihood that a child born to uneducated parents in a given district achieves at least primary schooling on the left-hand-side and the different covariates on the right-hand-side. All regressions condition on country-fixed-effects and the share of parents with less than primary attainment. Standard errors clustered at the country-level. Red point estimates indicate significance at the 10% level.

Overall the results suggest that educational IM in Africa is related only to a handful of geographical traits, namely the region’s malaria ecology as well its proximity to the capital city. The transmission of human capital across generations correlates significantly with population density during the late colonial period, even when we condition on the share of initial literacy that is strongly related to population density.

38 This result suggests that younger cohorts residing in initially more urbanized and developed places did better, as they managed to complete primary (or in some cases secondary and tertiary) education and escape family illiteracy. IM is also linked to colonial-era railroad and road investments, as well as colonial missions, suggesting that though limited, such investments may have influenced the pace of human capital accumulation and hence the post-independence development paths across regions. Finally, on contemporary covariates, IM is strongly negatively linked with initial inequality, a result that accords well with cross- country patterns. Moreover, the composition of the labour force is related to IM, with more agriculturally specialized districts showing less IM than more services-oriented districts. Notice that this appears to be not just about cities, as the initial urban share does not correlate. Figure 16 summarizes the correlations by displaying standardized (“beta”) coefficients for the dif- ferent variables conditional on country fixed effects and the share of uneducated parents.

7 Ethnic Patterns

7.1 Which Ethnicities Have High Social IM in Africa?

We compute within-family persistence in years of schooling at the ethnicity-level (relative IM), pooling all censuses for a given country and estimating:

s1,itbce = αe + βes0,itcbe + [γ0,b + δ1,b + θt] + 1,itce for each country c (8)

Likewise, we estimate ethnicity-specific conditional likelihoods that individuals whose parents had not completed primary education manage to complete at least primary schooling, escaping family “illiteracy” (absolute IM), running:

illit o o o I0,itbce = αe + [γb + δb + θt] + icte (9) lit,illit y o o I1,itbce = αe + [γb + δb + θt] + icte, (10)

Appendix table (39) reports the relative and absolute IM estimates for all 178 ethnic groups. As with regional IM, the table shows the unconditional estimates and also the estimates netting out cohort and cen- sus fixed effects. The relative and absolute estimates correlate strongly (ρ = −0.76 between ethnicity-level persistence and ethnicity-level unconditional likelihood that child of illiterate parents becomes literate). There is wide variation in social IM across ethnic groups across the 14 countries for which IPUMS reports ethnic information. Country features explain a sizable portion of the variance (R2 = 68%.). But there is variation within countries too. Figures 17 (a)-(b) portray the relationship between the ethnicity-specific “stock” of “old’s” education and ethnic-specific relative and absolute intergenerational IM. Panel (a) plots the ethnic-specific slope of intergenerational persistence in education (in the vertical axis) against the ethnic-specific intercept (in equation (8)) that captures mean education of the parents (in the horizontal axis). Panel (b) plots the share of individuals whose parents have not completed primary schooling (“illiterate”) and have managed to

39 complete at least primary education (in the vertical axis) against the share of the “old” without completed primary at the ethnicity level (in the horizontal axis).

Figure 17: Ethnic-specific IM and means, relative and absolut

(b) Absolute IM: probability of exiting illiteracy (less than pri- (a) Relative IM: conditional means and persistence mary)

1 1

.8 .8

.6 .6

.4 .4

.2 .2 ethnic specific slope

0 0

-2 0 2 4 6 8 share educated children of uneducated parents 0 .2 .4 .6 .8 1 ethnic specific intercept share uneducated old Burkina-Faso Botswana Ethiopia Ghana Burkina-Faso Botswana Ethiopia Ghana Liberia Mali Morocco Mozambique Liberia Mali Morocco Mozambique Malawi Senegal Sierra-Leone Uganda Malawi Senegal Sierra-Leone Uganda South Africa Zambia South Africa Zambia

There is an evident negative association between the mean education of the old cohorts and the relative IM index capturing the sensitivity of individuals’ education to their parents education (17 (a)). Likewise, we observe a strong link between the ethnic share of uneducated old and the likelihood that children from illiterate parents will complete at least primary school (17 (b)). The association is significantly negative in most of the countries. The ethnic-specific patterns, therefore, echo the cross-country and the cross-region patterns (reported in Figures 10 and 15, respectively).40

7.2 Correlates of IM across Ethnicities

We now examine (table 7) the correlates of social IM across ethnic lines. The relative size of the group (the percentage of the population accounted for by the ethnic group) is not significantly associated with either the share of illiterate parents or intergenerational IM. We the use the Ethnic Power Relations database (Wimmer, Cederman, and Min (2011)) that records information on the status of ethnicities in African national politics since 1960 to explore whether social IM is related to ethnicities’ influence in the government. Only the dummy variable for powerless ethnic groups is positively associated with illiteracy and negatively with IM. However, the correlation with IM turns insignificant once we condition on the share of illiterate parents. We then examine the correlation between IM and ethnic social, political, and economic organization traits using information from Murdock’s Ethnographic Atlas (Murdock (1967)).

40In the Appendix we further explore the association between the “stock” of education (and literacy) of the old cohorts and social IM using both the relative and the absolute IM measures. To start with the “poverty-trap” dynamics are present for both men and women and for both rural and urban households.

40 Table 7: District-level correlates of absolute mobility: Population density and infrastructure

LHS = share illiterate old LHS = share literate children of illiterate old Specification A Specification B variable N variable N variable share illiterate old N (1) (2) (3) (4) (5) (6) (7) -.6646∗∗∗ 178 (.1222) relative group size in country -.0489 178 .0484 178 .016 -.664∗∗∗ 178 (.0747) (.0679) (.0446) (.1225) dummy = 1 if EG ever discriminated .0013 113 -.0129 113 -.0117 -.8585∗∗∗ 113 (.0194) (.0222) (.0132) (.0856) number of years EG discriminated -.0007 113 .0002 113 -.0003 -.8615∗∗∗ 113 (.0008) (.0009) (.0005) (.0882) dummy = 1 if EG ever part of ethnic war .0097 113 -.0133 113 -.005 -.8579∗∗∗ 113 (.0108) (.02) (.0165) (.0855) dummy = 1 if EG ever powerless .0512∗∗∗ 113 -.0605∗∗ 113 -.0175 -.8389∗∗∗ 113 (.0194) (.0306) (.0274) (.0772) dummy = 1 if EG ever had regional autonomy -.0012 113 .0334 113 .0323 -.8587 113 (0) (0) (.0001) (.086) jurisdictional hierarchy .002 134 -.0003 134 .0009 -.6129∗∗∗ 134 (.006) (.0087) (.0078) (.1505) polygyny .0683∗∗ 155 -.0352∗ 155 .0103 -.6655∗∗∗ 155 (.0315) (.0182) (.0169) (.1292) dummy = 1 if pastoralism contributes most .0199 140 -.1331∗∗∗ 140 -.1192∗∗∗ -.6998∗∗∗ 140 (.0648) (.047) (.0284) (.1214) dummy = 1 if agriculture contributes most .0051 140 .0757∗ 140 .0794∗∗∗ -.7184∗∗∗ 140 (.0489) (.0457) (.0278) (.1271) Nunn slavery measure -.0011 134 -.0013 134 -.0022 -.7714∗∗∗ 134 (.0025) (.0042) (.0033) (.1005) The dependent variable in columns (1)-(2) is the fraction of illiterate parents (incomplete primary). In columns (3)-(7) it is the fraction of children of parents without primary education who complete at least primary school. All specifications include country fixed effects (not reported). Variables enter one at a time. Specification A adds one additional variable at a time. Specification B adds the same set of variables and also controls for the fraction of parents without primary school in the district. Standard errors clustered at the country-level in parentheses. ∗p < 0.1, ∗ ∗ p < 0.5, ∗ ∗ ∗p < 0.01. Blue lines in the tables indicate that variables are significantly correlated with mobility even when we control for the stock of illiterate parents (column (5)).

We start with Murdock’s 0 − 4 jurisdictional hierarchy beyond the local community index that is often used to proxy for the degree of political complexity of African ethnicities’ at the time of colonization. This variable is uncorrelated with the absolute IM index (columns (3) and (5)) as well as with the share of illiteracy among the “old” generation (column (1)). The results are similar with the relative IM index (see Appendix) or when we transform Murdock’s variable into a binary category. We then examine the role of polygyny -that is still practiced in many parts of the continent. In line with earlier works, we find that ethnic groups that practiced polygyny at the onset of colonization have higher illiteracy rates, on average by 6.8% (column (1)). There is also some evidence that polygyny correlates with educational IM (column (3)), though the coefficient on the dummy variable that identifies ethnicities where polygyny is widely practiced loses significance once we condition on the share of illiteracy

41 (column (5)). We then explore the association between IM and variables that reflect ethnicities’ economic orga- nization. As Michalopoulos, Putterman, and Weil (2017) using DHS data show, descendants of African groups that practised agriculture (as opposed to pastoralism) during the pre-colonial era are observed to be more educated and wealthy today. A natural question is whether these ethnic-specific differences in education were already present at the end of the colonial era or emerged over the course of the last 50 years in Africa. We define two dummy variables identifying ethnicities where either pastoral activities or agriculture contributed the most in precolonial times41 and then associate them with the absolute IM index. The pastoralism indicator enters with a significantly negative coefficient (−0.13), while the agricul- ture specialization dummy enters with a significantly positive estimate (0.075) in column (3). These results remain robust when we control for the share of illiteracy among the old (in column (5) that is itself not much correlated with the pastoral and agricultural dummies. This is an interesting finding as it suggests that precolonial traits (in this case the historical mode of subsistence) may manifest their influence on educational attainment not in the initial conditions but over time as they may influence the rate at which initial levels of human capital are transmitted intergenerationally. Using data from Nunn (2011), we correlate IM with a measure of how intensely an ethnic group was subject to the slave trade. The results indicate a negative but insignificant relationship.

8 Conclusion

We have conducted a systematic exploration of the available census data on education in Africa. Education is strongly correlated with income and for reasons discussed above, it may actually be better measured than income in Africa. Since independence, many individuals have acquired some schooling. As a result inequality in ed- ucation has declined. We have explored in detail how this exodus from illiteracy has taken place across countries, administrative regions and ethnic groups. In relative terms, regional, and ethnic inequalities continue to persist. Also, cross-country inequality has not fallen much. We also explored IM in education. Our key result here is that IM is higher in countries, regions and ethnicities where the initial level of education was higher. This finding is consistent with the idea of a poverty trap. Poorer (less educated) places are those in which it is harder to escape educational poverty. We have also investigated various correlates of IM. We find that IM correlates positively with colonial infrastructure (roads and railroads) and with the presence of Christian missions, especially Protestant. Instead, among an array of geographical and ecological features only distance from the capital and malaria suitability are systematic negative predictors of IM. Moreover, initial educational inequality (in 1960) is a strong negative predictor of subsequent IM. As for ethnicities descendants of pastoral groups are less educationally mobile than those tracing their ancestry to agricultural groups.

41In cases where two or more sources contribute equally, both the pastoralism and the agricultural dummy are set to zero.

42 Educational Inequality and Intergenerational Mobility in Africa Supplementary Online Appendix∗

Alberto Alesina Sebastian Hohmann Harvard University, CEPR and NBER London Business School

Stelios Michalopoulos Elias Papaioannou Brown University, CEPR and NBER London Business School and CEPR

March 14, 2018

∗Alberto Alesina Harvard Univerity and IGIER Bocconi, Sebatian Hohmnn , London Busienss Schoiol, Stelios Michalopou- los. Brown University, Elias Papaioannou. London Business School.

1 A Correlation between schooling and household wealth with DHS and Afrobarometer

A.1 DHS A.1.1 Household wealth

Table 8: Household wealth quintile and years of schooling

(1) (2) (3) (4) (5) (6) wealth quintile wealth quintile wealth quintile wealth quintile wealth quintile wealth quintile years of schooling 0.123*** 0.0815*** 0.0994*** 0.0857*** 0.0857*** 0.0791*** (39.02) (19.38) (31.43) (33.74) (34.31) (33.42) individual controls no yes yes yes yes yes fixed effects no no survey survey, region survey, admin-1 survey, admin2 R-squared 0.175 0.402 0.459 0.520 0.525 0.557 marginal R-squared 0.175 0.06 0.073 0.05 0.052 0.042 within R-squared 0.399 0.441 0.325 0.339 0.274 N 3516848 3509051 3509051 3509051 2823745 2823745 This table shows regression results of household wealth on years of schooling for individuals aged 18+. The dependent variable in all columns is the DHS household wealth quintile (computed for each survey, i.e. country-year) separately based on the DHS-computed wealth index). Individual controls are age, age squared, dummies for male individuals, male household head, urban residence, the log of the number of household members, and individual birth decade dummies. Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by DHS) fixed effects. Columns (5) and (6) restrict attention only to the sample for which GDS co-ordinates are available and replaces the DHS region fixed effects with admin-1 (5) and admin-2 (6) region fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Table 9: Household wealth index and years of schooling

(1) (2) (3) (4) (5) (6) wealth index wealth index wealth index wealth index wealth index wealth index years of schooling 11409.6*** 7100.9*** 8957.8*** 8140.5*** 8320.5*** 7940.2*** (7.69) (8.73) (8.85) (8.01) (7.45) (7.10) individual controls no yes yes yes yes yes fixed effects no no survey survey, region survey, admin-1 survey, admin2 R-squared 0.049 0.123 0.135 0.330 0.250 0.287 marginal R-squared 0.049 0.013 0.017 0.025 0.025 0.026 within R-squared 0.121 0.132 0.101 0.097 0.079 N 3516854 3509057 3509057 3509057 2823751 2823751 This table shows regression results of household wealth on years of schooling for individuals aged 18+. The dependent variable in all columns is the DHS household wealth index (computed for each survey, i.e. country-year) separately as the principal component of a variety of variables capturing asset ownership, health etc.). Individual controls are age, age squared, dummies for male individuals, male household head, urban residence, the log of the number of household members, and individual birth decade dummies. Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by DHS) fixed effects. Columns (5) and (6) restrict attention only to the sample for which GDS co-ordinates are available and replaces the DHS region fixed effects with admin-1 (5) and admin-2 (6) region fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

2 Figure 18: Binned scatter plots

(a) wealth index, unconditional (b) wealth quintile, unconditional

4.5 150000

4 100000

50000 3.5 household wealth index household wealth quintile 0 3

-50000 2.5 0 5 10 15 0 5 10 15 years of schooling years of schooling

(c) wealth index, conditional on controls and region FE (d) wealth quintile, conditional on controls and region FEs

100000 1

50000 .5

0 0 household wealth index residual household wealth quintile residual

-50000 -.5

-10 -5 0 5 10 -10 -5 0 5 10 years of schooling years of schooling residual

(f) wealth quintile, conditional on controls and admin-2 (e) wealth index, conditional on controls and admin-2 FE FE

100000 1

50000 .5

0 0 household wealth index residual household wealth quintile residual

-50000 -.5

-10 -5 0 5 10 -10 -5 0 5 10 years of schooling years of schooling residual

3 A.1.2 Child mortality

Table 10: Probability that child survives and years of schooling

(1) (2) (3) (4) (5) (6) I(child alive) I(child alive) I(child alive) I(child alive) I(child alive) I(child alive) years of schooling 0.00369*** 0.00313*** 0.00208*** 0.00170*** 0.00165*** 0.00154*** (12.51) (12.82) (8.97) (12.08) (10.90) (10.71) individual controls no yes yes yes yes yes fixed effects no no survey survey, region survey, admin-1 survey, admin2 R-squared 0.003 0.058 0.066 0.068 0.068 0.070 marginal R-squared 0.003 0.002 0.001 0 0 0 within R-squared 0.055 0.052 0.052 0.052 0.051 N 1239858 1172339 1172339 1172339 923261 923260

This table shows regression results for child mortality on years of schooling for individuals aged 18+. The dependent variable in all columns is an indicator equal to 1 if a child is alive and zero otherwise. Individual controls are mother age, age squared, dummies for children born as twins, child-birth-year dummies, a dummy for the number a child occupies in the birth sequence of the mother, the number of births of the mother, dummies for male household head, urban residence, the log of the number of household members, and individual birth decade dummies. Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by DHS) fixed effects. Columns (5) and (6) restrict attention only to the sample for which GDS co-ordinates are available and replaces the DHS region fixed effects with admin-1 (5) and admin-2 (6) region fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 19: Binned scatter plots

(a) I(child alive), unconditional

.98

.96

.94 likelihood child alive .92

.9 0 5 10 15 mother years of schooling

(b) I(child alive), conditional on controls and re- (c) I(child alive), conditional on controls and gion FE admin-2 FE

.02 .015

.01

.01 .005

0 0 likelihood child alive residual likelihood child alive residual -.005

-.01 -.01 -5 0 5 10 -5 0 5 10 mother years of schooling residual mother years of schooling residual

4 A.1.3 Bargaining power

Table 11: Bargaining power (sole and joint decider) on years of schooling

(1) (2) (3) (4) (5) (6) bargaining power bargaining power bargaining power bargaining power bargaining power bargaining power years of schooling 0.0721*** 0.0698*** 0.0442*** 0.0296*** 0.0300*** 0.0275*** (7.10) (7.52) (5.98) (7.89) (9.36) (8.87) individual controls no yes yes yes yes yes fixed effects no no survey survey, region survey, admin-1 survey, admin2 R-squared 0.041 0.126 0.288 0.322 0.326 0.340 marginal R-squared 0.041 0.031 0.01 0.004 0.004 0.003 within R-squared 0.1 0.057 0.043 0.041 0.039 N 615205 614634 614634 614634 534752 534751

This table shows regression results for individual bargaining power on years of schooling for individuals aged 18+. The dependent variable in all columns is a measure of individual bargaining power. This measure is constructed as the sum of six indicators equal to 1 if an individual takes part (either as sole or joint decision maker) in a particular decision: (a) decisions affecting the individual’s health, (b) large household purchases, (c) daily needs household purchases, (d) visits of family relatives, (e) what to cook each day, (f) what is to be done with money earned by the spouse. Individual controls are age, age squared, dummies for male individuals, male household head, and urban residence, as well as the log of the number of household members, and individual birth decade dummies. Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by DHS) fixed effects. Columns (5) and (6) restrict attention only to the sample for which GPS co-ordinates are available and replaces the DHS region fixed effects with admin-1 (5) and admin-2 (6) region fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 20: Binned scatter plots

(a) Bargaining power, unconditional

3.5

3

2.5 bargaining power (sole and joint decider)

2 0 5 10 15 years of schooling

(b) Bargaining power, conditional on controls and (c) Bargaining power, conditional on controls and region FE admin-2 FE

.3 .3

.2 .2

.1 .1

0 0

-.1 -.1

bargaining power (sole and joint decider) residual -.2 bargaining power (sole and joint decider) residual -.2 -10 -5 0 5 10 -10 -5 0 5 10 years of schooling residual years of schooling residual

5 A.1.4 Attitudes towards domestic violence

Table 12: Attitudes towards domestic violence on years of schooling

(1) (2) (3) (4) (5) (6) I(beating justified) I(beating justified) I(beating justified) I(beating justified) I(beating justified) I(beating justified) years of schooling -0.0248*** -0.0196*** -0.0178*** -0.0170*** -0.0172*** -0.0168*** (-11.01) (-10.34) (-14.02) (-14.59) (-12.84) (-12.22) individual controls no yes yes yes yes yes fixed effects no no survey survey, region survey, admin-1 survey, admin2 R-squared 0.057 0.093 0.193 0.228 0.241 0.257 marginal R-squared .057 .028 .019 .016 .016 .014 within R-squared .09 .045 .029 .03 .025 N 766631 765884 765884 765884 666739 666739

This table shows regression results for attitudes towards domestic violence on years of schooling for individuals aged 18+. The dependent variable in all columns is an indicator equal to one if the respondent responds ’yes’ to any of the questions of whether beating the wife is justified if she (a) goes out without telling the husband, (b) neglects the children, (c) argues with the husband, (d) refuses to have sex with the husband, (e) burns the food.. Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by DHS) fixed effects. Columns (5) and (6) restrict attention only to the sample for which GPS co-ordinates are available and replaces the DHS region fixed effects with admin-1 (5) and admin-2 (6) region fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 21: Binned scatter plots

(a) Attitudes towards domestic violence, uncondi- tional

.6

.5

.4

beating ever justified .3

.2

0 5 10 15 years of schooling

(b) Attitudes towards domestic violence, condi- (c) Attitudes towards domestic violence, condi- tional on controls and region FE tional on controls and admin-2 FE

.1 .1

0 0

-.1 -.1 beating ever justified residual beating ever justified residual

-.2 -.2 -10 -5 0 5 10 -10 -5 0 5 10 years of schooling residual years of schooling residual

6 A.1.5 Fertility

Table 13: Fertility on years of schooling

(1) (2) (3) (4) (5) (6) # children # children # children # children # children # children years of schooling -0.202*** -0.0893*** -0.0970*** -0.0894*** -0.0880*** -0.0852*** (-41.97) (-25.56) (-31.56) (-30.05) (-26.84) (-26.14) individual controls no yes yes yes yes yes fixed effects no no survey survey, region survey, admin-1 survey, admin2 R-squared 0.096 0.578 0.597 0.603 0.603 0.606 marginal R-squared .096 .015 .015 .011 .012 .01 within R-squared .386 .264 .237 .24 .231 N 1923074 1856989 1856989 1856989 1491708 1491708

This table shows regression results for total number of children ever born on years of schooling for individuals aged 18+. The dependent variable in all columns is the total number of children ever born. Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by DHS) fixed effects. Columns (5) and (6) restrict attention only to the sample for which GPS co-ordinates are available and replaces the DHS region fixed effects with admin-1 (5) and admin-2 (6) region fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 22: Binned scatter plots

(a) Fertility, unconditional

5

4

3 total children ever born 2

1 0 5 10 15 years of schooling

(c) Fertility, conditional on controls and admin-2 (b) Fertility, conditional on controls and region FE FE

.5 .5

0 0

-.5 -.5 total children ever born residual total children ever born residual

-1 -1 -10 -5 0 5 10 -10 -5 0 5 10 years of schooling residual years of schooling residual

7 A.1.6 Desired number of children

Table 14: Desired number of children on years of schooling

(1) (2) (3) (4) (5) (6) desired # children desired # children desired # children desired # children desired # children desired # children years of schooling -0.209*** -0.183*** -0.142*** -0.109*** -0.101*** -0.0926*** (-17.46) (-14.49) (-11.38) (-16.03) (-14.07) (-14.27) individual controls no yes yes yes yes yes fixed effects no no survey survey, region survey, admin-1 survey, admin2 R-squared 0.083 0.162 0.291 0.341 0.328 0.342 marginal R-squared .083 .051 .025 .014 .012 .01 within R-squared .138 .097 .064 .062 .054 N 1549614 1495878 1495878 1495878 1192596 1192594 This table shows regression results for desired number of children on years of schooling for individuals aged 18+. The dependent variable in all columns is the individual’s ideal desired number of children. Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by DHS) fixed effects. Columns (5) and (6) restrict attention only to the sample for which GPS co-ordinates are available and replaces the DHS region fixed effects with admin-1 (5) and admin-2 (6) region fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 23: Binned scatter plots

(a) Desired number of children, unconditional

7

6

5 ideal number of children 4

3 0 5 10 15 years of schooling

(b) Desired number of children, conditional on (c) Desired number of children, conditional on con- controls and region FE trols and admin-2 FE

1 1

.5 .5

0 0

-.5 -.5 ideal number of children residual ideal number of children residual

-1 -1 -10 -5 0 5 10 -10 -5 0 5 10 years of schooling residual years of schooling residual

8 A.1.7 Age at first marriage

Table 15: Age of first union on years of schooling

(1) (2) (3) (4) (5) (6) age first union age first union age first union age first union age first union age first union years of schooling 0.337*** 0.242*** 0.259*** 0.243*** 0.243*** 0.240*** (32.35) (24.96) (30.91) (34.71) (30.13) (30.25) individual controls no yes yes yes yes yes fixed effects no no survey survey, region survey, admin-1 survey, admin2 R-squared 0.094 0.328 0.357 0.369 0.371 0.375 marginal R-squared .094 .04 .036 .029 .03 .028 within R-squared .306 .262 .25 .251 .248 N 1449207 1389458 1389458 1389458 1106824 1106824

This table shows regression results for age at first union on years of schooling for individuals aged 18+. The dependent variable in all columns is the individual’s age at first union / marriage. Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by DHS) fixed effects. Columns (5) and (6) restrict attention only to the sample for which GPS co-ordinates are available and replaces the DHS region fixed effects with admin-1 (5) and admin-2 (6) region fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 24: Binned scatter plots

(a) Age at first marriage, unconditional

24

22 age at first union 20

18 0 5 10 15 years of schooling

(b) Age at first marriage, conditional on controls (c) Age at first marriage, conditional on controls and region FE and admin-2 FE

3 3

2 2

1 1

0 0 age at first union residual age at first union residual

-1 -1

-2 -2 -10 -5 0 5 10 -10 -5 0 5 10 years of schooling residual years of schooling residual

9 A.1.8 Age at first sexual intercourse

Table 16: Age of first sexual intercourse on years of schooling

(1) (2) (3) (4) (5) (6) age first sex age first sex age first sex age first sex age first sex age first sex years of schooling 0.134*** 0.113*** 0.141*** 0.142*** 0.144*** 0.143*** (9.95) (8.11) (18.13) (17.71) (15.05) (15.18) individual controls no yes yes yes yes yes fixed effects no no survey survey, region survey, admin-1 survey, admin2 R-squared 0.029 0.115 0.189 0.211 0.209 0.216 marginal R-squared .029 .016 .02 .019 .019 .018 within R-squared .101 .08 .075 .077 .074 N 1513798 1483235 1483235 1483235 1171074 1171074

This table shows regression results for age at first sexual intercourse on years of schooling for individuals aged 18+. The dependent variable in all columns is the individual’s age at first sexual intercourse. Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by DHS) fixed effects. Columns (5) and (6) restrict attention only to the sample for which GPS co-ordinates are available and replaces the DHS region fixed effects with admin-1 (5) and admin-2 (6) region fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 25: Binned scatter plots

(a) Age at first sexual intercourse, unconditional

20

19

18 age at first sexual intercourse

17

0 5 10 15 years of schooling

(b) Age at first sexual intercourse, conditional on (c) Age at first sexual intercourse, conditional on controls and region FE controls and admin-2 FE

1.5 1.5

1 1

.5 .5

0 0

-.5 -.5 age at first sexual intercourse residual age at first sexual intercourse residual

-1 -1 -10 -5 0 5 10 -10 -5 0 5 10 years of schooling residual years of schooling residual

10 A.2 Afrobarometer A.3 Living conditions

Table 17: Present living conditions (higher → better) on years of schooling

(1) (2) (3) (4) living conds. living conds. living conds. living conds. years of schooling 0.0385*** 0.0320*** 0.0325*** 0.0334*** (9.35) (7.25) (15.43) (15.91) individual controls no yes yes yes fixed effects no no survey survey, region R-squared 0.025 0.034 0.117 0.151 marginal R-squared .025 .014 .012 .012 within R-squared .024 .019 .019 N 104004 102977 102977 102977 This table shows regression results for living conditions on years of schooling for individuals aged 18+. The dependent variable in all columns is the respondent’s present living conditions (higher → better). Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by Afro) fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 26: Binned scatter plots

(b) Living conditions, conditional on controls and region (a) Living conditions, unconditional FE

3.2 .4

3 .2

2.8

0

2.6 present living conditions (higher -> better) -.2 2.4 present living conditions (higher -> better) residual

0 5 10 15 20 -10 -5 0 5 10 years of schooling years of schooling residual

11 A.4 Own living conditions vs. those of others

Table 18: Living conditions vs others (higher → better) on years of schooling

(1) (2) (3) (4) rel living conds. rel living conds. rel living conds. rel living conds. years of schooling 0.0397*** 0.0337*** 0.0368*** 0.0366*** (13.87) (11.56) (18.63) (18.67) individual controls no yes yes yes fixed effects no no survey survey, region R-squared 0.037 0.045 0.102 0.128 marginal R-squared .037 .022 .022 .02 within R-squared .04 .034 .029 N 100826 99854 99854 99854 This table shows regression results for relative living conditions on years of schooling for individuals aged 18+. The dependent variable in all columns is the respondent’s living conditions vs how she perceives those of others (higher → better). Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by Afro) fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 27: Binned scatter plots

(b) Living conditions vs. others, conditional on controls (a) Living conditions vs. others, unconditional and region FE

3.4 .4

3.2 .2

3 0

2.8 -.2 living conditions vs others (higher -> better)

2.6 living conditions vs others (higher -> better) residual -.4 0 5 10 15 20 -10 -5 0 5 10 years of schooling years of schooling residual

12 A.5 Own living conditions in 12 months

Table 19: Living conditions in 12 months (higher → better) on years of schooling

(1) (2) (3) (4) living conds in 1 yr living conds in 1 yr living conds in 1 yr living conds in 1 yr years of schooling 0.0123** 0.00194 0.0126*** 0.0138*** (2.29) (0.35) (5.89) (7.38) individual controls no yes yes yes fixed effects no no survey survey, region R-squared 0.003 0.015 0.205 0.236 marginal R-squared .003 0 .002 .002 within R-squared .005 .004 .004 N 92145 91398 91398 91398 This table shows regression results for living conditions in 12 months on years of schooling for individuals aged 18+. The dependent variable in all columns is the respondent’s expected living conditions in 12 months (higher → better). Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by Afro) fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 28: Binned scatter plots

(b) Living conditions in 12 months, conditional on controls (a) Living conditions in 12 months, unconditional and region FE

3.7 .1

3.6 .05

0 3.5

-.05

3.4 -.1 living conditions in 12 months (higher -> better)

3.3 living conditions in 12 months (higher -> better) residual -.15 0 5 10 15 20 -10 -5 0 5 10 years of schooling years of schooling residual

13 A.6 How often go without food

Table 20: How often go without food (higher → more often) on years of schooling

(1) (2) (3) (4) freq. no food freq. no food freq. no food freq. no food years of schooling -0.0561*** -0.0462*** -0.0476*** -0.0474*** (-12.96) (-10.27) (-15.68) (-16.94) individual controls no yes yes yes fixed effects no no survey survey, region R-squared 0.049 0.061 0.149 0.185 marginal R-squared .049 .027 .024 .023 within R-squared .057 .045 .037 N 104233 103187 103187 103187 This table shows regression results for frequency of going without food on years of schooling for individuals aged 18+. The dependent variable in all columns is how often the repondent goes without food (higher → more often). Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by Afro) fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 29: Binned scatter plots

(b) How often go without food, conditional on controls (a) How often go without food, unconditional and region FE

1.5 .4

.2

1 0

-.2 how often go without food (higher -> more often) .5 -.4 how often go without food (higher -> more often) residual 0 5 10 15 20 -10 -5 0 5 10 years of schooling years of schooling residual

14 A.7 How often go without water

Table 21: How often go without food (higher → more often) on years of schooling

(1) (2) (3) (4) freq. no food freq. no food freq. no food freq. no food years of schooling -0.0561*** -0.0462*** -0.0476*** -0.0474*** (-12.96) (-10.27) (-15.68) (-16.94) individual controls no yes yes yes fixed effects no no survey survey, region R-squared 0.049 0.061 0.149 0.185 marginal R-squared .049 .027 .024 .023 within R-squared .057 .045 .037 N 104233 103187 103187 103187 This table shows regression results for frequency of going without food on years of schooling for individuals aged 18+. The dependent variable in all columns is how often the repondent goes without food (higher → more often). Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by Afro) fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 30: Binned scatter plots

(b) How often go without water, conditional on controls (a) How often go without water, unconditional and region FE

.2 1.4

.1

1.2

0

1 -.1

.8 -.2 how often go without water (higher -> more often)

.6 how often go without water (higher -> more often) residual -.3 0 5 10 15 20 -10 -5 0 5 10 years of schooling years of schooling residual

15 A.8 Interest in public affairs

Table 22: Interest in public affairs (higher → more) on years of schooling

(1) (2) (3) (4) int. public aff. int. public aff. int. public aff. int. public aff. years of schooling 0.0210*** 0.0247*** 0.0329*** 0.0340*** (7.26) (9.30) (15.54) (17.00) individual controls no yes yes yes fixed effects no no survey survey, region R-squared 0.009 0.038 0.086 0.109 marginal R-squared .009 .01 .015 .015 within R-squared .033 .038 .04 N 103355 102364 102364 102364 This table shows regression results for interest in public affairs on years of schooling for individuals aged 18+. The dependent variable in all columns is the respondent’s interest in public affairs (higher → more). Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by Afro) fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 31: Binned scatter plots

(b) Interest in public affairs, conditional on controls and (a) Interest in public affairs, unconditional region FE

2.2 .4

.2 2

0

1.8

-.2 interest in public affairs (higher -> more)

1.6 interest in public affairs (higher -> more) residual -.4 0 5 10 15 20 -10 -5 0 5 10 years of schooling years of schooling residual

16 A.9 Frequency of discussing politics

Table 23: Discuss politics (higher → more frequently) on years of schooling

(1) (2) (3) (4) discuss pol discuss pol discuss pol discuss pol years of schooling 0.0261*** 0.0247*** 0.0290*** 0.0294*** (13.94) (13.69) (19.66) (21.52) individual controls no yes yes yes fixed effects no no survey survey, region R-squared 0.031 0.060 0.101 0.119 marginal R-squared .031 .023 .026 .025 within R-squared .057 .063 .063 N 103467 102461 102461 102461 This table shows regression results for frequency of discussing politics on years of schooling for individuals aged 18+. The dependent variable in all columns is the frequency with which the respondent discusses politics (higher → more frequently). Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by Afro) fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 32: Binned scatter plots

(b) Frequency of discussing politics, conditional on con- (a) Frequency of discussing politics, unconditional trols and region FE

1.4 .3

.2 1.2

.1

1 0

.8 -.1 discuss politics (higher -> more frequently)

discuss politics (higher -> more frequently) residual -.2 .6 0 5 10 15 20 -10 -5 0 5 10 years of schooling years of schooling residual

17 A.10 Politics too complicated?

Table 24: Politics too complicated (higher → disagree more with statement) on years of schooling

(1) (2) (3) (4) pol too compl pol too compl pol too compl pol too compl years of schooling 0.0256*** 0.0259*** 0.0232*** 0.0242*** (6.80) (6.64) (7.68) (9.06) individual controls no yes yes yes fixed effects no no survey survey, region R-squared 0.010 0.013 0.038 0.069 marginal R-squared .01 .008 .006 .006 within R-squared .013 .01 .01 N 72403 71808 71808 71808 This table shows regression results for of whether find politics too complicated on years of schooling for individuals aged 18+. The dependent variable in all columns is whether the respondent disagrees with the statement that politics too complicated (higher → disagree more with statement). Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by Afro) fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 33: Binned scatter plots

(b) Politics too complicated?, conditional on controls and (a) Politics too complicated?, unconditional region FE

.3 2.8

.2 2.6

.1

2.4 0

2.2 -.1

2 -.2

politics too complicated (higher -> disagree more with statement) 0 5 10 15 20 -10 -5 0 5 10 years of schooling years of schooling residual politics too complicated (higher -> disagree more with statement) residual

18 A.11 Support for democracy

Table 25: Support for democracy on years of schooling

(1) (2) (3) (4) support democ support democ support democ support democ years of schooling 0.0121*** 0.0109*** 0.0133*** 0.0137*** (8.13) (7.53) (10.88) (11.08) individual controls no yes yes yes fixed effects no no survey survey, region R-squared 0.016 0.026 0.089 0.109 marginal R-squared .016 .011 .013 .013 within R-squared .025 .026 .026 N 104435 103383 103383 103383 This table shows regression results for support for democracy on years of schooling for individuals aged 18+. The dependent variable in all columns is the respondent’s support for democracy. Column (1) shows the simple bivariate relationship without controls or fixed effects. Column (2) shows the relationship conditional on individual controls without fixed effects. Column (3) adds survey fixed, column (4) adds region (defined by Afro) fixed effects. t-statistics based on standard errors clustered at the survey-level in parentheses. ∗p < 0.05, ∗ ∗ p < 0.01, ∗ ∗ ∗p < 0.001.

Figure 34: Binned scatter plots

(b) Support for democracy, conditional on controls and (a) Support for democracy, unconditional region FE

.85 .1

.8

.05

.75

0 .7 support for democracy

support for democracy residual -.05 .65

.6 -.1 0 5 10 15 20 -10 -5 0 5 10 years of schooling years of schooling residual

19 B Sample coverage and construction

Table 26: Sample Coverage

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) country year fraction raw IPUMS age data schooling data 18+ schooling completed observe “old” urban/rural nE nR nD Botswana 1981 10 97238 96187 72951 42805 42390 8870 no 21 21 Botswana 1991 10 132623 132623 113172 65752 64369 13129 yes 21 21 Botswana 2001 10 168676 168134 159257 93314 87804 20481 no 9 21 21 Botswana 2011 10 201752 201235 190212 121737 112023 24268 no 9 21 21 Burkina Faso 1985 10 884797 883447 484384 347817 346214 0 no 10 30 Burkina Faso 1996 10 1081046 1075824 803264 457542 455395 109614 no 13 45 Burkina Faso 2006 10 1417824 1410123 1244291 647393 645608 80078 yes 14 13 45 Cameroon 1976 10 736514 736320 605749 356676 352984 47085 no 7 39 Cameroon 1987 10 897211 896649 763652 409859 401456 53186 yes 7 39 Cameroon 2005 10 1772359 1772359 1542200 861833 801773 159924 yes 7 39 Egypt 1986 14.1 6799093 6794386 5418332 3701656 3541903 983919 yes 24 236 Egypt 1996 10 5902243 5901839 4453382 3234344 3016642 738357 yes 24 236 Egypt 2006 10 7282434 7282434 5739722 4483663 3986862 885909 yes 24 236 Ethiopia 1984 10 3404306 3398027 2733575 1573342 1564979 203011 yes 14 15 95 Ethiopia 1994 10 5044598 5044597 4201616 2377601 2353706 428425 yes 14 12 63 Ethiopia 2007 10 7434086 7434086 1097614 627531 622028 112596 yes 14 12 63 Ghana 1984 10 1309352 1309351 1050813 637087 608906 161444 no 10 141 Ghana 2000 10 1894133 1894133 1730902 994743 961594 179060 yes 9 10 102 Ghana 2010 10 2466289 2466289 2262894 1365260 1306782 297830 yes 9 10 102 Guinea 1983 10 457837 457778 364805 241646 239472 23975 yes 6 34 Guinea 1996 10 729071 727246 551619 344354 341318 76594 yes 6 34 Kenya 1969 6 659310 659310 659310 344799 339788 34351 no 8 190 Kenya 1979 6.7 1033769 1031996 853843 488299 478828 0 yes 8 190 Kenya 1989 5 1074098 1072777 828512 477214 467069 80926 yes 8 190 Kenya 1999 5 1407547 1407547 1191268 690230 674090 111842 yes 8 190 Kenya 2009 10 3841935 3841935 3402695 1901329 1835658 343781 yes 8 190 Liberia 1974 10 150256 150256 127442 79596 79079 0 yes 17 9 65 Liberia 2008 10 348057 348057 294517 180235 178344 37772 yes 17 9 65 Malawi 1987 10 798669 798193 657998 382471 381528 36372 yes 26 181 Malawi 1998 10 991393 991393 826197 495522 489337 52579 yes 24 223 Malawi 2008 10 1341977 1341046 1161773 626879 612845 68636 yes 11 24 223 Mali 1987 10 785384 773407 582678 362154 360795 69204 no 13 8 258 Mali 1998 10 991330 986822 734156 438797 437349 95513 yes 13 8 258 Mali 2009 10 1451856 1424140 1262277 650993 639677 159516 yes 13 8 258 Morocco 1982 5 1012873 1012873 948008 482752 475647 161296 no 16 54 Morocco 1994 5 1294026 1293171 1293171 722560 706119 279696 no 6 16 54 Morocco 2004 5 1482720 1481076 1481076 924603 897904 365761 no 6 16 54 Mozambique 1997 10 1551517 1550505 1248483 747449 747096 106692 yes 11 143 Mozambique 2007 10 2047048 2047048 1616853 951280 949396 146621 yes 18 11 143 Nigeria 2006 0.06 83700 83700 82740 42578 40099 5820 yes 38 38 Nigeria 2007 0.06 85183 85182 84122 42295 39902 6241 yes 38 38 Nigeria 2008 0.07 107425 107425 105944 53407 50249 9001 yes 38 38 Nigeria 2009 0.05 77896 77880 77650 39865 37415 5701 yes 38 38 Nigeria 2010 0.05 72191 71991 58973 36170 34011 7060 yes 38 38 Rwanda 1991 10 742918 742918 535602 313627 312509 61480 no 10 104 Rwanda 2002 10 843392 843392 645489 382621 380744 74278 yes 10 104 Senegal 1988 10 700199 699981 527462 320608 316778 63788 no 10 9 28 Senegal 2002 10 994562 994562 911891 497609 487074 158489 yes 10 9 28 Sierra Leone 2004 10 494298 492922 395788 249781 248343 58680 yes 13 14 101 South Africa 1996 10 3621164 3578019 3055995 2022123 1946032 442478 yes 12 9 284 South Africa 2001 10 3725655 3725655 3353684 2260958 2160504 537755 yes 12 4 235 South Africa 2007 2 1047657 1047657 842103 580360 549832 145120 yes 4 235 South Africa 2011 8.6 4418594 4418594 3845633 2765370 2613682 567800 yes 12 4 235 South Sudan 2008 7 542765 542765 542333 251469 250945 57057 yes 10 73 Sudan 2008 16.6 5066530 5066530 3902071 2364220 2344865 621231 yes 15 129 Tanzania 1988 10 2310424 2304474 1911308 1107406 1104142 133531 no 23 113 Tanzania 2002 10 3732735 3732735 3123724 1863092 1851658 280444 yes 23 113 Tanzania 2012 10 4498022 4498022 3918823 2217104 2156002 356696 yes 23 113 Uganda 1991 10 1548460 1547604 1242885 716537 709158 98577 yes 22 38 164 Uganda 2002 10 2497449 2497449 2042838 1124940 1099596 133880 yes 22 38 164 Zambia 1990 10 787461 787461 664239 375207 366035 74310 yes 11 8 75 Zambia 2000 10 996117 996117 825110 477796 466409 106056 yes 11 8 75 Zambia 2010 10 1321973 1321973 1028628 587569 568878 114547 no 11 8 75

The first two columns give country and census-year. Column (3) shows the fraction of the census sampled by IPUMS. Columns (4) is the number of observation in the original IPUMS data without restrictions. Columns (5)-(9) gives the number of individuals with observations for successively tighter sample restrictions: (5) requires that age be observable, (6) in addition requires data on education, (7) in addition requires that the individual be at least 18 years old, (8) requires that the individual have completed her schooling according to our definition, (9) requires that the individual be co-habiting with at least one individual of an older generation. Column (10) indicates whether the census has data on urban vs. rural residence. Columns (11)-(13) show the number of sub-groups available for each census. (11) gives the number of ethnicities, (12) the number of (admin-1) regions, and (13) the number of (admin-2) districts.

20 C Barro-Lee crosscheck

We correlate our estimates of mean years of schooling for individuals aged 25-99 to the data from Barro and Lee (2011) who also report figures for years of schooling for this age range1. Barro and Lee provide two separate estimates for years for schooling – one based on an age range of 15-99, the other 25-99. We focus on ages 18-99. Strictly for this comparison only, we compute measures for the 25-99 age range. Since we have several countries with more than one census, we can also explore the panel-correlation with Barro and Lee.2 Figure 35: Barro-Lee crosscheck

(b) Years of schooling in our sample compared to Barro and (a) Years of schooling in our sample compared to Barro and Lee (2011), controlling for country fixed effects, full sample Lee (2011), levels, full sample of countries of countries

10.00 3 ZAF BWA BWA KEN CMR ZAF ZAF 2 8.00 BWA BWA KENLBREGY ZAF MARMWIZAFGHA GHA 1 ZMB ZMB TZA GHA KEN GHAMLIUGA 6.00 ZMB RWA EGY SENZMBMARTZAMOZ BWA CMR 0 ZAF KENMLIMWISLESDNZAF TZAKEN RWAEGYMOZ SEN CMRMLI ZMBTZA BWA UGA MWI TZA 4.00 GHAEGYLBR -1 ZMBMWI ZAF MAR KENUGA EGY TZA KENLBR MAR GHA SDN EGY MWI CMR CMRUGA -2 SLEKENMAR Barro-Lee years of schooling BWA SEN MWIRWA KEN 2.00 RWASEN CMRMLI KEN -3 MLILBRMAR MOZ MOZ MLI Barro-Lee years of schooling, net country FEs BWA 0.00 -4 0 2 4 6 8 10 -3 -2 -1 0 1 2 3 4 IPUMS mean years of schooling, inds. aged 25-99 IPUMS mean years of schooling, inds. aged 25-99 net of country FEs REGRESSION FIT: yrschool_BarroLee_i = -0.2011 + 1.0750*yrschool_IPUMS_i + e_i REGRESSION FIT: yrschool_BarroLee_it = a_i + 0.9727*yrschool_IPUMS_it + e_it R-squared = 0.93 R-squared = 0.90

1as Barro and Lee only report their estimates of years of schooling at 5-year intervals, we correlate our estimates with the closest years they report. 2There are five countries for which we only have one census: Sierra-Leone, Egypt, Rwanda, South Sudan, and Sudan.

21 D Further evidence on education inequality across countries

D.1 Inequality by cohorts D.1.1 Pan-Africa

Figure 36: Pan-African inequality across individuals, within and between countries, all census years

1 .8

.8 .6

.6 .4

GE(2) index values .4

.2 within and between share

.2 0 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort

GE(2) overall within share GE(2) within between share GE(2) between

D.1.2 Country-by-country

22 Table 27: Inequality across ethnicities, districts, and provinces at the country-birth-decade level

Gini GE(2) district GE(2) province GE(2) ethnicity GE(2) country birth decade individual district province ethnicity overall within between within between within between Botswana 1950 0.538 0.168 0.168 0.398 0.480 0.427 0.053 0.427 0.053 0.406 0.067 Botswana 1960 0.398 0.110 0.110 0.292 0.250 0.230 0.020 0.230 0.020 0.206 0.026 Botswana 1970 0.249 0.058 0.058 0.180 0.105 0.099 0.006 0.099 0.006 0.089 0.008 Botswana 1980 0.182 0.046 0.046 0.115 0.062 0.058 0.004 0.058 0.004 0.059 0.003 Burkina Faso 1950 0.929 0.401 0.495 0.688 5.965 5.186 0.779 5.213 0.752 4.247 1.522 Burkina Faso 1960 0.909 0.349 0.429 0.636 4.460 3.853 0.607 3.876 0.583 3.133 1.043 Burkina Faso 1970 0.849 0.305 0.346 0.544 2.501 2.125 0.376 2.149 0.352 1.841 0.518 Burkina Faso 1980 0.776 0.277 0.292 0.452 1.535 1.262 0.273 1.285 0.250 1.322 0.215 Cameroon 1950 0.525 0.242 0.217 0.466 0.310 0.157 0.331 0.135 Cameroon 1960 0.459 0.204 0.186 0.341 0.219 0.122 0.233 0.108 Cameroon 1970 0.436 0.180 0.171 0.304 0.205 0.099 0.216 0.088 Cameroon 1980 0.444 0.172 0.161 0.322 0.215 0.106 0.224 0.097 Egypt 1950 0.715 0.328 0.194 1.123 0.974 0.149 1.065 0.058 Egypt 1960 0.621 0.243 0.167 0.726 0.647 0.079 0.691 0.034 Egypt 1970 0.508 0.188 0.144 0.447 0.398 0.048 0.427 0.020 Egypt 1980 0.425 0.143 0.115 0.315 0.284 0.031 0.303 0.012 Ethiopia 1950 0.878 0.406 0.460 0.226 3.317 2.721 0.596 2.753 0.564 3.008 0.049 Ethiopia 1960 0.807 0.383 0.444 0.234 1.837 1.481 0.357 1.515 0.323 1.680 0.047 23 Ethiopia 1970 0.761 0.414 0.410 0.239 1.375 1.093 0.282 1.128 0.248 1.328 0.048 Ethiopia 1980 0.696 0.338 0.298 0.227 0.969 0.847 0.122 0.870 0.099 0.920 0.049 Ghana 1950 0.540 0.256 0.259 0.284 0.496 0.404 0.092 0.425 0.072 0.449 0.057 Ghana 1960 0.511 0.259 0.262 0.263 0.439 0.351 0.088 0.371 0.068 0.387 0.058 Ghana 1970 0.483 0.229 0.219 0.217 0.379 0.309 0.071 0.328 0.051 0.328 0.048 Ghana 1980 0.399 0.183 0.162 0.167 0.259 0.211 0.048 0.227 0.032 0.221 0.036 Guinea 1950 0.873 0.338 0.327 3.033 2.597 0.436 2.720 0.313 Guinea 1960 0.858 0.385 0.355 2.657 2.225 0.432 2.332 0.325 Guinea 1970 0.839 0.353 0.289 2.288 2.009 0.279 2.083 0.205 Guinea 1980 Kenya 1950 0.496 0.295 0.229 0.388 0.318 0.070 0.349 0.040 Kenya 1960 0.361 0.246 0.191 0.202 0.152 0.051 0.174 0.028 Kenya 1970 0.293 0.210 0.165 0.138 0.097 0.040 0.115 0.022 Kenya 1980 0.283 0.197 0.150 0.130 0.090 0.040 0.110 0.020 Liberia 1950 0.739 0.249 0.195 0.181 1.214 1.067 0.147 1.121 0.093 1.170 0.044 Liberia 1960 0.629 0.214 0.165 0.163 0.718 0.620 0.099 0.649 0.070 0.698 0.020 Liberia 1970 0.608 0.222 0.180 0.137 0.645 0.539 0.107 0.569 0.077 0.627 0.018 Liberia 1980 0.531 0.209 0.155 0.097 0.455 0.385 0.070 0.404 0.051 0.446 0.009 Malawi 1950 0.604 0.281 0.192 0.141 0.637 0.525 0.112 0.574 0.062 0.547 0.028 Malawi 1960 0.535 0.245 0.177 0.129 0.468 0.385 0.083 0.420 0.048 0.395 0.022 Malawi 1970 0.460 0.201 0.141 0.105 0.330 0.269 0.061 0.299 0.031 0.300 0.015 Malawi 1980 0.372 0.154 0.106 0.082 0.213 0.178 0.035 0.195 0.018 0.200 0.009 Mali 1950 0.889 0.376 0.368 0.279 3.560 3.010 0.551 3.133 0.427 3.494 0.068 Mali 1960 0.856 0.364 0.342 0.284 2.665 2.216 0.448 2.322 0.343 2.589 0.077 Mali 1970 0.852 0.383 0.341 0.259 2.579 2.123 0.455 2.236 0.343 2.507 0.071 Mali 1980 0.799 0.373 0.310 0.253 1.734 1.389 0.345 1.493 0.241 1.661 0.072 Morocco 1950 0.740 0.289 0.198 0.184 1.240 1.088 0.152 1.145 0.094 1.210 0.033 Morocco 1960 0.693 0.244 0.169 0.162 0.962 0.852 0.110 0.893 0.069 0.886 0.021 Morocco 1970 0.589 0.189 0.129 0.143 0.593 0.530 0.063 0.554 0.039 0.577 0.016 Morocco 1980 0.535 0.163 0.126 0.146 0.463 0.416 0.047 0.434 0.029 0.454 0.009 Mozambique 1950 0.702 0.294 0.239 0.237 1.295 1.025 0.270 1.154 0.141 1.011 0.138 Mozambique 1960 0.616 0.275 0.207 0.207 0.852 0.662 0.190 0.757 0.095 0.683 0.095 Mozambique 1970 0.592 0.279 0.212 0.219 0.750 0.573 0.177 0.656 0.094 0.577 0.120 Mozambique 1980 0.503 0.230 0.175 0.183 0.472 0.361 0.111 0.413 0.059 0.372 0.088 Nigeria 1950 0.638 0.293 0.293 0.757 0.629 0.128 0.629 0.128 Nigeria 1960 0.565 0.292 0.292 0.546 0.419 0.127 0.419 0.127 Nigeria 1970 0.517 0.289 0.289 0.445 0.312 0.133 0.312 0.133 Nigeria 1980 0.437 0.250 0.250 0.315 0.204 0.111 0.204 0.111 Rwanda 1950 0.604 0.183 0.137 0.666 0.612 0.054 0.630 0.036 Rwanda 1960 0.541 0.147 0.103 0.491 0.452 0.039 0.464 0.027 Rwanda 1970 0.436 0.099 0.065 0.305 0.283 0.022 0.289 0.015 Rwanda 1980 0.401 0.088 0.052 0.259 0.245 0.014 0.251 0.008 Senegal 1950 0.807 0.414 0.357 0.237 1.845 1.485 0.360 1.570 0.275 1.782 0.063 Senegal 1960 0.783 0.408 0.349 0.218 1.587 1.294 0.294 1.358 0.229 1.525 0.062 Senegal 1970 0.735 0.359 0.322 0.203 1.215 0.996 0.220 1.036 0.179 1.145 0.070 24 Senegal 1980 0.730 0.352 0.311 0.195 1.183 0.998 0.185 1.035 0.148 1.103 0.079 Sierra Leone 1950 0.792 0.345 0.308 0.356 1.671 1.398 0.274 1.448 0.224 1.515 0.158 Sierra Leone 1960 0.754 0.299 0.281 0.295 1.330 1.096 0.233 1.137 0.193 1.251 0.080 Sierra Leone 1970 0.715 0.288 0.264 0.246 1.078 0.850 0.228 0.889 0.189 1.029 0.049 Sierra Leone 1980 0.627 0.256 0.201 0.180 0.711 0.564 0.147 0.605 0.106 0.686 0.024 South Africa 1950 0.367 0.154 0.094 0.153 0.209 0.176 0.033 0.193 0.016 0.170 0.039 South Africa 1960 0.276 0.102 0.048 0.079 0.126 0.112 0.014 0.121 0.005 0.114 0.012 South Africa 1970 0.196 0.064 0.035 0.045 0.073 0.067 0.006 0.071 0.002 0.070 0.004 South Africa 1980 0.147 0.041 0.026 0.027 0.045 0.041 0.003 0.043 0.001 0.044 0.001 South Sudan 1950 0.935 0.518 0.378 6.470 5.879 0.591 6.233 0.237 South Sudan 1960 0.902 0.490 0.364 4.106 3.569 0.537 3.876 0.231 South Sudan 1970 0.884 0.462 0.344 3.437 2.963 0.474 3.207 0.230 South Sudan 1980 0.831 0.440 0.321 2.236 1.808 0.428 2.004 0.232 Sudan 1950 0.869 0.556 0.458 3.088 2.296 0.792 2.485 0.602 Sudan 1960 0.830 0.527 0.428 2.213 1.530 0.683 1.699 0.514 Sudan 1970 0.812 0.512 0.420 1.939 1.263 0.677 1.418 0.521 Sudan 1980 0.765 0.470 0.382 1.469 0.949 0.520 1.070 0.398 Tanzania 1950 0.525 0.152 0.113 0.470 0.437 0.033 0.454 0.016 Tanzania 1960 0.327 0.098 0.077 0.206 0.194 0.013 0.201 0.006 Tanzania 1970 0.282 0.090 0.072 0.163 0.152 0.011 0.157 0.005 Tanzania 1980 0.308 0.105 0.087 0.179 0.163 0.016 0.171 0.008 Uganda 1950 0.553 0.193 0.174 0.212 0.512 0.430 0.083 0.447 0.065 0.458 0.053 Uganda 1960 0.488 0.172 0.156 0.181 0.381 0.316 0.067 0.329 0.052 0.342 0.039 Uganda 1970 0.436 0.147 0.132 0.147 0.301 0.250 0.052 0.260 0.041 0.274 0.028 Uganda 1980 0.370 0.120 0.120 0.114 0.213 0.179 0.034 0.183 0.030 0.187 0.026 Zambia 1950 0.475 0.153 0.138 0.048 0.355 0.311 0.044 0.322 0.033 0.351 0.004 Zambia 1960 0.424 0.144 0.124 0.037 0.282 0.245 0.038 0.254 0.028 0.280 0.003 Zambia 1970 0.392 0.143 0.117 0.037 0.241 0.203 0.037 0.213 0.028 0.238 0.003 Zambia 1980 0.345 0.141 0.110 0.034 0.186 0.151 0.035 0.161 0.025 0.185 0.002 mean 0.590 0.258 0.222 0.211 1.126 0.936 0.190 0.985 0.141 0.942 0.099 median 0.565 0.246 0.194 0.190 0.546 0.452 0.110 0.464 0.072 0.577 0.045 stdev 0.203 0.121 0.115 0.133 1.267 1.091 0.200 1.138 0.160 0.938 0.246 min 0.147 0.041 0.026 0.027 0.045 0.041 0.003 0.043 0.001 0.044 0.001 max 0.935 0.556 0.495 0.688 6.470 5.879 0.792 6.233 0.752 4.247 1.522

Table 28: Correlation table for inequality measures

Panel A: 1950-1980 birth decades, unconditional Gini GE(2) Gini GE(2) district GE(2) district Gini GE(2) province GE(2) province Gini GE(2) ethnicity GE(2) ethnicity individual overall district within between province within between ethnicity within between Gini individual 1 GE(2) overall 0.821 1 Gini district 0.889 0.765 1 GE(2) district, within 0.799 0.997 0.726 1

25 GE(2) district, between 0.841 0.895 0.883 0.855 1 Gini province 0.878 0.789 0.939 0.751 0.898 1 GE(2) province, within 0.803 0.997 0.739 0.999 0.860 0.752 1 GE(2) province, between 0.788 0.824 0.799 0.781 0.959 0.894 0.776 1 Gini ethnicity 0.668 0.776 0.560 0.780 0.732 0.707 0.776 0.765 1 GE(2) ethnicity, within 0.845 0.982 0.788 0.977 0.981 0.866 0.981 0.970 0.708 1 GE(2) ethnicity, between 0.432 0.777 0.347 0.792 0.665 0.517 0.782 0.726 0.800 0.648 1

Panel B: 1950-1980 birth decades, net of country and birth decade fixed effects Gini GE(2) Gini GE(2) district GE(2) district Gini GE(2) province GE(2) province Gini GE(2) ethnicity GE(2) ethnicity individual overall district within between province within between ethnicity within between Gini individual 1 GE(2) overall -0.215 1 Gini district 0.615 0.074 1 GE(2) district, within -0.214 0.997 0.065 1 GE(2) district, between -0.172 0.786 0.123 0.736 1 Gini province 0.333 0.450 0.577 0.411 0.648 1 GE(2) province, within -0.216 0.996 0.070 1 0.730 0.401 1 GE(2) province, between -0.131 0.680 0.077 0.624 0.962 0.683 0.610 1 Gini ethnicity 0.531 0.223 0.531 0.244 0.065 0.526 0.237 0.115 1 GE(2) ethnicity, within -0.203 0.977 0.022 0.970 0.936 0.573 0.971 0.930 0.100 1 GE(2) ethnicity, between -0.052 0.836 0.315 0.851 0.663 0.624 0.849 0.678 0.521 0.711 1 All ethnicity measures are computed in the smaller, 14-country, sample. All correlations not involving ethnicity measures are based on the full, 23-country, sample (but excluding Guinea for panel B). Correlations are computed for the country-birth-decade-level data displayed in table 27. D.1.2.1 GE(2) decompositions country-by-country

Figure 37: Burkina Faso: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

6 1 6 1

.8 .8 4 4 .6 .6

.4 2 2

GE(2) index values GE(2) index values .4

.2 within and between share within and between share

.2 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

Figure 38: Botswana: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

.5 1 .5 1

.4 .8 .4 .8

.3 .6 .3 .6

.2 .4 .2 .4 GE(2) index values GE(2) index values

.1 .2 within and between share .1 .2 within and between share

0 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

26 Figure 39: Ethiopia: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

4 1 4 1

.8 .8 3 3

.6 .6 2 2 .4 .4

GE(2) index values 1 GE(2) index values 1 .2 within and between share .2 within and between share

0 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

Figure 40: Ghana: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

.5 1 .5 1

.4 .8 .4 .8

.3 .3 .6 .6

.2 .2 .4 .4 GE(2) index values GE(2) index values

.1 within and between share .1 .2 within and between share .2 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

27 Figure 41: Liberia: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

1.5 1 1.5 1

.8 .8

1 1 .6 .6

.4 .4 .5 .5 GE(2) index values GE(2) index values

.2 within and between share .2 within and between share

0 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

Figure 42: Morocco: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

1.5 1 1.5 1

.8 .8

1 1 .6 .6

.4 .4 .5 .5 GE(2) index values GE(2) index values

.2 within and between share .2 within and between share

0 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

28 Figure 43: Mali: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

4 1 4 1

.8 3 .8 3

.6 .6 2 2 .4 .4 GE(2) index values 1 GE(2) index values 1 within and between share .2 within and between share .2 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

Figure 44: Mozambique: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

1.5 1 1.5 1

.8 .8 1 1 .6 .6

.4 .5 .5 .4 GE(2) index values GE(2) index values

.2 within and between share within and between share .2 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

29 Figure 45: Malawi: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

1 .6 1 .6

.8 .8

.4 .4 .6 .6

.4 .4 .2 .2 GE(2) index values GE(2) index values

.2 within and between share .2 within and between share

0 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

Figure 46: Senegal: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

2 1 2 1

.8 1.5 .8 1.5

.6 .6 1 1 .4 .4 GE(2) index values .5 GE(2) index values .5 within and between share .2 within and between share .2 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

30 Figure 47: Sierra-Leone: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

2 1 2 1

.8 1.5 .8 1.5

.6 .6 1 1 .4 .4 GE(2) index values .5 GE(2) index values .5 within and between share .2 within and between share

.2 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

Figure 48: Uganda: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

.5 1 .5 1

.8 .4 .8 .4

.3 .3 .6 .6

.2 .2 .4 .4 GE(2) index values GE(2) index values

.1 within and between share .1 .2 within and between share .2 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

31 Figure 49: South Africa: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

1 1 .2 .2 .8 .8

.15 .15 .6 .6

.1 .1 .4 .4 GE(2) index values GE(2) index values

.05 .2 within and between share .05 .2 within and between share

0 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

Figure 50: Zambia: inequality (GE(2) index), individual and group components

(a) regions (b) ethnicities

.4 1 .4 1

.8 .8 .3 .3

.6 .6 .2 .2 .4 .4

GE(2) index values .1 GE(2) index values .1 .2 within and between share .2 within and between share

0 0 0 0 1950-1959 1960-1969 1970-1979 1980-1989 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort birth cohort

GE(2) overall within share GE(2) overall within share GE(2) within between share GE(2) within between share GE(2) between GE(2) between

32 Figure 51: Cameroon: inequality (GE(2) index), individual and group components

(a) regions

.5 .7

.4 .6

.3 .5

GE(2) index values .2 .4 within and between share

.1 .3

1950-1959 1960-1969 1970-1979 1980-1989 birth cohort

GE(2) overall within share GE(2) within between share GE(2) between

Figure 52: Egypt: inequality (GE(2) index), individual and group components

(a) regions

1 1 .8

.6

.5 .4 GE(2) index values

.2 within and between share

0 0 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort

GE(2) overall within share GE(2) within between share GE(2) between

33 Figure 53: Guinea: inequality (GE(2) index), individual and group components

(a) regions

3 1

.8

2 .6

.4 1 GE(2) index values

.2 within and between share

0 0 1950-1959 2.5 1960-1969 3.5 1970-1979 birth cohort

GE(2) overall within share GE(2) within between share GE(2) between

Figure 54: Kenya: inequality (GE(2) index), individual and group components

(a) regions

.4 1

.8 .3

.6 .2 .4

GE(2) index values .1 .2 within and between share

0 0 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort

GE(2) overall within share GE(2) within between share GE(2) between

34 Figure 55: Nigeria: inequality (GE(2) index), individual and group components

(a) regions

.8 .8

.6 .6

.4

.4 GE(2) index values .2 within and between share

.2 0 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort

GE(2) overall within share GE(2) within between share GE(2) between

Figure 56: Rwanda: inequality (GE(2) index), individual and group components

(a) regions

.8 1

.8 .6

.6 .4 .4

GE(2) index values .2 .2 within and between share

0 0 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort

GE(2) overall within share GE(2) within between share GE(2) between

35 Figure 57: Sudan: inequality (GE(2) index), individual and group components

(a) regions

3 .8

2.5

.6 2

1.5 .4 GE(2) index values

1 within and between share

.5 .2 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort

GE(2) overall within share GE(2) within between share GE(2) between

Figure 58: South Sudan: inequality (GE(2) index), individual and group components

(a) regions

1 6

.8

4 .6

.4 2 GE(2) index values

.2 within and between share

0 0 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort

GE(2) overall within share GE(2) within between share GE(2) between

36 Figure 59: Tanzania: inequality (GE(2) index), individual and group components

(a) regions

.5 1

.4 .8

.3 .6

.2 .4 GE(2) index values

.1 .2 within and between share

0 0 1950-1959 1960-1969 1970-1979 1980-1989 birth cohort

GE(2) overall within share GE(2) within between share GE(2) between

37 D.1.3 Methodological note: further decomposition of the GE(θ) index Recall the expression for the GE(θ) index

N " θ # 1 X si,g GE(θ)total = − 1 Nθ(θ − 1) S i=1 as well as its decomposition into the between-group and the within-group components

G " θ # G  θ  1−θ Ng " θ # 1 X Ng Sg X Sg Ng 1 X si,g GE(θ)total = − 1 + − 1 θ(θ − 1) N S S N N θ(θ − 1) S g=1 g=1 g i=1 g

= GE(θ)between + GE(θ)within.

Now suppose a “nested” structure for groups, in the sense that there are a number of super-groups, which in turn contain smaller sub-groups, which are made up of the individuals in the data. The super-groups will be referred to as “countries” (c ∈ {1,...C}) and the within-country sub-groups will still be called “groups” (g ∈ {1,...,Gc}). “Continental” inequality can be written as

C " θ # C  θ  1−θ 1 X Nc Sc X Sc Nc GE(θ) = − 1 + GE(θ) . continent θ(θ − 1) N S N c c=1 S c=1

Country-inequality GE(θ)c can in turn be decomposed as

Gc " θ # Gc  θ  1−θ Ngc " θ # 1 X Ngc Sgc X Sgc Ngc 1 X si,gc GE(θ)c = − 1 + − 1 . θ(θ − 1) N S S N N θ(θ − 1) S g=1 c c g=1 c c gc i=1 gc

We can therefore write continental inequality as

C " θ # 1 X Nc Sc GE(θ) = − 1 continent θ(θ − 1) N c=1 S | {z } between-countries C  θ  1−θ Gc " θ # X Sc Nc 1 X Ngc Sgc + − 1 S N θ(θ − 1) N c=1 g=1 c Sc | {z } within-countries-between-groups   C  θ  1−θ Gc  θ  1−θ Ngc " θ # X Sc Nc X Sgc Ngc 1 X si,gc +  − 1  . (11) S N S N N θ(θ − 1) S c=1 g=1 c c gc i=1 gc | {z } within-countries-within-groups

38 E Further evidence on regional, and ethnic inertia

E.1 Variance decompositions We furthered explored the role of regions and ethnicities in education conducting simple variance decom- positions, pooling all country observations and distinguishing between cohorts so to study dynamics. Appendix Table 29 reports the R2 of specifications where we pool all observations in each country and then examine the percentage of variance explained by birth-cohort fixed-effects, adding regional constants or ethnicity constants and adding both region fixed-effects and ethnicity fixed-effects . The table gives also the variance decomposition focusing solely on urban and rural households. Panel A focuses on spatial inequalities in the full sample of countries; Panel B examine the role regional and ethnic features in 14- country sample with ethnic identification information. Let us start with the birth-cohort fixed-effects specifications reported in Panel (a) - column (1). As all countries experience increases in education, a higher R2 in the birth-cohort specifications illustrates coun- tries that in relative terms experienced faster education improvements; these countries (R2) are Botswana (0.335), Kenya (0.218), Tanzania (0.182), Cameroon (0.173) and to a lesser extent in South Africa (0.139) and Rwanda (0.129). In contrast the variance explained by the birth-cohort constants ins quite low in countries with small improvements in education, such as Mali (0.032), Senegal (0.31) and Burkina Faso (0.47).. Overall, the R2 is higher in the specifications when we focus on urban households (column (4)), as compared to rural households (in (7)) pointing out that the expansion of education was for most countries relatively stronger in cities as compared to the countryside. However there are exceptions, most notably South Africa, where the expansion of schooling has been relatively stronger in rural areas. In Panel (a) - column (2) we add region constants so as to assess the role of location in explaining spatial differences in education in each country. Compared to the specifications with just birth-cohort constants, regions explain a large portion of the variance in Sudan, Sierra Leone, and Senegal and to a lesser extent in Burkina Faso, Mali, Guinea, and Nigeria. For example in Nigeria the R2 of the specification with both birth-cohort constants and the province (state) fixed-effects is close to 0.30, while when we only add birth-cohort constants to simply account for the general trend of rising education the R2 is 0.06. In contrast in Botswana the in-sample fit in the specification with both region and birth-cohort constants is 0.387, but with just birth-cohort constants the R2 is 0.335, showing that regions explain a relatively smaller portion of the variance. In Panel (b) we examine the role of ethnicity. In column (2) we add ethnicity fixed-effects to the simple specifications with just birth–cohort constants (reported in column (1)). The increase in R2 is considerable in Burkina Faso (from 0.03 to 0.21), Mozambique (0.05 to 0.20), Ghana (0.03 to 0.14) and Sierra Leone (0.036 to 0.084). Individual ethnic identification is relatively less important in Morocco (R2 goes from 0.089 to 0.107) and Zambia (0.86 to 0.95). In column (3) we report for comparison the in-sample fit of specifications with regional fixed-effects, so as to allow comparison with the ethnicity fixed-effects specifications. Due to high segregation levels in Africa (Alesina and Zhuravskaya (2012)) the region fixed-effects specifications also reflect -at least partially- ethnic features. The variance decomposition shows that ethnicity rather than regional features matter for education in Burkina Faso, South Africa, and Mozambique, while the reverse seems to be the case in Ethiopia, Zambia, Senegal, Morocco, and Mali. Column (4) gives the R2 of specifications that include both regional and ethnic fixed-effects on top of the birth-cohorts that capture general trends. So a high (low) R2 -compared to the simple models with just birth-cohort constants- shows relatively large differences (inequality) in education across regions and ethnic lines. In line with earlier evidence, spatial and ethnic disparities in education are especially large in Burkina Faso, in Mozambique, Sierra Leone, Ghana, Senegal, Mali and Ethiopia and relatively less important in South Africa, Morocco, Malawi, and Liberia.

39 Table 29: Variance decomposition

(a) full sample, all birth-decades FULL URBAN RURAL b br %∆b,br b br %∆b,br b br %∆b,br Botswana 0.335 0.387 13.4 Cameroon 0.173 0.362 52.2 0.113 0.23 50.9 0.074 0.273 72.9 Kenya 0.218 0.313 30.4 0.15 0.215 30.2 0.202 0.29 30.3 Nigeria 0.064 0.296 78.4 0.111 0.269 58.7 0.059 0.303 80.5 Sudan 0.016 0.249 93.6 0.059 0.114 48.2 0.014 0.166 91.6 Ethiopia 0.066 0.212 68.9 0.162 0.219 26 0.059 0.073 19.2 Tanzania 0.182 0.206 11.7 0.134 0.157 14.6 0.137 0.165 17 Malawi 0.115 0.194 40.7 0.067 0.099 32.3 0.114 0.171 33.3 Uganda 0.078 0.187 58.3 0.052 0.112 53.6 0.076 0.141 46.1 Zambia 0.086 0.177 51.4 0.069 0.075 8 0.068 0.08 15 Liberia 0.102 0.176 42 0.098 0.132 25.8 0.09 0.108 16.7 Ghana 0.055 0.174 68.4 0.033 0.083 60.2 0.027 0.128 78.9 South Africa 0.139 0.168 17.3 0.085 0.095 10.5 0.253 0.262 3.4 Sierra Leone 0.026 0.168 84.5 0.021 0.081 74.1 0.02 0.048 58.3 Rwanda 0.129 0.168 23.2 0.064 0.107 40.2 0.108 0.111 2.7 Mozambique 0.063 0.167 62.3 0.067 0.118 43.2 0.058 0.076 23.7 Morocco 0.103 0.157 34.4 Burkina Faso 0.047 0.157 70.1 0.05 0.081 38.3 0.023 0.033 30.3 Senegal 0.031 0.155 80 0.029 0.05 42 0.017 0.056 69.6 Mali 0.032 0.145 77.9 0.038 0.072 47.2 0.025 0.035 28.6 Guinea 0.035 0.13 73.1 0.051 0.071 28.2 0.016 0.024 33.3 Egypt 0.085 0.128 33.6 0.136 0.181 24.9 0.057 0.085 32.9 South Sudan 0.01 0.081 87.7 0.011 0.104 89.4 0.009 0.058 84.5 2 “b” = birth-cohort fixed effects only, “r” = region fixed effects. Columns %∆b,br show the increase in R (in percentages) from adding region fixed effects on top of birth cohort fixed effects.

(b) sample with ethnicity, all birth-decades FULL URBAN RURAL b be br ber %∆b,be %∆b,br b be br ber b be br ber Botswana 0.299 0.36 0.355 0.393 16.9 15.8 Burkina Faso 0.034 0.214 0.172 0.291 84.1 80.2 0.05 0.193 0.081 0.216 0.023 0.112 0.033 0.115 South Africa 0.137 0.25 0.167 0.266 45.2 18 0.083 0.18 0.094 0.19 0.252 0.278 0.262 0.286 Ethiopia 0.066 0.086 0.212 0.243 23.3 68.9 0.162 0.192 0.219 0.249 0.059 0.068 0.073 0.081 Mozambique 0.048 0.196 0.156 0.237 75.5 69.2 0.051 0.162 0.105 0.185 0.044 0.101 0.065 0.109 Uganda 0.078 0.161 0.187 0.207 51.6 58.3 0.052 0.089 0.112 0.126 0.076 0.149 0.141 0.166 Liberia 0.102 0.134 0.176 0.204 23.9 42 0.098 0.134 0.132 0.162 0.09 0.112 0.108 0.117 Sierra Leone 0.026 0.084 0.168 0.198 69 84.5 0.021 0.086 0.081 0.147 0.02 0.037 0.048 0.054 Zambia 0.086 0.095 0.177 0.189 9.5 51.4 0.069 0.077 0.075 0.083 0.068 0.079 0.08 0.088 Ghana 0.032 0.142 0.149 0.185 77.5 78.5 0.033 0.099 0.083 0.119 0.027 0.148 0.128 0.166 Malawi 0.084 0.125 0.161 0.177 32.8 47.8 0.042 0.075 0.06 0.091 0.084 0.122 0.136 0.144 Senegal 0.031 0.067 0.155 0.165 53.7 80 0.029 0.053 0.05 0.067 0.017 0.057 0.056 0.069 Morocco 0.089 0.107 0.142 0.154 16.8 37.3 Mali 0.032 0.054 0.145 0.152 40.7 77.9 0.038 0.064 0.072 0.087 0.025 0.035 0.035 0.04

Fixed Effects: b = birth decade, be = birth-decade and ethnicity, br = birth-decade and region, ber = birth-decade, ethnicity, and region. Columns %∆b,be & 2 %∆b,br show the increase in R (in percentages) from adding ethnicity and region fixed effects on top of birth cohort fixed effects respectively.

40 F Further evidence on IM at the country level

F.1 Variable construction for IM IPUMS provides a variable for the line number of father and mother in the household, but this variable exists for only one third of all observations in the sample, and far fewer of adults with completed schooling. To maximize the number of observation for whom we observe education in a previous generation, we therefore use the variable “relationship to household head” to identify the educational attainment of the generation previous to that of any given individual. This variable takes on 32 different values. We use these different categories to assign each individual to a “generation” within the household. Based on the generation assignment, each individual is assigned a value for s1 (her own education) and a value for s0 (the mean education level of individuals within the household of the generation immediately above). That is, an individual of generation “1” would be assigned the mean of the education of head, spouse, siblings of the head, and cousins of the head.

Table 30: Relationship to household head and generation assignment

relationship to head meaning generation relationship to head meaning generation 1000 Head 0 4500 Grandparent -2 2000 Spouse/partner 0 4600 Parent/grandparent/ascendant -1 3000 Child 1 4700 Aunt/uncle -1 3100 Biological child 1 4810 Nephew/niece 1 3200 Adopted child 1 4820 Cousin 0 3300 Stepchild 1 4900 Other relative, not elsewhere classified 4000 Other relative 5000 Non-relative 4100 Grandchild 2 5100 Friend/guest/visitor/partner 4110 Grandchild or great grandchild 2 5120 Visitor 4200 Parent/parent-in-law -1 5200 Employee 4210 Parent -1 5210 Domestic employee 4220 Parent-in-law -1 5330 Foster child 1 4300 Child-in-law 1 5600 Group quarters 4400 Sibling/sibling-in-law 0 5900 Non-relative, n.e.c. 4410 Sibling 0 6000 Other relative or non-relative 4430 Sibling-in-law 0 9999 Unknown

Table 31: Intergenerational links

previous generation observed previous generation’s education observed frequency percent cumulative frequency percent cumulative 2 parents 5,621,042 49.92 49.92 5,260,792 49.59 49.59 1 parent 4,514,207 40.09 90.02 4,263,777 40.19 89.78 2 parents, others 150,227 1.33 91.35 143,800 1.36 91.13 1 parent, others 112,033 1.00 92.35 108,248 1.02 92.15 1 other 346,289 3.08 95.42 332,538 3.13 95.29 2+ others 515,531 4.58 100.00 500,141 4.71 100.00 11,259,329 10,609,296 Frequency table for intergenerational links. “2 parents” means that the two individuals observed in the previous generation are an individual’s parents. “Others” means that the individuals are either not parents (aunts, uncles, parents in law etc.) or that they could be but cannot be clearly identified as an individual’s parents, for example if the individual is a niece of the head and the individuals in the older generation include siblings of the head.

Appendix table 32 shows that our way of creating the education of the previous generation is highly correlated with an analogous measure that uses the IPUMS variable to link children to their (likely) parents

41 Table 32: Correlation of previous generation average education with mothers’, fathers’, and parental average education

(1) (2) (3) estimation variable mothers fathers parental average unconditional years of schooling 0.933 0.932 0.993 unconditional educational attainment, fine 0.931 0.928 0.972 unconditional educational attainment, coarse 0.872 0.919 0.985 census FEs years of schooling 0.925 0.889 0.989 census FEs educational attainment, fine 0.906 0.927 1.025 census FEs educational attainment, coarse 0.860 0.885 0.981 This table shows standardized (“beta”) coefficients from regressing our measures of previous generation educ- tion for individual i on education of only the parents of individual i. In column (1), we regress our measure on the education of only the mother of individual i, in column (2) on that of the father, and in column (3) on the average of both parents. As with our measure, we allow fractional values for average years of schooling and round educational attainment (coarse and fine) to the nearest integer. The first three rows show unconditional estimates (hence the figures in the table are simply unconditional correlations), whereas rows four to six show estimates conditional on census (country-year) fixed effects.

and uses only these linked parents of individual i to compute the education of the previous generation. Our way of constructing previous generation education yields this measure for 19.5% of observations in the sample, whereas going the “parents-only” route results in only 16.7% of observations with parental education.

F.2 Country-level means and IM without fixed effects

Figure 60: Unconditional (no cohort and year fixed effects) country-level means and social IM

(b) Absolute IM: country-specific likelihood that children of (a) Relative IM: country-specific intercept and slopes uneducated parents become educated

1 uneducated = less than primary ETH .8

ZAF NGA MAR TZA c .8 .6 BWA KEN SDN BFAMLI SEN ZMB EGY GHACMR RWA MOZ .4 UGA LBR .6 GIN MWI CMR EGY MWI MAR SLE country-specific slope, b UGA BWA SLE SENSDN .2 MOZ KEN GIN ZMB BFAMLI ETH SSD GHA RWA NGA SSD TZA .4 share educated children of uneducated parents LBR ZAF 0 .49 1.68 2.87 4.06 5.26 6.45 7.64 .4 .5 .6 .7 .81 .91 1.01

country-specific intercept, ac share uneducated old

42 F.3 Migration How does IM in education relate to physical IM, i.e. migration? Most censuses record an individual’s administrative region of birth within the country. We use this information to define as a migrant an individual who, at the time of the census, resides outside her birth region. Table 33 shows the frequencies of parent-child combinations in terms of the migrant status of both.

Table 33: Migrant-status parent-child combinations, irrespective of parental educational attainment

parents children non-migrants migrants non-migrants 6,542,340 1,118,193 migrants 174,068 917,401

Table 34 shows unconditional estimates of absolute IM (restricting attention to parent-kid pairs in which the parents have less than primary schooling) by migration status of the pair.

Table 34: Unconditional likelihood that children of parents without primary complete at least primary

parents children non-migrants migrants non-migrants 0.416 0.552 migrants 0.471 0.456

Table 35 repeats the exercise conditional on birth cohort fixed effects.

Table 35: Likelihood that children of parents without primary complete at least primary, conditional on birth decade fixed effects

parents kids non-migrants migrants non-migrants 0.178 0.338 migrants 0.243 0.254

43 F.4 More details on sample selection F.4.1 Share of individuals with higher education

Figure 61: Share of individuals with higher education

(a) by birth cohort in our sample (b) by year from Barro and Lee (2011)

20 20

15 15

10 10

5 university completed in percent 5 university completed in percent

0 0

1950-1959 1960-1969 1970-1979 1980-1989 1950 1960 1970 1980 1990 2000 2010 birth decade year

F.4.2 Likelihood of co-residence by age

Figure 62: Likelihood of co-residence (observing parental education) by age

.8

.6

.4

.2

0 likelihood of observing parental education -.2

15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 age

unconditional conditional on country and census-year FEs

44 G Further evidence on IM at the region level

G.1 Summary statistics for IM at the admin-2 district level

Table 36: Summary statistics for simple IM at the district level

Variable Estimation N Mean Median Std. Dev. Min Max P(parents less than primary) unconditional 2444 0.770 0.807 0.198 0.063 1.000 P(at least primary|parents less than primary) unconditional 2444 0.380 0.382 0.242 -0.000 0.934 P(parents less than primary) year and birth-cohort FEs 2444 0.834 0.870 0.178 0.111 1.236 P(at least primary|parents less than primary) year and birth-cohort FEs 2444 0.187 0.173 0.188 -0.274 0.716

45 G.2 Regional heterogeneity in relative IM, country-level

Table 37: Intergenerational persistence by country, regional heterogeneity

country provincial persistence estimates provincial persistence R2 (1) (2) (3) (4) (5) (6) (7) (8) (p,OLS) (p,F E) 2 2 2 country βˆc βˆc βˆc min[βˆp] max[βˆp] R (2) R (3) R (4)-(5) Botswana 0.446 0.417 0.372 0.239 0.505 0.385 0.397 0.416 Burkina Faso 0.693 0.691 0.530 0.424 0.886 0.285 0.363 0.369 Cameroon 0.533 0.548 0.324 0.278 0.420 0.260 0.387 0.389 Egypt 0.533 0.535 0.478 0.337 0.726 0.153 0.177 0.179 Ethiopia 0.946 0.962 0.621 0.263 0.919 0.215 0.329 0.343 Ghana 0.428 0.435 0.321 0.259 0.552 0.197 0.275 0.288 Guinea 0.606 0.610 0.435 0.354 0.714 0.138 0.234 0.239 Kenya 0.502 0.513 0.421 0.294 0.642 0.279 0.324 0.336 Liberia 0.405 0.391 0.321 0.252 0.489 0.161 0.204 0.207 Malawi 0.563 0.551 0.499 0.235 0.611 0.282 0.304 0.313 Mali 0.731 0.726 0.558 0.412 0.727 0.224 0.306 0.315

46 Morocco 0.805 0.804 0.692 0.394 0.954 0.130 0.194 0.200 Mozambique 0.642 0.631 0.538 0.441 0.649 0.274 0.314 0.317 Nigeria 0.435 0.429 0.309 0.124 0.890 0.255 0.351 0.376 Rwanda 0.443 0.458 0.437 0.396 0.497 0.116 0.123 0.124 Senegal 0.736 0.755 0.591 0.477 0.873 0.216 0.295 0.302 Sierra Leone 0.579 0.554 0.407 0.313 0.602 0.217 0.283 0.287 South Africa 0.355 0.356 0.350 0.264 0.418 0.233 0.244 0.261 South Sudan 0.481 0.456 0.392 0.265 0.499 0.137 0.190 0.193 Sudan 0.767 0.768 0.526 0.355 0.958 0.248 0.364 0.375 Tanzania 0.436 0.409 0.386 0.189 0.493 0.171 0.193 0.198 Uganda 0.540 0.541 0.451 0.280 0.724 0.218 0.278 0.281 Zambia 0.448 0.470 0.404 0.339 0.462 0.213 0.242 0.268 Note that columns (1) and (2) differ because cohort- and year-effects are country specific in the latter. Note also that the proper comparison for columns (4) and (5) is columns (3) and not column (1). This is because the specifications that allow the slopes to vary by province also include province fixed effects. Hence to make the slope ranges comparable, these fixed effects must also be included in the specifications where we do not let the slopes vary and only estimate one coefficient per country. Finally, note that columns-(6)-(8) record the R2 from the regressions in the corresponding columns with which they are labelled. G.3 Regional heterogeneity in absolute IM, country-level

Table 38: Absolute mobility by country, provincial heterogeneity, “less-than-primary”-definition of illiteracy

Fraction children of Fraction of parents parents without primary without primary who complete primary (1) (2) (3) (4) (5) (6) y y y o o o countryα ˆc min[ˆαp] max[ˆαp]α ˆc min[ˆαp] max[ˆαp] Botswana 0.608 0.476 0.781 0.677 0.336 0.852 Burkina Faso 0.155 0.040 0.469 0.909 0.698 0.986 Cameroon 0.433 0.159 0.743 0.678 0.435 0.873 Egypt 0.463 0.309 0.642 0.837 0.709 0.931 Ethiopia 0.161 0.050 0.759 0.960 0.721 0.992 Ghana 0.448 0.194 0.628 0.655 0.374 0.936 Guinea 0.175 0.078 0.447 0.920 0.763 0.976 Kenya 0.557 0.189 0.735 0.614 0.296 0.966

47 Liberia 0.363 0.196 0.505 0.648 0.465 0.834 Malawi 0.285 0.166 0.538 0.758 0.502 0.881 Mali 0.157 0.000 0.497 0.912 0.686 1.000 Morocco 0.290 0.187 0.521 0.946 0.891 0.978 Mozambique 0.205 0.113 0.464 0.888 0.727 0.945 Nigeria 0.669 0.239 0.934 0.540 0.134 0.869 Rwanda 0.440 0.383 0.566 0.718 0.520 0.794 Senegal 0.229 0.083 0.484 0.872 0.687 0.971 Sierra Leone 0.239 0.108 0.586 0.788 0.466 0.958 South Africa 0.722 0.639 0.807 0.449 0.225 0.655 South Sudan 0.100 0.029 0.313 0.905 0.753 0.973 Sudan 0.240 0.040 0.623 0.889 0.637 0.983 Tanzania 0.648 0.493 0.827 0.638 0.398 0.723 Uganda 0.399 0.042 0.725 0.715 0.245 0.955 Zambia 0.479 0.382 0.652 0.507 0.326 0.644 This table displays only unconditional probabilities (not conditioning on cohort / census-year fixed effects). It is based on the “less-than-primary” definition of illiteracy (as opposed to the “no-schooling” definition). H Further evidence on IM at the ethnicity level

Table 39 shows ethnicity level measures of average schooling and intergenerational IM in education. In each table, the simple social IM coefficient is βˆe from equation 2. Note that the country-level figures for the percentage of old with no schooling and the percentage of literate children of illiterate old may differ from those in table 38 since they are based on only those observations for which ethnicity is available. There are several types of “Other” groups. (1) “Other IPUMS” denotes observations that did not have an ethnic or language identifier but were labelled by the census itself as “other”. (2) “Other small” denotes groups that individually did not account for 1% of the population and therefore were classified by us as “Other”. (3) “Other African” denotes nationals of other African countries without an ethnic identifier such as “Liberian”. (4) “Other non-African” denotes national of other non-African countries such as “Indian”. Table 39: Summary statistics and IM at the ethnicity level

ethnicity population mean old years percentage old percentage literate simple social share of schooling with no schooling children of illiterate old IM coefficient

Botswana 3.843 34.7 79.7 0.3 English 2.9 10.422 5.2 96.4 0.198 Kalanga / Sekalaka 7.4 4.156 27.2 82.3 0.227 Other African 3.2 3.62 44.7 78.5 0.285 Sekgalagadi / Sengologa 3.1 2.533 55 77.6 0.288 Setswana 77.4 3.866 32.9 81.8 0.292 Sembukushu 1.5 0.901 78.5 73.2 0.348 Other non-African 1.2 6.762 20.8 85 0.422 Sesarwa 1.7 0.686 83.8 55.8 0.508 Zezuru / Shona 1.7 5.239 39.5 36.7 0.635

Burkina Faso 1.225 79.1 31.7 0.548 French 1.8 9.032 15.3 94.4 0.132 Dioula 7.5 2.585 57.6 50.4 0.435 Samo 1.9 1.45 71 41.2 0.509 Bwa 2.2 1.202 71.6 35.6 0.532 Tuareg 1 0.178 97.9 3.5 0.562 Other small 1.2 0.899 84.1 29.8 0.606 Mossi 51.3 1.112 79.9 36.1 0.62 Bissa 3.1 0.634 87 29.6 0.654 Gurunsi 6.1 0.792 84.7 29.3 0.659 Bobo 2.6 0.66 82.8 30.4 0.667 Other IPUMS 5.1 0.439 89.8 22.8 0.692 Peul 9.3 0.152 96.1 6.8 0.708 Senoufo 1.3 0.367 89.9 18.2 0.898 Gurma 5.7 0.388 91.5 14.6 0.959

Ethiopia .416 87.4 29.4 0.864 Hadiya 1.6 0.343 86.9 48.7 0.447 Gedeo 1.3 0.392 83.1 33.8 0.511 Welayta 2.6 0.349 89.1 40.8 0.573 Sidama 3.5 0.329 87.6 35.2 0.611 Gurage 2.1 0.31 88.7 37.8 0.644 Kefficho 1.3 0.147 94.6 25.6 0.65 Somali 1.5 0.168 96 8.1 0.671 Other small 8.6 0.225 92.1 25.8 0.703 Oromo 33.3 0.198 91.7 26.9 0.741 Affar 0.5 0.058 97.8 5.2 0.749 Silte 1.6 0.197 91.5 26.4 0.769 Gamo 1.4 0.156 94.5 21.3 0.77 Amhara 36 0.736 80.5 31.9 0.919 Tigray 4.7 0.342 90 34.5 0.938

Ghana 4.094 53.3 52.2 0.335 Akan 48.3 5.472 39.9 67.3 0.287 Ewe 13.8 4.98 42.5 65.6 0.319

48 Ga-Dangme 8 5.511 39.8 61.1 0.375 Guan 3.9 3.667 57.8 49 0.412 Other 2.2 1.878 73.6 47.4 0.413 Mande 1.1 1.415 78.5 46.6 0.43 Grusi 2.7 1.679 76.6 43.7 0.437 Mole-Dagbani 15.6 1.258 82.3 35.4 0.456 Gurma 4.3 0.85 86.3 33.3 0.485

Liberia 3.159 59 50.1 0.384 Other non-African 1 7.85 25 60 0.75 Kru 6.7 4.448 44.8 54.6 0.349 Grebo 9 4.027 47.6 56 0.369 Other African 3 4.014 54.8 52.7 0.368 Vai 3.8 3.647 55.5 47.9 0.382 Sapo 0.9 3.271 56.1 61.5 0.33 Mano 7.6 3.128 56.5 57.1 0.284 Mende 1.1 3.741 56.8 44.9 0.348 Bassa 14 3.209 57.7 42.5 0.469 Lorma 5.5 3.44 57.9 55 0.332 Kpelle 20.1 2.694 64.1 39.5 0.476 Gola 4.5 2.89 64.7 45.8 0.389 Gio 8 2.574 65.3 62.8 0.284 Krahn 4.3 2.581 65.8 60.1 0.317 Gbandi 2.9 2.761 67.2 49.6 0.407 Mandingo 3.5 2.158 67.7 46.3 0.354 Kissi 4.4 2.402 68.5 45.8 0.454

Morocco .685 84.6 49.6 0.747 Arabic 74.8 0.808 82.3 53.6 0.73 Other 0.8 0.631 87.2 55.3 0.719 Hassania 0.3 0.475 90.5 70.7 0.496 Tarifite 4.3 0.34 90.5 40.9 0.919 Tchalhit 12.3 0.278 92 35.3 0.976 Tmazight 7.6 0.282 92.1 38.5 0.881

Mali .82 84.4 22 0.685 Bambara, Malinke, Dioula 49.8 1.11 79.7 28.7 0.685 Bobo, Dafing 2.6 0.864 81.4 25.7 0.611 Kassonge 1.3 0.755 83.4 25.9 0.795 Songhai, Djerma 6.3 0.799 85 20.6 0.635 OtherSMALL 1.9 0.65 88.6 16.1 0.735 Senoufo 2.8 0.471 90 23.1 0.625 Marka, Soninke 6.3 0.408 90.4 17.4 0.643 Minianka 4.2 0.394 90.5 20.2 0.579 Dagon, Kado 7.3 0.406 91.5 13.9 0.68 Pulaar, Peulh, Fulbe 10.4 0.403 91.9 9.2 0.747 Bozo, Somono 1.9 0.287 92.8 8.1 0.632 Tamasheq, Bellah 3.9 0.3 94.3 7.8 0.638 Maure 1.3 0.229 95.1 5.5 0.818

Mozambique 1.99 43.9 59.7 0.436 Portuguese 9.2 4.229 14.9 92.2 0.389 Xirhonga 1.8 2.303 29.6 82.8 0.382 Bitonga 1.7 1.878 39.8 79.2 0.353 Cishona 1.1 1.886 44.3 73.8 0.422 Xitsonga 11.1 1.743 44.3 69.9 0.461 Cichopi 2.2 1.638 45.5 72.3 0.428 Elomwe 7.2 1.441 46.5 49.8 0.449 Chitewe 1.5 1.563 47.2 77.2 0.352 Cinyungwe 2.8 1.667 48.1 70.1 0.408 Emakhuwa 26.7 1.576 48.2 43.9 0.484 Other small 2.3 1.679 51 55.6 0.536 Echuwabo 5.8 1.501 51.1 54.2 0.434 Cinyanja 5.6 1.217 56.1 43 0.497 Sena 7.7 1.224 56.8 56.2 0.537 Xitswa 4.8 1.22 57.5 62.8 0.462 Shimakonde 1.9 1.083 64.1 46.6 0.434

49 Ciyao 2.1 1.104 65 34.9 0.514 Cindau 4.7 0.905 65.8 46.9 0.591

Malawi 4 29.3 65.1 0.478 Tumbuka 8.8 5.815 13.3 80.4 0.362 Other small 0.8 5.317 17.9 75 0.458 Ngonde 1 5.368 19.8 76 0.407 Tonga 2 5.035 22 75 0.37 Other IPUMS 2.7 4.607 23.4 65.6 0.497 Ngoni 11.6 4.564 23.8 69.5 0.481 Lomwe 17.8 3.919 29 68.6 0.464 Chewa 32.5 3.649 30.3 61.4 0.509 Nyanja 5.8 3.516 32.7 67.9 0.469 Sena 3.7 2.836 42.6 61.8 0.517 Yao 13.4 2.96 45.4 61.5 0.533

Senegal .993 82.3 27.3 0.702 Lebou 1.6 2.851 51.4 60 0.453 Other IPUMS 2.2 2.882 62.6 41 0.664 Jola 6.9 1.613 69.6 54.5 0.523 Toucouleur 6.4 1.374 76.6 34 0.694 Soninke 1.6 1.251 77.5 36.6 0.609 Mandinka 5.3 1.083 80.7 31.9 0.67 Wolof 40.5 0.984 82.8 24.9 0.745 Serer 15.3 0.839 84 30.3 0.738 Other small 1.5 0.791 85.3 29.9 0.641 Fula 18.7 0.462 91.1 16.6 0.783

Sierra-Leone 1.784 72.3 35.4 0.487 Krio 1.6 8.414 13.9 69.6 0.386 Other small 0.7 3.543 51.8 34.5 0.565 Mandingo 2.3 2.41 63.2 48.5 0.458 Mende 33 2.122 66.3 37.1 0.466 Sherbro 2.3 2.213 69.2 26.4 0.616 Fullah 3.5 1.827 70.4 49.3 0.421 Kono 4.5 1.597 73.4 36.6 0.517 Loko 2.7 1.56 76.6 32.7 0.562 Temne 30.9 1.394 77.2 35.4 0.498 Kissi 2.5 1.198 78.3 41 0.431 Susu/Yalunka 3.5 1.301 78.3 32.4 0.513 Limba 8.5 1.305 79 33.6 0.529 Koranko 3.9 0.499 91 16.6 0.58

Uganda 2.571 47.4 62.1 0.458 Baganda 15.9 5.093 20.3 80.5 0.42 Other non-African 0.1 5.516 32.4 78.8 0.383 Banyoro 3.3 3.336 35.3 72.4 0.442 Bagisu 4.2 3.357 35.3 72.4 0.446 Langi 5.4 2.799 40 75 0.371 Iteso 6.2 2.848 41.2 70.4 0.423 Batoro 2.6 2.806 41.6 65 0.515 Basoga 7.8 2.998 44 66 0.466 Bagwere 1.6 2.624 45.3 59.8 0.528 Banyole 1.2 2.491 45.3 63.4 0.506 Jopadhola 1.4 2.723 46.1 70.8 0.433 Alur 2.5 2.165 46.5 62.1 0.455 Acholi 4.7 2.608 48 69.2 0.415 Madi 1.7 2.399 49.8 74.7 0.414 Other small 10.5 2.253 50.5 63.8 0.492 Banyakole 10.2 1.873 52.4 70.7 0.505 Lugbara 4 2.056 53.3 69.9 0.446 Bakhonzo 2.7 1.779 54.4 63.5 0.518 Bakiga 7.7 1.694 56.1 69.6 0.497 Bafumbir 1.6 1.656 58.1 57.3 0.597 Banyarwanda 1.5 1.512 61.8 59.6 0.503 Karamojong 3.1 0.326 93 7 0.679

50 South Africa 5.289 29 77.5 0.33 English 9.9 8.889 7 89.1 0.179 Afrikaans 14.8 7.523 9.4 83.7 0.38 Other 1.3 6.962 19.2 79.5 0.33 Sesotho 8.2 5.219 23.7 82 0.278 Setswana 8.3 5.081 27.5 74.5 0.391 Xhosa 16.7 5.008 28.5 74 0.364 Zulu 21.3 4.539 34.1 74.6 0.344 Sepedi 8.8 4.069 44 79.3 0.272 Siswati 2.4 3.669 47.1 81 0.247 Ndebele 1.8 3.494 47.7 83.2 0.232 Tshivenda 2.2 3.747 48.5 84.5 0.186 Xitsonga 4.3 3.448 49.6 78.2 0.284

Zambia 4.451 30.5 53.1 0.441 Bemba 25.9 5.214 22.6 56 0.397 Tumbuka 5.2 4.702 28 58.1 0.416 Tonga 15.9 4.594 28.3 55.9 0.377 Lamba 2.2 4.317 28.9 52.7 0.391 Mambwe 6 4.508 29.3 54.3 0.427 Kaonde 3 4.106 32.2 50.9 0.429 Lala 5.5 3.909 32.5 50.4 0.433 Lozi 8 4.404 33.1 53.5 0.416 Other IPUMS 2.2 4.104 35.1 48.2 0.427 Nyanja 18.5 4.045 35.7 51.4 0.429 Lunda 7.6 3.356 42.8 48.6 0.439

51