EMPIRICAL PROJECT 5 MEASURING INEQUALITY: LORENZ CURVES AND GINI COEFFICIENTS

LEARNING OBJECTIVES In this project you will:

• draw Lorenz curves and interpret the • calculate and interpret alternative measures of income inequality • research other dimensions of inequality and how they are measured.

Key concepts

• Concepts needed for this project: ratio and decile. • Concepts introduced in this project: Gini coefficient and Lorenz curve.

INTRODUCTION There are many criteria that policymakers can use to assess outcomes or allocations of economic interactions, in order for them to evaluate which CORE PROJECTS outcome is ‘better’ than the others. One important criterion for assessing This empirical project is related to an allocation is efficiency, and another is fairness. Outcomes that eco- material in: nomists would define as ‘efficient’—those that cannot make one person • Unit 5 (https://tinyco.re/ better off without making someone else worse off—may be undesirable 5600166) of Economy, Society, because they are unfair. To read more about how economists use the and Public Policy word ‘efficiency’, see Section 3.3 (https://tinyco.re/2876321) in Economy, • Unit 5 (https://tinyco.re/ Society, and Public Policy. 5986623) and Unit 19 (https://tinyco.re/1408798) of The Economy.

259 EMPIRICAL PROJECT 5 MEASURING INEQUALITY: LORENZ CURVES AND GINI COEFFICIENTS

For example, a situation where a small fraction of the population lives in Lorenz curve A graphical luxury and everybody else struggles to survive may be efficient, but few representation of inequality of people would say it is desirable due to the vast inequality between the rich some quantity such as or and poor. In this case, policymakers might intervene by implementing a tax income. Individuals are arranged in system where richer people pay a greater proportion of their income than ascending order by how much of poorer people (a progressive tax), and some revenue collected in taxes is this quantity they have, and the transferred to the poor. Empirical evidence on people’s views about the cumulative share of the total is fairness of the income and further discussion of the concept of then plotted against the fairness can be found in Sections 3.4 (https://tinyco.re/7883386) and 3.5 cumulative share of the population. (https://tinyco.re/7126396) of Economy, Society, and Public Policy. For complete equality of income, To assess inequality economists often use a measure called the Gini for example, it would be a straight coefficient, which is based on the differences in incomes, wealth, or some line with a slope of one. The extent other measure between people. We will first look at how the Gini coeffi- to which the curve falls below this cient is calculated and compare it with other measures of inequality perfect equality line is a measure between the rich and poor, such as the 90/10 ratio. We will also use Lorenz of inequality. See also: Gini coeffi- curves to show the entire distribution of income in a country. Then, we cient. will look gender inequality to see how this dimension can be measured. Finally, we will look at how inequality can be accounted for in indices of wellbeing, such as the Human Development Index (HDI). Gini coefficient A measure of To learn more about how the Gini coefficient is calculated from differ- inequality of any quantity such as ences in people’s endowments, see Section 5.8 (https://tinyco.re/5748024) income or wealth, varying from a of Economy, Society, and Public Policy. value of zero (if there is no inequal- ity) to one (if a single individual receives all of it).

260 EMPIRICAL PROJECT 5 WORKING IN R

R-SPECIFIC LEARNING OBJECTIVES In addition to the learning objectives for this project, in this section you will learn how to use loops to repeat specified tasks for a list of values (Note: this is an extension task so may not apply to all users).

GETTING STARTED IN R For this project you will need the following packages:

• tidyverse , to help with data manipulation • readxl , to import an Excel spreadsheet • ineq , to calculate inequality measures • reshape2 , to rearrange a dataframe.

If you need to install any of these packages, run the following code:

install.packages(c("readxl","tidyverse", "ineq","reshape2"))

You can import these libraries now, or when they are used in the R walk- throughs below.

library(readxl) library(tidyverse) library(ineq) library(reshape2)

PART 5.1 MEASURING INCOME INEQUALITY One way to visualize the in a population is to draw a Lorenz curve. This curve shows the entire population lined up along the horizontal axis from the poorest to the richest. The height of the curve at any point on the vertical axis indicates the fraction of total income received by the fraction of the population given by that point on the horizontal axis.

277 EMPIRICAL PROJECT 5 WORKING IN R

We will start by using income decile data from the Global Consumption and Income Project to draw Lorenz curves and compare changes in the income distribution of a country over time. Note that income here refers to market income, which does not take into account taxes or government transfers (see Section 5.9 (https://tinyco.re/1276323) of Economy, Society, and Public Policy for further details). To answer the question below:

• Go to the Globalinc website (http://tinyco.re/9553483) and download the Excel file containing the data by clicking ‘xlsx’. • Save it in an easily accessible location, such as a folder on your Desktop or in your personal folder. • Import the data into R as explained in R walk-through 5.1.

R WALK-THROUGH 5.1 Importing an Excel file (either .xlsx or . xls format) into R As we are dealing with an Excel file, we use the read_excel function from the readxl package. The file is called GCIPrawdata.xlsx , but before you import the file into R, open the datafile in Excel to under- stand its structure. You will see that the data is on one worksheet (which is convenient), and that the headings for the variables are in the third row. Hence we will use the skip = 2 option in the read_excel function to skip the first two rows.

library(tidyverse) library(readxl) decile_data <- read_excel("GCIPrawdata.xlsx", skip = 2)

The data is now in a tibble (like a spreadsheet for R). Let’s look at the first few rows:

head(decile_data)

## # A tibble: 6 x 14 ## Country Year `Decile 1 Income` `Decile 2 Income` `Decile 3 Income` ## ## 1 Afghanistan 1980 206 350 455 ## 2 Afghanistan 1981 212 361 469 ## 3 Afghanistan 1982 221 377 490 ## 4 Afghanistan 1983

278 PART 5.1 MEASURING INCOME INEQUALITY

238 405 527 ## 5 Afghanistan 1984 249 424 551 ## 6 Afghanistan 1985 256 435 566 ## # ... with 9 more variables: `Decile 4 Income` , `Decile 5 ## # Income` , `Decile 6 Income` , `Decile 7 Income` , ## # `Decile 8 Income` , `Decile 9 Income` , `Decile 10 ## # Income` , `Mean Income` , Population

As you can see, we have an entry (row) for every country and every year. The first entry (for Afghanistan in the Year 1980) is 206, and it is the value for the variable Decile 1 Income . This value indicates that the mean annual income of the poorest 10% in Afghanistan was the equivalent of 206 US Dollars (in 1980, adjusted using purchasing power parity). The mean income of the next richest 10% (those in the 11th to 20th percentiles for income) was 350. To see the list of variables, we examine the structure of decile_data .

str(decile_data)

## Classes 'tbl_df', 'tbl' and 'data.frame': 4799 obs. of 14 variables: ## $ Country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ... ## $ Year : num 1980 1981 1982 1983 1984 ... ## $ Decile 1 Income : num 206 212 221 238 249 256 268 243 223 202 ... ## $ Decile 2 Income : num 350 361 377 405 424 435 457 414 380 344 ... ## $ Decile 3 Income : num 455 469 490 527 551 566 594 539 493 447 ... ## $ Decile 4 Income : num 556 574 599 644 674 692 726 659 603 547 ... ## $ Decile 5 Income : num 665 686 716 771 806 828 869 788 722 654 ... ## $ Decile 6 Income : num 793 818 854 919 961 ... ## $ Decile 7 Income : num 955 986 1029 1107

279 EMPIRICAL PROJECT 5 WORKING IN R

1157 ... ## $ Decile 8 Income : num 1187 1225 1278 1376 1438 ... ## $ Decile 9 Income : num 1594 1645 1717 1848 1932 ... ## $ Decile 10 Income: num 3542 3655 3814 4105 4291 ... ## $ Mean Income : num 1030 1063 1109 1194 1248 ... ## $ Population : num 13211412 12996923 12667001 12279095 11912510 ...

In addition to the country, year, and the ten income deciles we have mean income and the population.

1 Choose two countries. You will be using their data, for 1980 and 2014, as the basis for your Lorenz curves. Use the country data you have selected to calculate cumulative income shares. (Remember that each decile represents 10% of the population.)

R WALK-THROUGH 5.2 Calculating cumulative shares using the cumsum function Here we have chosen China (a country that recently underwent enormous economic changes) and the US (a developed country).

sel_Year <- c(1980,2014) sel_Country <- c("United States","China") temp <- subset(decile_data, (decile_data$Country %in% sel_Country) & (decile_data$Year %in% sel_Year)) # Select the data for the chosen country and years temp

## # A tibble: 4 x 14 ## Country Year `Decile 1 Income` `Decile 2 Income` `Decile 3 Incom~ ## ## 1 China 1980 79 113 146 ## 2 China 2014 448 927 1440 ## 3 United States 1980 3392 5820 7855

280 PART 5.1 MEASURING INCOME INEQUALITY

## 4 United States 2014 3778 6534 9069 ## # ... with 9 more variables: `Decile 4 Income` , `Decile 5 ## # Income` , `Decile 6 Income` , `Decile 7 Income` , ## # `Decile 8 Income` , `Decile 9 Income` , `Decile 10 ## # Income` , `Mean Income` , Population

Before we calculate cumulative income shares, we need to calculate the total income using the mean income and the population size.

print("Total incomes are:")

## [1] "Total incomes are:"

total_income <-temp[,"Mean Income"]*temp[,"Population"] total_income

## Mean Income ## 1 2.472624e+11 ## 2 6.609944e+12 ## 3 3.366422e+12 ## 4 6.401280e+12

These numbers are very large, so for our purpose it is easier to assume that there is only one person in each decile, in other words the total income is 10 times the mean income. This simplification works because, by definition, each decile has exactly the same number of people (10% of the population). We will be using the very useful cumsum function to calculate the cumulative income. To see what this function does, look at this simple example.

test <- c(2,4,10,22) cumsum(test)

## [1] 2 6 16 38

281 EMPIRICAL PROJECT 5 WORKING IN R

With this functionality in mind, we now calculate the cumulative income shares for China (1980).

decs_c80 <- unlist(temp[1,3:12]) # Pick the 10 deciles (Columns 3 to 12) in Row 1 (China, 1980) # The unlist function transforms temp[1,3:12] from a tibble to simple vector with data which simplifies the calculations. total_inc <- 10*unlist(temp[1,"Mean Income"]) # Give the total income, assuming a population of 10 cum_inc_share_c80 = cumsum(decs_c80)/total_inc cum_inc_share_c80

## Decile 1 Income Decile 2 Income Decile 3 Income Decile 4 Income ## 0.03134921 0.07619048 0.13412698 0.20436508 ## Decile 5 Income Decile 6 Income Decile 7 Income Decile 8 Income ## 0.28769841 0.38492063 0.49841270 0.63174603 ## Decile 9 Income Decile 10 Income ## 0.79206349 0.99841270

We repeat the same process for China in 2014 and for the US in 1980 and 2014.

# For China, 2014 decs_c14 <- unlist(temp[2,3:12]) # Go to Row 2 (China, 2014) total_inc <- 10*unlist(temp[2,"Mean Income"]) # Give the total income, assuming a population of 10 cum_inc_share_c14 = cumsum(decs_c14)/total_inc

# For the US, 1980 decs_us80 <- unlist(temp[3,3:12]) # Select Row 3 (USA, 1980) total_inc <- 10*unlist(temp[3,"Mean Income"]) # Give the total income, assuming a population of 10 cum_inc_share_us80 = cumsum(decs_us80)/total_inc

282 PART 5.1 MEASURING INCOME INEQUALITY

# For the US, 2014 decs_us14 <- unlist(temp[4,3:12]) # Select Row 4 (USA, 2014) total_inc <- 10*unlist(temp[4,"Mean Income"]) # Give the total income, assuming a population of 10 cum_inc_share_us14 = cumsum(decs_us14)/total_inc

2 Use the cumulative income shares to draw Lorenz curves for each country in order to visually compare the income distributions over time.

(a) Draw a line chart with cumulative share of population on the hori- zontal axis and cumulative share of income on the vertical axis. Make sure to include a chart legend, and label your axes and chart appropriately.

(b) Follow the steps in R walk-through 5.3 to add a straight line representing perfect equality to each chart. (Hint: If income was shared equally across the population, the bottom 10% of people would have 10% of the total income, the bottom 20% would have 20% of the total income, and so on.)

R WALK-THROUGH 5.3 Drawing Lorenz curves Let us plot the cumulative income shares for China (1980).

plot(cum_inc_share_c80, type = "l",col="blue",lwd = 2, ylab="Cumulative income share") abline(a=0,b=0.1,col="black",lwd=2) # Add the perfect equality line title("Lorenz curve, China, 1980")

283 EMPIRICAL PROJECT 5 WORKING IN R

Figure 5.1 Lorenz curve, China, 1980.

The blue line is the Lorenz curve. The Gini coefficient is the ratio of the area between the two lines and the total area under the black line. We will calculate that in the R walk-through 5.4. Now we add the other Lorenz curves to the chart.

plot(cum_inc_share_c80, type = "l",col="blue", lty = 2, lwd=2, xlab = "Deciles", ylab="Cumulative income share") abline(a=0,b=0.1,col="black",lwd=2) # Add the perfect equality line lines(cum_inc_share_c14,col="green",lty = 1,lwd=2) # lty = 2 = solid line lines(cum_inc_share_us80,col="red", lty = 2,lwd=2) # lty = 1 = dashed line lines(cum_inc_share_us14,col="orange", lty = 1,lwd=2) title("Lorenz curves, China and the US (1980 and 2014)") legend("topleft", legend=c("China, 1980", "China, 2014", "US, 1980", "US, 2014"), col=c("blue", "green","red", "orange"), lty=2:1, lwd=2,cex=1.2)

284 PART 5.1 MEASURING INCOME INEQUALITY

Figure 5.2 Lorenz curves, China and the US (1980 and 2014).

As the chart shows, the income distribution has changed more clearly for China than for the US.

3 Using your Lorenz curves:

(a) Compare the distribution of income across time for each country.

(b) Compare the distribution of income across countries for each year.

(c) Suggest some explanations for any similarities and differences you observe. (You may want to research your chosen countries to see if there were any changes in government policy, political events, or other factors that may affect the income distribution.)

A rough way to compare income distributions is to use a summary measure such as the Gini coefficient. The Gini coefficient ranges from 0 (complete equality) to 1 (complete inequality). It is calculated by dividing the area between the Lorenz curve and the perfect equality line, by the total area underneath the perfect equality line. Intuitively, the further away the Lorenz curve is from the perfect equality line, the more unequal the income distribution is, and the higher the Gini coefficient will be. To calculate the Gini coefficient you can either use a Gini coefficient calculator (http://tinyco.re/8392848), or calculate it directly in R as shown in R walk-through 5.4.

4 Calculate the Gini coefficient for each of your Lorenz curves. You should have four coefficients in total. Label each Lorenz curve with its corresponding Gini coefficient, and check that the coefficients are consistent with what you see in your charts.

285 EMPIRICAL PROJECT 5 WORKING IN R

R WALK-THROUGH 5.4 Calculating Gini coefficients In Section 5.8 (https://tinyco.re/5987207) of Economy, Society, and Public Policy you can learn that the Gini coefficient is graphically represented by dividing the area between the perfect equality line and the Lorenz curve by the total area under the perfect equality line. You could calculate this area by hand, by decomposing the area under the Lorenz curve into rectangles and triangles, but as with so many problems, someone else has already figured out how to do that and has provided R users with a package ( ineq ) to do this task very easily.

library(ineq) # Load the ineq library

g_c80<- Gini( decs_c80 ) # The decile mean incomes from R walk-through 5.3 are used. g_c14<- Gini( decs_c14 ) g_us80<- Gini( decs_us80 ) g_us14<- Gini( decs_us14 ) paste("Gini coefficients")

## [1] "Gini coefficients"

paste("China - 1980: ", round(g_c80,2), ", 2014: ", round(g_c14,2))

## [1] "China - 1980: 0.29 , 2014: 0.51"

paste("United States - 1980: ", round(g_us80,2), ", 2014: ",round( g_us14,2))

## [1] "United States - 1980: 0.34 , 2014: 0.4"

Now we make the same line chart (copy and paste the code from R walk- through 5.3 (page 283), but use the text command to label curves with their respective Gini coefficients.

plot(cum_inc_share_c80, type = "l",col="blue", lty = 2, lwd=2, xlab = "Deciles", ylab="Cumulative income share") abline(a=0,b=0.1,col="black", lwd=2) # Add the perfect equality line

286 PART 5.1 MEASURING INCOME INEQUALITY

lines(cum_inc_share_c14,col="green",lty = 1,lwd=2) # lty = 2 = solid line lines(cum_inc_share_us80,col="red", lty = 2,lwd=2) # lty = 1 = dashed line lines(cum_inc_share_us14,col="orange", lty = 1,lwd=2) title("Lorenz curves, China and the US (1980 and 2014)") legend("topleft", legend=c("China, 1980", "China, 2014", "US, 1980", "US, 2014"), col=c("blue", "green","red", "orange"), lty=2:1,lwd=2, cex=1.2) text(8.5, 0.78,round(g_c80,digits = 3)) text(9.4, 0.6,round(g_c14,digits = 3)) text(5.7, 0.38,round(g_us80,digits = 3)) text(6.4, 0.3,round(g_us14,digits = 3))

Figure 5.3 Lorenz curves, China and the US (1980 and 2014).

The Gini coefficients have increased, confirming what we already saw from the Lorenz curves that in both countries the income distribution has become more unequal.

287 EMPIRICAL PROJECT 5 WORKING IN R

EXTENSION R WALK-THROUGH 5.5 Calculating Gini coefficients for all countries and all years using a loop In this extention walk-though, we show you how to calculate the Gini coefficient for all countries and years in your dataset. This sounds like a tedious task, and indeed if we were to use the same method as before it would be mind-numbing. However, we have a powerful programming language at hand, and this is the time to use it. Here we use a very useful programming tool you may not have come across yet, which is loops. Let’s start with a very simple case: printing the values for i^2 for values of i=1,...,10 .

for (i in seq(1,10)){ print(i^2) }

## [1] 1 ## [1] 4 ## [1] 9 ## [1] 16 ## [1] 25 ## [1] 36 ## [1] 49 ## [1] 64 ## [1] 81 ## [1] 100

In the above command, seq(1,10) creates a vector of data (1,2,3,…,10). The command for (i in seq(1,10)) defines the variable i initially as 1, then performs all the commands that are between the curly brackets for each value of i (typically these commands will involve the variable i ). Here our command prints the value of i^2 for each value of i . Now we use loops to complete our task. We begin by creating a new variable in our dataset, gini , which we initially set to 0 for all country- year combinations.

decile_data$gini <- 0

Now we use a loop to run through all the rows in our dataset and for each row we will repeat the Gini coefficient calculation from R walk- through 5.4 then save the resulting value in the gini variable we created.

288 PART 5.1 MEASURING INCOME INEQUALITY

noc <- nrow(decile_data) # Give us the number of rows in decile_data

for (i in seq(1,noc)){ decs_i <- unlist(decile_data[i,3:12]) # Go to Row I to get the decile data decile_data$gini[i] <- Gini( decs_i ) }

With this code, we calculated 4,799 Gini coefficients. We now look at the summary stats for the gini variable.

summary(decile_data$gini)

## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.1791 0.3470 0.4814 0.4617 0.5700 0.7386

The average Gini coefficient is 0.46, the maximum is 0.74, and the minimum 0.18. Let’s look at these extreme cases. First we will look at the extremely equal income distributions:

temp <- subset(decile_data, decile_data$gini < 0.20, select = c("Country","Year","gini")) temp

## # A tibble: 17 x 3 ## Country Year gini ## ## 1 Bulgaria 1987 0.191 ## 2 Czech Republic 1985 0.195 ## 3 Czech Republic 1986 0.194 ## 4 Czech Republic 1987 0.192 ## 5 Czech Republic 1988 0.191 ## 6 Czech Republic 1989 0.194 ## 7 Czech Republic 1990 0.196 ## 8 Czech Republic 1991 0.199 ## 9 Slovak Republic 1985 0.195 ## 10 Slovak Republic 1986 0.194 ## 11 Slovak Republic 1987 0.193 ## 12 Slovak Republic 1988 0.192 ## 13 Slovak Republic 1989 0.193 ## 14 Slovak Republic 1990 0.194 ## 15 Slovak Republic 1991 0.195

289 EMPIRICAL PROJECT 5 WORKING IN R

## 16 Slovak Republic 1992 0.196 ## 17 Slovak Republic 1993 0.179

These correspond to eastern European countries before the fall of communism. Now the most unequal countries:

temp <- subset(decile_data, decile_data$gini > 0.73, select = c("Country","Year","gini")) temp

## # A tibble: 27 x 3 ## Country Year gini ## ## 1 Burkina Faso 1980 0.738 ## 2 Burkina Faso 1981 0.738 ## 3 Burkina Faso 1982 0.738 ## 4 Burkina Faso 1983 0.738 ## 5 Burkina Faso 1984 0.738 ## 6 Burkina Faso 1985 0.738 ## 7 Burkina Faso 1986 0.738 ## 8 Burkina Faso 1987 0.738 ## 9 Burkina Faso 1988 0.738 ## 10 Burkina Faso 1989 0.739 ## # ... with 17 more rows

EXTENSION R WALK-THROUGH 5.6 Plotting time-series of Gini coefficients, using ggplot In this extension walk-through, we show you how to make time series plots (time on the horizontal axis, the variable of interest on the vertical axis) with Gini coefficients for a list of countries of your choice. There are many ways to plot data in R, one being the standard plotting function we used in previous walk-throughs. Another (and perhaps more beautiful) way is to use the ggplot function, which is part of the tidyverse package we loaded earlier. Our dataset is already in a format which the ggplot function can easily use. First we select a small list of countries. As an example, we have chosen four anglophone countries: the UK, the US, Ireland, and Australia.

290 PART 5.1 MEASURING INCOME INEQUALITY

temp_data <- subset(decile_data, Country %in% c("United Kingdom","United States","Ireland","Australia"))

Now we plot the data using ggplot . ggplot(temp_data,aes(x =Year, y=gini, color=Country)) + geom_line(size=1) + theme_bw() + ggtitle("Gini coefficients for anglophone countries") # Add a title

Figure 5.4 Time series plots of Gini coefficients for anglophone countries.

We asked the ggplot function to use the decile_data dataframe/ tibble, with Year on the horizontal axis and gini on the vertical axis. The color option indicates which variable we use to separate the data ( Country ). The first line of code sets up the chart, and the + geom_line(size=1) then instructs R to draw lines. (See what happens if you replace + geom_line(size=1) with + geom_point(size=1) .) ggplot assumes that the different lines you want to show are University of Manchester’s identified through the different values in one variable (here, the Econometric Computing Learning Country variable). If your data is formatted differently, for example, if Resource (ECLR). 2018. ‘R TSplots’. you have one variable for the Gini of each country, then in order to use Updated 26 July 2016. ggplot you will first have to transform the dataset. Doing so is

291 EMPIRICAL PROJECT 5 WORKING IN R

beyond the scope of this task, however you can find a worked example online, such as ‘R TSplots’ (https://tinyco.re/5093147). The ggplot set of commands are extremely powerful, and if you want to produce a variety of different charts, you may want to read more about that package, for example, see a Harvard R tutorial (https://tinyco.re/8185055) or an R statistics tutorial (https://tinyco.re/ 9652072) for great examples including code.

Now we will look at other measures of income inequality and see how they can be used along with the Gini coefficient to summarize a country’s income distribution. Instead of summarizing the entire income distribution like the Gini coefficient does, we can take the ratio of incomes at two points in the distribution. For example, the 90/10 ratio takes the ratio of the top 10% of incomes (Decile 10) to the lowest 10% of incomes (Decile 1). A 90/10 ratio of 5 means that the richest 10% earns 5 times more than the poorest 10%. The higher the ratio, the higher the inequality between these two points in the distribution.

5 Look at the following ratios: • 90/10 ratio = ratio of the Decile 10 income to the Decile 1 income • 90/50 ratio = ratio of the Decile 10 income to the Decile 5 income (the median) • 50/10 ratio = ratio of the Decile 5 income (the median) to the Decile 1 income.

(a) For each of these ratios, explain why policymakers might want to compare these two deciles in the income distribution.

(b) What kinds of policies or events could affect these ratios?

We will now compare these summary measures (ratios and the Gini coeffi- cient) for a larger group of countries, using OECD data. The OECD has annual data for different ratio measures of income inequality for 42 coun- tries around the world, and has an interactive chart function that plots this data for you. Go to the OECD website (http://tinyco.re/5057087) to access the data. You will see a chart similar to Figure 5.5, which shows data for 2015. The countries are ranked from smallest to largest Gini coefficient on the hori- zontal axis, and the vertical axis gives the Gini coefficient.

6 Compare summary measures of inequality:

(a) Plot the data for the ratio measures by changing the variable selected in the drop-down menu ‘Gini coefficient’. The three ratio measures we looked at previously are called ‘Interdecile P90/P10’, ‘Interdecile P90/P50’, and ‘Interdecile P50/P10’, respectively. (If you click the ‘Compare variables’ option, you can plot more than one variable on the same chart.)

(b) For each measure, give an intuitive explanation of how it is measured and what it tells us about income inequality. (For example: What do

292 PART 5.2 MEASURING OTHER KINDS OF INEQUALITY

the larger and smaller values of this measure mean? Which parts of the income distribution does this measure use?)

(c) Do countries that rank highly on the Gini coefficient also rank highly on the ratio measures, or do the rankings change depending on the measure used? Based on your answers, explain why it is important to look at more than one summary measure of a distribution.

The Gini coefficient and the ratios we have used are common measures of inequality, but there are other ways to measure income inequality.

7 Go to the ‘Income Inequality’ section (http://tinyco.re/4140440) of the Our world in data website, and choose two other measures of income inequality that you find interesting.

(a) For each measure, give an intuitive explanation of how it is measured and what we can learn about income inequality from it. (For example: What do the larger and smaller values of this measure mean? Which parts of the income distribution does this measure use?)

(b) If possible, find data or a chart for your chosen measures for the two countries you used in Questions 1 to 6, and explain what these measures tell us about inequality in those countries.

PART 5.2 MEASURING OTHER KINDS OF INEQUALITY There are many ways to measure income inequality, but income inequality is only one dimension of inequality within a country. To get a more complete picture of inequality within a country, we need to look at other areas in which there may be inequality in outcomes. We will explore two particular areas:

• health inequality • gender inequality in education.

Figure 5.5 OECD countries ranked according to their Gini coefficient.

293 EMPIRICAL PROJECT 5 WORKING IN R

First, we will look at how researchers have measured inequality in health- related outcomes. Besides income, health is an important aspect of wellbeing because it determines how long an individual will be alive to enjoy his or her income. If two people had the same annual income throughout their lives, but one person had a much shorter life than the other, we might say that the distribution of wellbeing is unequal, despite annual incomes being equal. As with income, inequality in life expectancy can be measured using a Gini coefficient. In the study ‘Mortality inequality’ (http://tinyco.re/ 8593466), researcher Sam Peltzman (2009) estimated Gini coefficients for life expectancy based on the distribution of total years lived (life-years) across people born in a given year (birth cohort). If everybody born in a given year lived the same number of years, then the total years lived would be divided equally among these people (perfect equality). If a few people lived very long lives but everybody else lived very short lives, then there would be a high degree of inequality (Gini coefficient close to 1). We will now look at mortality inequality Gini coefficients for 10 coun- tries around the world. First, download the data:

• Go to the ‘health inequality’ section (http://tinyco.re/2668264) of the Our world in data website. In Section 1.1 (Mortality inequality), click the ‘Data’ button at the bottom of the chart shown. • Click the blue button that appears to download the data in csv format.

Import the data into R and investigate the structure of the data as explained in R walk-through 5.7.

R WALK-THROUGH 5.7 Importing .csv files into R Before importing, save the csv file in your working directory.

health_in <- read.csv("inequality-of-life-as-measured-by-morta lity-gini-coefficient-1742-2002.csv") # Open the csv file from the working directory str(health_in)

## 'data.frame': 320 obs. of 4 variables: ## $ Entity : Factor w/ 10 levels "Brazil","England and Wales",..: 1 1 1 1 1 1 1 1 1 1 ... ## $ Code : Factor w/ 10 levels "","BRA","DEU",..: 2 2 2 2 2 2 2 2 2 2 ... ## $ Year : int 1892 1897 1902 1907 1912 1917 1922 1927 1932 1937 ... ## $ X.percent.: num 0.566 0.557 0.547 0.482 0.494 ...

294 PART 5.2 MEASURING OTHER KINDS OF INEQUALITY

The variable Entity is the country and the variable X.percent is the health Gini. Let’s change these variable names to make them more intuitive for our analysis.

names(health_in)[1] <- "Country" # Country is the first variable. names(health_in)[4] <- "HGini" # Health Gini is the fourth variable.

There is another quirk in the data that you may not have noticed in this initial data inspection: All countries have a short code ( Code ), except for England and Wales for which that field is empty (or formally "" ). Let’s change the blanks to “ENW” .

levels(health_in$Code)[levels(health_in$Code)==""] <- "ENW"

Tip The way this code works may seem a little mysterious, and you may find it difficult to remember the code for this step. However, an Internet search for ‘R renaming one factor level’ (recall that Code is a factor variable) will show you many ways to achieve this (including that shown above). Often you will find answers on stackoverflow.com, where experienced coders provide useful help.

1 Using the mortality inequality data:

(a) Plot all the countries on the same line chart, with Gini coefficient on the vertical axis and year (1952–2002 only) on the horizontal axis. Make sure to include a legend showing country names, and label the axes appropriately.

(b) Describe any general patterns in mortality inequality over time, as well as any similarities and differences between countries.

R WALK-THROUGH 5.8 Creating line graphs with ggplot As shown in R walk-through 5.7, the data is already formatted so that we can use ggplot directly, in other words we have only one variable for the mortality Gini ( HGini ), and we can separate the data by country using one variable ( Country ).

temp_data <- subset(health_in, Year > 1951) # Select all data after 1951

295 EMPIRICAL PROJECT 5 WORKING IN R

ggplot(temp_data,aes(x =Year, y=HGini, color=Country)) + geom_line(size=1) + labs(y="Mortality inequality Gini coefficient") + scale_color_brewer(palette="Paired") + # Change the colour palette theme_bw() + ggtitle("Mortality inequalities") # Add a title

Figure 5.6 Mortality inequality Gini coefficients (1952–2002).

2 Now compare the Gini coefficients in the first year of your line chart (1952) with the last year (2002).

(a) For the year 1952, sort the countries according to their mortality inequality Gini coefficient from smallest to largest. Plot a column chart showing these Gini coefficients on the vertical axis, and country on the horizontal axis.

(b) Repeat Question 2(a) for the year 2002.

(c) Comparing your charts for 1952 and 2002, have the rankings between countries changed? Suggest some explanations for any observed changes. (You may want to do some additional research, for example, look at the healthcare systems of these countries.)

296 PART 5.2 MEASURING OTHER KINDS OF INEQUALITY

R WALK-THROUGH 5.9 Drawing a column chart with sorted values Plot a column chart for 1952 First we use subset to extract the data for 1952 only.

temp_data <- subset(health_in, Year == 1952) # Select all data for 1952 temp_data <- temp_data[order(temp_data$HGini),] # Reorder the rows in temp_data according to the values of HGini, from smallest to largest

temp_data

## Country Code Year HGini ## 279 Sweden SWE 1952 0.1194045 ## 46 England and Wales ENW 1952 0.1319542 ## 310 United States USA 1952 0.1471329 ## 138 Germany DEU 1952 0.1572112 ## 86 France FRA 1952 0.1605238 ## 228 Spain ESP 1952 0.1985371 ## 184 Japan JPN 1952 0.2021728 ## 206 Russia RUS 1952 0.2237161 ## 161 India IND 1952 0.3978703 ## 13 Brazil BRA 1952 0.4103805

The rows are now ordered according to HGini , in ascending order. Let’s use ggplot again.

ggplot(temp_data, aes(x=Code, y=HGini)) + geom_bar(stat="identity", width=.5, fill="tomato3") + theme_bw() + labs(title="Mortality Gini coefficients (1952)", caption="source: https://ourworldindata.org/ health-inequality",y="Mortality inequality Gini coefficient")

297 EMPIRICAL PROJECT 5 WORKING IN R

Figure 5.7 Mortality Gini coefficients (1952).

Unfortunately, the columns are not ordered correctly, because when the horizontal axis variable (here, Code ) is a factor, then ggplot uses the ordering of the factor levels, which is:

levels(temp_data$Code)

## [1] "ENW" "BRA" "DEU" "ESP" "FRA" "IND" "JPN" "RUS" "SWE" "USA"

A blog post (http://tinyco.re/1992106) from Data Se provides the following code for ‘R geom_bar change order’, and uses the reorder function to reorder the horizontal axis variable ( Code ) according to the HGini value.

ggplot(temp_data, aes(x=reorder(Code,HGini), y=HGini)) + geom_bar(stat="identity", width=.5, fill="tomato3") + coord_cartesian(ylim=c(0,0.45)) + theme_bw() + labs(title="Mortality Gini coefficients (1952)", x="Country", caption="source: https://ourworldindata.org/ health-inequality",y="Mortality inequality Gini coefficient")

298 PART 5.2 MEASURING OTHER KINDS OF INEQUALITY

Figure 5.8 Mortality Gini coefficients (1952).

Plot a column chart for 2002 We want to compare this ranking with the ranking of 2002. First we extract the relevant data again.

temp_data <- subset(health_in, Year == 2002) # Select all data for 2002

ggplot(temp_data, aes(x=reorder(Code,HGini), y=HGini)) + geom_bar(stat="identity", width=.5, ylim = c(0,0.45),fill="tomato3") + coord_cartesian(ylim=c(0,0.45)) + # Adjust the ylim (vertical axis scale) to ensure comparability with 1952 theme_bw() + labs(title="Mortality Gini coefficients (2002)", x="Country", caption="source: https://ourworldindata.org/ health-inequality",y="Mortality inequality Gini coefficient")

299 EMPIRICAL PROJECT 5 WORKING IN R

Figure 5.9 Mortality Gini coefficients (2002).

It is fairly easy to plot the data for both years in the same chart.

temp_data <- subset(health_in, Year %in% c("1952","2002")) # Select all data for 1952 and 2002 temp_data$Year <- factor(temp_data$Year)

ggplot(temp_data, aes(x=reorder(Code,HGini), y=HGini, fill=Year)) + geom_bar(position="dodge", stat="identity") + theme_bw() + labs(title="Mortality Gini coefficients (1952 and 2002)", x="Country", caption="source: https://ourworldindata.org/ health-inequality",y="Mortality inequality Gini coefficient")

300 PART 5.2 MEASURING OTHER KINDS OF INEQUALITY

Figure 5.10 Mortality Gini coefficients (1952 and 2002).

Now the country ordering is in terms of the average HGini, rather than HGini in 1952 (which would have made comparisons easier).

Other measures of health inequality, such as those used by the World Health Organization (WHO), are based on access to healthcare, affordability of healthcare, and quality of living conditions. Choose one of the following measures of health inequality to answer Question 3:

• access to essential medicines • basic hospital access • composite coverage index.

To download the data for your chosen measure:

• If you choose to look at either the access to essential medicines or the basic hospital access measure, go to the WHO’s Universal Health Coverage Data Portal (http://tinyco.re/9304620), click on the tab ‘Explore UHC Indicators’, and select your chosen measure. • A drop-down menu with three buttons will appear: ‘Map’ (or ‘Graph’) shows a visual description of the data, ‘Data’ contains the data files, and ‘Metadata’ contains information about your chosen measure. • Click on the ‘Data’ button, then select ‘CSV table’ from the ‘Download complete data set as’ list. • If you choose to look at the composite coverage index measure, go to WHO’s Global Health Observatory data repository (http://tinyco.re/ 3968368), and select one category to compare (economic status, education, or place of residence). To download the data for that category, click ‘CSV table’ from the ‘Download complete data set as’ list. You can read further information about this index in the WHO’s technical notes (http://tinyco.re/5693881).

301 EMPIRICAL PROJECT 5 WORKING IN R

3 For your chosen measure:

(a) Explain how it is constructed and what outcomes it assesses.

(b) Create an appropriate chart to summarize the data. (You can replicate a chart shown on the website or draw a similar chart.)

(c) Explain what your chart shows about health inequality within and between countries, and discuss the limitations of using this measure (for example, measurement issues or other aspects of inequality that this measure ignores).

R WALK-THROUGH 5.10 Drawing a column chart with sorted values For this walk-through, we downloaded the ‘access to essential medicines’ data, as explained above. Here we saved it as WHO access to essential medicines.csv . Looking at the spreadsheet in Excel, you can see that the actual data starts in row three, meaning there are two header rows. So let’s skip the first row when uploading it.

med_access <- read.csv("WHO access to essential medicines.csv",skip = 1) str(med_access)

## 'data.frame': 38 obs. of 3 variables: ## $ Country : Factor w/ 38 levels "Afghanistan",..: 1 2 3 4 5 6 7 8 9 10 ... ## $ X2007.2013 : num 94 42.9 86.7 76.7 72.1 58.3 13.3 90.7 31.3 33.3 ... ## $ X2007.2013.1: num 81.1 43.2 31.9 0 87.1 46.7 15.5 86.7 21.2 100 ...

The second and third variables have lost their labels. From the spreadsheet you know that they are:

• median availability of selected generic medicines (%) – Private • median availability of selected generic medicines (%) – Public.

Let’s change the names of these variables to make working with them easier:

names(med_access)[2] <- "Private_Access" names(med_access)[3] <- "Public_Access"

302 PART 5.2 MEASURING OTHER KINDS OF INEQUALITY

To find details about these variables, click the ‘Metadata’ button on the website to find the following explanation:

A standard methodology has been developed by WHO and Health Action International (HAI). Data on the availability of a specific list of medicines are collected in at least four geographic or administrative areas in a sample of medicine dispensing points. Availability is reported as the percentage of medicine outlets where a medicine was found on the day of the survey.

Before we produce charts of the data we shall look at summary statistics of the access variable.

summary(med_access)

## Country Private_Access Public_Access ## Afghanistan : 1 Min. : 2.80 Min. : 0.00 ## Bahamas : 1 1st Qu.: 54.62 1st Qu.: 39.67 ## Bolivia (Plurinational State of): 1 Median : 70.15 Median : 55.95 ## Brazil : 1 Mean : 65.97 Mean : 58.25 ## Burkina Faso : 1 3rd Qu.: 86.70 3rd Qu.: 82.50 ## Burundi : 1 Max. :100.00 Max. :100.00 ## (Other) :32 NA's :2

On average, private sector patients have better access to essential medication. From the summary statistics for the Public_Access variable, you can see that there are two missing observations. Here, we will keep these observations because leaving them in doesn’t affect the following analysis.

med_access <- med_access[complete.cases(med_access),]

303 EMPIRICAL PROJECT 5 WORKING IN R

There are a number of interesting aspects to look at. We shall produce a bar chart comparing the private and public access in countries.

med_access$Country <- reorder(med_access$Country, med_access$Private_Access) # Reorders according to values of private access (largest to smallest)

library(reshape2) # This is required for the melt function. med_access_melt <- melt(med_access) # Rearrange the data for ggplot # This creates a dataframe with three columns # Country = Country name # value = access percentage (either Private_Access or Public_Access). # variable = indicates whether a row refers to Public_Access or Private_Access.

ggplot(med_access_melt, aes(x=Country, y=value, fill = variable)) + geom_bar(position="dodge", stat="identity") + scale_fill_discrete(name ="Access",labels=c("Private sector", "Public sector")) + theme(axis.text.x=element_text(angle=90, hjust=1, vjust=0.5)) + theme_bw() + labs(title="Access to essential medication", x="Country",y="Percent of patients with access to essential medication") + coord_flip() # Flip axis to make country labels readable

304 PART 5.2 MEASURING OTHER KINDS OF INEQUALITY

Figure 5.11 Access to essential medication.

Let’s find the extreme values. There are two countries where public sector patients have access to all (100%) essential medications.

med_access[med_access$Public_Access == 100,]

## Country Private_Access Public_Access ## 10 Cook Islands 33.3 100 ## 30 Russian Federation 100.0 100

Let’s see which countries provide 0% access to essential medication for people in the public sector.

med_access[med_access$Public_Access == 0,]

## Country Private_Access Public_Access ## 4 Brazil 76.7 0

305 EMPIRICAL PROJECT 5 WORKING IN R

Since an individual’s income and available options in later life partly depend on their level of education, inequality in educational access or attainment can lead to inequality in income and other outcomes. We will focus on the aspect of gender inequality in educational attainment, using data from the Our world in data website, to make our own comparisons between countries and over time. Choose one of the following measures to answer Question 4:

• gender gap in primary education (share of enrolled female primary education students) • share of women, between 15 and 19 years old, with no education • share of women, 15 years and older, with no education.

To download the data for your chosen measure:

• Go to the ‘Educational Mobility and Inequality’ section (http://tinyco.re/8784776) of the Our world in data website, and find the chart for your chosen measure. • Click the ‘Data’ button at the bottom of the chart, then click the blue button that appears to download the data in csv format.

4 For your chosen measure:

(a) Choose ten countries that have data from 1980 to 2010. Plot your chosen countries on the same line chart, with year on the horizontal axis and share on the vertical axis. Make sure to include a legend showing country names and label the axes appropriately.

(b) Describe any general patterns in gender inequality in education over time, as well as any similarities and differences between countries.

(c) Calculate the change in the value of this measure between 1980 and 2010 for each country chosen. Sort these countries according to this value, from the smallest change to largest change. Now plot a column chart showing the change (1980 to 2010) on the vertical axis, and country on the horizontal axis. Add data labels to display the value for each country.

(d) Which country had the largest change? Which country had the smallest change?

(e) Suggest some explanations for your observations in Questions 4(b) and 4(d). (You may want to do some background research on your chosen countries.)

(f) Discuss the limitations of using this measure to assess the degree of gender inequality in educational attainment and propose some alternative measures.

306 PART 5.2 MEASURING OTHER KINDS OF INEQUALITY

R WALK-THROUGH 5.11 Using line and bar charts to illustrate changes in time Import data and plot a line chart First we import the data into R.

data_prim <- read.csv("OWID-gender-gap-in-primary-education.cs v") # Open the csv file from the working directory str(data_prim)

## 'data.frame': 8780 obs. of 4 variables: ## $ Entity : Factor w/ 250 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ... ## $ Code : Factor w/ 207 levels "","ABW","AFG",..: 3 3 3 3 3 3 3 3 3 3 ... ## $ Year : int 1970 1971 1972 1973 1974 1975 1976 1977 1978 1980 ... ## $ Primary.education..pupils....female.....female.: num 14.1 13.7 14 14.5 14.4 ...

The data is now in the dataframe data_prim . The variable of interest ( percentage of female enrolment ) has a very long name so we will shorten it to PFE .

names(data_prim)[4] <- "PFE"

As usual, ensure that you understand the definition of the variables you are using. In the Our world in data website, look at the ‘Sources’ tab underneath the graph for a definition:

Percentage of female enrollment is calculated by dividing the total number of female students at a given level of education by the total enrolment at the same level, and multiplying by 100.

This definition implies that if the primary-school-age population was 50% male and 50% female and all children were enrolled in school, the percentage of female enrolment would be 50.

307 EMPIRICAL PROJECT 5 WORKING IN R

Before choosing ten countries, we check which countries are in the dataset using the unique(data_prim$Entity) command. Here we only show the first few countries using the head() command.

head(unique(data_prim$Entity))

## [1] Afghanistan Albania Algeria ## [4] Andorra Angola Antigua and Barbuda ## 250 Levels: Afghanistan Albania Algeria Andorra ... Zimbabwe

You can find nearly all the countries in the world in this list (plus some sub- and supra-country entities, like OECD countries, which explains why the variable wasn’t initially called ‘country’).

Plot a line chart for a selection of countries We now make a selection of ten countries. (You can of course make a dif- ferent selection, but ensure that you get the spelling right as R is unforgiving!).

temp_data<- subset(data_prim, Entity %in% c("Albania","China","France", "India","Japan", "Switzerland", "United Arab Emirates", "United Kingdom","Zambia","United States"))

Now we plot the data, similar to what we did earlier.

ggplot(temp_data,aes(x =Year, y=PFE, color=Entity)) + geom_line(size = 1) + # size = 1 sets the line thickness. theme_bw() + # Remove grey background scale_colour_brewer(palette="Paired") + # Change the set of colours used scale_colour_discrete(name ="Country") + ylab("Percentage (%)") + # Set the vertical axis label ggtitle("Female pupils as a percentage of total enrolment in primary education") # Add a title

308 PART 5.2 MEASURING OTHER KINDS OF INEQUALITY

Figure 5.12 Female pupils as a percentage of total enrolment in primary education.

Plot a column chart with sorted values To calculate the change in the value of this measure between 1980 and 2010 for each country chosen, we have to manipulate the data so that we have one entry (row) for each entity (or country), but two different variables for the percentage of female enrolment PFE (one for each year).

temp_data_80 <- subset(temp_data, Year == "1980") # Select all data for 1980 names(temp_data_80)[4] <- "PFE_80" # Rename variable to include year temp_data_10 <- subset(temp_data, Year == "2010") # Select all data for 2010 names(temp_data_10)[4] <- "PFE_10" # Rename variable to include year temp_data2 <- merge(temp_data_80,temp_data_10,by=c("Entity"))

Have a look at temp_data2 , which now contains two variables for every country, PFE_80 and PFE_10 . It also has multiple variables for Year ( Year.x and Year.y ) and Code ( Code.x and Code.y ), but that is a minor issue and you could delete one of them. Now we can calculate the difference.

temp_data2$dPFE <- temp_data2$PFE_10 - temp_data2$PFE_80

309 EMPIRICAL PROJECT 5 WORKING IN R

You could plot a separate chart for each year and check the order, but here we show how to create one chart with the data from both years.

ggplot(temp_data2, aes(x=reorder(Code.x,dPFE), y=dPFE)) + geom_bar(stat="identity",fill="tomato3") + labs(title="Change (%) in female pupils’ share of total enrolment in primary education", x="Country", y="Percentage change (%)", caption="source: https://ourworldindata.org/ educational-mobility-inequality") + theme_bw()

Figure 5.13 Change in percentage of female enrolment in primary school from 1980 to 2010.

It is apparent that some countries saw very little or no change (the coun- tries that already had very high PFE). The countries with initially low female participation have significantly improved.

310 EMPIRICAL PROJECT 5 SOLUTIONS

These are not model answers. They are provided to help students, including those doing the project outside a formal class, to check their progress while working through the questions using the Excel or R walk-throughs. There are also brief notes for the more interpretive questions. Students taking courses using Doing should follow the guidance of their instructors.

PART 5.1 MEASURING INCOME INEQUALITY

1 China and the US are used as examples.

China, 1980 Cumulative share of the population (%) Cumulative share of income (%) 0 0.00 10 3.14 20 7.63 30 13.43 40 20.47 50 28.82 60 38.55 70 49.92 80 63.28 90 79.33 100 100.00

Solution figure 5.1 Table showing cumulative income shares for China (1980).

311 EMPIRICAL PROJECT 5 SOLUTIONS

China, 2014 Cumulative share of the population (%) Cumulative share of income (%) 0 0.00 10 0.92 20 2.84 30 5.81 40 9.95 50 15.44 60 22.55 70 31.75 80 43.95 90 61.43 100 100.00

Solution figure 5.2 Table showing cumulative income shares for China (2014).

United States, 1980 Cumulative share of the population (%) Cumulative share of income (%) 0 0.00 10 2.29 20 6.22 30 11.52 40 18.08 50 25.89 60 35.04 70 45.73 80 58.44 90 74.39 100 100.00

Solution figure 5.3 Table showing cumulative income shares for the US (1980).

312 PART 5.1 MEASURING INCOME INEQUALITY

United States, 2014 Cumulative share of the population (%) Cumulative share of income (%) 0 0.00 10 1.88 20 5.14 30 9.66 40 15.41 50 22.45 60 30.92 70 41.09 80 53.58 90 69.90 100 100.00

Solution figure 5.4 Table showing cumulative income shares for the US (2014).

2 (a) Solution figures 5.5 and 5.6 show the Lorenz curves for China and the US, the perfect equality line applies to the next question’s solution.

(b) Solution figures 5.5 and 5.6 show the Lorenz curves for China and the US, with the perfect equality line.

Solution figure 5.5 Lorenz curves for China.

313 EMPIRICAL PROJECT 5 SOLUTIONS

Solution figure 5.6 Lorenz curves for the US.

3 (a) The area between the perfect equality line and the Lorenz curve reflects inequality. Inequality in both countries widened between 1980 and 2014. The change in China is far larger than that in the US.

(b) Although income distribution is more equal in China than in the US in 1980, it is less equal in China than in the US in 2014.

(c) China had a mostly planned economy in 1980, which prioritized equality. Since 1978, China has undertaken waves of reforms to marketize the economy and improve efficiency. The rapid growth has come at the cost of equality. By introducing market reforms, opportunities emerged for private gain through entrepreneurial activities. Although rapid growth and high inequality are negatively correlated both in high income countries and in a group of ‘catching up’ countries, as discussed in Section 19.11 (https://tinyco.re/ 1686411) of The Economy, rapid growth in China has come at the cost of rising inequality. Inequality in the US is higher than in most developed countries. Many people attribute the higher inequality to policies favouring the rich. Worsening inequality in the US can be explained by a range of factors, including tax policies that favour the rich, education policies that dampen the opportunities for intergenerational mobility (see Section 19.2 (https://tinyco.re/3301931) of The Economy), the skill- biased technological change that raises the incomes of workers with skills complementary to ICT and reduces that of workers with skills substitutable by ICT, and the decline of labour unions (http://tinyco.re/434258).

4 Solution figures 5.7 and 5.8 show the Lorenz curves for China and the US with Gini coefficients labelled.

314 PART 5.1 MEASURING INCOME INEQUALITY

Solution figure 5.7 Lorenz curves for China, with labelled Gini coefficients.

Solution figure 5.8 Lorenz curves for the US, with labelled Gini coefficients.

5 (a) These ratios all help give policymakers an idea of the distribution of income in the economy and where income is concentrated. Policy- makers may use the information to decide on policies favouring certain income deciles of the population. • The 90/10 ratio compares the two extremes of the income distribution and tells policymakers about the difference between the richest and the poorest. Policymakers can use

315 EMPIRICAL PROJECT 5 SOLUTIONS

the information to decide how much income to redistribute to the poorest. • The 90/50 ratio tells policymakers about how the middle class is doing relative to the richest. The ratio can also be used to determine the distribution of tax burden among the relatively rich population. • The 50/10 ratio reveals the distribution of income among the relatively poor population. Policymakers can use the information to determine the amount of income to be redistributed to each group, and to determine who is in relative poverty (many governments define the poverty line relative to the median income).

(b) See Section 19.8 (https://tinyco.re/2299150) of The Economy to see how governments can affect income inequality.

6 (a) Students will plot the data for the ratio measures by changing the variable selected for the Gini coefficient.

(b) The inter-decile ratios are calculated as the ratios between incomes of various deciles of income distribution. The 90/10 ratio, for example, is the ratio of the income of the 9th decile to the income of the 1st decile. Larger values mean the income from one decile of the distribution is higher relative to the income from another decile.

(c) Countries that rank highly on the Gini coefficient also generally rank highly on ratio measures. There are, however, some exceptions. Slovenia, for example, while being the most equal country in terms of the Gini coefficient in 2015, was only the 5th most equal country in terms of the 90/10 ratio. The potential differences in rankings of dif- ferent measures mean it is important to look at more than one measure. The Gini coefficient is an overall measure of a distribution that may mask extreme inequalities between certain groups of the population.

7 Measures chosen here are the share of income going to the top 1%, and the share of children living in relative poverty. • Share of income going to the top 1%: This measure looks at the high end of the income distribution (the right tail). Larger values indicate that the very rich have a larger share of the income, and that there is therefore more inequality between the very rich and the rest of society. However, this is a narrower measure of inequality than the Gini coefficient because it only tells us about how the very rich are doing. • Share of children living in relative poverty: This measure is defined as the share of children living in a household with half of the disposable income of the median household. A larger value indicates that a larger proportion of children are living in relative poverty.

316 PART 5.2 MEASURING OTHER KINDS OF INEQUALITY

PART 5.2 MEASURING OTHER KINDS OF INEQUALITY

1 (a) Solution figure 5.9 shows the mortality inequality Gini coefficients for the ten countries.

(b) Mortality inequality has been falling over time in all countries except Russia. Developing countries tend to have greater mortality inequality than developed countries. Industrialized, richer countries seem to have materialized most of the available improvement (somewhere at a mor- tality Gini of 0.1) since the 1960s. Exceptions to this are India and Brazil, which are both still on a significant downward trend and still not close to a mortality Gini value of 0.1. The only country in this set of countries where some of the gains are being reversed is Russia, although the latest upward movement is fairly modest, and one may interpret this as Russia having settled on a higher mortality Gini of about 0.15.

2 (a) Solution figure 5.10 shows Gini coefficients by country for 1952.

(b) Solution figure 5.11 shows Gini coefficients by country for 2002.

(c) The rankings are different in 1952 and 2002. Japan, for example, moved up five places in the ranking to become the second most equal country in 2002. The rapid economic development in Japan has led to rising life expectancy. Living to old age is now the norm in Japan rather than a privilege enjoyed only by the rich. The rising proportion of elderly voters has contributed to policies aimed at improving elderly care, which have reduced the variation in life expectancy. The United States, on the other hand, dropped four places to become a relatively less equal nation in the group. The high costs of healthcare may prevent poor people from accessing treatment, especially if uninsured. It is more likely for disadvantaged groups in society such as minorities or part-time workers to lack insurance coverage.

3 This example looks at access to essential medicines.

Solution figure 5.9 Mortality inequality Gini coefficients (1952–2002).

317 EMPIRICAL PROJECT 5 SOLUTIONS

Solution figure 5.10 Countries ranked according to mortality inequality Gini coefficients in 1952.

Solution figure 5.11 Countries ranked according to mortality inequality Gini coefficients in 2002.

(a) The median availability of selected generic medicines (in percentage terms) is a measure of the access to treatment. Data on availability, defined as the percentage of medicine outlets where a medicine was found on a given day, are collected through surveys in multiple regions for each country.

(b) Solution figures 5.12 and 5.13 provide two charts summarizing the data.

(c) There are large disparities in health inequality across countries. For example, availability in the Russian Federation is 100%, whereas in China it is about 15%. The availability of medicines within a country can differ depending on whether an outlet belongs to the public or

318 PART 5.2 MEASURING OTHER KINDS OF INEQUALITY

Solution figure 5.12 Median availability of selected generic medicines in the private sector.

Solution figure 5.13 Median availability of selected generic medicines in the public sector.

the private sector. In some countries, such as Brazil, private sector availability of medicines is far higher than that in the public sector. The reverse is true for other countries such as Sao Tome and Principe. Note that a higher availability of medicines in the private sector does not necessarily mean greater access for the entire popula- tion, since the private sector is only open to individuals with the ability to pay. This disparity means that richer individuals can access a wider range of medical treatments.

319 EMPIRICAL PROJECT 5 SOLUTIONS

The data has some limitations. The basket of medicines differs across countries. The data reflects availability on the day of data collection, which may not be a representative day. Outlets could stockpile medicines in expectation of the arrival of the data collection team. Availability does not account for the dosage and strengths of the products.

4 Solution figure 5.14 looks at the gender gap in primary education.

(a) Note: It is difficult to find ten countries without any missing data point between 1980 and 2010. Countries with full data may not be as interesting as others. The lines below connect all available data points.

(b) For most countries in the selected group, the share of female pupils in primary education fluctuated around levels just below 50% throughout the period. China and India were the most unequal coun- tries in 1980. India had the greatest improvement in equality over the period, and by 2010 the female share reached nearly 48%. Note the inverse U-shape for China, which could be due to the increasing gender imbalance in the school-age population (around 112 males per 100 females in 2010 (http://tinyco.re/7113498)).

(c) Solution figure 5.15 shows the percentage change in the measure between 1980 and 2010.

(d) India had the largest change, whereas France had the smallest change.

(e) India had the lowest share of enrolled female primary education students in the group in 1980. Rapid development and changing beliefs have contributed to the efforts to reduce gender education

Solution figure 5.14 Female pupils as a percentage of total enrolment in primary education.

320 PART 5.2 MEASURING OTHER KINDS OF INEQUALITY

inequalities. Universal primary education and promotion of gender equality are among the 8 goals in the Millennium Development Goals (MDGs) to which India committed to achieve by 2015 since 2000. France, as a developed country, had relatively high equality from the beginning of the period and hence had experienced relatively little change over the period (due to less scope for improvement). From Question 4(c), it is apparent that countries which already had very a high percentage of female enrolment (PFE) saw no change. Those countries with initially low female participation have significantly improved. The data demonstrates that the past few decades have seen a significant improvement in access to education for girls. If you repeated the above analysis for all countries, you would see similar results.

(f) The measure depends on the gender composition of the population. If there are more male than female children of primary schooling age in a country, then the share of female enrolled must be less than 50%. The ratio of female to male in enrolment rate, which provides a pop- ulation-adjusted measure of gender parity, can be used instead. Remember that all we can see here is enrolment in primary education. It is possible that males could receive more education overall (secondary and higher levels). In fact, if you go back to the ‘educational mobility and inequality’ section (http://tinyco.re/ 8784776) of the Our world in data website, you will see that in many regions females still receive a significantly smaller amount of education overall.

Solution figure 5.15 Change (%) in female pupils’ share of total enrolment in primary education.

321