Measuring Inequality: Lorenz Curves and Gini Coefficients
Total Page:16
File Type:pdf, Size:1020Kb
EMPIRICAL PROJECT 5 MEASURING INEQUALITY: LORENZ CURVES AND GINI COEFFICIENTS LEARNING OBJECTIVES In this project you will: • draw Lorenz curves and interpret the Gini coefficient • calculate and interpret alternative measures of income inequality • research other dimensions of inequality and how they are measured. Key concepts • Concepts needed for this project: ratio and decile. • Concepts introduced in this project: Gini coefficient and Lorenz curve. INTRODUCTION There are many criteria that policymakers can use to assess outcomes or allocations of economic interactions, in order for them to evaluate which CORE PROJECTS outcome is ‘better’ than the others. One important criterion for assessing This empirical project is related to an allocation is efficiency, and another is fairness. Outcomes that eco- material in: nomists would define as ‘efficient’—those that cannot make one person • Unit 5 (https://tinyco.re/ better off without making someone else worse off—may be undesirable 5600166) of Economy, Society, because they are unfair. To read more about how economists use the and Public Policy word ‘efficiency’, see Section 3.3 (https://tinyco.re/2876321) in Economy, • Unit 5 (https://tinyco.re/ Society, and Public Policy. 5986623) and Unit 19 (https://tinyco.re/1408798) of The Economy. 259 EMPIRICAL PROJECT 5 MEASURING INEQUALITY: LORENZ CURVES AND GINI COEFFICIENTS For example, a situation where a small fraction of the population lives in Lorenz curve A graphical luxury and everybody else struggles to survive may be efficient, but few representation of inequality of people would say it is desirable due to the vast inequality between the rich some quantity such as wealth or and poor. In this case, policymakers might intervene by implementing a tax income. Individuals are arranged in system where richer people pay a greater proportion of their income than ascending order by how much of poorer people (a progressive tax), and some revenue collected in taxes is this quantity they have, and the transferred to the poor. Empirical evidence on people’s views about the cumulative share of the total is fairness of the income distribution and further discussion of the concept of then plotted against the fairness can be found in Sections 3.4 (https://tinyco.re/7883386) and 3.5 cumulative share of the population. (https://tinyco.re/7126396) of Economy, Society, and Public Policy. For complete equality of income, To assess inequality economists often use a measure called the Gini for example, it would be a straight coefficient, which is based on the differences in incomes, wealth, or some line with a slope of one. The extent other measure between people. We will first look at how the Gini coeffi- to which the curve falls below this cient is calculated and compare it with other measures of inequality perfect equality line is a measure between the rich and poor, such as the 90/10 ratio. We will also use Lorenz of inequality. See also: Gini coeffi- curves to show the entire distribution of income in a country. Then, we cient. will look gender inequality to see how this dimension can be measured. Finally, we will look at how inequality can be accounted for in indices of wellbeing, such as the Human Development Index (HDI). Gini coefficient A measure of To learn more about how the Gini coefficient is calculated from differ- inequality of any quantity such as ences in people’s endowments, see Section 5.8 (https://tinyco.re/5748024) income or wealth, varying from a of Economy, Society, and Public Policy. value of zero (if there is no inequal- ity) to one (if a single individual receives all of it). 260 EMPIRICAL PROJECT 5 WORKING IN R R-SPECIFIC LEARNING OBJECTIVES In addition to the learning objectives for this project, in this section you will learn how to use loops to repeat specified tasks for a list of values (Note: this is an extension task so may not apply to all users). GETTING STARTED IN R For this project you will need the following packages: • tidyverse , to help with data manipulation • readxl , to import an Excel spreadsheet • ineq , to calculate inequality measures • reshape2 , to rearrange a dataframe. If you need to install any of these packages, run the following code: install.packages(c("readxl","tidyverse", "ineq","reshape2")) You can import these libraries now, or when they are used in the R walk- throughs below. library(readxl) library(tidyverse) library(ineq) library(reshape2) PART 5.1 MEASURING INCOME INEQUALITY One way to visualize the income distribution in a population is to draw a Lorenz curve. This curve shows the entire population lined up along the horizontal axis from the poorest to the richest. The height of the curve at any point on the vertical axis indicates the fraction of total income received by the fraction of the population given by that point on the horizontal axis. 277 EMPIRICAL PROJECT 5 WORKING IN R We will start by using income decile data from the Global Consumption and Income Project to draw Lorenz curves and compare changes in the income distribution of a country over time. Note that income here refers to market income, which does not take into account taxes or government transfers (see Section 5.9 (https://tinyco.re/1276323) of Economy, Society, and Public Policy for further details). To answer the question below: • Go to the Globalinc website (http://tinyco.re/9553483) and download the Excel file containing the data by clicking ‘xlsx’. • Save it in an easily accessible location, such as a folder on your Desktop or in your personal folder. • Import the data into R as explained in R walk-through 5.1. R WALK-THROUGH 5.1 Importing an Excel file (either .xlsx or . xls format) into R As we are dealing with an Excel file, we use the read_excel function from the readxl package. The file is called GCIPrawdata.xlsx , but before you import the file into R, open the datafile in Excel to under- stand its structure. You will see that the data is on one worksheet (which is convenient), and that the headings for the variables are in the third row. Hence we will use the skip = 2 option in the read_excel function to skip the first two rows. library(tidyverse) library(readxl) decile_data <- read_excel("GCIPrawdata.xlsx", skip = 2) The data is now in a tibble (like a spreadsheet for R). Let’s look at the first few rows: head(decile_data) ## # A tibble: 6 x 14 ## Country Year `Decile 1 Income` `Decile 2 Income` `Decile 3 Income` ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 Afghanistan 1980 206 350 455 ## 2 Afghanistan 1981 212 361 469 ## 3 Afghanistan 1982 221 377 490 ## 4 Afghanistan 1983 278 PART 5.1 MEASURING INCOME INEQUALITY 238 405 527 ## 5 Afghanistan 1984 249 424 551 ## 6 Afghanistan 1985 256 435 566 ## # ... with 9 more variables: `Decile 4 Income` <dbl>, `Decile 5 ## # Income` <dbl>, `Decile 6 Income` <dbl>, `Decile 7 Income` <dbl>, ## # `Decile 8 Income` <dbl>, `Decile 9 Income` <dbl>, `Decile 10 ## # Income` <dbl>, `Mean Income` <dbl>, Population <dbl> As you can see, we have an entry (row) for every country and every year. The first entry (for Afghanistan in the Year 1980) is 206, and it is the value for the variable Decile 1 Income . This value indicates that the mean annual income of the poorest 10% in Afghanistan was the equivalent of 206 US Dollars (in 1980, adjusted using purchasing power parity). The mean income of the next richest 10% (those in the 11th to 20th percentiles for income) was 350. To see the list of variables, we examine the structure of decile_data . str(decile_data) ## Classes 'tbl_df', 'tbl' and 'data.frame': 4799 obs. of 14 variables: ## $ Country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ... ## $ Year : num 1980 1981 1982 1983 1984 ... ## $ Decile 1 Income : num 206 212 221 238 249 256 268 243 223 202 ... ## $ Decile 2 Income : num 350 361 377 405 424 435 457 414 380 344 ... ## $ Decile 3 Income : num 455 469 490 527 551 566 594 539 493 447 ... ## $ Decile 4 Income : num 556 574 599 644 674 692 726 659 603 547 ... ## $ Decile 5 Income : num 665 686 716 771 806 828 869 788 722 654 ... ## $ Decile 6 Income : num 793 818 854 919 961 ... ## $ Decile 7 Income : num 955 986 1029 1107 279 EMPIRICAL PROJECT 5 WORKING IN R 1157 ... ## $ Decile 8 Income : num 1187 1225 1278 1376 1438 ... ## $ Decile 9 Income : num 1594 1645 1717 1848 1932 ... ## $ Decile 10 Income: num 3542 3655 3814 4105 4291 ... ## $ Mean Income : num 1030 1063 1109 1194 1248 ... ## $ Population : num 13211412 12996923 12667001 12279095 11912510 ... In addition to the country, year, and the ten income deciles we have mean income and the population. 1 Choose two countries. You will be using their data, for 1980 and 2014, as the basis for your Lorenz curves. Use the country data you have selected to calculate cumulative income shares. (Remember that each decile represents 10% of the population.) R WALK-THROUGH 5.2 Calculating cumulative shares using the cumsum function Here we have chosen China (a country that recently underwent enormous economic changes) and the US (a developed country). sel_Year <- c(1980,2014) sel_Country <- c("United States","China") temp <- subset(decile_data, (decile_data$Country %in% sel_Country) & (decile_data$Year %in% sel_Year)) # Select the data for the chosen country and years temp ## # A tibble: 4 x 14 ## Country Year `Decile 1 Income` `Decile 2 Income` `Decile 3 Incom~ ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 China 1980 79 113 146 ## 2 China 2014 448 927 1440 ## 3 United States 1980 3392 5820 7855 280 PART 5.1 MEASURING INCOME INEQUALITY ## 4 United States 2014 3778 6534 9069 ## # ... with 9 more variables: `Decile 4 Income` <dbl>, `Decile 5 ## # Income` <dbl>, `Decile 6 Income` <dbl>, `Decile 7 Income` <dbl>, ## # `Decile 8 Income` <dbl>, `Decile 9 Income` <dbl>, `Decile 10 ## # Income` <dbl>, `Mean Income` <dbl>, Population <dbl> Before we calculate cumulative income shares, we need to calculate the total income using the mean income and the population size.