Faculty of Business, Economics and Social Sciences
Department of Social Sciences
University of Bern Social Sciences Working Paper No. 15
Estimating Lorenz and concentration curves in Stata
Ben Jann
This paper is forthcoming in the Stata Journal.
Current version: October 27, 2016 First version: January 12, 2016
http://ideas.repec.org/p/bss/wpaper/15.html http://econpapers.repec.org/paper/bsswpaper/15.htm
University of Bern Tel. +41 (0)31 631 48 11 Department of Social Sciences Fax +41 (0)31 631 48 17 Fabrikstrasse 8 [email protected] CH-3012 Bern www.sowi.unibe.ch
Estimating Lorenz and concentration curves in Stata
Ben Jann Institute of Sociology University of Bern [email protected]
October 27, 2016
Abstract
Lorenz and concentration curves are widely used tools in inequality research. In this paper I present a new Stata command called lorenz that estimates Lorenz and concentration curves from individual-level data and, optionally, displays the results in a graph. The lorenz command supports relative as well as generalized, absolute, unnor- malized, or custom-normalized Lorenz or concentration curves, and provides tools for computing contrasts between different subpopulations or outcome variables. Variance estimation for complex samples is fully supported. Keywords: Stata, lorenz, Lorenz curve, concentration curve, inequality, income distribution, wealth distribution, graphics Contents
1Introduction 3
2Methodsandformulas 4 2.1 Lorenzcurve...... 4 2.2 Equalitygapcurve ...... 4 2.3 Total (unnormalized) Lorenz curve ...... 5 2.4 Generalized Lorenz curve ...... 5 2.5 Absolute Lorenz curve ...... 5 2.6 Concentrationcurve ...... 6 2.7 Renormalization ...... 6 2.8 Contrasts ...... 8 2.9 Point estimation ...... 8 2.10 Variance estimation ...... 9
3Thelorenzcommand 11 3.1 Syntax of lorenz estimate ...... 11 3.2 Syntax of lorenz contrast ...... 14 3.3 Syntax of lorenz graph ...... 15
4Examples 17 4.1 Basic application ...... 17 4.2 Subpopulation estimation ...... 19 4.3 ContrastsandLorenzdominance ...... 21 4.4 Concentration curves and renormalization ...... 24
5Acknowledgments 25
2 1Introduction
Lorenz curves and concentration curves are widely used tools for the analysis of economic in- equality and redistribution (see, e.g., Cowell, 2011; Lambert, 2001). Yet, no official command for the estimation of Lorenz and concentration curves is offered in Stata. In this paper I present an implementation of such a command, called lorenz.Although other user commands with related functionality do exist,1 Ibelievethatlorenz is a worth- while contribution that will prove beneficial to inequality researchers. The command com- putes and, optionally, graphs relative, total (unnormalized), generalized, or absolute Lorenz and concentration curves from individual-level data. Standard errors and confidence intervals are provided and estimation from complex samples is fully supported. Furthermore, lorenz is well suited for subpopulation analysis and offers options to compute contrasts between subpopulations or between outcome variables (including standard errors). The command also offers custom normalization of results, which can be useful for comparing subpopula- tions or outcome variables. Finally, lorenz saves its results in the e() returns for easy processing by post-estimation commands. In the remainder of the paper I will first discuss the relevant methods and formulas and then present the syntax and options of the lorenz command. After that, the usage of the command will be illustrated by a number of examples.
1Examples are glcurve (Jenkins and Van Kerm, 1999; Van Kerm and Jenkins, 2001), svylorenz (Jenkins, 2006), clorenz (Abdelkrim, 2005), or alorenz (Azevedo and Franco, 2006).
3 2Methodsandformulas
2.1 Lorenz curve
Let X be the outcome variable of interest (e.g. income). The cumulative distribution function of X is given as FX (x)=Pr X x and the quantile function (the inverse of the distribution { 1} function) is given as QX (p)=F (p)=inf x FX (x) p with p [0, 1].Forcontinuous X { | } 2 X,theordinatesoftherelative Lorenz curve are given as
p QX ydFX (x) LX (p)= 1 R 1 xdFX (x) 1 (see, e.g., Cowell, 2000; Lambert, 2001; HaoR and Naiman, 2010). Intuitively, a point on the Lorenz curve quantifies the proportion of total outcome of the poorest p 100 percent of the · population. This can easily be seen in the finite population form of LX (p), which is given as
N p i=1 XiI Xi QX LX (p)= N{ } P i=1 Xi with I A as an indicator function being equalP to 1 if A is true and 0 else. { } Furthermore, let Ji be an indicator for whether observation i belongs to subpopulation j or not (i.e., Ji =1if observation i belongs to subpopulation j and Ji =0else). The finite population form of the Lorenz curve of X in subpopulation j can then be written as
N p,j j i=1 XiI Xi QX Ji LX (p)= N{ } P i=1 XiJi p,j P where QX is the p-quantile of X in subpopulation j. The population-wide Lorenz curve is obtained by setting Ji =1for all observations.
Lorenz curves are typically displayed graphically with p on the horizontal axis and LX (p) on the vertical axis, although Lorenz (1905) originally proposed an opposite layout.
2.2 Equality gap curve
The equality gap curve quantifies the degree to which the proportion of total outcome of the poorest p 100 percent of the population deviates from the proportion of total outcome · these population members would get under an equal distribution. That is, the equality gap curve is equal to the difference between the equal distribution diagonal and the Lorenz curve. Formally, the (finite population form of the) equality gap curve of Y in subpopulation j is given as N X I X Qp,j J EG j (p)=p i=1 i { i X } i = p Lj (p) X N X P i=1 XiJi P 4 EG X (p) is equal to the proportion of total outcome that would have to be relocated to the poorest p 100 percent in order to provide them an average outcome equal to the population · average.
2.3 Total (unnormalized) Lorenz curve
In the finite population, define the (subpopulation specific) total Lorenz curve as
N TLj (p)= X I X Qp,j J X i { i X } i i=1 X The total Lorenz curve quantifies the cumulative sum of outcomes among the poorest p 100 · percent of the (sub-)population.
2.4 Generalized Lorenz curve
The ordinates of the (relative) Lorenz curve refer to cumulative outcome proportions. Hence, LX (1) = 1.Incontrast,theordinatesofthegeneralized Lorenz curve, GLX (p),refertothe cumulative outcome average.HenceGLX (1) = X, where X is the mean of X.Formally,the generalized Lorenz curve can be defined as
p QX GLX (p)= xdFX (x) Z 1 the finite population form of which is
1 N GL (p)= X I X Qp X N i { i X } i=1 X (see, e.g., Shorrocks, 1983; Cowell, 2000; Lambert, 2001). Furthermore, for subpopulation j, the generalized Lorenz curve can be written as
1 N GLj (p)= X I X Qp,j J X N i { i X } i i=1 Ji i=1 X N P where i=1 Ji is equal to the subpopulation size. P 2.5 Absolute Lorenz curve
The absolute Lorenz curve quantifies the degree to which the generalized Lorenz curve devi- ates from the equal distribution line in terms of the cumulative outcome average (see, e.g.,
5 Moyes, 1987). Formally, the (finite population form of the) absolute Lorenz curve of Y in subpopulation j is given as
1 N N N X J ALj (p)= X I X Qp,j J p X J = GLj (p) p i=1 i i X N i { i X } i i i X N i=1 Ji i=1 i=1 ! i=1 Ji X X P P P 2.6 Concentration curve
The Lorenz curve of outcome variable X refers cumulative outcome proportions of population members ranked by the values of X.UsinganalternativerankingvariableY , while still measuring outcome in terms of X,leadstotheso-calledconcentration curve.Formally,the (relative) concentration curve of X with respect to Y can be defined as
p QY 1 xfXY (x, y) dx dy LXY (p)= 1 1 R R 1 xdFX (x) 1 p where QY is the p-quantile of the distributionR of Y and fXY (x, y) is the density of the joint distribution of X and Y (see, e.g., Bishop et al., 1994). In the finite population the concentration curve simplifies to
N p i=1 XiI Yi QY LXY (p)= N{ } P i=1 Xi Furthermore, for subpopulation j the concentrationP curve can be written as
N p,j j i=1 XiI Yi QY Ji LXY (p)= N{ } P i=1 XiJi Total, generalized, or absolute concentration curvesP can be defined analogously.
2.7 Renormalization
Relative Lorenz curves are normalized with respect to the total of the analyzed outcome variable in the given population or subpopulation. Depending on context, it may be useful to apply a different type of normalization. For example, when analyzing labor income, we may want to express results with respect total income (labor income plus capital income). Likewise, when analyzing a subpopulation, we may be interested in results relative to another subpopulation or relative to the overall population. To normalize the relative Lorenz curve or the equality gap curve of X with respect to the total of Z (where Z may be the sum of several variables, possibly including X), let
N p,j XiI Xi Q Ji Lj,Z (p)= i=1 { X } and EG j,Z (p)=p Lj,Z (p) X N X X P i=1 ZiJi P 6 Likewise, to normalize with respect to a fixed (subpopulation) total ⌧,let
N p,j XiI Xi Q Ji Lj,⌧ (p)= i=1 { X } and EG j,⌧ (p)=p Lj,⌧ (p) X ⌧ X X P To normalize the Lorenz curve of subpopulation j with respect to the total in subpopu- lation r (where subpopulation r may include subpopulation j), define
N p,j jr i=1 XiI Xi QX Ji LX (p)= N{ } P i=1 XiRi where Ri is an indicator for whether observationP i belongs to subpopulation r or not. For j,r example, if r is the entire population (including subpopulation j), then LX (p) is the propor- tion of the population-wide outcome that goes to the poorest p 100 percent of subpopulation · j.Incontrast,sincetheequalitygapcurveissupposedtoquantifythedeviationfromthe equal distribution line, the renormalized equality gap curve of subpopulation j with respect to the total in subpopulation r should be defined as N J EG jr(p)=p i=1 i Ljr(p) X N X Pi=1 Ri N N where p Ji/ Ri is the outcome shareP of the poorest p 100 percent of subpopulation i=1 i=1 · j if all population members would receive the same outcome. P P jr The normalized Lorenz curve ordinates LX (p) express outcome shares in subpopulation j relative to the total outcome of subpopulation r.AnalternativeistorenormalizeLorenz curve ordinates in a way such that they are relative to the total that would be observed in subpopulation j,ifallmembersofsubpopulationj would receive the average outcome of subpopulation r. This can be achieved by rescaling the total by relative group sizes, that is,
N p,j XiI Xi Q Ji Ljr(p)= i=1 { X } and EG jr(p)=p Ljr(p) X N J X X i=1 i N X R P N R i=1 i i Pi=1 i P P jr Note that there is a close relation between LX (p) and generalized Lorenz curves: The ratio jr of LX (p) from two subpopulations is equal to the ratio of the generalized Lorenz curves from these subpopulations. Combining normalization with respect to a different subpopulation and normalization with respect to the total of a different outcome variable or a fixed total leads to N X I X Qp,j J N J Ljr,Z(p)= i=1 i { i X } i EG jr,Z(p)=p i=1 i Ljr,Z(p) X N X N X P i=1 ZiRi Pi=1 Ri N p,j jr,Z i=1 XPiI Xi QX Ji jr,Z P jr,Z LX (p)= N { } EG X (p)=p LX (p) i=1 Ji N Z R P N R i=1 i i Pi=1 i P P 7 and N X I X Qp,j J N J Ljr,⌧ (p)= i=1 i { i X } i EG jr,⌧ (p)=p i=1 i Ljr,⌧ (p) X ⌧ X N X P Pi=1 Ri N X I X Qp,j J Ljr,⌧ (p)= i=1 i { i X } i EG jr,⌧ (p)=pPLjr,⌧ (p) X N J X X i=1 i ⌧ P N R Pi=1 i P Analogous renormalizations can be applied to concentration curves. Simply replace p,j p,j I Xi Q in the above formulas by I Yi Q . { X } { Y }
2.8 Contrasts
To analyze distributional differences is helpful to compute contrasts between Lorenz curves. For example, the difference L (p) L (p) X Y may be used to evaluate whether distribution X Lorenz dominates distribution Y . Likewise, the difference GL (p) GL (p) X Y may be used to evaluate whether distribution X generalized Lorenz dominates distribution Y .Dominanceisgivenifthedifference is positive for all p. As shown by Atkinson (1970), if distribution X Lorenz dominates distribution Y then distribution X can be seen as less unequal than distribution Y under weak conditions. Likewise, if distribution X general- ized Lorenz dominates distribution Y then distribution X can be seen as preferable over distribution Y in terms of welfare under weak conditions (see, e.g., Lambert, 2001). Depending on context, it may also be practical to define contrasts as ratios, that is, LX (p)/LY (p),oraslogarithmsofratios,thatis,ln(LX (p)/LY (p)).
2.9 Point estimation
Given is a sample Xi, i =1,...,n, with sampling weights wi.Furthermore,letsubscripts in parentheses refer to observations sorted in ascending order of X. LX (p) can then be estimated as
LX (p)=(1 )Xip 1 + Xip , where b ip e e ip p pip 1 i=1 w(i)X(i) i=1 w(i) = , Xip = n , and pip = n pip pip 1 wiXi wi P i=1 P i=1 b and where ip is set such that pipe1