Estimating Lorenz and Concentration Curves in Stata

Faculty of Business, Economics and Social Sciences Department of Social Sciences University of Bern Social Sciences Working Paper No. 15 Estimating Lorenz and concentration curves in Stata Ben Jann This paper is forthcoming in the Stata Journal. Current version: October 27, 2016 First version: January 12, 2016 http://ideas.repec.org/p/bss/wpaper/15.html http://econpapers.repec.org/paper/bsswpaper/15.htm University of Bern Tel. +41 (0)31 631 48 11 Department of Social Sciences Fax +41 (0)31 631 48 17 Fabrikstrasse 8 [email protected] CH-3012 Bern www.sowi.unibe.ch Estimating Lorenz and concentration curves in Stata Ben Jann Institute of Sociology University of Bern [email protected] October 27, 2016 Abstract Lorenz and concentration curves are widely used tools in inequality research. In this paper I present a new Stata command called lorenz that estimates Lorenz and concentration curves from individual-level data and, optionally, displays the results in a graph. The lorenz command supports relative as well as generalized, absolute, unnormalized, or custom-normalized Lorenz or concentration curves, and provides tools for computing contrasts between different subpopulations or outcome variables. Variance estimation for complex samples is fully supported. Keywords: Stata, lorenz, Lorenz curve, concentration curve, inequality, income distribution, wealth distribution, graphics Contents 1Introduction 3 2Methodsandformulas 4 2.1 Lorenzcurve.................................... 4 2.2 Equalitygapcurve ................................ 4 2.3 Total (unnormalized) Lorenz curve . 5 2.4 Generalized Lorenz curve . 5 2.5 Absolute Lorenz curve . 5 2.6 Concentrationcurve ............................... 6 2.7 Renormalization . 6 2.8 Contrasts ..................................... 8 2.9 Point estimation . 8 2.10 Variance estimation . 9 3Thelorenzcommand 11 3.1 Syntax of lorenz estimate . 11 3.2 Syntax of lorenz contrast . 14 3.3 Syntax of lorenz graph . 15 4Examples 17 4.1 Basic application . 17 4.2 Subpopulation estimation . 19 4.3 ContrastsandLorenzdominance . 21 4.4 Concentration curves and renormalization . 24 5Acknowledgments 25 2 1Introduction Lorenz curves and concentration curves are widely used tools for the analysis of economic inequality and redistribution (see, e.g., Cowell, 2011; Lambert, 2001). Yet, no official command for the estimation of Lorenz and concentration curves is offered in Stata. In this paper I present an implementation of such a command, called lorenz.Although other user commands with related functionality do exist,1 Ibelievethatlorenz is a worth- while contribution that will prove beneficial to inequality researchers. The command com- putes and, optionally, graphs relative, total (unnormalized), generalized, or absolute Lorenz and concentration curves from individual-level data. Standard errors and confidence intervals are provided and estimation from complex samples is fully supported. Furthermore, lorenz is well suited for subpopulation analysis and offers options to compute contrasts between subpopulations or between outcome variables (including standard errors). The command also offers custom normalization of results, which can be useful for comparing subpopulations or outcome variables. Finally, lorenz saves its results in the e() returns for easy processing by post-estimation commands. In the remainder of the paper I will first discuss the relevant methods and formulas and then present the syntax and options of the lorenz command. After that, the usage of the command will be illustrated by a number of examples. 1Examples are glcurve (Jenkins and Van Kerm, 1999; Van Kerm and Jenkins, 2001), svylorenz (Jenkins, 2006), clorenz (Abdelkrim, 2005), or alorenz (Azevedo and Franco, 2006). 3 2Methodsandformulas 2.1 Lorenz curve Let X be the outcome variable of interest (e.g. income). The cumulative distribution function of X is given as FX (x)=Pr X x and the quantile function (the inverse of the distribution { 1} function) is given as QX (p)=F − (p)=inf x FX (x) p with p [0, 1].Forcontinuous X { | ≥ } 2 X,theordinatesoftherelative Lorenz curve are given as p QX ydFX (x) LX (p)= 1 R 1 xdFX (x) 1 (see, e.g., Cowell, 2000; Lambert, 2001; HaoR and Naiman, 2010). Intuitively, a point on the Lorenz curve quantifies the proportion of total outcome of the poorest p 100 percent of the · population. This can easily be seen in the finite population form of LX (p), which is given as N p i=1 XiI Xi QX LX (p)= N{ } P i=1 Xi with I A as an indicator function being equalP to 1 if A is true and 0 else. { } Furthermore, let Ji be an indicator for whether observation i belongs to subpopulation j or not (i.e., Ji =1if observation i belongs to subpopulation j and Ji =0else). The finite population form of the Lorenz curve of X in subpopulation j can then be written as N p,j j i=1 XiI Xi QX Ji LX (p)= N{ } P i=1 XiJi p,j P where QX is the p-quantile of X in subpopulation j. The population-wide Lorenz curve is obtained by setting Ji =1for all observations. Lorenz curves are typically displayed graphically with p on the horizontal axis and LX (p) on the vertical axis, although Lorenz (1905) originally proposed an opposite layout. 2.2 Equality gap curve The equality gap curve quantifies the degree to which the proportion of total outcome of the poorest p 100 percent of the population deviates from the proportion of total outcome · these population members would get under an equal distribution. That is, the equality gap curve is equal to the difference between the equal distribution diagonal and the Lorenz curve. Formally, the (finite population form of the) equality gap curve of Y in subpopulation j is given as N X I X Qp,j J EG j (p)=p i=1 i { i X } i = p Lj (p) X − N − X P i=1 XiJi P 4 EG X (p) is equal to the proportion of total outcome that would have to be relocated to the poorest p 100 percent in order to provide them an average outcome equal to the population · average. 2.3 Total (unnormalized) Lorenz curve In the finite population, define the (subpopulation specific) total Lorenz curve as N TLj (p)= X I X Qp,j J X i { i X } i i=1 X The total Lorenz curve quantifies the cumulative sum of outcomes among the poorest p 100 · percent of the (sub-)population. 2.4 Generalized Lorenz curve The ordinates of the (relative) Lorenz curve refer to cumulative outcome proportions. Hence, LX (1) = 1.Incontrast,theordinatesofthegeneralized Lorenz curve, GLX (p),refertothe cumulative outcome average.HenceGLX (1) = X, where X is the mean of X.Formally,the generalized Lorenz curve can be defined as p QX GLX (p)= xdFX (x) Z1 the finite population form of which is 1 N GL (p)= X I X Qp X N i { i X } i=1 X (see, e.g., Shorrocks, 1983; Cowell, 2000; Lambert, 2001). Furthermore, for subpopulation j, the generalized Lorenz curve can be written as 1 N GLj (p)= X I X Qp,j J X N i { i X } i i=1 Ji i=1 X N P where i=1 Ji is equal to the subpopulation size. P 2.5 Absolute Lorenz curve The absolute Lorenz curve quantifies the degree to which the generalized Lorenz curve deviates from the equal distribution line in terms of the cumulative outcome average (see, e.g., 5 Moyes, 1987). Formally, the (finite population form of the) absolute Lorenz curve of Y in subpopulation j is given as 1 N N N X J ALj (p)= X I X Qp,j J p X J = GLj (p) p i=1 i i X N i { i X } i − i i X − N i=1 Ji i=1 i=1 ! i=1 Ji X X P P P 2.6 Concentration curve The Lorenz curve of outcome variable X refers cumulative outcome proportions of population members ranked by the values of X.UsinganalternativerankingvariableY , while still measuring outcome in terms of X,leadstotheso-calledconcentration curve.Formally,the (relative) concentration curve of X with respect to Y can be defined as p QY 1 xfXY (x, y) dx dy LXY (p)= 1 1 R R 1 xdFX (x) 1 p where QY is the p-quantile of the distributionR of Y and fXY (x, y) is the density of the joint distribution of X and Y (see, e.g., Bishop et al., 1994). In the finite population the concentration curve simplifies to N p i=1 XiI Yi QY LXY (p)= N{ } P i=1 Xi Furthermore, for subpopulation j the concentrationP curve can be written as N p,j j i=1 XiI Yi QY Ji LXY (p)= N{ } P i=1 XiJi Total, generalized, or absolute concentration curvesP can be defined analogously. 2.7 Renormalization Relative Lorenz curves are normalized with respect to the total of the analyzed outcome variable in the given population or subpopulation. Depending on context, it may be useful to apply a different type of normalization. For example, when analyzing labor income, we may want to express results with respect total income (labor income plus capital income). Likewise, when analyzing a subpopulation, we may be interested in results relative to another subpopulation or relative to the overall population. To normalize the relative Lorenz curve or the equality gap curve of X with respect to the total of Z (where Z may be the sum of several variables, possibly including X), let N p,j XiI Xi Q Ji Lj,Z (p)= i=1 { X } and EG j,Z (p)=p Lj,Z (p) X N X − X P i=1 ZiJi P 6 Likewise, to normalize with respect to a fixed (subpopulation) total ⌧,let N p,j XiI Xi Q Ji Lj,⌧ (p)= i=1 { X } and EG j,⌧ (p)=p Lj,⌧ (p) X ⌧ X − X P To normalize the Lorenz curve of subpopulation j with respect to the total in subpopulation r (where subpopulation r may include subpopulation j), define N p,j jr i=1 XiI Xi QX Ji LX (p)= N{ } P i=1 XiRi where Ri is an indicator for whether observationP i belongs to subpopulation r or not.

Load more