Essays in Labour and Urban Economics

by

Nicolas Gendron-Carrier

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Economics University of Toronto

c Copyright 2018 by Nicolas Gendron-Carrier Abstract

Essays in Labour and Urban Economics

Nicolas Gendron-Carrier Doctor of Philosophy Department of Economics University of Toronto 2018

This thesis contains three essays that focus on topics in labour and urban economics.

In Chapter 1, I use new administrative Canadian matched owner-employer-employee data to investigate the mechanisms that drive entry into entrepreneurial careers and en- trepreneurial success among young individuals. I pay particular attention to the value of prior work experience in entrepreneurship. I use information on the career choices and earnings of individuals each year to structurally estimate a dynamic Roy model of career choice. I recover parameters governing: (a) the returns to various types of experience in the labour market and in entrepreneurship, (b) the non-pecuniary benefits associated with being a worker and an entrepreneur, and (c) career-specific entry costs. I use the estimated model to evaluate the impact of policies designed to promote successful entrepreneurship.

Chapter 2 (joint with Leah Brooks and Gisela Rua) investigates how containerization impacts local economic activity. Containerization is premised on a simple insight: packaging goods for waterborne trade into a standardized container makes them dramatically cheaper to move. We use a novel cost-shifter instrument – port depth pre-containerization – to contend with the non-random adoption of containerization by ports. Container ships sit much deeper in the water than their predecessors, making initially deep ports cheaper to containerize. Consistent with New Economic Geography models, we find that cities near container ports grow an additional 70 percent from 1950 to 2010. Gains predominate in cities with initially

ii low population density and manufacturing. Chapter 3 (joint with Marco Gonzalez-Navarro, Stefano Polloni, and Matthew Turner) investigates the relationship between the opening of a city’s subway network and its air quality. We find that particulate concentrations drop by 4% in a 10km radius disk surrounding a city center following a subway system opening. The effect is larger near the city center and persists over the longest time horizon that we can measure with our data, about eight years. We estimate that a new subway system provides an external mortality benefit of about $594m per year. Although available subway capital cost estimates are crude, the estimated external mortality effects represent a significant fraction of construction costs.

iii Acknowledgements

I am most grateful to Nathaniel Baum-Snow for his invaluable guidance and support. His mentorship, in the form of feedback, encouragement and enthusiasm raised the quality of my research and ensured the timely completion of my project. I am also grateful to Marco Gonzalez-Navarro, Daniel Trefler, and Matthew Turner for serving on my PhD supervisory committee and for their insightful suggestions and challenging questions. Special thanks to Matthew Turner for his generosity and mentorship early on. I owe my interest in the field of urban economics to him. I thank Victor Aguirregabiria, Mitch Hoffman, Kory Kroft, Jean-William Laliberté, Math- ieu Marcoux, Juan Morales, Peter Morrow, Michel Serafinelli, and Aloysius Siow for many helpful comments and discussions. I also thank my mother Lucie Gendron and my stepfather Gaétan Chouinard for their continuing encouragement. Last but not least, thanks to Prachi Khandekar, my partner in life, for her unwavering support and love.

iv À la mémoire de mon père, Bernard Carrier.

v Contents

Acknowledgements iv

Dedication v

Table of Contents vi

List of Tables ix

List of Figures xi

1 Understanding the Careers of Young Entrepreneurs 1

1.1 Introduction ...... 1

1.2 Data ...... 8

1.2.1 Data Sources and Measurement ...... 8

1.2.2 Definition of the Firm Productivity Ladder ...... 10

1.2.3 Sample Restrictions ...... 12

1.2.4 Descriptive Statistics ...... 13

1.3 The Model ...... 16

1.3.1 Timing and Flow Utility ...... 17

1.3.2 The Earnings Process ...... 19

1.3.3 Mobility Costs ...... 21

1.3.4 Optimal Career Choices ...... 22

vi 1.4 Estimation ...... 23

1.4.1 The First Stage ...... 24

1.4.2 The Second Stage ...... 27

1.4.3 Identification ...... 31

1.5 Results ...... 33

1.5.1 Earnings Profiles ...... 33

1.5.2 Career Choices by Type ...... 36

1.5.3 Non-Pecuniary Benefits and Entry Costs ...... 36

1.6 Policy Simulations ...... 38

1.7 Conclusion ...... 41

1.8 Appendix A: AKM Estimation ...... 54

1.9 Appendix B: CCP Smoothing ...... 62

1.10 Appendix C: Parameter Estimates ...... 64

1.11 Appendix D: Model Fit ...... 74

2 The Local Impact of Containerization 79

2.1 Introduction ...... 79

2.2 Containerization ...... 83

2.3 Theoretical Motivation ...... 85

2.4 Data ...... 87

2.5 Empirical Methods ...... 89

2.5.1 Difference-in-Differences ...... 89

2.5.2 Instrumental Variables ...... 92

2.6 Results ...... 97

2.6.1 Difference-in-Differences ...... 97

2.6.2 Instrumental Variables ...... 99

2.6.3 Where Gains to Containerization Are Largest ...... 108

2.7 Conclusion ...... 110

vii 2.8 Appendix A: Data Sources ...... 121 2.9 Appendix B: Data Choices ...... 127

3 Subways and Urban Air Pollution 144 3.1 Introduction ...... 144 3.2 Literature ...... 146 3.3 Data ...... 147 3.3.1 Subways ...... 148 3.3.2 Aerosol Optical Depth measurements from the Terra and Aqua earth observing satellites ...... 150 3.3.3 Other control variables ...... 154 3.4 Aerosol Optical Depth versus ground based measurements ...... 155 3.5 The relationship between subway system openings and AOD ...... 158 3.5.1 Longer time horizons ...... 166 3.5.2 Spatial scale of effect ...... 169 3.5.3 Further results ...... 169 3.6 Subways, AOD and urban travel behavior ...... 172 3.7 Value of AOD reductions following subway openings ...... 175 3.7.1 Value of health benefits from estimates in the economics literature . . 175 3.7.2 Value of health benefits from the Global Burden of Disease methodology177 3.7.3 Discussion ...... 178 3.8 Conclusion ...... 179 3.9 Appendix A: Ridership data ...... 193 3.10 Appendix B: AOD data ...... 195 3.11 Appendix C: Global Burden of Disease based mortality estimates ...... 199

Bibliography 213

viii List of Tables

1.1 Summary Statistics ...... 49

1.2 Career Transitions ...... 50

1.3 Earnings in Each Career at Age 25 by Type ...... 51

1.4 Career Choices by Type ...... 52

1.5 The Short Run and Long Run Impact of Policies ...... 53

1.6 Estimation Results for AKM Model ...... 60

1.7 AKM Firm Effects Are Correlated with Alternative Measures of Firm Produc- tivity ...... 61

1.8 Earnings Process: High Productivity Firms ...... 65

1.9 Earnings Process: Medium-High Productivity Firms ...... 66

1.10 Earnings Process: Medium-Low Productivity Firms ...... 67

1.11 Earnings Process: Low Productivity Firms ...... 68

1.12 Earnings Process: Incorporated ...... 69

1.13 Earnings Process: Unincorporated ...... 70

1.14 Non-Pecuniary Benefits and Scale Parameters ...... 71

1.15 Mobility Costs ...... 72

1.16 Mobility Costs (continued) ...... 73

1.17 Model Fit: Career Choices Over the Life Cycle ...... 76

1.18 Model Fit: Career Transitions ...... 77

1.19 Model Fit: Career Choices by Type ...... 78

ix 2.1 County Characteristics by Distance to Nearest Containerized Port ...... 115 2.2 Containerization Associated with Increased Population, Particularly Near the Port ...... 116 2.3 Impact of Containerization Robust to Alternative Specifications ...... 117 2.4 Containerization Impacts Growth in World Cities ...... 118 2.5 More Employment and Higher Earnings Near Containerized Ports ...... 119 2.6 Greater Containerization-Induced Growth in Initially Lagging Places . . . . 120 2.7 Complete First Stage Specification ...... 134 2.8 Midwest Counties Have No First Stage and Reduced Form Impacts Are Zero 135 2.9 World City Characteristics by Distance to Nearest Containerized Port . . . . 136 2.10 Complete First Stage Estimates for World Sample ...... 137

3.1 AOD in 43 new subway cities ...... 187 3.2 The relationship between AOD and ground-based particulate measures . . . 188 3.3 Subway opening and AOD for the 18 month period post system opening . . 189 3.4 Longer term effects ...... 190 3.5 Even longer horizon ...... 191 3.6 Spatial decay ...... 192 3.7 Ridership data sources ...... 194 3.8 City level descriptive statistics and health estimates ...... 202 3.9 Subway opening and AOD by 6 month period, pre- and post-system opening 206 3.10 Heterogenous effects ...... 207 3.11 Placebo city AOD for 18 month period post system opening ...... 208 3.12 Robustness check using an expanded sample of cities and country by month fixed effects ...... 209 3.13 Robustness check excluding observations with low pixel count ...... 210 3.14 Expansions ...... 211 3.15 Results on ridership per capita ...... 212

x List of Figures

1.1 Distribution of Annual Earnings ...... 43

1.2 Career Choices Over the Life Cycle ...... 44

1.3 Understanding the Importance of Learning-By-Doing in Entrepreneurship . . 45

1.4 Understanding the Importance of Labour Market Experience in Entrepreneur- ship ...... 46

1.5 Labour Market Returns To Experience in Entrepreneurship ...... 47

1.6 Utility Costs Associated With Entrepreneurship ...... 48

1.7 AKM Event Study: Symmetric Job Changes ...... 57

1.8 AKM Event Study: Asymmetric Job Changes ...... 58

1.9 AKM Residuals ...... 59

1.10 Empirical CCPs: Raw and Adjusted ...... 63

2.1 Adoption of Containerization: 1956–2008 ...... 111

2.2 Graphical Intuition ...... 112

2.3 Geographic Variation in Treatment and Instrument ...... 113

2.4 Port Depth Unrelated to Pre-Containerization Growth ...... 114

2.5 Evolution of Ship Sizes ...... 129

2.6 Instrument Variation vs. Pre-Treatment Covariates: All Instruments . . . . . 130

2.7 IV Estimates Indistinguishable From Zero at 300 km ...... 131

2.8 Containerization’s Impact Increases Over Time ...... 132

2.9 Depth and Likelihood of Containerization, World Cities ...... 133

xi 3.1 Daily ridership per capita ...... 181 3.2 Two maps showing AOD. Red indicates higher levels of AOD ...... 182 3.3 AOD for Bangalore in June and December 2014 ...... 183 3.4 AOD versus PM ...... 184 3.5 Break-tests and event studies the 18 months before and after subway openings and the start of construction ...... 185 3.6 Heterogeneity of the effect of subway opening on AOD ...... 186

3.7 modis Terra and Aqua AOD data ...... 197 3.8 Plots of ground-based pm10 and pm2.5 vs. MODIS AOD ...... 198 3.9 AOD during the 48 months before and after subway openings ...... 201

xii Chapter 1

Understanding the Careers of Young Entrepreneurs

1.1 Introduction

Many of today’s most successful businesses, including Microsoft, Apple, Google, and Face- book, were founded by individuals under the age of 30. Perhaps as a result, policymakers around the world have embarked on initiatives designed to help young individuals become entrepreneurs. Some policies focus on specific issues known to deter young individuals from a career in entrepreneurship (e.g. lack of access to capital, lack of avenues to acquire en- trepreneurial skills, etc), while others provide direct financial incentives to attract more young entrepreneurs.1 The capability of such policies to create impact, however, is poorly understood. Evaluating the effectiveness of policies designed to promote successful entrepreneurship is challenging for two main reasons. First, there is tremendous heterogeneity across en- trepreneurs. Although start-ups are known to contribute substantially to both job creation and productivity growth, these results are driven by a small number of high-growth en-

1Many governments offer subsidized loan programs, training programs, and grants to help young individ- uals start businesses.

1 Chapter 1. Understanding the Careers of Young Entrepreneurs 2 trepreneurs (Haltiwanger et al., 2013; Decker et al., 2014; Haltiwanger et al., 2016). The vast majority of entrepreneurs start small businesses, earn less than the average worker, and have no desire to grow over time (Hamilton, 2000; Hurst and Pugsley, 2011).2 Policies may therefore increase the supply of entrepreneurs in the economy without attracting individuals with high-growth potential in entrepreneurship (Schoar, 2010). The challenge is to identify which type of individual is most likely to respond to a given policy. Second, datasets that link business owners to their previous career histories are exceed- ingly rare. This has hampered our understanding of the returns to various types of experience in entrepreneurship (Goetz et al., 2016). A commonly held belief is that there is no better way to acquire the skills necessary to run a successful business than to be an entrepreneur and learn from experience. If this is true, promoting successful entrepreneurship means helping individuals become entrepreneurs early in the life cycle. On the other hand, many academics and practitioners have emphasized the importance of prior work experience for en- trepreneurial success (e.g. Lazear, 2005).3 If work experience is valuable in entrepreneurship, incentivizing individuals to become entrepreneurs early in their careers may be counterpro- ductive. Perhaps it would be more effective to help young individuals acquire skills in the labour market instead. Without understanding how individuals acquire skills that are valu- able in entrepreneurship, we can only speculate about the effectiveness of various policies. In this paper, I investigate the mechanisms that drive sorting into entrepreneurship and entrepreneurial success among young individuals. My analysis uses new administrative Cana- dian matched owner-employer-employee data. This dataset allows me to precisely character- ize the career histories of individuals before they become entrepreneurs, and then track their business income. I have access to 12 years of data, 2001-2012, and I focus on individuals who start their careers during this time period. I use information on the career choices and earnings of individuals each year to structurally

2Non-pecuniary benefits are thought to play an important role in the decision to become an entrepreneur (Hurst and Pugsley, 2011, 2015). 3See also the management literature on entrepreneurial spawning (e.g. Agarwal et al., 2004; Gompers et al., 2005; Franco and Filson, 2006; Chatterji, 2009; Chatterji et al., 2016). Chapter 1. Understanding the Careers of Young Entrepreneurs 3 estimate a dynamic Roy model of career choice. I recover parameters governing: (a) the returns to various types of experience in the labour market and in entrepreneurship, (b) the non-pecuniary benefits associated with being a worker and an entrepreneur, and (c) career- specific entry costs. As in Keane and Wolpin (1997), I specify a finite mixture model to capture unobserved heterogeneity across individuals. This means that I separate individuals into a finite number of unobservable types, and that I allow key parameters of the model to vary by type. For example, to capture unobserved absolute and comparative advantages between individuals, I allow the earnings process in each career to flexibly depend on type. My use of a finite mixture model allows for the career choices and earnings of individuals to be correlated with each other and over time. As such, it allows the model to handle sorting on unobservables at labour force entry and across careers over the life cycle. I pay particular attention to the returns to prior work experience in entrepreneurship. A distinguishing feature of my model is that it accounts for the presence of heterogeneous employers in the labour market.4 I classify firms based on quartiles of the productivity distribution. Empirically, I use firm-specific components of the wage to measure the pro- ductivity levels of firms based on the fact that productive firms offer higher wages.5 In the model, employers with different productivity levels offer different pay schedules and, impor- tantly, different learning opportunities. The hypothesis is that firm productivity is driven by knowledge, and that knowledge is fundamentally non-appropriable (Arrow, 1962). Workers internalize some of their employer’s knowledge because they are intimately involved in the production process. By working in productive firms (i.e. firms that possess high-quality knowledge), future entrepreneurs have the potential to acquire skills that are necessary to run a successful business. In the model, employers with different productivity levels also offer different non-pecuniary benefits and have different costs of entry.6

4This is a pervasive feature of the labour market (e.g. Abowd et al., 1999; Syverson, 2011). 5This is a robust finding in the empirical literature in labour economics (e.g. Abowd et al., 1999; Serafinelli, 2015; Card et al., 2016b). It can be explained theoretically by the presence of labour market frictions: when facing an upward sloping labour supply curve, productive firms offer higher wages to attract more workers (Mortensen, 2005). 6If knowledge spillovers exist, we would expect the costs associated with entering a firm to depend on its Chapter 1. Understanding the Careers of Young Entrepreneurs 4

Throughout the paper, I make a distinction between two types of entrepreneurs: unincor- porated business owners and incorporated business owners. Recent research on entrepreneur- ship has documented important differences between the two, most notably in terms of earn- ings (Levine and Rubinstein, forthcoming).7 The model allows each organizational form to have its own earnings process, non-pecuniary benefits, and entry costs. As in the canonical occupational choice model of Roy (1951), the potential earnings of individuals in each career depend on aggregate prices and on their career-specific abilities. The model recognizes the fact that career decisions are made in a macroeconomic environment that fluctuates over time. There is now mounting evidence that young firms are particularly vulnerable to aggregate shocks (e.g. Gertler and Gilchrist, 1994; Fort et al., 2013; Zarutskie and Yang, 2016). As such, I allow earnings in the labour market and in entrepreneurship to be differentially affected by aggregate price fluctuations. I use a computationally light two-stage procedure developed by Arcidiacono and Miller (2011) to estimate the parameters of the model. This allows me to specify a model that is both flexible and empirically tractable. The key idea behind the two-stage estimation procedure is to exploit a mapping between the conditional choice probabilities and the parameters of the model. Because conditional choice probabilities can be recovered directly from the data, it is possible to estimate the parameters without having to solve the full model. In the first stage, I recover empirical estimates of the conditional choice probabilities that vary non-parametrically by unobservable type using the EM algorithm. In the second stage, I project the type-specific empirical estimates of the conditional choice probabilities onto parameter space to recover the parameters of the model. Intuitively, the estimated second stage parameters are the ones that minimize the distance between the observed behavior of individuals in the data and the behavior of individuals that is predicted by the model. productivity level. See, for example, the market for jobs model of Rosen (1972). 7Unincorporated business owners earn less, on average, than workers. In contrast, the distribution of annual earnings for incorporated business owners first-order stochastically dominates that for workers. Figure 1.1 illustrates these facts by showing the distribution of annual earnings for workers, incorporated business owners, and unincorporated business owners in Canada. Chapter 1. Understanding the Careers of Young Entrepreneurs 5

I build on Scott (2013) to account for aggregate price fluctuations. To be able to represent the parameters of the model as a function of the conditional choice probabilities, I would nor- mally have to make assumptions about how individuals form expectations over the evolution of aggregate prices in the economy. Typically, researchers either assume that individuals have perfect foresight or that prices follow a predetermined pattern that is known to individuals. Instead, I use the realization of aggregate prices in the economy as a noisy measure of what individuals expect them to be, and I treat the forecasting error of individuals as an error term in the second stage. I show that, assuming rational expectations, it is possible to estimate the parameters of the model without introducing any bias due to the forecasting error.8 My results indicate that only a small fraction of the population has a comparative ad- vantage in entrepreneurship at labour force entry. This subpopulation can be divided into two groups: (1) individuals who earn more as entrepreneurs because they have low earnings potential in the labour market and (2) individuals who have high earnings potential in all careers. They represent roughly 8% and 6% of the population, respectively. Interestingly, these two groups, which have emerged from the data, fit the profiles of subsistence and trans- formational entrepreneurs, as described by Schoar (2010). For simplicity, I use the labels subsistence and transformational to refer to them from here on in. In the model, there are two channels through which individuals acquire skills that are valuable in entrepreneurship: experience in entrepreneurship and work experience. I confirm recent empirical evidence that experience in entrepreneurship is an important channel through which individuals acquire skills that are valuable in entrepreneurship (e.g. Gompers et al., 2010; Lafontaine and Shaw, 2016). I find, however, that this channel is mostly relevant for the two types of individuals mentioned above: subsistence and transformational. For the vast majority of the population, the returns to experience in entrepreneurship are small. I find that prior work experience is of limited value in entrepreneurship, unless it has been

8Traiberman (2016) also uses the techniques developed by Scott (2013) to estimate a dynamic model of occupational choice model that accounts for aggregate price fluctuations. A contribution of my paper is to show how the techniques developed by Scott (2013) can be used to estimate a model that flexibly incorporates human capital accumulation. Chapter 1. Understanding the Careers of Young Entrepreneurs 6 accumulated in high productivity firms. Experience in high productivity firms is particularly valuable for the subsistence type. For these individuals, five years of work experience in high productivity firms increases their baseline earnings in entrepreneurship by about 75%. For the transformational type, work experience is of limited value in entrepreneurship, regardless of whether it has been accumulated in high productivity firms or in low productivity firms. Broadly, these results are consistent with Lazear (2005)’s theory that individuals must have a general set of skills to be successful in entrepreneurship. My main result is that a very small fraction of the population has the potential to be successful in entrepreneurship: the transformational type. Despite large earnings premiums in entrepreneurship, only 4% of transformational individuals become entrepreneurs between age 25 and 35. Parameter estimates indicate that the main deterrents for them are (a) large non-pecuniary costs associated with being an entrepreneur and (b) low returns to work experience in entrepreneurship. I interpret these large non-pecuniary costs as evidence of risk aversion: entrepreneurship can be unattractive because it is inherently more risky than other careers.9 Interestingly, I find that entry costs are a second-order consideration for the transformational type. My results have important policy implications. First, policies that are aimed at reducing entry costs (e.g. loan programs) are unlikely to be effective. Second, policies that help young individuals acquire skills in the labour market (e.g. internship programs) are also unlikely to create impact since work experience is of limited value in entrepreneurship for the vast majority of the population. Third, and most important, policies that incentivize individuals to become entrepreneurs early in the life cycle are the most likely to attract the transformational type. This is because the transformational type has (a) high returns to entrepreneurial experience and (b) low returns to prior work experience in entrepreneurship. The value of entrepreneurship is therefore highest for them at labour force entry.

9Hincapié (2017) estimates a dynamic Roy model of career choice that formally incorporates risk aversion and he finds that risk aversion reduces the fraction of entrepreneurs in the economy by about 40%. See also Kihlstrom and Laffont (1979), Iyigun and Owen (1998), and Hall and Woodward (2010) on the importance of risk aversion. Chapter 1. Understanding the Careers of Young Entrepreneurs 7

I use the estimated model to evaluate the impact of policies designed to promote successful entrepreneurship. I find that providing direct financial incentives to become an entrepreneur early in the life cycle induces transformational individuals to sort into entrepreneurship. Specifically, a $10,000 subsidy that is only available to individuals who become entrepreneurs at labour force entry increases the fraction of transformational individuals who become en- trepreneurs by about 10%. However, unless targeted appropriately, such a policy is costly to implement because it also induces a large fraction of individuals with relatively low en- trepreneurial skills to become entrepreneurs. This paper contributes to our understanding of the relative importance of various factors affecting the decision to become an entrepreneur, as well as to our understanding of the determinants of entrepreneurial success. A recent set of papers has documented substantial heterogeneity in the ambition, ability, and preferences of entrepreneurs (e.g. Schoar, 2010; Hurst and Pugsley, 2011, 2015; Levine and Rubinstein, forthcoming). My paper is part of a nascent literature in entrepreneurship that explicitly takes into account such heterogeneity in modelling the dynamic career decisions of individuals (Dillon and Stanton, 2017; Hincapié, 2017; Humphries, 2017). In particular, Humphries (2017) also uses administrative data to understand how entrepreneurship fits into the broader labour market. He finds that cogni- tive and non-cognitive skills, education, and past experience are important determinants of entrepreneurial success. I build on Humphries (2017) in three important ways. First, I add to our understanding of the importance of labour market experience for entrepreneurial success. My paper is the first to develop and estimate a model in which future entrepreneurs can climb a productivity ladder as workers in order to accumulate entrepreneurial skills before starting their busi- nesses. Second, I provide the first evidence that the returns to various types of experience in entrepreneurship are heterogeneous and correlated with unobserved ability. In particular, my paper is the first to show that prior work experience is of limited value in entrepreneurship for high ability individuals. As I show, this heterogeneity is key to understanding how policies Chapter 1. Understanding the Careers of Young Entrepreneurs 8 can affect change in entrepreneurship. Finally, I account for aggregate price fluctuations. This allows my model to handle sorting on the basis of time-varying economic shocks that are career-specific. The remainder of the paper proceeds as follows. In Section 1.2, I present the data and provide new descriptive statistics about entrepreneurship. In Section 1.3 and Section 1.4, I present the model and discuss the estimation procedure. I present the results in Section 1.5 and discuss policy simulations in Section 1.6. I conclude in Section 1.7.

1.2 Data

1.2.1 Data Sources and Measurement

My investigation uses new administrative Canadian matched owner-employer-employee data. The dataset contains information on the universe of workers, firms, and business owners in Canada between 2001 and 2012. It is created by merging various administrative tax files. Information on workers comes from individual tax returns (Form T1, similar to Form 1040 in the U.S.). These individual tax returns are merged to firm records of employment remuneration (Form T4, similar to Form W-2 in the U.S.) to create a matched employer- employee dataset. Additional information on firms comes from corporate income tax returns (Form T2, similar to Form 1120 in the U.S.) and firm book values. Information on business owners comes from unincorporated business declaration files and shareholder information for private corporations (Form T2 - Schedule 50). The business owner files, which are linked to both the worker file and the firm file, allow me to identify entrepreneurs in the data and to follow them over time. I now describe the procedure used to construct a panel with information on the career choices and earnings of individuals each year. Individuals derive annual earnings from three main sources: employment income, unincor- porated business income, and incorporated business income. I define incorporated business income as the sum of all employment income received from incorporated businesses owned Chapter 1. Understanding the Careers of Young Entrepreneurs 9 by the individual plus the sum of all dividends and retained earnings weighted by ownership share.10 I assign individuals each year to their main career on the basis of their main source of income. Throughout the paper, I refer to individuals who derive most annual earnings from employment income as workers. To characterize work experience, I assign workers who hold multiple jobs in a given year to the employer from which they derive most employment income during the year. Unincorporated business owners and incorporated business owners are individuals who derive most annual earnings from unincorporated business income and incorporated business income, respectively. I assign unemployed individuals and individuals that are out of the labour force to non-employment.11 I also assign individuals who make less than $8,000 at their main career to non-employment. I do this to minimize the effect of part time work and, more generally, to make sure that individuals have a non-negligible attachment to the labour force.12 All dollar amounts are converted into constant 2012 dollars using Bank of Canada’s core CPI index. Figure 1.1 shows the distribution of the logarithm of annual earnings for workers, unin- corporated business owners, and incorporated business owners. This figure illustrates why it is important to make a distinction between the two types of business owners in the data. Unincorporated business owners earn significantly less, on average, than workers. This is a robust finding in the empirical literature in entrepreneurship (e.g. Evans and Leighton, 1989; Hamilton, 2000; Hurst and Pugsley, 2011). In contrast, incorporated business owners tend to earn more than workers. As we can see, the higher mean earnings among incorporated business owners is accounted for by changes over the entire distribution. As Levine and Rubinstein (forthcoming) argue, entrepreneurs with high-growth potential tend to start in- corporated businesses. Two key features of incorporation underlie this fact: (1) incorporation encourages risk-taking because of limited liability and (2) it facilitates financing through the

10This measure of business income is similar to the "equity-adjusted draw" measure used by Hamilton (2000). 11I observe most individuals that are unemployed or out of the labour force because individuals need to file a T1 to claim any benefits in Canada. 12$8,000 is roughly equal to 26 weeks of work full-time at minimum wage. Song et al. (2015) use a similar minimum threshold for analogous reasons. Chapter 1. Understanding the Careers of Young Entrepreneurs 10 issuance of bonds to investors.

1.2.2 Definition of the Firm Productivity Ladder

I adopt the framework developed by Abowd et al. (1999) to identify the productivity level of firms in the data. For the remainder of this paper, I refer to this framework as the AKM model. This model decomposes the logarithm of annual earnings of workers into worker- specific and firm-specific pay components. Index individuals by i ∈ I and firms by m ∈ M.

m Let di,t denote an indicator variable that is equal to 1 if individual i is employed by firm m at time t and is equal to 0 otherwise. The AKM model assumes that the logarithm of annual earnings for worker i at time t is additively separable in a worker fixed effect, Θi, a firm

0 fixed effect, Λm, a time-varying bundle of individual characteristics, Xi,tβ, and a residual component, µi,t:

M X m 0 ln(wi,t) = Θi + di,tΛm + Xi,tβ + µi,t (1.1) m=1

M I use estimates of {Λm}m=1 to define the firm productivity ladder empirically. In the pres- ence of labour market frictions, productive firms pay a premium to attract more workers (Mortensen, 2005). This provides a theoretical rational for the relationship between the firm effects in the AKM model and firm productivity.13 In principle, I could have used alternative measures of firm productivity such as those derived from the estimation of production func- tions to classify firms. I chose not to do this for three reasons. First, production functions are typically well defined for manufacturing firms but less so for firms in other industries. Second, the estimation of production functions requires additional information on firm inputs and output which results in a less complete classification of firms because of missing data. Third, the AKM model allows me to have a definition of the firm productivity ladder that controls for the quality of the workforce. As I show in Table 1.7, the estimated firm effects

13See Card et al. (2016a) for a good discussion. Chapter 1. Understanding the Careers of Young Entrepreneurs 11 in the AKM model are closely related to alternative measures of firm productivity such as total factor productivity (TFP), sales per employee, value added per employee, profits per employee, and payroll per employee.14 I estimate equation (1.1) using all non-immigrant male workers age 20-60. I include year fixed effects and quadratic and cubic terms in age as time-varying covariates.15 The AKM model is identified using workers who change employers over the course of the sample period. Worker effects and firm effects are estimated separately for each group of workers and firms that are connected by labour mobility.16 I only use the estimated firm effects from the largest connected group to classify firms. 98% of all observations are included in the largest connected group. Table 1.6 provides basic summary statistics from the estimation of the AKM model. The standard deviation of the worker effects is 0.49 and the standard deviation of the firm effects is 0.26. The correlation between the two effects is 0.20. To interpret the magnitude of these numbers, I note that Song et al. (2015) find a standard deviation of the worker effects of .69 and a standard deviation of the firm effects of .33 in the United States over a comparable time period. The larger standard deviations in the components of pay obtained by Song et al. (2015) reflect greater wage inequality in the US. Song et al. (2015) find a correlation between the worker effects and the firm effects of 0.14. The higher correlation between the two effects in Canada suggests greater positive assortative matching of workers and firms. I provide a more detailed discussion of the basic results from the AKM estimation in Appendix 1.8. I assign firms to one of four classes: high, medium-high, medium-low, and low productivity firms. These classes are defined using quartiles of the distribution of AKM firm effects in each

14The positive correlation between the firm effects in the AKM model and alternative measures of firm productivity has been documented in many other contexts. For example, Abowd et al. (1999) show that the firm effects are correlated with average sales per employee in France, Serafinelli (2015) shows that they are correlated with TFP for manufacturing firms in Veneto (Italy), and Card et al. (2016b) show that they are correlated with average value added per employee and average sales per employee in Portugal. Interestingly, Bender et al. (2016) show that firms that use advanced management practices also have higher estimated AKM firm effects. 15Following Card et al. (2013) and Card et al. (2016b), I normalize the effect of age to be equal to zero at age 40 to estimate the AKM model. 16See Abowd et al. (2002) for more details on the identification of connected groups. Chapter 1. Understanding the Careers of Young Entrepreneurs 12 industry.17 I use a definition of the firm productivity ladder that is industry specific because firms in different industries may offer different financial incentives for continued employment. For example, firms in natural resource industries may offer a base pay that is higher than that of other industries to compensate for remoteness of employment. Having a definition of the firm productivity ladder that is industry specific accounts for such factors. To maximize statistical power, I define quartiles using the number of worker-year observations in each firm as weights so that the number of observations are roughly equal across firm classes. I consider four classes of firms for three reasons. First, the presence of a very large number of firm effects in the AKM model can lead to imprecise estimates (Andrews et al., 2008). A coarse definition of the firm productivity ladder ensures that measurement error is not exacerbated. Second, reducing the dimensionality of firm heterogeneity increases statistical power. Finally, using only a small number of classes reduces the dimension of the state space in the structural model and facilitates the estimation.

1.2.3 Sample Restrictions

My analysis hinges on the ability to observe the complete career histories of entrepreneurs before they start their businesses. Because I only observe career histories between 2001 and 2012, I need to restrict my attention to individuals that start their careers during this time period. Throughout the analysis, I restrict my attention to individuals that are born between 1976 and 1987. These are the individuals for which I observe right-truncated career histories starting at age 25.18 I further restrict my attention to men only. Very few women are entrepreneurs in the data and their career decisions early in the life cycle would require a different model that takes into account fertility decisions. I exclude immigrants because their background prior to arrival is unobserved and it is likely to be different than that of natives. I exclude individuals who work in firms with missing information on firm productivity

17Industries are defined using three-digit NAICS codes. 18I follow individuals starting at age 25 to abstract from schooling decisions. A limitation of the Canadian matched owner-employer-employee data is that it doesn’t include any information on education. I explain how I account for ex-ante heterogeneity in Section 1.3. Chapter 1. Understanding the Careers of Young Entrepreneurs 13 because I cannot characterize the type of work experience they acquire. I exclude individuals that are non-employed at age 25 because these individuals have a very weak attachment to the labour force.19 I exclude individuals who enter the agricultural sector in any given year because firms in this sector use a different set of tax forms which makes it difficult to compare with the rest of the economy. I also exclude individuals who enter the public sector in any given year because earnings in this sector are heavily regulated and organizations tend to be governmental agencies rather than firms.20 Finally, I exclude individuals who do not file a T1 in any given year from the analysis. This leaves me with a sample of 1,227,307 individuals and 7,430,419 individual-year observations.

1.2.4 Descriptive Statistics

Figure 1.2 shows the fraction of individuals in various careers between age 25 and 36. Panel (a) focuses on workers. Unsurprisingly, the fraction of individuals that are workers in the sample is large. However, this fraction decreases slightly between age 25 and 36. At age 26, about 89% of individuals in the sample are workers whereas, at age 36, 84% of individuals are workers. Interestingly, there is a clear shift in the distribution of workers across firm classes over the life cycle. At age 25, workers tend to be saddled in the lower rungs of the firm productivity ladder. This tendency is reversed at age 36. As can be seen, the fraction of individuals that work in high productivity firms increases steadily between age 25 and 36, from about 21% to 23%. In contrast, the fraction of individuals that work in medium-high, medium-low, and low productivity firms decreases over the same age range. The largest decrease is seen in low productivity firms (from 26% at age 25 to 20% at age 36). Taken together, these patterns suggest that individuals climb the firm productivity ladder over the life cycle. Panel (b) plots the fraction of individuals that are entrepreneurs between age 25 and 36.

19About 80% of all individual-year observations for this group of individuals are in non-employment. 20The public sector is defined as educational services (NAICS code 61), health care and social assistance (NAICS code 62), and public administration (NAICS code 91). Chapter 1. Understanding the Careers of Young Entrepreneurs 14

The triangle line represents unincorporated business owners and the square line represents incorporated business owners. This figure highlights the fact that very few individuals become entrepreneurs over the course of their career. Only 1% of individuals are incorporated business owners at age 25. This fraction increases steadily over the life cycle, reaching 5% at age 36.21 Interestingly, almost all of the decrease in the fraction of workers over the life cycle is accounted for by the increase in the fraction of individuals that are incorporated business owners. Table 1.2 provides more information about the types of career transitions that are observed in the data. It reports the fraction of transitions from origin (row) to destination (column). Focusing on the career transitions of workers, the results provide additional evidence that individuals climb the firm productivity ladder over the course of their careers. Although workers tend to stay in the same firm class from one year to the next, those who move up the firm productivity ladder are more likely to move up one rung at a time. Put differently, workers are more likely to transition into high productivity firms coming from medium- high productivity firms than from low productivity firms. Consider individuals that are in low productivity firms at time t − 1. The results in Table 1.2 show that they are about 1.6 (5.5/3.3 = 1.66) times more likely to move into medium-low productivity firms than into high productivity firms. The reverse is also true: workers who move down the firm productivity ladder tend to move down one rung at a time. Workers are more likely to transition into low productivity firms coming from medium-low productivity firms than from high productivity firms. Turning to transitions into entrepreneurship, the results show that individuals are more likely to become unincorporated business owners than incorporated business owners, re- gardless of their origin. With this said, we can see that individuals are relatively more likely to become incorporated business owners if they come from productive firms. Of all the individuals that transition into entrepreneurship from high productivity firms, 40%

21The fraction of individuals that are incorporated business owners in the data converges to about 7.5% at age 55. Chapter 1. Understanding the Careers of Young Entrepreneurs 15

(0.004/(0.004 + 0.006)) of them become incorporated business owners. In contrast, only 28% of individuals that transition into entrepreneurship from low productivity firms become incorporated business owners. Coming from non-employment, almost all individuals that transition into entrepreneurship choose to be unincorporated business owners (only 11% be- come incorporated business owners). Looking at transitions out of entrepreneurship, we can also see important differences between incorporated and unincorporated business owners in terms of destination career. Incorporated business owners are relatively more likely to transition into high productivity firms than unincorporated business owners. They are also less likely to transition into non- employment. Table 1.2 shows that career choices are highly persistent from one year to the next. It also reveals systematic differences in the degree of persistence across careers. Surprisingly, I find that incorporated business owners are the most likely to remain in their career from one year to the next. This finding is important because it challenges the idea that most entrepreneurship spells are short lived. I find that 89% of all incorporated business owners remain in that career from one year to the next. This degree of persistence is larger than the one observed among workers in high productivity firms (85% of them remain in high pro- ductivity firms from one year to the next). In contrast, 76% of all unincorporated business owners choose to remain in that career. In all, there seems to be a clear hierarchy of careers in terms of persistence. The most absorbing careers are (in decreasing order): (1) incorporated business owners, (2) workers in high productivity firms, (3) workers in medium-high produc- tivity firms, (4) workers in medium-low productivity firms, (5) workers in low productivity firms, (6) unincorporated business owners, and (7) non-employment. This hierarchy is also reflected in the average logarithm of annual earnings across careers, as reported in Table 1.1. Table 1.1 provides more information about the estimating sample. The average logarithm of annual earnings in the sample is 10.63. In the labour market, average log earnings increase monotonically with the productivity level of employers. In entrepreneurship, we can see that Chapter 1. Understanding the Careers of Young Entrepreneurs 16 incorporated business owners earn more than workers in high productivity firms and that unincorporated business owners earn less than workers in low productivity firms. Incorpo- rated business owners are older, on average, than unincorporated business owners and they accumulate more work experience. They also tend to accumulate work experience in more productive firms than unincorporated business owners.

1.3 The Model

In this section, I develop a dynamic Roy model of career choice that flexibly incorporates various mechanisms driving entry into entrepreneurial careers and entrepreneurial success. I specify a finite mixture model, which means that there is a finite number of unobservable individual types in the population and that key parameters of the model are allowed to vary by type.22 I index types by z ∈ Z and I allow the probability that an individual belongs to each type to depend on his first observed career choice at age 25. As such, an individual’s unobservable type can be interpreted as unobserved heterogeneity that is either innate or acquired before age 25. Because finite mixture models are very demanding in terms of identification and estimation, I limit the number of unobservable types in the population to six. In the model, individuals decide each year whether they want to pursue a career in the labour market or in entrepreneurship. They derive flow utility from the sum of their expected log earnings, career-specific non-pecuniary benefits (net of mobility costs if they change career), and an idiosyncratic preference shock. Individuals are assumed to be forward- looking and they make their career decisions to maximize the expected discounted value of lifetime utility. There are seven career options in the model, indexed by j ∈ J. In the labour market, individuals can choose to work in low productivity firms (j = 1), in medium-low productivity

22This way of accounting for unobserved heterogeneity is based on the seminal work of Heckman and Singer (1984). Chapter 1. Understanding the Careers of Young Entrepreneurs 17

firms (j = 2), in medium-high productivity firms (j = 3), or in high productivity firms (j = 4). In entrepreneurship, individuals can choose to be unincorporated business owners (j = 5) or incorporated business owners (j = 6). Individuals can also choose to be non- employed (j = 0). Let ai,j,t ∈ ai,t denote an action variable that is equal to 1 if individual i chooses career j at time t and is equal to 0 otherwise.

1.3.1 Timing and Flow Utility

The timing of the model is as follows. Individuals enter each period with their time-invariant unobservable type, zi, and a vector of observable individual characteristics, xi,t, which in- cludes the number of years of experience they have in each career and a set of indicator variables that identify their previous career choice. Upon entering the period, they observe aggregate prices in the economy, ωt, and they receive a vector of career-specific idiosyncratic preference shocks, i,t. Given zi, xi,t, ωt, and i,t, they optimally choose a career j ∈ J. After making their decisions, individuals receive an ex-post productivity shock, µi,j,t, which affects realized earnings during the period. I assume the career decisions of individuals are not af- fected by the uncertainty associated with the ex-post productivity shock. As I explain below, I allow each career to offer different non-pecuniary benefits/costs. This allows the model to capture the non-pecuniary costs associated with choosing a career that is inherently more risky, such as entrepreneurship. Individual i derives the following flow utility from choosing career j at time t:

uj(zi, xi,t, ωt, i,j,t) = α(zi)E[ln(yj(zi, xi,t, ωt, µi,j,t))] + φj(zi) − ψj(zi, xi,t) + i,j,t (1.2)

where E[ln(yj(zi, xi,t, ωt, µi,j,t))] denotes the expected log earnings of individual i in career j at time t, φj(zi) are the non-pecuniary benefits associated with career j, and ψj(zi, xi,t) are the mobility costs incurred by individual i upon entering career j at time t. As is standard in the literature, I assume the career-specific idiosyncratic preference shock, i,j,t, is Chapter 1. Understanding the Careers of Young Entrepreneurs 18 independent and identically distributed across individuals and over time and drawn from the

π2 Type I extreme value distribution with variance 6 . The scale parameter α(zi) determines the relative importance of expected log earnings in choosing a career. I describe the earnings process and mobility costs in Subsection 1.3.2 and 1.3.3. To capture unobserved absolute and comparative advantages between individuals, I allow the earnings process in each career to flexibly depend on unobservable type. To capture heterogeneity in preferences, I allow all the parameters of the utility function to vary by unobservable type. This is important in the context of entrepreneurship for two main rea- sons. First, a commonly cited reason for selection into entrepreneurship has to do with the value of being your own boss (Hurst and Pugsley, 2011, 2015). Such intrinsic motivation for entrepreneurship is captured by the type-specific non-pecuniary benefits. Second, en- trepreneurs are characterized by a higher tolerance to discomfort and disrupting activities (Levine and Rubinstein, forthcoming). This higher tolerance to discomfort is captured by the type-specific mobility costs and the type-specific scale parameter.

As mentioned above, ωt represents aggregate prices in the economy at time t. This aggregate state variable affects earnings differentially in each career and fluctuates over time. Accounting for aggregate price fluctuations is important to allow the model to handle business cycle effects. As in Scott (2013), I assume aggregate prices follow a markov process which is not affected by the actions of any single individual.23 Aggregate prices are allowed to be endogenous in the economy, but they are taken as given by individuals at the time of making their career decisions.

23 Formally, I assume Fω(ωt+1|ωt, ai,t) = Fω(ωt+1|ωt) where Fω is the transition probability function of the state of the economy. Chapter 1. Understanding the Careers of Young Entrepreneurs 19

1.3.2 The Earnings Process

Earnings in career j are modeled as follows:

6 X j j ln(y (z , x , ω , µ )) = r (z , ω ) + β 0 (z )exper 0 + β (z )age + j i i,t t i,j,t j i t j i i,j ,t 7 i i,t j0 =1 4 j X 2 j 2 j 2 j 2 β8( experi,j0 ,t) + β9experi,5,t + β10experi,6,t + β11agei,t + µi,j,t j0 =1 (1.3)

where rj(zi, ωt) represents the effect of aggregate prices on earnings in career j at time t,

4 {exper 0 } 0 denotes work experience accumulated by individual i in firms with produc- i,j ,t j =1 0 tivity level j up until time t, experi,5,t denotes experience accumulated by individual i as an unincorporated business owner up until time t, experi,6,t denotes experience accumulated by individual i as an incorporated business owner up until time t, agei,t denotes age of individual i at time t, and µi,j,t is the ex-post productivity shock. I assume individuals have no experi- ence in any career upon entering the model and I normalize the effect of age to be equal to

24 zero at age 25. µi,j,t is assumed to be normally distributed and i.i.d. across individuals and

2 over time with variance σj . Equation (1.3) is very flexible. It allows each career to have its own earnings process, characterized by different time-varying intercepts and different returns to various types of experience. It also allows unobservable type to affect the earnings process in each career in two important ways. First, the time and career-specific intercepts vary by unobservable type. The introduction of such fixed effects generally captures the effect of aggregate prices on earnings in each career. The fact that they are type-specific captures level effects associated with unobserved productivity differences across individuals. Second, the specification allows for the returns to various types of experience to depend on unobservable type. Allowing key

24From this point on, the process governing the evolution of the experience variables is a deterministic function of past actions: experi,j,t+1 = experi,j,t + ai,j,t ∀j ∈ J. Chapter 1. Understanding the Careers of Young Entrepreneurs 20 parameters of the earnings process in each career to depend on unobservable type is crucial to allow the model to handle sorting on unobserved ability. I allow the curvature of the earnings process in each career to depend on the total number of years of work experience, the number of years of experience as an unincorporated business owner, and the number of years of experience as an incorporated business owner. I restrict the quadratic terms of experience to be the same across unobservable types to reduce the number of parameters that need to be estimated. This specification captures the key determinants of entrepreneurial success discussed in the introduction. I focus the discussion on the earnings process of incorporated business owners (j = 6). First, the specification allows for differences in innate entrepreneurial abil- ity through the time and type-specific intercept r6(zi, ωt). Second, the specification allows individuals to acquire skills in the labour market that are valuable in entrepreneurship. To capture the idea that experience in more productive firms might be more valuable in en- trepreneurship, I allow the value of work experience to depend on the productivity level of the firm in which it has been acquired. For example, the value of one year of work experience in high productivity firms (relative to one year in non-employment) is given by the coefficient

6 β4 (zi). I allow the rate at which individuals internalize knowledge on the job to depend on their type to capture heterogeneous learning. Finally, the specification allows for learning-

6 by-doing in entrepreneurship through the coefficient β6 (zi). I allow for the slope coefficient on entrepreneurial experience to vary by type. There are two sources of mobility frictions operating through the earnings equations. First, career-specific human capital is not fully transferable across careers. Second, unob- servable type differentially affects the earnings process in each career so individuals have persistent unobserved comparative advantages across careers. I now describe the mobility frictions that operate outside the earnings equations. Chapter 1. Understanding the Careers of Young Entrepreneurs 21

1.3.3 Mobility Costs

The mobility costs incurred by individual i upon entering career j at time t are modeled as follows:

 6  j X j j ψ (z , x ) = 1{a 6= 1} ψ (z ) + ψ 0 (z )exper 0 + ψ (z )age (1.4) j i i,t i,j,t−1  0 i j i i,j ,t 7 i i,t j0 =1

The mobility costs are specified in a reduced-form way to flexibly fit patterns in the data. They are meant to capture mobility frictions that operate outside the earnings equations such as startup costs, search frictions, psychological costs, etc. I assume it is costless to remain in the same career. I also assume the previous career choice of individuals upon entering the model is the same as their first observed career choice at age 25. As explained above, I use information on the first observed career choice of individuals to identify their unobservable type as opposed to mobility costs. The mobility costs associated with entering each career consist of an intercept term,

j j 7 ψ (z ), and of slope terms, {ψ 0 (z )} 0 . The intercept term captures the fixed cost of enter- 0 i j i j =1 ing that career. The slope terms capture the idea that mobility costs evolve over the course of an individual’s career. For example, it seems likely that it is easier to transition into entrepreneurship after having worked for a couple of years because of wealth accumulation.

6 6 6 6 The coefficients ψ1(zi), ψ2(zi), ψ3(zi), and ψ4(zi) allow for this possibility. I allow all the pa- rameters of the mobility costs to be career and type specific. I now describe the optimization problem of individuals. Chapter 1. Understanding the Careers of Young Entrepreneurs 22

1.3.4 Optimal Career Choices

Letting β denote the common discount factor, the optimization problem of individual i at time t is given by:

"T −t 6 # X X b max Et β ai,j,t+b (uj(zi, xi,t+b, ωt+b) + i,j,t+b) |zi, xi,t, ωt, i,t {ai,t,...,ai,T } b=0 j=0 where the expectation is taken over future values of aggregate prices in the economy and future values of the idiosyncratic preference shocks conditional on all the information available

∗ ∗ at time t. Let {ai,t, ..., ai,T } denote the optimal decision rule. For the remainder of this paper, I index by t any function or variable that depends on aggregate prices, ωt, and I index by z any function or variable that depends on unobservable ¯ type, zi. Let Vz,t(xi,t) denote the ex-ante value function at the beginning of period t. This function is defined as the expected discounted value of lifetime utility, before i,t is realized, conditional on behaving according to the optimal decision rule:

"T −t 6 # ¯ X X b ∗ Vz,t(xi,t) = Et β ai,j,t+b (uj,z,t+b(xi,t+b) + i,j,t+b) |zi, xi,t, ωt b=0 j=0

To discuss the solution to the optimization problem of individuals, it is convenient to define the conditional ex-ante value function associated with career choice j at time t:

¯ vj,z,t(xi,t) = uj,z,t(xi,t) + βEt[Vz,t+1(xi,t+1(xi,t, ai,j,t))] (1.5)

where xi,t+1(xi,t, ai,j,t) denotes individual i’s vector of observable individual characteristics at time t + 1 conditional on choosing career j at time t. The conditional ex-ante value function captures the two channels through which career choices affect lifetime utility: through today’s

flow utility, uj,z,t(xi,t), and through the resulting expected value of lifetime utility tomorrow, ¯ βEt[Vz,t+1(xi,t+1(xi,t, ai,j,t))]. Chapter 1. Understanding the Careers of Young Entrepreneurs 23

Since the career-specific idiosyncratic preference shocks are drawn from the Type I ex- treme value distribution, there is a closed form solution for the conditional choice probability of optimally choosing career j at time t:

exp(v (x )) p (x ) = j,z,t i,t j,z,t i,t P6 (1.6) j=0 exp(vj,z,t(xi,t))

Standard logit derivations also imply a key relationship between the ex-ante value func- ¯ tion, Vz,t(xi,t), the conditional ex-ante value function, vj,z,t(xi,t), and the conditional choice probability, pj,z,t(xi,t):

¯ Vz,t(xi,t) = vj,z,t(xi,t) − ln(pj,z,t(xi,t)) + γ (1.7) where γ denotes Euler’s constant.25 This relationship holds for any j ∈ J. As I show in the next section, this relationship is key to derive a mapping between the conditional choice probabilities and the parameters of the utility function.

1.4 Estimation

I use a computationally light two-stage procedure developed by Arcidiacono and Miller (2011) to estimate the parameters of the model. The key idea is to exploit the finite dependence property of the model to derive a linear mapping between the conditional choice probabilities and the parameters of the utility function. Because conditional choice probabilities can be recovered directly from the data, it is possible to estimate the parameters of the utility function without solving the full model. In the first stage, I use the EM algorithm to obtain

25To derive equation (1.7), take the log of equation (1.6) and use the following expression for the ex-ante value function:

6 X V¯z,t(xi,t) = ln( exp(vj,z,t(xi,t))) + γ j=0 Chapter 1. Understanding the Careers of Young Entrepreneurs 24

(1) estimates of the parameters governing the distribution of unobservable types, τ, (2) estimates of the parameters of the earnings equations, θY , and (3) empirical estimates of the conditional choice probabilities that vary non-parametrically by unobservable type. In the second stage, I project the type-specific empirical estimates of the conditional choice probabilities onto parameter space in a least-squares way to recover the parameters of the utility function, θU . Intuitively, these parameters are the ones that minimize the distance between the observed behavior of individuals in the data and the behavior of individuals that is predicted by the model. This approach has two advantages over full likelihood methods. First, it is computationally light. Second, it can accomodate models that have a very large state space.26 I build on Scott (2013) to account for aggregate price fluctuations without having to make any assumptions about how individuals form expectations over the evolution of prices over time.

1.4.1 The First Stage

The likelihood of the observed data for individual i can be written as a finite mixture of likelihoods:

Z T 6 X Y Y ai,j,t Li(τ, θY , θU ) = τz|xi,0 ( [pj(zi, xi,t, ωt; θY , θU )fj(yi,t|zi, xi,t, ωt; θY )] ) z=1 t=1 j=0

where τz|xi,0 is the probability individual i belongs to type z given his first observed career choice at age 25, pj(zi, xi,t, ωt; θY , θU ) denotes the conditional choice probability of optimally choosing career j given the parameters of the model, fj(yi,t|zi, xi,t, ωt; θY ) is the conditional density of earnings in career j, and ai,j,t is an action variable that is equal to 1 if individual i chooses career j at time t and is equal to 0 otherwise.

26There are more than 9 billion possible states in my model. Chapter 1. Understanding the Careers of Young Entrepreneurs 25

Arcidiacono and Jones (2003) show that the log-likelihood of the observed data can be written as:

N Z T 6 X X X X l(τ, θY , θU ) = qi,zai,j,tln(pj(zi, xi,t, ωt; θY , θU )) i=1 z=1 t=1 j=0 N Z T 6 X X X X + qi,zai,j,tln(fj(yi,t|zi, xi,t, ωt; θY )) (1.8) i=1 z=1 t=1 j=0 N Z N Z X X X X + qi,zln(τz|xi,0 ) − qi,zln(qi,z) i=1 z=1 i=1 z=1

where qi,z denotes the posterior probability individual i belongs to type z:

QT Q6 ai,j,t τz|xi,0 ( t=1 j=0[pj(zi, xi,t, ωt; θY , θU )fj(yi,t|zi, xi,t, ωt; θY )] ) qi,z = (1.9) PZ QT Q6 ai,j,t z=1 τz|xi,0 ( t=1 j=0[pj(zi, xi,t, ωt; θY , θU )fj(yi,t|zi, xi,t, ωt; θY )] )

Intuitively, the posterior probability individual i belongs to type z is equal to the fraction of the likelihood for individual i that comes from type z. Instead of using the structural conditional choice probabilities given by (1.6) to obtain estimates of θU directly, I pursue a two-stage approach proposed by Arcidiacono and Miller (2011) in which I find empirical estimates of the conditional choice probabilities in the first stage and use them to recover estimates of θU in the second stage. The maximization problem of the first stage reduces to:

N Z T 6 ˆ X X X X {τ,ˆ θY , pˆ} = argmax qi,zai,j,tln(pj,z,t(xi,t)) τ,θY ,p i=1 z=1 t=1 j=0 N Z T 6 X X X X + qi,zai,j,tln(fj(yi,t|zi, xi,t, ωt; θY )) (1.10) i=1 z=1 t=1 j=0 N Z N Z X X X X + qi,zln(τz|xi,0 ) − qi,zln(qi,z) i=1 z=1 i=1 z=1

As suggested by Arcidiacono and Jones (2003) and Arcidiacono and Miller (2011), I carry out this maximization problem in stages using the EM algorithm. The EM algorithm Chapter 1. Understanding the Careers of Young Entrepreneurs 26 proceeds in two steps. At the expectation step, the posterior probability individual i belongs to type z, qi,z, is calculated given the data and the structure of the model using equation (1.9). At the maximization step, the posterior probabilities that each individual belongs to each unobservable type are taken as given and used as weights to obtain maximum likelihood estimates of {τ, θY , p}. Note that, taking qi,z as given, the log-likelihood function in equation (1.10) is the sum of three components: a component associated with choices, a component associated with earnings, and a component associated with the probability that an individual belongs to each unobservable type given his first observed career choices at age 25. Because it is additively separable, consistent estimates of τ, θY , and p can be obtained separately. The algorithm is easy to implement in practice because it amounts to iterating on a set of weighted OLS regressions. I now describe the EM algorithm in more detail.

The expectation step consists of updating the estimates of qi,z using equation (1.9). Specif-

th ically, at the (m + 1) iteration of the algorithm, I obtain a new estimate of qi,z as follows:

m QT Q6 m ˆm ai,j,t τˆ ( [ˆp (xi,t)fj(yi,t|zi, xi,t, ωt; θ )] ) m+1 z|xi,0 t=1 j=0 j,z,t Y qˆi,z = Z T 6 (1.11) P τˆm (Q Q [ˆpm (x )f (y |z , x , ω ; θˆm)]ai,j,t ) z=1 z|xi,0 t=1 j=0 j,z,t i,t j i,t i i,t t Y

The maximization step consists of finding estimates of τ, θY , and p that solve equation

(1.10), taking qˆi,z as given. Maximizing (1.10) with respect to τ gives:

PN m+1 qˆ 1{xi,0 = x} τˆm+1 = i=1 i,z (1.12) z|xi,0 PN 1 i=1 {xi,0 = x} which implies that the probability individual i belongs to type z given his first observed career choice at age 25 is equal to the average of the posterior probabilities among all individuals with the same first observed career choice at age 25. Chapter 1. Understanding the Careers of Young Entrepreneurs 27

Maximizing (1.10) with respect to θY gives:

N Z T 6 ˆm+1 X X X X m+1 θY = argmax qˆi,z ai,j,tln(fj,z,t(yi,t|xi,t, θY )) (1.13) θY i=1 z=1 t=1 j=0 which amounts to obtaining OLS estimates of the parameters of the earnings equations taking

m+1 the unobserved heterogeneity as given and using {qˆi,z }i∈I as population weights.

Finally, maximizing (1.10) with respect to pj,z,t gives:

PN m+1 ai,j,tqˆ 1{xi,t = x} pˆm+1(x ) = i=1 i,z (1.14) j,z,t i,t PN m+11 i=1 qˆi,z {xi,t = x}

m+1 which is equivalent to the non-parametric empirical likelihood where {qˆi,z }i∈I are used as population weights. In practice, to avoid empty cells and small bin problems in calculating (1.14), I use flexible linear probability models to smooth the empirical estimates of the con- ditional choice probabilities across the state space, as in Traiberman (2016). For each career option j ∈ J, I estimate a separate linear probability model for every possible combination of last year’s career choice and unobservable type. In total, I run 294 regressions to obtain the empirical estimates of the conditional choice probabilities.27 In each regression, I include a constant term, linear and quadratic terms of the number of years of experience an individual has in each career, and year fixed effects. I provide additional details about the smoothing procedure in Appendix 1.9.

1.4.2 The Second Stage

I now describe how I recover estimates of the parameters of the utility function using only empirical estimates of the conditional choice probabilities and the parameters of the earnings equations. I start by taking differences in conditional ex-ante value functions. As shown by Hotz and Miller (1993), there exists a simple one-to-one mapping between the conditional choice

27There are seven career options, seven possible career choices last period, and six unobservable types. Chapter 1. Understanding the Careers of Young Entrepreneurs 28 probabilities and the conditional ex-ante value functions:

pj,z,t(xi,t) ln( ) = vj,z,t(xi,t) − vj0 ,z,t(xi,t) (1.15) pj0 ,z,t(xi,t)

Using equation (1.5) to replace for the conditional ex-ante value functions in equation (1.15), I obtain the following equation:

pj,z,t(xi,t) ln( ) = uj,z,t(xi,t) − uj0 ,z,t(xi,t) p 0 (x ) j ,z,t i,t (1.16) ¯ ¯  + β Et[Vz,t+1(xi,t+1(xi,t, ai,j,t))] − Et[Vz,t+1(xi,t+1(xi,t, ai,j0 ,t))]

At this point, it would be possible to estimate all the parameters of the utility function if I could calculate the difference in expected ex-ante value functions at time t + 1. From the first stage of the estimation procedure, I have empirical estimates of the conditional choice probabilities which allow me to construct the left hand side of equation (1.16). I can also calculate the difference in expected log earnings between career j and j0 , which enters linearly on the right hand side of equation (1.16), using the estimated parameters of the earnings equations. I face two major obstacles, however, in calculating the continuation values. First, I do not model how individuals forecast the evolution of aggregate prices in the economy. Second, the richness of the state space makes it prohibitively costly to calculate the ex-ante value functions using backward induction. I employ recent advances in the estimation of dynamic discrete choice models to address these issues. To deal with the first problem, that I make no assumptions on how aggregate prices evolve over time, I replace the difference in expected ex-ante value functions at time t + 1 with the Chapter 1. Understanding the Careers of Young Entrepreneurs 29 sum of its realization and forecasting error, as in Scott (2013):

pj,z,t(xi,t) ln( ) = uj,z,t(xi,t) − uj0 ,z,t(xi,t) pj0 ,z,t(xi,t) ¯ ¯  (1.17) + β Vz,t+1(xi,t+1(xi,t, ai,j,t))] − Vz,t+1(xi,t+1(xi,t, ai,j0 ,t))

+ β(ηi,j,z,t+1 − ηi,j0 ,z,t+1)

where ηi,j,z,t+1 is the forecasting error:

¯ ¯ ηi,j,z,t+1 ≡ Et[Vz,t+1(xi,t+1(xi,t, ai,j,t))] − Vz,t+1(xi,t+1(xi,t, ai,j,t))

The key idea behind this approach is that the forecasting error of individuals is mean uncorre- lated with any information available at time t because individuals have rational expectations. From an econometrician’s perspective, the forecasting error of individuals can be seen as an error term in equation (1.17) that is orthogonal to the difference in flow utilities at time t. To deal with the second problem, that I cannot calculate the ex-ante value functions using backward induction, I exploit the finite dependence property of the model. Arcidiacono and Miller (2011) show that it is possible to calculate the difference in ex-ante value functions at time t + 1 using only first stage estimates if the model exhibits finite dependence. They say that a model exhibits ρ-period finite dependence if it is possible to find two sequences of choices that lead to the same continuation values after ρ periods. In the context of my model, the effect on the future of a choice today occurs through two channels: human capital accumulation and mobility costs. It is possible to find two career paths that lead to the same continuation values at some point in the future because (a) there is no depreciation of career-specific human capital over time and (b) mobility costs only depend on last year’s

0 career choice, ai,t−1. Consider the following career paths: (1) career j at time t, career j at time t + 1, and career j00 at time t + 2 and (2) career j0 at time t, career j at time t + 1, and career j00 at time t + 2. Both of them lead to the same state at the beginning of period t + 3.

To see this, note that both career paths increase experi,j,t, experi,j0 ,t, and experi,j00 ,t by one Chapter 1. Understanding the Careers of Young Entrepreneurs 30 unit and present individuals with the same menu of mobility costs at the beginning of time t + 3 because the last career choice is j00 in both cases. Telescoping equation (1.17) two periods in the future along these two career paths using equation (1.7) gets rid of the continuation values:

p (x ) p 0 (xi,t+1(xi,t, ai,j,t)) p 00 (xi,t+2(xi,t, ai,j,t, a 0 )) ln( j,z,t i,t ) + βln( j ,z,t+1 ) + β2ln( j ,z,t+2 i,j ,t+1 ) pj0 ,z,t(xi,t) pj,z,t+1(xi,t+1(xi,t, ai,j0 ,t)) pj00 ,z,t+2(xi,t+2(xi,t, ai,j0 ,t, ai,j,t+1))

= uj,z,t(xi,t) − uj0 ,z,t(xi,t)

+ β[uj0 ,z,t+1(xi,t+1(xi,t, ai,j,t)) − uj,z,t+1(xi,t+1(xi,t, ai,j0 ,t))]

2 + β [uj00 ,z,t+2(xi,t+2(xi,t, ai,j,t, ai,j0 ,t+1)) − uj00 ,z,t+2(xi,t+2(xi,t, ai,j0 ,t, ai,j,t+1))]

2 + β(ηi,j,z,t+1 − ηi,j0 ,z,t+1) + β (ηi,j,j0 ,z,t+2 − ηi,j0 ,j,z,t+2) (1.18)

28 where ηi,j,j0 ,z,t+2 is the two periods ahead forecasting error. Equation (1.18) has an intuitive interpretation: the left hand side is equal to the minimum compensating differential an individual must receive at time t to be willing to choose career path {j0 , j, j00 } instead of {j, j0 , j00 }. Letting j0 and j00 refer to non-employment and normalizing the value of non-employment

28The two periods ahead forecasting error is defined as:

¯ ¯ ηi,j,j0 ,z,t+2 ≡ Et+1[Vz,t+2(xi,t+2(xi,t, ai,j,t, ai,j0 ,t+1))] − Vz,t+2(xi,t+2(xi,t, ai,j,t, ai,j0 ,t+1)) Chapter 1. Understanding the Careers of Young Entrepreneurs 31 to zero yields:29

p (x ) p 0 (xi,t+1(xi,t, ai,j,t)) p 00 (xi,t+2(xi,t, ai,j,t, a 0 )) ln( j,z,t i,t ) + βln( j ,z,t+1 ) + β2ln( j ,z,t+2 i,j ,t+1 ) = pj0 ,z,t(xi,t) pj,z,t+1(xi,t+1(xi,t, ai,j0 ,t)) pj00 ,z,t+2(xi,t+2(xi,t, ai,j0 ,t, ai,j,t+1))  αz E[ln(yj,z,t(xi,t))] − βE[ln(yj,z,t+1(xi,t+1(xi,t, ai,j0 ,t)))]

+ φj,z − ψj,z(xi,t) − β(φj,z − ψj,z(xi,t+1(xi,t, ai,j0 ,t)))

2 + β(ηi,j,z,t+1 − ηi,j0 ,z,t+1) + β (ηi,j,j0 ,z,t+2 − ηi,j0 ,j,z,t+2) (1.19)

Given empirical estimates of the conditional choice probabilities and estimates of the pa- rameters of the earnings equations, equation (1.19) can be constructed and estimated easily because it is linear in the parameters of the utility function. As I mentioned, rational expectations implies that expected log earnings at time t are or- thogonal to the forecasting errors in equation (1.19). However, expected log earnings at time t+1 are not independent of the error term. To see this, note that both the expected log earn- ings at time t+1 and the forecasting error at time t+1 depend on the realization of aggregate prices at time t + 1. To deal with this, I instrument for expected log earnings at time t + 1,

E[ln(yj,z,t+1(xi,t+1(xi,t, ai,j0 ,t)))], with its lagged value, E[ln(yj,z,t(xi,t+1(xi,t, ai,j0 ,t)))]. Because individuals are assumed to have rational expectations, this instrument satisfies the exclusion restriction. I estimate equation (1.19) separately for each unobservable type via two-stage least squares. I set the discount factor β to 0.9 throughout the estimation procedure.30

1.4.3 Identification

In this section, I discuss the variation in the data that identifies the parameters of the model. Kasahara and Shimotsu (2009) discuss the non-parametric identification of finite mixtures in dynamic discrete choice models. Proposition 4 in their paper implies that it is possible to

29I need to normalize the value of one career to zero to be able to identify the parameters of the utility function (Rust, 1994; Magnac and Thesmar, 2002). I choose non-employment as the reference. 30I also estimated the parameters of the utility function using a discount factor of 0.95 and the results were similar. Chapter 1. Understanding the Careers of Young Entrepreneurs 32 recover type-specific conditional choice probabilities in my model. First, consider the identification of the parameters of the earnings equations. These parameters are identified using variation in earnings. For the sake of exposition, I discuss the identification of the returns to various types of experience in entrepreneurship. The returns to experience in entrepreneurship, which capture the importance of learning-by-doing in entrepreneurship, are identified by following individuals over time in entrepreneurship. The returns to various types of labour market experience in entrepreneurship are identified by comparing the baseline earnings of individuals that switch into entrepreneurship in the same year, but have different career histories up until that year. Because these parameters are type- specific, they are only identified by comparing individuals of the same type with one another. The key identifying assumption necessary to obtain unbiased estimates of the parameters of the earnings equations is that career choices are exogenous conditional on unobservable type, a vector of observable individual characteristics, and calendar year. In the model, two individuals with the same unobservable type who enter the model in the same calendar year have the same ex-ante probability of choosing each career path. Second, consider the identification of the parameters of the utility function. These pa- rameters are identified using variation in the empirical estimates of the conditional choice probabilities. Again, the left hand side of equation (1.19) can be interpreted as the minimum compensating differential an individual must receive at time t to be willing to choose career path {j0 , j, j00 } instead of {j, j0 , j00 }. The scale parameter is identified using variation in ex- pected log earnings along these two career paths. Specifically, the correlation between the minimum compensating differential and the difference in expected log earnings along the two career paths pins down the scale parameter. Intuitively, I will estimate a high value of the scale parameter if expected log earnings differentials are a good predictor of the compensating differential implied by the empirical estimates of the conditional choice probabilities. Once we know how sensitive career choices are to expected log earnings, it is possible to pin down the career-specific non-pecuniary benefits and the mobility costs parameters. Accounting for Chapter 1. Understanding the Careers of Young Entrepreneurs 33 expected log earnings differentials, the career-specific non-pecuniary benefits and the mobil- ity costs parameters ensure that the minimum compensating differential that is predicted by the model matches the one implied by the empirical estimates of the conditional choice probabilities. As is standard with the estimation of dynamic discrete choice models, the iden- tification of the parameters of the utility function relies on (a) the distributional assumption imposed on the career-specific idiosyncratic preference shocks, (b) the normalization of the value of non-employment to zero and (c) the assumption that the discount factor β is equal to .9.

1.5 Results

In this section, I present and discuss the results from the estimation. I focus on the parameters that are most relevant to understanding incorporated entrepreneurship. For the remainder of this section, I simply refer to incorporated entrepreneurship as entrepreneurship. All parameter estimates can be found in Appendix 1.10.

1.5.1 Earnings Profiles

Table 1.3 shows the potential earnings of individuals in each career at age 25 as a function of their unobservable type. This corresponds to the average type-specific intercept in the earnings equation of each career.31 As can be seen, there is substantial heterogeneity across individuals in terms of potential earnings at age 25. Although certain types earn more than others in each career (e.g. Type 6), others have a comparative advantage in one career over others. For example, Type 1 and Type 6 individuals have a clear comparative advantage in entrepreneurship at labour force entry. They earn 26% and 91% more as entrepreneurs than as workers in a high productivity firm at age 25, respectively. Type 6 individuals are, by far, the most successful entrepreneurs in the population at age 25.

31 Using the notation of equation (1.3), the average intercept for type z in career j corresponds to rz,j = P2012 j t=2001 rt (z) 12 . Chapter 1. Understanding the Careers of Young Entrepreneurs 34

In the model, there are two channels through which individuals acquire skills that are valuable in entrepreneurship: experience in entrepreneurship and work experience. I discuss their importance in turn. Figure 1.3 describes what the parameters of the model pertaining to learning-by-doing imply for patterns of log earnings in each career. I plot the potential earnings of individuals in entrepreneurship and in the labour market as a function of their unobservable type, assuming no career changes. These earnings profiles are calculated using the parameters reported in Tables 1.8, 1.11, and 1.12 in the appendix. The results confirm that Type 1 and Type 6 individuals have a clear comparative advantage in entrepreneurship over others. This comparative advantage persists and reinforces itself over the life cycle. As we’ve seen, both types have the potential to earn more as entrepreneurs than as workers at labour force entry (see Table 1.3). Figure 1.3 shows that Type 1 and Type 6 individuals also have higher earnings growth potential in entrepreneurship than in the labour market. On average, the earnings of Type 1 individuals grow by 8.8% annually in entrepreneurship. To interpret orders of magnitude, I note that their earnings in entrepreneurship grow about 6 times faster than their earnings as workers in high productivity firms (which grow at an average annual rate of about 1.5%). The high ability Type 6 individuals experience even higher earnings growth in entrepreneurship: their earnings grow at an average rate of 11% per year. In comparison, their earnings as workers in high productivity firms increase by about 7% per year, on average. These results tell us that learning-by-doing is an important channel through which individuals acquire skills that are valuable in entrepreneurship. However, this channel seems to be only relevant for those who have the potential to earn more as entrepreneurs than as workers at labour market entry. This can be seen clearly from the earnings profiles of Type 2, Type 3, and Type 4 individuals, which are essentially flat. Figure 1.4 plots the earnings profiles of individuals assuming that they switch into en- trepreneurship for the first time at age 30. These earnings profiles tell us how valuable different types of labour market experience are in entrepreneurship. Before age 30, I assume Chapter 1. Understanding the Careers of Young Entrepreneurs 35 individuals are either workers in high productivity firms (diamond line) or workers in low productivity firms (circle line). The dashed lines correspond to the earnings profiles of indi- viduals who enter entrepreneurship at age 25 and remain in that career throughout (these correspond to the square lines in Figure 1.3). The solid black line represents potential earn- ings in entrepreneurship at age 25. The most striking result from this figure is that labour market experience is of limited value in entrepreneurship for most types. To see this, compare the intercept of log earnings in entrepreneurship after 5 years of experience to the baseline intercept at age 25 (without any experience). Only two types of individuals greatly benefit from prior work experience in entrepreneurship: Type 1 and Type 3. For both, five years of experience in high productivity firms increases potential earnings in entrepreneurship in an economically meaningful way. For example, five years of work experience in high productivity firms increases baseline earnings in entrepreneurship by about 75% for Type 1 individuals. At age 36, Type 1 and Type 3 individuals have the potential to be much more successful in entrepreneurship if they accumulate work experience in high productivity firms prior to entry (compare the diamond line with the dashed line at age 36). For instance, Type 1 individuals have the potential to earn 50% more as entrepreneurs at age 36 if they accumulate five years of work experience in high productivity firms prior to entry. This stands in sharp contrast with the high ability Type 6 individuals who are not estimated to benefit much from labour market experience in entrepreneurship. Figure 1.5 plots the earnings profiles of individuals assuming that they start their careers in entrepreneurship and enter the labour market for the first time at age 30. These earnings profiles are useful to understand how valuable experience in entrepreneurship is in the labour market. The black diamond lines correspond to the earnings profiles of individuals who start their careers in entrepreneurship and enter high productivity firms at age 30. The black circle lines correspond to the earnings profiles of individuals who start their careers in entrepreneurship and enter low productivity firms at age 30. The dashed lines correspond to the earnings profiles of individuals who enter the labour market at age 25 and remain in Chapter 1. Understanding the Careers of Young Entrepreneurs 36 the same career throughout (these lines correspond to the diamond and circle lines in Figure 1.3). As can be seen, experience in entrepreneurship is valuable in the labour market. This is especially true for Type 1 and Type 6 individuals. For both types, five years of experience in entrepreneurship is at least as valuable in the labour market as five years of labour market experience.

1.5.2 Career Choices by Type

As we’ve seen, only two types of individuals have a clear comparative advantage in en- trepreneurship: Type 1 and Type 6. Both types have high growth in earnings in entrepreneur- ship. To get a sense of how prevalent these types are in the economy, I assign individuals to the type to which they have the highest posterior probability of belonging. I refer to this as their dominant type. I find that about 8% of individuals are dominant Type 1 and 6% of individuals are dominant Type 6. Table 1.4 describes the career choices of individuals as a function of their dominant type. The most striking result from this table is that very few individuals choose a career in entrepreneurship, regardless of their dominant type. In particular, only 1% of all individual- year observations for dominant Type 1 individuals and 4% of all individual-year observations for dominant Type 6 individuals are in entrepreneurship. This is despite the fact that both types have the potential to earn a lot more in entrepreneurship than in the labour market. To understand why individuals who have a clear comparative advantage in entrepreneurship seldom choose to become entrepreneurs in practice, I turn to the parameters of the utility function.

1.5.3 Non-Pecuniary Benefits and Entry Costs

There are two additional forces in the model that affect the decision process of individuals: non-pecuniary benefits and mobility costs. I estimate all non-pecuniary values to be negative relative to the value of non-employment. For the remainder of this paper, I refer to them Chapter 1. Understanding the Careers of Young Entrepreneurs 37 as non-pecuniary costs instead of benefits and I express them as -1 times the estimated parameter. Panel (a) in Figure 1.6 shows the difference between the non-pecuniary costs associated with being an entrepreneur and the non-pecuniary costs associated with being a worker in high productivity firms. I report this statistic for the two types of individuals that have a clear comparative advantage in entrepreneurship: Type 1 and Type 6. To interpret orders of magnitude, I express the difference in non-pecuniary costs in dollars. To do this, I use the type-specific scale parameters (reported in Table 1.14) and convert all estimated non- pecuniary costs into log dollars. I then obtain a dollar amount by evaluating the difference in log dollars at the average earnings in the population.32 The estimated non-pecuniary costs associated with entrepreneurship are large. For Type 1 individuals, the additional non-pecuniary costs associated with being an entrepreneur is worth about $30,000 annually. Relative to the non-pecuniary costs associated with being a worker in high productivity firms, this represents a 5% decrease in flow utility. For Type 6 individuals, the additional non-pecuniary costs associated with being an entrepreneur is worth about $150,000 annually. This represents an 11% decrease in flow utility. I interpret these large non-pecuniary costs as evidence of risk aversion: entrepreneurship can be unattractive because it is inherently more risky than other careers. Panel (b) in Figure 1.6 shows the difference between the entry costs associated with becoming an entrepreneur at age 25 and the entry costs associated with entering high pro- ductivity firms at age 25. Again, I report this statistic for Type 1 and Type 6 individuals and I express the difference in dollars.33 I estimate entry costs of $30,000 for Type 1 indi- viduals and $90,000 for Type 6 individuals. Given that mobility costs are only paid once upon changing careers and that non-pecuniary costs are incurred every period, I conclude

32 φj0,z −φj,z Specifically, using the notation in equation 1.2, I calculate exp( + 10.63) − exp(10.63), where φj αz represents the non-pecuniary benefits associated with being an incorporated business owner and φj0 represents the non-pecuniary benefits associated with being a worker in high productivity firms. The average logarithm of annual earnings in the sample is 10.63. 33 ψj,z −ψj0,z Specifically, I calculate exp( + 10.63) − exp(10.63), where ψj represents baseline entry costs as- αz sociated with becoming an incorporated business owner and ψj0 represents the baseline entry costs associated with entering high productivity firms. Chapter 1. Understanding the Careers of Young Entrepreneurs 38 that the main deterrent for both types is large non-pecuniary costs associated with being an entrepreneur. These results imply that policies that are aimed at reducing costs of entry are unlikely to be effective. The key to fostering entrepreneurship lies in either (a) improving the skills of individuals so that it is worth it for them to incur such large non-pecuniary costs in entrepreneurship or (b) reducing the non-pecuniary costs associated with entrepreneurship.

1.6 Policy Simulations

In this section, I evaluate the effectiveness of two policies designed to promote successful entrepreneurship among young individuals. First, I look at the effects of a one-time subsidy to enter high productivity firms at age 25. The goal of this policy is to help young individuals acquire skills in the labour market so they can start more successful businesses later in life than without any intervention. I interpret this policy as an internship program in which the government gives money to successful companies to hire and train young talent. From an individual’s point of view, this policy provides a window of opportunity to enter productive firms early in the life cycle. Based on the results presented in the previous section, this policy could be effective for Type 1 individuals. Absent any governmental intervention, these individuals seldom acquire this kind of work experience because it is too costly for them to do so. Subsidizing entry into high productivity firms helps them acquire skills that could justify pursuing a career in entrepreneurship later in life. Second, I look at the effects of a one-time subsidy to become an entrepreneur at age 25. The goal of this policy is to attract young entrepreneurs by providing them with direct financial incentives. Parameter estimates indicate that this policy could be particularly effective in inducing high ability Type 6 individuals to become entrepreneurs. As we’ve seen, Type 6 individuals have high earnings growth in entrepreneurship and low returns to work experience in entrepreneurship. Therefore, the value of entrepreneurship is highest for them early in the life cycle. Both subsidies are equivalent, in utility terms, to receiving an additional $10,000 in earn- Chapter 1. Understanding the Careers of Young Entrepreneurs 39 ings at the mean. For each policy, I evaluate its aggregate impact on the state of entrepreneur- ship in the economy as well as the effect it has on each unobservable individual type. I also evaluate the capacity of each policy to pay for itself through taxation of future streams of income generated by the subset of individuals who take the subsidy. To simulate counterfactuals, I start with a baseline sample of individuals that is represen- tative of the population at age 25. For each individual, I calculate the difference in ex-ante value functions between career j and non-employment for all j ∈ J.34 I then draw a vector of career-specific idiosyncratic shocks and calculate the optimal career choice of individuals at age 25. I proceed in a similar fashion to forward simulate the career choices of individuals for 10 years. This gives me a simulated panel dataset with information on the career choices and earnings of individuals at each age between 25 and 35. The results are presented in Table 1.5. For each policy, I report the change in the share of incorporated business owners in the economy as well as the change in the share of incorporated business owners for each unobservable type. I refer to the effect of the policy on individuals at age 25 as the short run impact. The long run impact refers to the effect 10 years after the policy intervention, when individuals are 35 years old. In the last column of each panel, I report the percentage change in average annual earnings among all incorporated business owners. Below each panel, I report the change in the discounted present value of income for the subset of individuals who take the subsidy. I calculate the discounted present value of income over 10 years. Panel (a) in Table 1.5 shows the effects of a one-time subsidy to enter high productivity firms at age 25. This policy has a small and negative impact on the state of entrepreneurship in the economy. There is a small decrease in the share of individuals that choose to be incorporated business owners both in the short run and in the long run. As expected, this policy increases the fraction of incorporated business owners among Type 1 individuals in

34To do this, I combine the estimated parameters of the model with empirical estimates of the conditional choice probabilities to obtain a counterfactual estimate of ln( pj,z,t(xi,t) ). This is equal to v (x ) − p 0 (x ) j,z,t i,t j ,z,t i,t vj0 ,z,t(xi,t) (see equation (1.15)). The exact formula used to obtain counterfactual estimates of the conditional choice probabilities is given by equation (1.19). Chapter 1. Understanding the Careers of Young Entrepreneurs 40 the long run, but this effect is small (0.3 percentage points). For the other types, this policy decreases the propensity to become incorporated business owners because labour market experience is of limited value in incorporated entrepreneurship and the value of their outside option increases. Panel (b) shows the effects of a one-time subsidy to become an incorporated business owner at age 25. This policy has a sizeable effect on the fraction of individuals who choose to become incorporated business owners in the population, both in the short run and in the long run. As expected, this policy is successful in inducing a switch into incorporated entrepreneurship for the high ability Type 6 individuals. 10 years after the policy intervention, there is a 1 percentage point increase in the fraction of Type 6 individuals in incorporated entrepreneurship relative to the baseline. This policy also induces a large fraction of Type 4 individuals to switch into incorporated entrepreneurship. This explains why there is an overall decrease in the average quality of entrepreneurs, as reflected by the decrease in average earnings among incorporated business owners. Broadly, this is a reasonable policy given its ability to induce high ability individuals to become entrepreneurs. To evaluate the capacity of the policy to pay for itself, I look at the change in the discounted present value of income for the subset of individuals who take the subsidy. I estimate an average increase of $89,173 over 10 years. Assuming that this additional stream of income will be taxed at a rate of 10%, we can expect the government to barely recover its initial investment. Still, it is of importance that this subsidy is successful in inducing a switch of high ability individuals into incorporated entrepreneurship. Unfortunately, this policy also encourages individuals unfit for entrepreneurship to switch as well. This drives the average quality of entrepreneurs down and increases the total cost of the policy. Ideally, we ought to tweak the policy to target only those individuals who have strong absolute and comparative advantages in incorporated entrepreneurship (i.e. Type 6 individuals). Panel (c) shows the effects of a one-time subsidy to become an incorporated business owner at age 25 that is only available to individuals who start their careers in high productivity Chapter 1. Understanding the Careers of Young Entrepreneurs 41

firms. The additional condition narrows in on predominantly high ability individuals. As a consequence, this policy impacts mainly Type 5 and Type 6 individuals. In the long run, we see a .1 percentage point (2%) increase in the share of incorporated entrepreneurs in the population. The overall effect of the policy on the fraction of individuals that choose to become incorporated business owners is small because high ability individuals represent a small fraction of the population. Even so, it does seem to be effective because (a) it increases the average quality of incorporated entrepreneurs in the population and (b) it has the potential to pay for itself. I estimate an average increase in the discounted present value of income of $466,856 over 10 years for the subset of individuals who take the subsidy. Assuming that this additional stream of income will be taxed at a rate of 10%, we can expect the government to benefit from this policy intervention.

1.7 Conclusion

In this paper, I investigate the mechanisms that drive sorting into entrepreneurship and en- trepreneurial success among young individuals. I use a new administrative Canadian matched owner-employer-employee dataset to structurally estimate a flexible dynamic Roy model of career choice. Using a computationally light two-stage estimation procedure, I recover pa- rameters governing returns to various types of experience, non-pecuniary benefits, and entry costs that vary by career and unobservable type. I find that 15% of the population has a comparative advantage in entrepreneurship at labour force entry. This subpopulation can be divided into two economically distinct groups: (1) individuals who earn more as entrepreneurs because they have low earnings potential in the labour market (Type 1) and (2) individuals who have high earnings potential in all careers (Type 6). Parameter estimates indicate that both types have large returns to entrepreneurial experience in entrepreneurship. Moreover, while the relatively low ability Type 1 individuals benefit from prior work experience in entrepreneurship, the high ability Type 6 individuals do not. Chapter 1. Understanding the Careers of Young Entrepreneurs 42

My main result is that only 6% of the population has the potential to be successful in entrepreneurship: Type 6 individuals. Despite large earnings premiums in entrepreneurship, only 4% of Type 6 individuals become entrepreneurs between age 25 and 35. Parameter estimates indicate that the main deterrents for them are (a) large non-pecuniary costs asso- ciated with being an entrepreneur and (b) low returns to work experience in entrepreneurship. These results have important implications for policy design: they suggest that policies that incentivize individuals to become entrepreneurs early in the life cycle are the most likely to have impact. This is because the value of entrepreneurship is highest for Type 6 individuals at labour force entry. I use the estimated model to evaluate the impact of policies designed to promote successful entrepreneurship. I find that providing direct financial incentives to become an entrepreneur early in the life cycle is effective in inducing Type 6 individuals to sort into entrepreneurship. However, unless targeted appropriately, such a policy is costly to implement because it also induces a large fraction of individuals with relatively low entrepreneurial skills to become entrepreneurs. My finding that the main driver of entrepreneurial success (belonging to Type 6) seems to be either innate or acquired before age 25 warrants further investigation. A limitation of the Canadian matched owner-employer-employee dataset used in this paper is that it does not include any information on education. It would be interesting to know whether the type of education an individual receives (e.g. MBA or STEM) correlates with the probability of belonging to Type 6. More generally, an important avenue for future research is investigating what early life characteristics can help identify Type 6 individuals so that policymakers can design policies that target them more efficiently. Chapter 1. Understanding the Careers of Young Entrepreneurs 43

Figure 1.1 – Distribution of Annual Earnings Fraction of individuals

Notes: This figure shows the distribution of the logarithm of annual earnings for unincorporated business owners (light grey), workers (grey), and incorporated business owners (dark grey). The sample used to create this figure includes all non-immigrant men age 25-55. Additional sample restrictions are as described in section 1.2.3. Earnings are in CAD$2012. Chapter 1. Understanding the Careers of Young Entrepreneurs 44

Figure 1.2 – Career Choices Over the Life Cycle .26 .24 .22 Fraction of individuals .2

25 26 27 28 29 30 31 32 33 34 35 36 Age

Worker (High) Worker (Low)

(a) Workers .05 .04 .03 .02 Fraction of individuals .01 25 26 27 28 29 30 31 32 33 34 35 36 Age

Incorporated Unincorporated

(b) Entrepreneurs

Notes: This figure shows the career choices of individuals over the life cycle. I plot the fraction of individuals in various careers at each age between 25 and 36. The fraction of individuals that are non-employed is roughly constant at about 7% throughout (see Table 1.17). The sample used to create this figure is as described in section 1.2.3. Chapter 1. Understanding the Careers of Young Entrepreneurs 45

Figure 1.3 – Understanding the Importance of Learning-By-Doing in Entrepreneurship 13.5 13.5 12.5 12.5 11.5 11.5 Log earnings Log earnings 10.5 10.5 9.5 9.5 25 26 27 28 29 30 31 32 33 34 35 36 25 26 27 28 29 30 31 32 33 34 35 36 Age Age

Worker (High) Worker (Low) Incorporated Worker (High) Worker (Low) Incorporated

(a) Type 1 (b) Type 2 13.5 13.5 12.5 12.5 11.5 11.5 Log earnings Log earnings 10.5 10.5 9.5 9.5 25 26 27 28 29 30 31 32 33 34 35 36 25 26 27 28 29 30 31 32 33 34 35 36 Age Age

Worker (High) Worker (Low) Incorporated Worker (High) Worker (Low) Incorporated

(c) Type 3 (d) Type 4 13.5 13.5 12.5 12.5 11.5 11.5 Log earnings Log earnings 10.5 10.5 9.5 9.5 25 26 27 28 29 30 31 32 33 34 35 36 25 26 27 28 29 30 31 32 33 34 35 36 Age Age

Worker (High) Worker (Low) Incorporated Worker (High) Worker (Low) Incorporated

(e) Type 5 (f) Type 6

Notes: Each panel graphs the expected logarithm of annual earnings as functions of years of experience in each career by unobservable type (assuming no career changes). Dashed diamond lines correspond to worker in high productivity firms, dashed circle lines correspond to worker in low productivity firms, solid square lines correspond to incorporated business owner. These earnings profiles are calculated using the parameters reported in Tables 1.8, 1.11, and 1.12. Chapter 1. Understanding the Careers of Young Entrepreneurs 46

Figure 1.4 – Understanding the Importance of Labour Market Experience in Entrepreneur- ship 13.5 13.5 12.5 12.5 11.5 11.5 10.5 10.5 Log earnings Log earnings Baseline intercept Baseline intercept 9.5 9.5 25 26 27 28 29 30 31 32 33 34 35 36 25 26 27 28 29 30 31 32 33 34 35 36 Age Age

Worker (High) to Incorp. Worker (Low) to Incorp. Worker (High) to Incorp. Worker (Low) to Incorp. Always Incorp. Always Incorp.

(a) Type 1 (b) Type 2 13.5 13.5 12.5 12.5 11.5 11.5

10.5 10.5 Baseline intercept Log earnings Log earnings Baseline intercept 9.5 9.5 25 26 27 28 29 30 31 32 33 34 35 36 25 26 27 28 29 30 31 32 33 34 35 36 Age Age

Worker (High) to Incorp. Worker (Low) to Incorp. Worker (High) to Incorp. Worker (Low) to Incorp. Always Incorp. Always Incorp.

(c) Type 3 (d) Type 4 13.5 13.5 12.5 12.5

Baseline intercept 11.5 11.5

Baseline intercept 10.5 10.5 Log earnings Log earnings 9.5 9.5 25 26 27 28 29 30 31 32 33 34 35 36 25 26 27 28 29 30 31 32 33 34 35 36 Age Age

Worker (High) to Incorp. Worker (Low) to Incorp. Worker (High) to Incorp. Worker (Low) to Incorp. Always Incorp. Always Incorp.

(e) Type 5 (f) Type 6

Notes: Each panel graphs the expected logarithm of annual earnings of individuals assuming they become incorporated business owners for the first time at age 30. Diamond lines correspond to individuals who start their careers as workers in high productivity firms and circle lines correspond to individuals who start their careers as workers in low productivity firms. Dashed lines correspond to individuals who become incorporated business owners at age 25 and remain in that career throughout. These earnings profiles are calculated using the parameters reported in Tables 1.8, 1.11, and 1.12. Chapter 1. Understanding the Careers of Young Entrepreneurs 47

Figure 1.5 – Labour Market Returns To Experience in Entrepreneurship 13.5 13.5 12.5 12.5 11.5 11.5 10.5 10.5 Log earnings Log earnings 9.5 9.5 25 26 27 28 29 30 31 32 33 34 35 36 25 26 27 28 29 30 31 32 33 34 35 36 Age Age

Worker (High) Worker (Low) Worker (High) Worker (Low) Incorp. to Worker (High) Incorp. to Worker (Low) Incorp. to Worker (High) Incorp. to Worker (Low)

(a) Type 1 (b) Type 2 13.5 13.5 12.5 12.5 11.5 11.5 10.5 10.5 Log earnings Log earnings 9.5 9.5 25 26 27 28 29 30 31 32 33 34 35 36 25 26 27 28 29 30 31 32 33 34 35 36 Age Age

Worker (High) Worker (Low) Worker (High) Worker (Low) Incorp. to Worker (High) Incorp. to Worker (Low) Incorp. to Worker (High) Incorp. to Worker (Low)

(c) Type 3 (d) Type 4 13.5 13.5 12.5 12.5 11.5 11.5 10.5 10.5 Log earnings Log earnings 9.5 9.5 25 26 27 28 29 30 31 32 33 34 35 36 25 26 27 28 29 30 31 32 33 34 35 36 Age Age

Worker (High) Worker (Low) Worker (High) Worker (Low) Incorp. to Worker (High) Incorp. to Worker (Low) Incorp. to Worker (High) Incorp. to Worker (Low)

(e) Type 5 (f) Type 6

Notes: Each panel graphs the expected logarithm of annual earnings of individuals assuming they enter the labour market for the first time at age 30. Black diamond lines correspond to individuals who start their careers as incorporated business owners and then become workers in high productivity firms at age 30. Black circle lines correspond to individuals who start their careers as incorporated business owners and then become workers in low productivity firms at age 30. Dashed lines correspond to individuals who become workers at age 25 and remain in the same career throughout. These earnings profiles are calculated using the parameters reported in Tables 1.8, 1.11, and 1.12. Chapter 1. Understanding the Careers of Young Entrepreneurs 48

Figure 1.6 – Utility Costs Associated With Entrepreneurship 150 100 50 Costs (in $1,000) 0 -50 1 6 Unobservable Type

Low prod. firms Incorporated

(a) Non-Pecuniary Costs 150 100 50 Costs (in $1,000) 0 -50 1 6 Unobservable Type

Low prod. firms Incorporated

(b) Entry Costs

Notes: Panel (a) shows the difference between the non-pecuniary costs associated with being an incorporated business owner and the non-pecuniary costs associated with being a worker in high productivity firms. Panel (b) shows the difference between the entry costs associated with becoming an entrepreneur at age 25 and the entry costs associated with entering high productivity firms at age 25. I report these statistics for the two types of individuals that have a clear comparative advantage in incorporated entrepreneurship: Type 1 and Type 6. As described in the main text, I express all values in dollars. Chapter 1. Understanding the Careers of Young Entrepreneurs 49

Table 1.1 – Summary Statistics

Workers Entrepreneurs All High Med-high Med-low Low Incorp. Unincorp. Log Earnings 10.63 10.95 10.72 10.58 10.34 11.12 10.15 (0.64) (0.59) (0.57) (0.56) (0.57) (0.89) (0.70) Age 28.51 28.58 28.40 28.36 28.30 29.90 28.78 (2.88) (2.89) (2.85) (2.86) (2.86) (3.00) (2.98) Work Exper. 3.14 3.46 3.28 3.22 3.11 2.01 1.46 (2.75) (2.82) (2.78) (2.77) (2.74) (2.39) (2.13) - High 0.74 2.34 0.34 0.26 0.21 0.50 0.29 (1.63) (2.37) (1.04) (0.91) (0.84) (1.30) (0.97) - Med-high 0.80 0.44 2.20 0.42 0.31 0.45 0.33 (1.64) (1.14) (2.32) (1.15) (0.97) (1.22) (1.01) - Med-low 0.82 0.38 0.41 2.15 0.41 0.49 0.39 (1.63) (1.04) (1.10) (2.29) (1.09) (1.28) (1.07) - Low 0.78 0.31 0.32 0.39 2.18 0.58 0.46 (1.61) (0.94) (0.93) (1.03) (2.33) (1.41) (1.17) Incorp. Exper. 0.07 0.01 0.01 0.01 0.01 2.35 0.04 (0.54) (0.21) (0.17) (0.17) (0.21) (2.29) (0.35) Unincorp. Exper. 0.14 0.04 0.03 0.03 0.04 0.41 1.96 (0.71) (0.31) (0.29) (0.30) (0.34) (1.12) (2.21)

Number of ind. 1,227,307 436,541 472,342 501,440 492,704 43,163 102,520 Number of obs. 7,430,419 1,608,558 1,618,393 1,683,075 1,636,477 168,021 305,870

Notes: This table summarizes the logarithm of annual earnings, age, and experience profile of individuals in each career. I report averages and standard deviations for each variable in each career. Statistics are calculated using all individual-year observations in the career described by the column header. The column header "All" pools all careers together. There are 200,772 unique individuals in non-employment in the sample and 410,025 individual-year observations. The sample used is as described in section 1.2.3. Chapter 1. Understanding the Careers of Young Entrepreneurs 50

Table 1.2 – Career Transitions

Career at time t Workers Entrepreneurs Career at time t − 1 High Med-high Med-low Low Incorp. Unincorp. Non-emp.

High 0.853 0.045 0.035 0.027 0.004 0.006 0.029 Med-high 0.052 0.811 0.052 0.039 0.003 0.007 0.036 Med-low 0.040 0.049 0.802 0.054 0.003 0.008 0.043 Low 0.033 0.040 0.055 0.799 0.004 0.010 0.058 Incorp. 0.018 0.013 0.014 0.021 0.889 0.014 0.031 Unincorp. 0.025 0.023 0.028 0.038 0.024 0.764 0.097 Non-emp. 0.039 0.050 0.071 0.118 0.007 0.055 0.659

Notes: This table describes the career choices of individuals as a function of their career at time t − 1. It reports the fraction of transitions from origin (row) to destination (column). "High" refers to worker in high productivity firms, "Med-high" refers to worker in medium-high productivity firms, "Med-low" refers to worker in medium-low productivity firms, "Low" refers to worker in low productivity firms, "Incorp." refers to incorporated business owner, "Unincorp." refers to unincorporated business owner, and "Non-emp." refers to non-employed. The sample used to create this figure is as described in section 1.2.3. Chapter 1. Understanding the Careers of Young Entrepreneurs 51

Table 1.3 – Earnings in Each Career at Age 25 by Type

Career Workers Entrepreneurs Type High Med-high Med-low Low Incorp. Unincorp. 1 9.88 9.80 9.75 9.64 10.11 9.56 (0.07) (0.05) (0.04) (0.03) (0.14) (0.03) 2 10.18 10.03 9.93 9.86 10.19 9.73 (0.08) (0.04) (0.03) (0.02) (0.05) (0.04) 3 10.41 10.23 10.14 9.91 10.27 9.81 (0.07) (0.08) (0.08) (0.05) (0.02) (0.06) 4 10.69 10.54 10.41 10.40 10.53 9.90 (0.08) (0.07) (0.04) (0.05) (0.09) (0.04) 5 10.92 10.75 10.68 10.48 11.00 10.33 (0.05) (0.02) (0.03) (0.04) (0.07) (0.05) 6 11.25 11.15 11.05 10.87 11.90 10.89 (0.12) (0.11) (0.09) (0.08) (0.14) (0.19) Average 10.67 10.47 10.34 10.14 10.82 9.95

Notes: In this table, I report the potential earnings of individuals in each career at age 25 as a function of their unobservable type. This corresponds to the average type-specific intercept in the earnings equation of P2012 j t=2001 rt (z) each career (rz,j = 12 ). I report in brackets the standard deviation of the type-specific intercepts in each career. The last row of the panel reports the average intercept in the population. Chapter 1. Understanding the Careers of Young Entrepreneurs 52

Table 1.4 – Career Choices by Type

Career Workers Entrepreneurs Type Share of pop. High Med-high Med-low Low Incorp. Unincorp. Non-emp. 1 0.08 0.08 0.09 0.13 0.23 0.01 0.09 0.36 2 0.20 0.13 0.18 0.23 0.32 0.01 0.07 0.07 3 0.12 0.31 0.24 0.22 0.15 0.01 0.03 0.04 4 0.21 0.13 0.22 0.28 0.32 0.02 0.02 0.02 5 0.34 0.30 0.26 0.23 0.13 0.04 0.03 0.01 6 0.06 0.35 0.25 0.15 0.18 0.04 0.03 0.01 Average 0.22 0.22 0.23 0.22 0.02 0.04 0.06

Notes: This table describes the career choices of individuals as a function of their dominant unobservable type. The dominant unobservable type of an individual is the one for which he has the highest posterior probability of belonging to. I report the fraction of individual-year observations in each career by dominant type. "Share of pop." reports the fraction of individuals in the population whose dominant type is the type listed at left. The last row of the panel reports average career choices in the population. Chapter 1. Understanding the Careers of Young Entrepreneurs 53

Table 1.5 – The Short Run and Long Run Impact of Policies

∆ share incorporated entrepreneurs (p.p.) ∆ avg. earnings (%) Pop. Type 1 Type 2 Type 3 Type 4 Type 5 Type 6 Incorporated Short run -0.10 -0.10 -0.20 -0.10 0.00 -0.10 -0.30 0.03 Long run -0.10 0.30 -0.10 -0.30 0.00 0.00 -1.00 0.20

Impact on the subset of individuals who take the subsidy: ∆ PVI (%): 0.08, ∆ PVI ($): 27,260.

(a) One-Time Subsidy to Enter High Productivity Firms at Age 25

∆ share incorporated entrepreneurs (p.p.) ∆ avg. earnings (%) Pop. Type 1 Type 2 Type 3 Type 4 Type 5 Type 6 Incorporated Short run 3.20 0.40 0.40 0.40 8.40 3.10 1.70 -10.89 Long run 2.20 0.00 0.10 0.10 6.20 2.30 1.10 -7.70 Impact on the subset of individuals who take the subsidy: ∆ PVI (%): 0.22, ∆ PVI ($): 89,173.

(b) One-Time Subsidy to Become an Incorporated Business Owner at Age 25

∆ share incorporated entrepreneurs (p.p.) ∆ avg. earnings (%) Pop. Type 1 Type 2 Type 3 Type 4 Type 5 Type 6 Incorporated Short run 0.20 0.00 0.10 0.00 0.00 0.30 0.80 11.53 Long run 0.10 -0.10 0.00 0.10 0.10 0.20 0.70 2.50 Impact on the subset of individuals who take the subsidy: ∆ PVI (%): 0.66, ∆ PVI ($): 466,856.

(c) One-Time Subsidy to Become an Incorporated Business Owner at Age 25 (Only Available to Individuals Who Start Their Careers in High Productivity Firms) Chapter 1. Understanding the Careers of Young Entrepreneurs 54

1.8 Appendix A: AKM Estimation

I estimate the AKM model using all observations of non-immigrant male workers between 20 and 60 years old. As described in the main text, I include year fixed effects, age squared, and age cubed as time-varying covariates. Following Card et al. (2013) and Card et al. (2016b), I normalize the effect of age to be zero at age 40. Table 1.6 provides basic summary statistics from the estimation. The AKM sample consists of 10,177,750 workers, 1,189,137 firms, and 76,232,621 worker-year observations. The model is identified using workers who change employers over the course of the sample period. Specifically, worker fixed effects and firm fixed effects are estimated separately for each group of workers and firms that are connected by labour mobility (see Abowd et al. (2002) for more details on the identification of connected groups). I use only the fixed effects from the largest connected group in the empirical analysis. As indicated in Table 1.6, the largest connected group consist of 9,161,419 workers, 876,309 firms, and 74,616,421 observations. 98% of all observations are included in the largest connected group. Overall, the AKM model fits the data well with an adjusted R-squared of .79 and a root mean squared error (RMSE) of 0.32. To put these numbers in perspective, note that Song et al. (2015) find an adjusted R-squared of .79 and a RMSE of .44 in the US for 2001-2007 and an adjusted R-squared of .81 and a RMSE of .42 for 2008-2013. To have unbiased firm effects in the AKM model, the key identifying assumption that

m M needs to be satisfied is E[µi,t|i, {di,t}m=1,Xi,t] = 0. I follow the literature in refering to this assumption as the exogenous mobility assumption. The exogenous mobility assumption says that, once we condition on the identity and characteristics of the employee and on the identity of the employer, there is no additional information in the residual that predicts annual earnings. An important threat to identification comes from the omission of a match

M quality component in the AKM model. Estimates of {Λm}m=1 will be biased if workers sort into firms on the basis of their comparative advantage. With this said, recent empirical Chapter 1. Understanding the Careers of Young Entrepreneurs 55 evidence suggests that match quality plays a small role in the assignment of workers to firms (e.g. Card et al., 2013; Bonhomme et al., 2015). I replicate the specification tests developed by Card et al. (2013) to show that the AKM model is a reasonable approximation of the earnings process of workers in Canada between 2001 and 2012.35 First, I look at the evolution of annual earnings over time for job changers. I consider all workers who change jobs between 2001 and 2012 and restrict my attention to job changers with at least two years of continuous employment at their previous and new employer. I divide firms into quartiles according to their estimated firm effects. Figure 1.7 plots the average log earnings for workers who move into a firm that offers a pay premium that is comparable to the one offered by their previous employer. If match quality is an important determinant of earnings, job changers should experience an increase in earnings on average. The fact that the figure shows no jump in average log earnings following a lateral job move suggests that the match quality component of earnings is small. Figure 1.8 plots the evolution of average log earnings over time for workers who move into a firm that offers a pay premium that is different than the one offered by their previous employer. If the match quality component of earnings is small, workers who move to a firm that offers a larger pay premium should see an increase in annual earnings that is roughly equal to the difference in the pay premiums offered by the two employers. The reverse is also true. Workers who move to a firm that offers a smaller pay premium should see a decrease in annual earnings that is roughly equal to the difference in the pay premiums offered by the two employers. As can be seen, the decline in earnings experienced by workers moving down the pay premium ladder is approximately symmetric to the increase in earnings experienced by workers moving in the opposite direction. Taken together, 1.7 and 1.8 suggest that match quality is not an important determinant of earnings in the Canadian context. I also look at the residuals of the AKM model. The assumption that the logarithm of

35As such, I add to a growing list of empirical papers that arrive at similar conclusions using data from various countries. To name just a few, Card et al. (2013) use data from Germany, Serafinelli (2015) looks at Veneto in Italy, Card et al. (2016b) look at Portugal, and Song et al. (2015) look at the United States. Chapter 1. Understanding the Careers of Young Entrepreneurs 56 annual earnings of workers is additively separable in a worker fixed effect and a firm fixed effect implies that the residuals should be close to zero irrespective of the match. Following Card et al. (2013), I divide firms into deciles of firm effects and workers into deciles of worker effects. For each combination of decile of firm effect and decile of worker effect, I calculate the average residuals from the AKM model. Figure 1.9 shows the average residuals on a 10x10 grid. The residuals are all very small in magnitude. The largest absolute deviation is less than 0.025. The only discernable pattern in this figure is for the lowest worker effect group. I find that low wage individuals in low wage firms earn systematically more than predicted by the AKM model. Low wage individuals in high wage firms earn systematically less than predicted. I suspect that this deviation at the low end of the labour market is a consequence of the minimum wage in Canada. In all, the AKM model seems to provide a good approximation of the earnings process of workers in Canada between 2001 and 2012. Chapter 1. Understanding the Careers of Young Entrepreneurs 57

Figure 1.7 – AKM Event Study: Symmetric Job Changes 11.5 11 4 to 4 3 to 3 2 to 2 1 to 1 10.5 Average logarithm of annual earnings 10 -2 -1 0 1 Event time

Notes: This figure shows the average logarithm of annual earnings for workers who move into a firm that offers a pay premium that is comparable to the one offered by their previous employer. Time 0 refers to the first year at the new firm. Chapter 1. Understanding the Careers of Young Entrepreneurs 58

Figure 1.8 – AKM Event Study: Asymmetric Job Changes 11.5

4 to 4

11 4 to 3 4 to 2 4 to 1 1 to 4 1 to 3 1 to 2

10.5 1 to 1 Average logarithm of annual earnings 10 -2 -1 0 1 Event time

Notes: This figure shows the average logarithm of annual earnings for job changers who move from a firm that is in either the top or bottom quartile of the distribution of estimated firm effects. Time 0 refers to the first year at the new firm. Chapter 1. Understanding the Careers of Young Entrepreneurs 59

Figure 1.9 – AKM Residuals Mean residuals

Notes: This figure shows the average residuals from the AKM model by cells defined by decile of the estimated worker effect × decile of the estimated firm effect. Chapter 1. Understanding the Careers of Young Entrepreneurs 60

Table 1.6 – Estimation Results for AKM Model

Description of AKM sample and largest connected group Number of workers in largest connected group 9,161,419 Number of workers in full sample 10,177,750 Ratio: workers in largest/workers in full 0.90

Number of firms in largest connected group 876,309 Number of firms in full sample 1,189,137 Ratio: firms in largest/firms in full 0.74

Number of observations in largest connected group 74,616,421 Number of observations in full sample 76,232,621 Ratio: observations in largest/observations in full 0.98

Summary of parameter estimates Standard deviation of worker effects 0.49 Standard deviation of firm effects 0.26 Correlation between worker and firm effects 0.20 Root mean squared error of AKM residuals 0.32 Adjusted R-squared 0.79 Chapter 1. Understanding the Careers of Young Entrepreneurs 61

Table 1.7 – AKM Firm Effects Are Correlated with Alternative Measures of Firm Produc- tivity

(1) (2) (3) (4) (5) TFP Sales Value added Profits Payroll High prod. firms 0.282∗∗∗ 0.387∗∗∗ 0.409∗∗∗ 0.370∗∗∗ 0.458∗∗∗ (0.003) (0.004) (0.004) (0.004) (0.002) Medium-high prod. firms 0.237∗∗∗ 0.313∗∗∗ 0.323∗∗∗ 0.221∗∗∗ 0.401∗∗∗ (0.003) (0.004) (0.004) (0.004) (0.002) Medium-low prod. firms 0.155∗∗∗ 0.217∗∗∗ 0.216∗∗∗ 0.130∗∗∗ 0.299∗∗∗ (0.003) (0.004) (0.003) (0.004) (0.002)

Mean, dependent variable 0.01 11.68 11.12 10.85 10.16 Adjusted R-squared 0.15 0.21 0.26 0.15 0.34 Number of firms 436,536 557,674 442,988 549,961 743,074

Notes: All regressions include a constant term and industry-specific fixed effects (3-digit NAICS code). Stars denote significance levels: ∗ 0.10, ∗∗ 0.05, ∗∗∗ 0.01. Chapter 1. Understanding the Careers of Young Entrepreneurs 62

1.9 Appendix B: CCP Smoothing

Although the empirical conditional choice probabilities could, in principle, be estimated non- parametrically using equation (1.14), I estimate them using flexible linear probability models to avoid the curse of dimensionality. For each career choice j ∈ J, I estimate a separate linear probability model for every possible combination of last year’s career choice and unobservable type. In total, I run 294 regressions to obtain empirical estimates of the conditional choice probabilities.36 In each regression, I include a constant term, linear and quadratic terms of the number of years of experience an individual has in each career, and year fixed effects. The upside of this approach is that it allows me to obtain estimates of the conditional choice probabilities that are very flexible in a reasonable amount of computation time. The downside of this approach is that it does not impose the restriction that the predicted conditional choice probablities lie between 0 and 1. To deal with fitted values that lie outside this range, I replace fitted values below 0.000001 with 0.000001 and fitted values above 0.999999 with 0.999999. This ad-hoc adjustment is necessary to have a log-likelihood function that is well behaved. In practice, less than 5% of all fitted values are affected by this procedure and the ones that are affected are all very close to 0 and 1 (see Figure 1.10).

36There are seven career choices, seven possible career choices last period, and six unobservable types. Chapter 1. Understanding the Careers of Young Entrepreneurs 63

Figure 1.10 – Empirical CCPs: Raw and Adjusted 5 4 3 2 Density 1 0 -.2 0 .2 .4 .6 .8 1 1.2 Conditional choice probability

Raw Adjusted

(a) Distribution of empirical CCPs of choosing chosen career (pj,z,t(xi,t)) 50 40 30 20 Density 10 0 -.2 0 .2 .4 .6 .8 1 1.2 Conditional choice probability

Raw Adjusted

(b) Distribution of empirical CCPs of choosing non-employment (pj0 ,z,t(xi,t))

Notes: These figures show the distribution of the estimated empirical CCPs in the data. Panel (a) shows the distribution of the predicted probability of choosing career j, where j is the observed career choice of individual i at time t. Panel (b) shows the distribution of the predicted probability of choosing non-employment. The dashed lines correspond to the raw fitted values and the solid lines correspond to the adjusted fitted values. Chapter 1. Understanding the Careers of Young Entrepreneurs 64

1.10 Appendix C: Parameter Estimates Chapter 1. Understanding the Careers of Young Entrepreneurs 65

Table 1.8 – Earnings Process: High Productivity Firms

Type 1 Type 2 Type 3 Type 4 Type 5 Type 6 Type Invariant Exper. high 0.139∗∗∗ 0.287∗∗∗ 0.179∗∗∗ 0.151∗∗∗ 0.659∗∗∗ 0.357∗∗∗ (0.003) (0.004) (0.003) (0.004) (0.012) (0.008) Exper. med-high 0.183∗∗∗ 0.270∗∗∗ 0.173∗∗∗ 0.137∗∗∗ 0.646∗∗∗ 0.326∗∗∗ (0.004) (0.004) (0.003) (0.004) (0.012) (0.008) Exper. med-low 0.170∗∗∗ 0.281∗∗∗ 0.185∗∗∗ 0.119∗∗∗ 0.645∗∗∗ 0.346∗∗∗ (0.004) (0.004) (0.003) (0.004) (0.012) (0.008) Exper. low 0.153∗∗∗ 0.291∗∗∗ 0.176∗∗∗ 0.128∗∗∗ 0.621∗∗∗ 0.322∗∗∗ (0.004) (0.004) (0.003) (0.004) (0.012) (0.008) Exper. unincorp. 0.168∗∗∗ 0.246∗∗∗ 0.153∗∗∗ 0.061∗∗∗ 0.586∗∗∗ 0.259∗∗∗ (0.006) (0.006) (0.005) (0.005) (0.012) (0.011) Exper. incorp. 0.192∗∗∗ 0.228∗∗∗ 0.109∗∗∗ 0.114∗∗∗ 0.629∗∗∗ 0.332∗∗∗ (0.010) (0.010) (0.008) (0.007) (0.013) (0.014) Age -0.081∗∗∗ -0.215∗∗∗ -0.089∗∗∗ -0.083∗∗∗ -0.568∗∗∗ -0.242∗∗∗ (0.003) (0.004) (0.003) (0.004) (0.012) (0.008) Work exper. (sq) -0.012∗∗∗ (0.000) Exper. unincorp. (sq) -0.014∗∗∗ (0.001) Exper. incorp. (sq) -0.012∗∗∗ (0.001) Age (sq) 0.008∗∗∗ (0.000)

Average intercept 9.88 10.18 10.41 10.69 10.92 11.25

Mean, dep. var. 10.95 Adjusted R-squared 0.56 Number of ind. 436,541 Number of obs. 1,608,558

Notes: The dependent variable is the logarithm of annual earnings. Standard errors clustered at the individual level are in parentheses. The regression includes a constant term and type-year fixed effects. I report the time averaged intercept for each type. Stars denote significance levels: ∗ 0.10, ∗∗ 0.05, ∗∗∗ 0.01. Chapter 1. Understanding the Careers of Young Entrepreneurs 66

Table 1.9 – Earnings Process: Medium-High Productivity Firms

Type 1 Type 2 Type 3 Type 4 Type 5 Type 6 Type Invariant Exper. high 0.164∗∗∗ 0.246∗∗∗ 0.213∗∗∗ 0.304∗∗∗ 0.715∗∗∗ 0.394∗∗∗ (0.004) (0.003) (0.003) (0.005) (0.012) (0.007) Exper. med-high 0.172∗∗∗ 0.235∗∗∗ 0.211∗∗∗ 0.281∗∗∗ 0.698∗∗∗ 0.375∗∗∗ (0.004) (0.003) (0.003) (0.005) (0.012) (0.007) Exper. med-low 0.172∗∗∗ 0.233∗∗∗ 0.208∗∗∗ 0.277∗∗∗ 0.698∗∗∗ 0.411∗∗∗ (0.004) (0.003) (0.003) (0.005) (0.012) (0.007) Exper. low 0.172∗∗∗ 0.238∗∗∗ 0.186∗∗∗ 0.278∗∗∗ 0.678∗∗∗ 0.350∗∗∗ (0.004) (0.003) (0.003) (0.005) (0.012) (0.007) Exper. unincorp. 0.183∗∗∗ 0.245∗∗∗ 0.218∗∗∗ 0.187∗∗∗ 0.635∗∗∗ 0.309∗∗∗ (0.006) (0.005) (0.005) (0.006) (0.012) (0.011) Exper. incorp. 0.265∗∗∗ 0.182∗∗∗ 0.209∗∗∗ 0.184∗∗∗ 0.661∗∗∗ 0.347∗∗∗ (0.019) (0.009) (0.007) (0.009) (0.013) (0.014) Age -0.110∗∗∗ -0.167∗∗∗ -0.107∗∗∗ -0.221∗∗∗ -0.605∗∗∗ -0.279∗∗∗ (0.003) (0.003) (0.003) (0.005) (0.012) (0.007) Work exper. (sq) -0.012∗∗∗ (0.000) Exper. unincorp. (sq) -0.016∗∗∗ (0.001) Exper. incorp. (sq) -0.011∗∗∗ (0.001) Age (sq) 0.008∗∗∗ (0.000)

Average intercept 9.80 10.03 10.23 10.54 10.75 11.15

Mean, dep. var. 10.72 Adjusted R-squared 0.57 Number of ind. 472,342 Number of obs. 1,618,393

Notes: The dependent variable is the logarithm of annual earnings. Standard errors clustered at the individual level are in parentheses. The regression includes a constant term and type-year fixed effects. I report the time averaged intercept for each type. Stars denote significance levels: ∗ 0.10, ∗∗ 0.05, ∗∗∗ 0.01. Chapter 1. Understanding the Careers of Young Entrepreneurs 67

Table 1.10 – Earnings Process: Medium-Low Productivity Firms

Type 1 Type 2 Type 3 Type 4 Type 5 Type 6 Type Invariant Exper. high 0.149∗∗∗ 0.231∗∗∗ 0.220∗∗∗ 0.392∗∗∗ 0.645∗∗∗ 0.389∗∗∗ (0.003) (0.003) (0.003) (0.005) (0.008) (0.009) Exper. med-high 0.155∗∗∗ 0.220∗∗∗ 0.208∗∗∗ 0.379∗∗∗ 0.629∗∗∗ 0.364∗∗∗ (0.003) (0.003) (0.003) (0.005) (0.008) (0.009) Exper. med-low 0.152∗∗∗ 0.227∗∗∗ 0.226∗∗∗ 0.375∗∗∗ 0.635∗∗∗ 0.396∗∗∗ (0.003) (0.002) (0.003) (0.005) (0.008) (0.009) Exper. low 0.157∗∗∗ 0.229∗∗∗ 0.175∗∗∗ 0.370∗∗∗ 0.599∗∗∗ 0.345∗∗∗ (0.003) (0.003) (0.003) (0.005) (0.008) (0.009) Exper. unincorp. 0.145∗∗∗ 0.241∗∗∗ 0.230∗∗∗ 0.292∗∗∗ 0.571∗∗∗ 0.317∗∗∗ (0.005) (0.004) (0.004) (0.006) (0.009) (0.014) Exper. incorp. 0.220∗∗∗ 0.222∗∗∗ 0.231∗∗∗ 0.310∗∗∗ 0.610∗∗∗ 0.417∗∗∗ (0.011) (0.008) (0.008) (0.008) (0.010) (0.020) Age -0.104∗∗∗ -0.152∗∗∗ -0.110∗∗∗ -0.303∗∗∗ -0.532∗∗∗ -0.265∗∗∗ (0.002) (0.002) (0.003) (0.005) (0.008) (0.009) Work exper. (sq) -0.011∗∗∗ (0.000) Exper. unincorp. (sq) -0.016∗∗∗ (0.001) Exper. incorp. (sq) -0.016∗∗∗ (0.001) Age (sq) 0.007∗∗∗ (0.000)

Average intercept 9.75 9.93 10.14 10.41 10.68 11.05

Mean, dep. var. 10.58 Adjusted R-squared 0.57 Number of ind. 501,440 Number of obs. 1,683,075

Notes: The dependent variable is the logarithm of annual earnings. Standard errors clustered at the individual level are in parentheses. The regression includes a constant term and type-year fixed effects. I report the time averaged intercept for each type. Stars denote significance levels: ∗ 0.10, ∗∗ 0.05, ∗∗∗ 0.01. Chapter 1. Understanding the Careers of Young Entrepreneurs 68

Table 1.11 – Earnings Process: Low Productivity Firms

Type 1 Type 2 Type 3 Type 4 Type 5 Type 6 Type Invariant Exper. high 0.087∗∗∗ 0.196∗∗∗ 0.272∗∗∗ 0.222∗∗∗ 0.835∗∗∗ 0.391∗∗∗ (0.003) (0.002) (0.003) (0.003) (0.010) (0.009) Exper. med-high 0.101∗∗∗ 0.188∗∗∗ 0.271∗∗∗ 0.191∗∗∗ 0.821∗∗∗ 0.367∗∗∗ (0.002) (0.002) (0.003) (0.003) (0.010) (0.009) Exper. med-low 0.103∗∗∗ 0.186∗∗∗ 0.278∗∗∗ 0.188∗∗∗ 0.825∗∗∗ 0.397∗∗∗ (0.002) (0.002) (0.003) (0.003) (0.010) (0.009) Exper. low 0.096∗∗∗ 0.213∗∗∗ 0.261∗∗∗ 0.201∗∗∗ 0.808∗∗∗ 0.354∗∗∗ (0.002) (0.002) (0.003) (0.003) (0.010) (0.008) Exper. unincorp. 0.080∗∗∗ 0.180∗∗∗ 0.209∗∗∗ 0.139∗∗∗ 0.741∗∗∗ 0.275∗∗∗ (0.004) (0.004) (0.005) (0.004) (0.010) (0.013) Exper. incorp. 0.101∗∗∗ 0.159∗∗∗ 0.212∗∗∗ 0.138∗∗∗ 0.776∗∗∗ 0.410∗∗∗ (0.008) (0.007) (0.009) (0.006) (0.012) (0.020) Age -0.058∗∗∗ -0.138∗∗∗ -0.158∗∗∗ -0.128∗∗∗ -0.710∗∗∗ -0.244∗∗∗ (0.002) (0.002) (0.003) (0.003) (0.010) (0.008) Work exper. (sq) -0.009∗∗∗ (0.000) Exper. unincorp. (sq) -0.008∗∗∗ (0.001) Exper. incorp. (sq) -0.008∗∗∗ (0.001) Age (sq) 0.005∗∗∗ (0.000)

Average intercept 9.64 9.86 9.91 10.30 10.48 10.87

Mean, dep. var. 10.34 Adjusted R-squared 0.55 Number of ind. 492,704 Number of obs. 1,636,477

Notes: The dependent variable is the logarithm of annual earnings. Standard errors clustered at the individual level are in parentheses. The regression includes a constant term and type-year fixed effects. I report the time averaged intercept for each type. Stars denote significance levels: ∗ 0.10, ∗∗ 0.05, ∗∗∗ 0.01. Chapter 1. Understanding the Careers of Young Entrepreneurs 69

Table 1.12 – Earnings Process: Incorporated

Type 1 Type 2 Type 3 Type 4 Type 5 Type 6 Type Invariant Exper. high 0.072∗∗∗ 0.133∗∗∗ 0.054∗∗∗ 0.060∗∗∗ 0.162∗∗∗ 0.155∗∗∗ (0.016) (0.012) (0.013) (0.010) (0.011) (0.033) Exper. med-high 0.01 0.122∗∗∗ 0.052∗∗∗ 0.043∗∗∗ 0.141∗∗∗ 0.124∗∗∗ (0.019) (0.012) (0.013) (0.010) (0.011) (0.033) Exper. med-low -0.009 0.147∗∗∗ 0.040∗∗∗ 0.044∗∗∗ 0.171∗∗∗ 0.158∗∗∗ (0.016) (0.012) (0.013) (0.010) (0.011) (0.034) Exper. low -0.046∗∗∗ 0.141∗∗∗ -0.02 0.071∗∗∗ 0.141∗∗∗ 0.123∗∗∗ (0.013) (0.011) (0.013) (0.010) (0.011) (0.033) Exper. unincorp. 0.004 0.158∗∗∗ 0.066∗∗∗ 0.051∗∗∗ 0.139∗∗∗ 0.133∗∗∗ (0.013) (0.012) (0.014) (0.012) (0.012) (0.035) Exper. incorp. 0.146∗∗∗ 0.206∗∗∗ 0.095∗∗∗ 0.180∗∗∗ 0.326∗∗∗ 0.418∗∗∗ (0.011) (0.011) (0.012) (0.010) (0.011) (0.033) Age 0.015 -0.154∗∗∗ 0.006 -0.095∗∗∗ -0.177∗∗∗ -0.187∗∗∗ (0.010) (0.010) (0.012) (0.010) (0.011) (0.033) Work exper. (sq) 0.004∗∗∗ (0.000) Exper. unincorp. (sq) 0.002∗∗ (0.001) Exper. incorp. (sq) -0.008∗∗∗ (0.000) Age (sq) 0.001∗∗∗ (0.000)

Average intercept 10.11 10.19 10.27 10.53 11.00 11.90

Mean, dep. var. 11.12 Adjusted R-squared 0.50 Number of ind. 43,163 Number of obs. 168,021

Notes: The dependent variable is the logarithm of annual earnings. Standard errors clustered at the individual level are in parentheses. The regression includes a constant term and type-year fixed effects. I report the time averaged intercept for each type. Stars denote significance levels: ∗ 0.10, ∗∗ 0.05, ∗∗∗ 0.01. Chapter 1. Understanding the Careers of Young Entrepreneurs 70

Table 1.13 – Earnings Process: Unincorporated

Type 1 Type 2 Type 3 Type 4 Type 5 Type 6 Type Invariant Exper. high 0.130∗∗∗ 0.194∗∗∗ 0.210∗∗∗ 0.275∗∗∗ 0.443∗∗∗ 0.346∗∗∗ (0.004) (0.004) (0.006) (0.006) (0.006) (0.015) Exper. med-high 0.131∗∗∗ 0.185∗∗∗ 0.174∗∗∗ 0.253∗∗∗ 0.432∗∗∗ 0.281∗∗∗ (0.004) (0.004) (0.006) (0.006) (0.006) (0.016) Exper. med-low 0.129∗∗∗ 0.190∗∗∗ 0.155∗∗∗ 0.255∗∗∗ 0.452∗∗∗ 0.327∗∗∗ (0.004) (0.004) (0.005) (0.006) (0.006) (0.016) Exper. low 0.097∗∗∗ 0.183∗∗∗ 0.098∗∗∗ 0.289∗∗∗ 0.382∗∗∗ 0.263∗∗∗ (0.003) (0.004) (0.005) (0.006) (0.006) (0.016) Exper. unincorp. 0.099∗∗∗ 0.208∗∗∗ 0.178∗∗∗ 0.272∗∗∗ 0.453∗∗∗ 0.323∗∗∗ (0.003) (0.003) (0.005) (0.005) (0.006) (0.015) Exper. incorp. 0.098∗∗∗ 0.133∗∗∗ 0.082∗∗∗ 0.175∗∗∗ 0.390∗∗∗ 0.248∗∗∗ (0.012) (0.011) (0.011) (0.013) (0.013) (0.032) Age -0.053∗∗∗ -0.127∗∗∗ -0.028∗∗∗ -0.193∗∗∗ -0.344∗∗∗ -0.176∗∗∗ (0.002) (0.003) (0.004) (0.005) (0.006) (0.014) Work exper. (sq) -0.010∗∗∗ (0.000) Exper. unincorp. (sq) -0.009∗∗∗ (0.000) Exper. incorp. (sq) -0.003 (0.002) Age (sq) 0.003∗∗∗ (0.000)

Average intercept 9.56 9.73 9.81 9.90 10.33 10.89

Mean, dep. var. 10.15 Adjusted R-squared 0.47 Number of ind. 102,520 Number of obs. 305,870

Notes: The dependent variable is the logarithm of annual earnings. Standard errors clustered at the individual level are in parentheses. The regression includes a constant term and type-year fixed effects. I report the time averaged intercept for each type. Stars denote significance levels: ∗ 0.10, ∗∗ 0.05, ∗∗∗ 0.01. Chapter 1. Understanding the Careers of Young Entrepreneurs 71

Table 1.14 – Non-Pecuniary Benefits and Scale Parameters

Type 1 Type 2 Type 3 Type 4 Type 5 Type 6 High -9.79 -10.50 -10.49 -10.34 -12.23 -12.63 Med-high -9.59 -10.58 -10.35 -11.49 -12.48 -12.70 Med-low -9.51 -10.51 -10.32 -12.16 -12.27 -12.78 Low -9.56 -10.38 -10.16 -10.87 -13.54 -12.37 Incorp. -10.32 -10.64 -9.98 -11.33 -12.60 -14.07 Unincorp. -9.65 -10.37 -10.26 -10.75 -13.15 -12.22 Scale parameter 26.92 17.09 19.72 27.39 37.58 19.21

Notes: This table reports the estimated non-pecuniary benefits associated with each career for each unobservable type. All estimates have been divided by the corresponding type-specific scale parameter reported in the last row of the table. Chapter 1. Understanding the Careers of Young Entrepreneurs 72

Table 1.15 – Mobility Costs

High Med-high Med-low Low Unincorp. Incorp. Intercept -0.36 -0.55 -0.51 -0.28 -0.81 -0.89 Exper. high 0.14 -0.05 -0.08 0.10 0.17 -0.28 Exper. med-high -0.23 0.30 -0.19 -0.06 0.06 0.25 Exper. med-low -0.18 -0.17 0.19 -0.06 0.06 -0.26 Exper. low -0.09 -0.16 -0.13 0.06 0.18 0.12 Exper. unincorp. -0.13 -0.13 0.06 -0.01 0.06 0.00 Exper. incorp 0.25 -0.07 0.11 0.22 0.22 -0.02 Exper. non-emp. 0.24 0.15 0.12 0.07 0.13 0.12

(a) Type 1

High Med-high Med-low Low Unincorp. Incorp. Intercept -0.62 -1.01 -0.94 -1.40 -0.87 -0.72 Exper. high 0.02 0.04 0.05 0.08 0.03 -0.55 Exper. med-high -0.06 0.15 0.10 0.21 0.11 0.17 Exper. med-low -0.07 0.13 0.12 0.25 0.10 0.08 Exper. low 0.02 0.21 0.21 0.16 0.19 -0.30 Exper. unincorp. -0.32 -0.18 0.09 -0.37 0.30 -0.08 Exper. incorp -0.34 -0.09 -0.18 -0.43 0.03 0.47 Exper. non-emp. 0.38 0.34 0.26 0.24 0.11 0.53

(b) Type 2

High Med-high Med-low Low Unincorp. Incorp. Intercept -0.54 -0.71 -0.70 0.11 -0.68 -1.37 Exper. high 0.24 -0.13 -0.25 -0.31 0.19 0.18 Exper. med-high 0.01 0.41 -0.26 -0.40 0.15 -0.09 Exper. med-low -0.06 -0.28 0.42 -0.55 0.32 0.00 Exper. low -0.35 -0.56 -0.75 0.68 -0.17 0.15 Exper. unincorp. 0.10 -0.04 -0.11 -0.34 -0.03 -0.02 Exper. incorp 0.20 -0.07 -0.13 -0.19 -0.13 0.24 Exper. non-emp. 0.22 0.22 0.13 -0.04 -0.01 0.57

(c) Type 3

Notes: This table reports the estimated parameters of the mobility costs function for each unobservable type. All estimates have been divided by the corresponding type-specific scale parameter reported in the last row of Table 1.14. Chapter 1. Understanding the Careers of Young Entrepreneurs 73

Table 1.16 – Mobility Costs (continued)

High Med-high Med-low Low Unincorp. Incorp. Intercept -2.45 0.25 -0.26 -0.89 -0.05 -0.09 Exper. high 0.39 0.11 0.17 0.18 0.17 0.38 Exper. med-high 0.44 -0.13 0.01 -0.11 -0.04 0.19 Exper. med-low 0.31 -0.3 0.00 0.01 -0.13 0.08 Exper. low 0.50 -0.13 -0.07 0.06 0.09 0.14 Exper. unincorp. -0.10 -0.35 -0.33 -0.15 0.09 -0.08 Exper. incorp 0.14 0.05 0.00 0.30 -0.13 -0.07 Exper. non-emp. -0.08 0.61 0.61 0.07 -0.02 0.12

(d) Type 4

High Med-high Med-low Low Unincorp. Incorp. Intercept -0.32 -0.35 -1.62 -0.29 -1.15 -0.25 Exper. high 0.18 -0.08 -0.2 -0.48 0.06 -0.06 Exper. med-high -0.04 0.10 -0.10 -0.08 0.00 0.00 Exper. med-low 0.04 0.04 0.24 -0.01 0.05 0.02 Exper. low 0.00 -0.07 0.37 -0.09 0.16 -0.02 Exper. unincorp. -0.11 0.16 0.29 -0.06 0.24 -0.02 Exper. incorp -0.15 -0.07 0.41 0.02 0.10 0.00 Exper. non-emp. 0.53 0.54 1.18 1.09 0.41 0.14

(e) Type 5

High Med-high Med-low Low Unincorp. Incorp. Intercept 0.06 -0.64 -0.34 -0.73 -0.09 -1.08 Exper. high -0.05 -0.19 -0.03 0.00 0.06 0.17 Exper. med-high -0.12 0.16 -0.18 -0.39 0.21 0.25 Exper. med-low -0.14 -0.15 -0.07 -0.11 0.13 0.33 Exper. low 0.01 -0.31 -0.34 0.08 0.07 0.18 Exper. unincorp. -0.02 -0.02 0.03 -0.09 -0.33 0.32 Exper. incorp -0.02 -0.05 -0.19 -0.06 0.25 -0.35 Exper. non-emp. 0.10 0.35 0.18 0.08 0.04 0.77

(f) Type 6

Notes: This table shows the estimated parameters of the mobility costs function for each unobservable type. All estimates have been divided by the corresponding type-specific scale parameter reported in the last row of Table 1.14. Chapter 1. Understanding the Careers of Young Entrepreneurs 74

1.11 Appendix D: Model Fit

To assess model fit, I compare the career choices of individuals observed in the data to the ones predicted by the model. Equation (1.19) implies that predicted career choices at time t can be computed for each individual using a combination of the structural parameters of the model and estimates of future empirical CCPs. For each observation in the sample, I calculate the difference in ex-ante value functions between career j and non-employment for all j ∈ J. I then draw a vector of career-specific idiosyncratic shocks and calculate their optimal career choice. Table 1.17 shows the career choices of individuals over the life cycle. Panel (a) reports the observed fraction of individuals in each career at each age between 25 and 36. Using all observations in the sample, I use the model to predict the career choices of individuals. Panel (b) reports the predicted fraction of individuals in each career at each age between 25 and 35. Overall, I find that the model does a good job at explaining the career choices of individuals over the life cycle. The model predicts most accurately the career choices of individuals near the beginning and near the end of the age range for which I observe individuals in the data. Non-trivial discrepancies are found in the middle of the age range. For example, I predict more than double the share of incorporated business owners at age 27, but this discrepancy vanishes by age 33. Turning to career transitions, I find that the model has a hard time replicating the high degree of persistence in career choices observed in the data. Table 1.18 describes the career choices of individuals as a function of their career at time t − 1. In Panel (a), I report the fraction of observed transitions from origin (row) to destination (column). In Panel (b), I report the fraction of transitions from origin (row) to destination (column) that is predicted by the model. In the data, I find that 89% of incorporated business owners remain in that career from one year to the next. The model predicts that only 63% of them choose to remain in their career. This lower degree of persistence in career choices is observed in all careers Chapter 1. Understanding the Careers of Young Entrepreneurs 75

(this can be seen by comparing the numbers on the diagonals). With this said, I find that the model can predict the patterns of transitions observed in the data reasonably well. For example, the model predicts that incorporated business owners are relatively more likely to switch into either high or low productivity firms. This is consistent with what we see in the data. The model also predicts that workers in high productivity firms transition into incorporated entrepreneurship at a higher rate than workers in other classes of firms. Again, this is consistent with what we see in the data. Finally, I explore the ability of the model to predict the career choices of individuals as a function of their dominant unobservable type in Table 1.19. I find that the model does a good job at predicting the career choices of individuals along that dimension as well. The main discrepancy is that the model tends to predict too many incorporated entrepreneurs for each unobservable type. This is especially true for dominant Type 1 and dominant Type 6 individuals. Chapter 1. Understanding the Careers of Young Entrepreneurs 76

Table 1.17 – Model Fit: Career Choices Over the Life Cycle

Age 25 26 27 28 29 30 31 32 33 34 35 36 High 0.21 0.21 0.21 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.23 0.23 Med-high 0.23 0.22 0.22 0.22 0.22 0.21 0.21 0.21 0.21 0.20 0.20 0.20 Med-low 0.25 0.23 0.23 0.22 0.22 0.22 0.22 0.21 0.21 0.21 0.21 0.21 Low 0.26 0.23 0.22 0.21 0.21 0.21 0.21 0.20 0.20 0.20 0.20 0.20 Incorp. 0.01 0.01 0.02 0.02 0.02 0.03 0.03 0.04 0.04 0.04 0.05 0.05 Unincorp. 0.04 0.04 0.04 0.04 0.04 0.04 0.05 0.05 0.05 0.05 0.05 0.05 Non-emp. - 0.06 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07 0.07

(a) Data

Age 25 26 27 28 29 30 31 32 33 34 35 36 High 0.20 0.19 0.19 0.19 0.17 0.18 0.20 0.21 0.21 0.21 0.28 - Med-high 0.23 0.23 0.21 0.19 0.18 0.19 0.19 0.21 0.23 0.24 0.20 - Med-low 0.22 0.24 0.22 0.21 0.21 0.21 0.21 0.21 0.23 0.22 0.17 - Low 0.28 0.25 0.25 0.23 0.23 0.22 0.22 0.23 0.20 0.20 0.19 - Incorp. 0.02 0.02 0.04 0.05 0.07 0.07 0.06 0.05 0.04 0.04 0.06 - Unincorp. 0.04 0.05 0.06 0.07 0.08 0.07 0.06 0.04 0.04 0.04 0.05 - Non-emp. 0.02 0.03 0.05 0.06 0.06 0.06 0.06 0.06 0.07 0.06 0.05 -

(b) Model

Notes: This table describes the observed [Panel (a)] and predicted [Panel (b)] career choices of individuals over the life cycle. I report the fraction of individuals in each career by age. Chapter 1. Understanding the Careers of Young Entrepreneurs 77

Table 1.18 – Model Fit: Career Transitions

Career at time t Workers Entrepreneurs Career at time t − 1 High Med-high Med-low Low Incorp. Unincorp. Non-emp.

High 0.853 0.045 0.035 0.027 0.004 0.006 0.029 Med-high 0.052 0.811 0.052 0.039 0.003 0.007 0.036 Med-low 0.040 0.049 0.802 0.054 0.003 0.008 0.043 Low 0.033 0.040 0.055 0.799 0.004 0.010 0.058 Incorp. 0.018 0.013 0.014 0.021 0.889 0.014 0.031 Unincorp. 0.025 0.023 0.028 0.038 0.024 0.764 0.097 Non-emp. 0.039 0.050 0.071 0.118 0.007 0.055 0.659

(a) Data

Career at time t Workers Entrepreneurs Career at time t − 1 High Med-high Med-low Low Incorp. Unincorp. Non-emp.

High 0.677 0.062 0.053 0.097 0.045 0.042 0.024 Med-high 0.061 0.688 0.071 0.082 0.02 0.047 0.031 Med-low 0.057 0.078 0.701 0.069 0.022 0.036 0.037 Low 0.046 0.056 0.066 0.737 0.03 0.015 0.049 Incorp. 0.141 0.047 0.053 0.071 0.63 0.036 0.023 Unincorp. 0.09 0.055 0.046 0.163 0.051 0.519 0.076 Non-emp. 0.065 0.078 0.102 0.142 0.025 0.048 0.54

(b) Model

Notes: This table describes the observed [Panel (a)] and predicted [Panel (b)] career choices of individuals as a function of their career at time t − 1. I reports the fraction of transitions from origin (row) to destination (column). Chapter 1. Understanding the Careers of Young Entrepreneurs 78

Table 1.19 – Model Fit: Career Choices by Type

Career Workers Entrepreneurs Type Share of pop. High Med-high Med-low Low Incorp. Unincorp. Non-emp. 1 0.08 0.08 0.09 0.13 0.23 0.01 0.09 0.36 2 0.20 0.13 0.18 0.23 0.32 0.01 0.07 0.07 3 0.12 0.31 0.24 0.22 0.15 0.01 0.03 0.04 4 0.21 0.13 0.22 0.28 0.32 0.02 0.02 0.02 5 0.34 0.30 0.26 0.23 0.13 0.04 0.03 0.01 6 0.06 0.35 0.25 0.15 0.18 0.04 0.03 0.01 Average 0.22 0.22 0.23 0.22 0.02 0.04 0.06

(a) Data

Career Workers Entrepreneurs Type Share of pop. High Med-high Med-low Low Incorp. Unincorp. Non-emp. 1 0.08 0.09 0.11 0.15 0.23 0.04 0.08 0.30 2 0.20 0.13 0.18 0.22 0.34 0.03 0.04 0.06 3 0.12 0.24 0.21 0.22 0.2 0.02 0.08 0.04 4 0.21 0.16 0.23 0.24 0.31 0.03 0.03 0.01 5 0.34 0.24 0.23 0.23 0.18 0.05 0.08 0.01 6 0.06 0.29 0.25 0.15 0.16 0.12 0.03 0.01 Average 0.19 0.21 0.22 0.24 0.06 0.04 0.05

(b) Model

Notes: This table describes the observed [Panel (a)] and predicted [Panel (b)] career choices of individuals as a function of their dominant type. The dominant type of an individual is the type for which he has the highest posterior probability of belonging to. I report the fraction of individual-year observations in each career by dominant type. Chapter 2

The Local Impact of Containerization

with Leah Brooks and Gisela Rua

2.1 Introduction

Underlying the second wave of globalization following World War II is a vast improvement in the ability to transport goods. New York City’s Herald Square Macy’s now finds it cheaper to source a dress from Malaysia than from the city’s own rapidly disappearing garment district (Levinson, 2008, p. 3). This decline in the importance of physical distance owes much to the development and rise of containerization. Containerization, which took off in the early 1960s, is premised on a simple insight: packaging goods for waterborne trade into a standardized container makes them cheaper to move. Containerization simplifies and speeds packing, transit, pricing, and the transfer from ship to train to truck. It also limits previously routine and lucrative pilferage. These cost declines have yielded sea changes in trade. From the advent of containerization in 1956 to 1981, containerization caused international trade to grow by more than 1,000 percent (Bernhofen et al., 2016). Containerized cargo now accounts for over half of global non-commodity trade (United Nations Conference on Trade and Development, 2013). In this paper, we use novel data and a new identification strategy to understand how

79 Chapter 2. The Local Impact of Containerization 80 the drastic decline in trade cost brought by containerization impacts local economic activity. We address the non-random adoption of the new shipping technology by ports with a novel cost-shifter instrument: port depth pre-containerization. This variable isolates exogenous, cost-driven port containerization from adoption due to local demand. Because container ships sit much deeper in the water than their predecessors, they require deeper ports in which to dock. Dredging a harbor to increase depth is possible, but it is extremely costly. We find that the cost advantage conferred by a deep harbor in the pre-containerization era makes a port more likely to containerize. To ensure that the instrument works through the cost of supplying a container port and not through a port’s initial competitive advantage, we limit the instrument to only ports that are “very deep” pre-containerization. Intuitively, these ports had a depth beyond any pre-containerization advantage. To undertake this analysis, we combine a large variety of data sources for the period 1910 to 2010. We use US counties as our unit of analysis, with county-level demographics and income from the Decennial Census (1910 to 2010) and employment and payroll data from the County Business Patterns (1956 and 1971 to 2011). We supplement these data with infor- mation on ports from 1953 and 2014, containerization adoption, and port-level foreign trade in the pre-containerization era. To measure contemporaneous alternative transportation, we use newly digitized highway and rail routes circa 1950. We assess the economic impact of containerization’s most important feature—the reduc- tion in trade costs—through the lens of a New Economic Geography (NEG) model (Helpman, 1995; Redding and Sturm, 2008; Redding and Rossi-Hansberg, 2017). In these models, ag- glomeration and dispersion forces account for the spatial distribution of economic activity, and population moves in response to changes in real wages. A NEG model predicts that con- tainerization’s reduction in trade costs causes an increase in population near containerized ports, and that this effect decays as distance from the port increases. Containerization has an ambiguous impact on nominal wages, depending on the balance between the gains from access to a larger market and greater competition from lower-priced distant firms. Chapter 2. The Local Impact of Containerization 81

Our findings are consistent with these theoretical predictions. From 1950 to 2010, our instrumental variable estimates report that counties within 100 km of a container port ex- perience population gains of about 70 percent over the 60 year period. We find smaller, but still economically meaningful, gains in employment, and no substantive change in nominal wages. These gains in population and employment dissipate with distance from the port, and are indistinguishable from zero beyond roughly 300 km. We find that measures of initial land values mediate gains to containerization. Container- ization requires large extensions of land, as port activity shifts from water-based finger piers to giant cranes and vast marshalling yards. New Economic Geography models also predict that initially smaller locations experience a proportionally larger increase in access to new markets. Consistent with both the NEG intuition and the fact that container ports are cheaper and easier to develop in initially low land value areas, we find that gains from containerization, in percentage terms, are concentrated in places where we expect low pre-containerization land values: counties with initially lower population density and manufacturing employment. Our paper adds to several literatures. First, our findings contribute to the debate on the impact of globalization on economic activity. Following Romer and Frankel (1999), a large literature has emerged to understand how improved access to international markets affects country level outcomes such as GDP (e.g. Feyrer, 2009a,b; Pascali, 2017).1 Our paper contributes to this literature by looking at how the reduction in trade costs brought by containerization affects the spatial distribution of economic activity within countries. In doing so, our results shed light on the potential uneven impacts of globalization. To the best of our knowledge, only one other paper isolates the causal effects of globalization on local economic activity: Campante and Yanagizawa-Drott (2017). These authors exploit constraints on the capacity of airplanes to fly long distances to obtain a source of exogenous

1Most papers in this literature find that improved access to international markets has large positive effects on GDP, with the exception of Pascali (2017) who documents mainly negative effects. Pascali (2017) is particularly related to our paper in that he exploits a major improvement in the shipping technology—the advent of the steamship—to examine how a decline in international transportation costs impacts economic activity. Chapter 2. The Local Impact of Containerization 82 variation in access to international markets at the city level. As do we, they find large positive effects of access to international markets on local economic activity.2 Second, our paper contributes to a growing academic literature investigating the conse- quences of improvements in transportation infrastructure on local economic activity (Baum- Snow, 2007; Michaels, 2008; Duranton and Turner, 2012; Donaldson and Hornbeck, 2016). These studies examine how investments in highways and railways have shaped the spatial distribution of economic activity within countries. Our paper is the first to study how large investments in maritime transportation infrastructure, specifically new container terminals, affect the economic conditions of target areas. Methodologically, our paper contributes a new instrumental variable strategy to contend with the non-random allocation of transporta- tion infrastructure. Specifically, we introduce a cost-shifter instrument to obtain a source of quasi-random variation in the observed infrastructure. See Redding and Turner (2015) for a recent survey of the literature. Finally, our work enhances the growing literature on containerization by expanding its focus beyond the shipping and trade industries. In this burgeoning literature, Rua (2014) investigates the global adoption of containerization, and Bernhofen et al. (2016) estimate its impact on world trade.3 Hummels (2007), Bridgman (2014) and Coşar and Demir (2018) all analyze containerization’s impact on shipping costs. The remainder of this paper is organized as follows. The following section provides back- ground on containerization, Section 2.3 outlines the theoretical motivation, and Section 2.4 discusses the data. We present empirical methods in Section 2.5, and results in Section 2.6. We conclude with Section 2.7. 2Our paper also complements a growing literature in international trade that looks at the impact of trade shocks on local labour markets (e.g. Topalova, 2010; Autor et al., 2013; Kovak, 2013). These papers compare locations within a country that have similar access to international markets but that, because of initial differences in industry composition, are differentially affected by changes in a trading partner’s economic activity (e.g. ). In contrast, we control for initial differences in industry composition and compare locations that experience differential gains in access to international markets. 3The seminal book on this topic is Levinson (2008). Chapter 2. The Local Impact of Containerization 83

2.2 Containerization

Before goods went into the box, shipping was expensive and slow. Vessels spent weeks at ports while gangs of dockworkers handled cargo piece by piece. Port costs accounted for a sizeable share of the total cost of the movement of goods. The American Association of Port Authorities estimated that in-port costs, primarily labor, accounted for half the cost of moving a truckload of medicine from Chicago to Nancy, France in 1960 (Levinson, 2008, p. 9). In response to these high costs, producers searched for alternatives. Trucker and en- trepreneur Malcolm McLean is generally credited with being the first to match vision with reality when he moved 58 truck trailers on a ship from Newark to Houston in 1956 on the maiden container voyage. Containerized trade relies on two key innovations. The first is the mechanization of con- tainer movements. Rather than workers with carts, specialized container cranes lift containers in and out of ships, around the port, and onto rail cars and trucks. This mechanization sub- stantially decreased per unit labor costs, cut time in ports and made ever-larger ships viable. Today’s Post-Panamax ship is more than 17 times larger than the first ship to carry container goods in 1956 (see ship sizes in Appendix Figure 2.5). The second key innovation of containerization is the development of common standards for container size, stacking techniques, and grip mechanisms. These standards allow a container to be used across modes of transportation—ships, trucks, rail—and across countries. The U.S. standard for containers was adopted in the early 1960s, and the international standard followed in the late 1960s. To achieve economies of scale, containerization requires physical changes to ports. In breakbulk ports, as cargo ports were known before the rise of containerization, ships pulled into finger piers and workers on- and off-loaded items by hand and cart. Ports were centrally located within cities and used a large amount of labor and a moderate amount of land for Chapter 2. The Local Impact of Containerization 84 warehousing and storage. In contrast, containerized ports require substantially less labor per unit of weight and a much larger amount of land. Land is used both for the large cranes that move containers and for the marshalling of containers and trucks (Rua, 2014). Despite containerization’s small-scale start, it diffused extremely rapidly across the United States. The bulk of domestic containerization adoption occurred in the 1960s, as shown in Figure 2.1a, which reports the total number of US containerized ports by year. In the early 1960s, the cost decreases from containerization were perceived as primarily a domestic benefit, or following Benjamin Chinitz, “a trend far more advanced in domestic waterhauls than in foreign trade” (Chinitz, 1960, p. 85). Containerization adoption in the United States continued at a slower pace throughout the 1970s and 1980s and plateaued thereafter. Post-containerization, the distribution of dominant ports has shifted. Of the ten largest ports before containerization (in 1955, measured in terms of international trade), two never containerized: New York (Manhattan), NY and Newport News, VA. In fact, the Port of Manhattan, the largest in the world in 1956, no longer exists as a freight port. Of today’s 25 largest ports, four did not rank in the pre-containerization top 25. Only two of the modern ten largest ports were in the pre-containerization top ten: Norfolk, VA and Los Angeles, CA.4 Adoption of containerization in the rest of the world followed a similar pattern, roughly one decade delayed. Figure 2.1b shows that the majority of containerization outside the US occurred in the 1970s (see also Rua (2014)). The pace of adoption in the US and across the world is consistent with the initial pattern of containerized trade. Until at least the mid 1960s, containerized trade was primarily domestic. The first international container service did not begin until 1966, nearly a decade after the first US shipment. Containerized trade is now central to the global economy. Bernhofen et al. (2016) estimate that containerization caused international trade to grow by more than 1,000 percent over the 15 years following 1966. As of 2013, containerized trade accounted for over half of global

4See Kuby and Reid (1992) on port concentration. Chapter 2. The Local Impact of Containerization 85 non-commodity trade (United Nations Conference on Trade and Development, 2013).5 The literature credits containerization with substantially decreasing the cost of water- borne trade. While Bridgman (2014) and Hummels (2007) note only a small decline in shipping rates, traditional measures of shipping costs understate the true cost advantage yielded by containerization. Containerization cuts the time ships spend at port and thus the total time in transit. Hummels and Schaur (2013) estimate that each day in transit is worth between 0.6 to 2.1 percent of the value of the good, showing that the time benefits of containerized shipping are non-negligible. In addition, losses to pilferage plummeted with containerization. Wilson (1982) estimates loses to pilferage at roughly 25 percent in the breakbulk era, and near zero in the container era.6 Finally, containers ease logistics costs by protecting goods from unintentional damage and allowing different kinds of goods, with different destinations, to be shipped together (Holmes and Singer, 2017). Using 2013 export transaction data for Turkey, Coşar and Demir (2018) find that containerization decreases variable shipping costs between 16 to 22 percent.

2.3 Theoretical Motivation

We now turn to the theoretical literature to frame our empirical work and understand con- tainerization’s potential impact. Containerization’s most important feature is the reduction in waterborne transit costs it generates. Because almost all goods transported by water re- quire additional land-based movement, reductions in trade costs due to containerization are largest in percentage terms at the port and decay as distance to the port increases. We assess the impact of this reduction in trade costs through the lens of a standard New Economic Geography model (e.g. Helpman, 1995; Redding and Sturm, 2008; Redding and Rossi-Hansberg, 2017). In this class of models, agglomeration and dispersion forces explain

5While containers are appropriate for carrying many goods, as diverse as toys and frozen meat, some goods are not yet containerizable. Both “non-dry cargo” and “dry-bulk commodities” such as oil, fertilizers, ore, and grain cannot be shipped inside “the box.” 6It is therefore no surprise that Scottish whiskey bound for US markets was on the first international container trip (Levinson, 2008, p. 165). Chapter 2. The Local Impact of Containerization 86 the uneven distribution of economic activity across space, resulting in particular from people moving in response to changes in real wages. Variation in real wages typically results from changes in (1) nominal wages, (2) the cost of living, and (3) land prices. Containerization’s reduction in trade costs has three main short run effects. First, when firms produce differentiated products and consumers love variety, locations with lower trade costs become more attractive to consumers. These locations offer a greater variety of goods at lower prices, reducing the cost of living and increasing real wages. Second, if there are increasing returns to scale in production, a reduction in trade costs also increases the profitability of firms because firms can access a larger market for their products and cheaper inputs for production. This “home market effect” yields an increase in nominal wages and, therefore, an increase in real wages. Third, due to increased trade, firms encounter more lower-priced competitors. This heightened competition, known as the “market crowding effect,” acts as a dispersion force and causes both nominal and real wages to decline. If there are gains from trade, as New Economic Geography models assume, the cost of living effect and the home market effect should dominate the market crowding effect. Thus, we expect a short run increase in real wages in locations near container ports.7 In the long run, however, higher real wages should attract people to locations near con- tainer ports. As population increases, land prices rise, in turn lowering real wages. Migration ceases when real wages equalize across space. Since the containerization-induced reduction in trade costs declines with distance from the port, we anticipate that the impact of containerization on population is greatest in places near container ports and declines as distance to the port increases. The simplest New Economic Geography framework, outlined above, assumes that places are all ex-ante homogeneous. However, an extension to the basic framework can allow the same shock to impact cities unevenly, as a function of the city’s initial characteristics. In the

7Note that even if there are gains from trade, the net effect on nominal wages is ambiguous because the home-market effect and the market-crowding effect go in opposite directions. Chapter 2. The Local Impact of Containerization 87 empirical section, we consider variation in both initial population and land values. Firms in initially less populous cities rely more heavily on the demand from non-local consumers. We therefore expect containerization to have a larger impact, in percentage terms, in initially smaller cities relative to initially larger cities. We also expect containerization to have an uneven effect based on pre-containerization land prices. Because container ports require large swaths of land for giant cranes and ex- tensive marshalling yards, rather than the water-based finger piers of the breakbulk era, container ports may be more viable in locations with initially low land value. However, as local productivity shocks are ultimately capitalized into the value of land (Moretti, 2011), low land value cities tend also to be small cities, all else equal. Empirically, the distinction between being initially low population and initially low land value is not empirically visible. In sum, New Economic Geography models predict that containerization’s reduction in trade costs causes population to increase near container ports. This effect diminishes as distance to the container port increases. Containerization’s net effect on nominal wages remains theoretically ambiguous because the productivity gains associated with access to a larger market may be offset by the intensified competition from distant firms. Finally, for a given distance to a container port we anticipate greater population growth in initially smaller cities. These smaller cities receive a proportionately larger increase in access to new markets and have relatively cheap land, which is key to container port development.

2.4 Data

To study the impact of containerization on local economic activity, we construct a county-level panel dataset that includes population and employment information, as well as proximity to port and port characteristics. This section gives an overview of the data, and we present full details in the data appendix. Our sample frame is the Decennial Census, for the years 1910 to 2010.8 We assemble a

8For the 2010 sample, we use the Decennial Census for population figures and the American Community Chapter 2. The Local Impact of Containerization 88 time invariant panel of counties by aggregating 1950 counties to their 2010 counterparts and by dropping a very few counties with large land area changes. From 1910 to 2010 we observe population; and from 1950 to 2010 income and demographic characteristics. We also observe total employment, total payroll, and employment and payroll by industry from the County Business Patterns from 1956 and then annually 1971 to 2011.9 We omit Alaska from our analysis because its administrative districts in 1950 do not correspond to modern counties. This yields 3,023 counties with complete data.10 To this sample frame, we add port attribute data. Our universe of ports is all ports that existed in either 1953 or 2015, as defined by the 1953 and 2015 World Port Index. For each port, we observe its location (latitude and longitude), size (in four discrete categories), and depth (in eight discrete categories). We gather the year of first containerization from the Containerisation International Yearbook, volumes 1968 and 1970 to 2010.11 We also observe 1948 and 1955 international trade in dollars by port from the Census Bureau’s Foreign Trade Statistics. We associate each county with a vector of ports and port characteristics, which include the distance from each county to each port, the number of nearby 1953 ports, the maximal depth of nearby ports in 1953, and the total value of international trade at nearby ports in 1948 and 1955.12 We also include variables that characterize the state of the transportation network now and at the advent of containerization (c. 1957 for highway and c. 1960 for rail). We measure total rail kilometers, highway kilometers, and waterway kilometers in each county, per square kilometer of each county’s area. In addition to these detailed US data, we construct a less detailed panel dataset of world

Survey (years 2008–2012) for other demographic covariates. 9We are very appreciative of digitized 1956 County Business Patterns from Matt Turner and Gilles Du- ranton. See the data appendix for more information about these data. 10Estimations using County Business Patterns data use a slightly smaller sample because the provider suppresses data for counties under certain conditions; see data appendix for complete details. 11For the purposes of this paper, and consistent with the industry definition, we call a port “containerized” when it has special infrastructure and equipment to handle containers. Specifically, the port has invested in equipment to handle shipping containers which enables their movement in and out of ship and onto a train or a truck. 12We calculate all distances as the great-circle distance from the county centroid. Chapter 2. The Local Impact of Containerization 89 cities. The sample frame for world cities is the United Nation’s 2014 Revision of World Ur- banization Prospects. This dataset contains all 1,692 urban agglomerations with populations exceeding 300,000 at any time between 1950 and 2014. By construction, this sample over- represents fast growing cities that were small in 1950 but grew rapidly in the second half of the twentieth century. To mitigate this sampling issue, we restrict the sample to cities with population over 50,000 in 1950, yielding a world panel of 1,051 cities.

2.5 Empirical Methods

We now turn to our empirical strategy for estimating the causal effects of containerization on local economic activity. We first present a difference-in-differences framework to analyze the impact of proximity to a containerized port on economic activity and illustrate its strengths. We then discuss remaining concerns with causality, followed by a motivation for and details about our instrumental variable strategy.

2.5.1 Difference-in-Differences

Our goal is to understand how local economic activity responds to the advent of container- ization. Specifically, we test the theoretical predictions that population and employment increase in locations close to container ports and that these gains attenuate with distance from the port. We also test whether percentage gains are larger in locations with initially low land values, all else equal. Empirically, we ask whether county proximity to a containerized port is associated with changes in key economic outcomes, conditional on a host of covariates. We estimate

∆ ln(yi,t) = β0 + β1∆Ci,t + β2Xi + ∆i,t , (2.1)

where i ∈ I indexes counties and t ∈ T indexes years. Our primary dependent variable, yi,t, is population. We also investigate the impact that containerization has on nominal wages, industrial composition, and income. The operator ∆ denotes long run differences, so that Chapter 2. The Local Impact of Containerization 90

13 ∆ ln(yi,t) = ln(yi,t) − ln(yi,1950). Capital letters denote vectors. Our key explanatory variable is an indicator for proximity to a containerized port at time t, ∆Ci,t, which is equivalent to Ci,t, as no containerized ports existed in 1950 (Ci,1950 = 0 ∀i ∈ I). We allow for potential non-linear impacts of proximity to a containerized port by using indicator variables for port proximity by distance bin. Figure 2.3a shows this parameterization. Counties in the darkest blue are located within 100 km of a containerized port, counties in mid-blue are between 100 and 200 km from a containerized port, counties in light blue are between 200 and 300 km from a containerized port, and counties in light pink are more than 300 km away from a containerized port. Mathematically, we parameterize proximity to a containerized port as

X 1 β1∆Ci,t ≡ β1,d {Closest containerized port is between d1 and d2 km}i,t, (2.2) d∈D where d ∈ D are a set of distance bins of {0 − 100, 100 − 200, 200 − 300} kilometers. We interpret β1,{0−100} as the percentage change in the dependent variable for counties within 100 km of a containerized port relative to counties more than 300 km away from a containerized port, conditional on covariates. Coefficients β1,{100−200} and β1,{200−300} refer to the remaining distance bins.

Theory suggests that population increases in counties proximate to container ports (β1 > 0). In addition, standard New Economic Geography models predict that containeriza- tion’s impact attenuates with distance from the port, so that β1,{0−100} > β1,{100−200} >

14 β1,{200−300}. However, theory does not clearly predict where the impact of containerization stops, so this bound of 300 km comes from the data (see a more detailed discussion on this in Section 2.6.2, footnote 24). To establish the causal effects of containerization on local economic activity, we must contend with the non-random assignment of containerized ports to counties. The difference-

13When we use County Business Patterns data, the initial year is 1956. 14This framework does not allow us to distinguish between growth and reallocation. See footnote 21 for a discussion of the magnitude of reallocation required for growth to be negligible. Chapter 2. The Local Impact of Containerization 91 in-differences specification in Equation (2.1) goes some way to this end by netting out any time-invariant county-specific characteristics correlated with the location of containerized ports. Such characteristics include geography, proximity to population centers, climate, and historical antecedents for the location of particular industries. This method also nets out any national changes that impact all counties equally from 1950 to 2010. In the event that county proximity to a containerized port is also a function of time- varying county attributes, we also include a vector of baseline covariates, Xi. Including initial covariates in the difference-in-differences model is akin to allowing for differential trends in the dependent variable by the initial covariates. We list these in greater detail in Section 2.6, but Xi includes regional fixed effects, distance to the ocean, measures of geographic proximity to ports in 1953, the extent of the initial transportation network, initial demographic characteristics, initial industry mix, and pre-1950 county population. We cluster standard errors throughout at the 2010 commuting zone to account for spatial dependence in the error. A commuting zone is a grouping of counties that approximate a local labor market. The average commuting zone includes 4.4 counties.15 This empirical strategy yields a causal estimate of the effect of proximity to a container- ized port on local economic activity when proximity to a containerized port is uncorrelated with the error term. This is equivalent to saying that β1 can be interpreted as a causal estimate when proximity to a containerized port is randomly assigned, conditional on time- invariant county-level factors and the included initial covariates. Because we include a host of initial period covariates, these estimates cannot be driven by, for example, regional trends in population growth, or differential population growth related to proximity to the coast. To test the predictions that gains vary by initial conditions, we introduce an interaction

15 We have also made standard error estimates with the spatial HAC method, using radii of 100, 200 and 300 km. Because these standard errors are in general smaller than those using commuting zones, and because these spatial standard errors are not (to the best of our knowledge) yet available for the instrumental variable case, we use commuting zone clustering throughout. Even in principle, commuting zone clustering may be preferred, as commuting zone counties are linked by economic activity and therefore likely to be spatially correlated. In contrast, counties within a fixed radius may be less likely to be related in an economically meaningful way. Chapter 2. The Local Impact of Containerization 92

term that allows β1 to vary below the median of a given covariate. Call this covariate hi and

16 let Hi = 1 when hi < median(hi) and 0 otherwise. We therefore modify Equation (2.1):

∆ ln(yi,t) = γ0 + γ1∆Ci,t + +γ2∆Ci,t ∗ Hi + γ3Xi + γ4Hi + ∆i,t . (2.3)

Now γ1 reports the average impact of proximity to a container port on population growth, and γ2 reports whether there is any incremental population gain or loss in counties when hi is below the median. We expect containerization induced population growth to be larger, in percentage terms, in locations with low initial population and low initial land values. We therefore anticipate γ2 > 0 when hi is a measure of initial land values or population. While both equations (2.1) and (2.3) net out county-specific time-invariant factors as well as trends by initial conditions – including distance to the ocean and initial share of employment in manufacturing – it may still be the case that an element in the error ∆i,t remains correlated with both containerization and the outcome variable of interest. For example, if counties near container ports were more likely to specialize in an agricultural commodity that became tradeable since the 1950s, we could conflate local economic growth due to the increase in the trade of the agricultural commodity with local economic growth related to containerization.

2.5.2 Instrumental Variables

To address this type of concern – and any other remaining non-randomness in the assignment of containerized ports to counties – we use proximity to a very deep port in 1953, Zi, as an instrument for proximity to a containerized port, ∆Ci,t. Specifically, we instrument proximity to a containerized port with proximity to initially very deep ports as

∆Ci,t = α0 + α1Zi + α2Xi + ∆ηi,t , (2.4)

16 Hi relative to the overall distribution and Hi relative to the treated distribution are both of interest. We consider both empirically; in practice the difference in estimates is quite small. Chapter 2. The Local Impact of Containerization 93

where α1Zi is

X 1 α1Zi ≡ α1,d {Closest very deep port in 1953 is between d1 and d2 km}i. (2.5) d∈D

Thus, we have three potentially endogenous variables and three instruments. For the in- teraction specification in Equation (2.3), we use both proximity to a very deep port, Zi, and that proximity interacted with being below the median of a given covariate, Zi ∗ Hi, as instruments—so, six instruments overall. There are two requirements for the instrument to yield a causal estimate of proximity to a containerized port on local economic activity. The first is a strong relationship between proximity to a containerized port and proximity to a very deep port in 1953. The second requirement is that, conditional on covariates, proximity to a very deep port in 1953 is uncorrelated with unobserved determinants of changes in local economic activity from 1950 to period t. In other words, proximity to a very deep port in 1953 impacts changes in local economic activity only through its impact on proximity to a containerized port, conditional on covariates (Cov(Zi, ∆i,t) = 0). We discuss each of these requirements in turn. First, we anticipate that proximity to a containerized port should be strongly related to proximity to a very deep port in 1953 because container ships require deeper ports than their predecessors. As Appendix Figure 2.5 illustrates, container ships are much larger than their predecessors and larger ships sit deeper in the water and thus require greater depth to navigate and dock. It is possible, but quite expensive, to drill, blast or dredge an initially shallow port sufficiently deep to accept container ships. Given enough money and sufficiently lax envi- ronmental regulation, a harbor can arguably be made arbitrarily deep. However, port depth is only malleable at great cost. Therefore, initially deep ports have a competitive advantage when technology changes to favor very deep ports. This inability of ports to adjust equally is confirmed by Broeze, who notes that while “ship designers [keep] turning out larger and larger vessels,” and “the engineering limits of port construction and channel deepening have by no Chapter 2. The Local Impact of Containerization 94 means been reached[, t]his, however, may not be said of the capacity of all port authorities to carry the cost of such ventures” Broeze (2002, pp. 175–177). Thus, initial port depth is a key component of the cost of converting a breakbulk port into a containerized port. Our instrument is therefore analogous to a cost shifter instrument often used in the industrial organization literature. Port depth should affect the supply of ports after the advent of containerization, but have no effect on the demand for ports. The intuition that port depth is a key driver of containerization is borne out in practice by containerization’s pattern of adoption. Figure 2.2a shows the likelihood that a county becomes proximate to (within 300 km of) a containerized port over time by the maximal depth of ports within 300 km of the county in 1953.17 Thick lines indicate depths we consider “very deep.” It is immediately clear that proximity to deep ports in 1953 is a strong predictor of proximity to a containerized port at time t. Counties within 300 km of a port with depth greater than 40 feet are always within 300 km of a containerized port by the end of the sample period, as are almost all counties with 300 km of a port 35 to 40 feet deep. Roughly 20 percent of counties within 300 km of a port with depth between 25 and 35 feet are not near a containerized port by the end of the sample period. For counties within 300 km of less deep ports, however, containerization is decidedly not a certainty. Indeed, counties near initially shallow ports—those less than 20 feet deep—are never within 300 km of a containerized port. An alternative way to view the strength of our instrument is to compare Figures 2.3a and 2.3b. The top panel is the map of US counties, where treated counties are blue and deeper blue indicates greater proximity to a containerized port. The bottom panel repeats this map, but re-colors treated counties in green when the instrument predicts treatment. “Predicting treatment” means that a county is both between d1 and d2 km from the nearest containerized port in 2010 and between d1 and d2 km from the nearest very deep port in 1953. This picture demonstrates that while the instrument frequently fails to predict treatment in the Midwest,

17We use depth of the wharf in 1953 as our measure of pre-containerization port depth. Results are robust to using anchorage and channel depth, which the World Port Index also reports. Chapter 2. The Local Impact of Containerization 95 it predicts treatment quite accurately on the ocean coasts.18 Given this evidence of a strong relationship between the endogenous variables and the instruments, we now turn to the second condition for instrument validity—that proximity to a very deep port in 1953 affects local economic activity only through its impact on proximity to a containerized port. A key concern with the instrument is that proximity to a deep port may explain changes in county economic activity even before containerization. This is surely true, as ports have long been engines of growth. For this reason, rather than rely on the full distribution of port depth, we use an indicator variable for a county being proximate to a very deep port pre-containerization. Specifically, we call a port “very deep” when it is 30 feet or more deep in 1953. We choose this depth cut-off because the historical record indicates no perceived advantage to depth greater than 30 feet in the pre-containerization era. Before containerization, while port depth conveyed some advantage, it was not particu- larly useful for a port to be very deep given the draft of breakbulk ships. This is clear even from how data on port depth was collected. The 1953 World Port Index’s deepest category is “40 feet and above,” while the deepest category in the 2015 World Port Index is “76 feet and over.” Thus, intuitively, our instrument measures how much more likely a county is to be- come proximate to a container port if it is proximate to a very deep port in 1953, conditional on initial covariates. Our specification includes covariates that allow for differential growth trends in the dependent variable by the number of ports in 1953 within 300 km in 100 km bins and the values of international trade at these ports in 1955, also measured in 100 km bins. Therefore, the instrument captures the impact of proximity to an initially very deep port above and beyond proximity to many ports in 1953 and to high value ports in 1955. Our claim that depths beyond 30 feet were not particularly advantageous to port success is supported by a number of contemporary commentators. A 1938 monograph notes the critical 30-foot cut-off, arguing that “For the ports with which we are dealing, the 30-foot channel at

18We address this case where the instrument fails to predict treatment in Section 2.6.2. Chapter 2. The Local Impact of Containerization 96 low-water will be taken as the minimum standard in relation to the needs of modern ships” (Sargent, 1938).19 However, he notes that the cost of making a channel deeper is no small endeavor: “It is a question how far the rest of the world, Europe in particular, is prepared, except in special circumstances, to face the very heavy cost of providing for the needs of the ocean mammoth” (Sargent, 1938, p. 21). This author’s focus on the irrelevance of extreme depth is not unique. Even as late as 1952, F. W. Morgan argues in Ports and Harbours that beyond a certain level, depth is not a particularly useful feature of a port:

The importance for a few ports of maintaining a ruling depth sufficient to admit the largest liners [a draft of 40 feet] emphasizes unduly their importance to the port world. A super-liner which comes into a port every few weeks will, it is true, amplify that port’s tonnage figures by half a million tons or so annually. . . . The greater part of world trade by sea and the greater part of the traffic of many ports is concerned with ships of more modest size. It would certainly be possible to devise a classification of ports by the draught of ship which can be berthed in them. Halifax and Wellington would appear in the first class, and their ability to berth the largest ships is a great asset in wartime. It tells, however, only a little about their normal significance as ports. (p. 15, Morgan (1952))

Thus, pre-containerization, being very deep was not a particularly valuable port attribute. This instrumental variables strategy implies multiple tests for validity. First, if our claims about the role of “very deep” ports are true, we should see no impact of proximity to very deep ports on population growth in the pre-containerization era. In addition, in any sub-sample where our instrument does not predict treatment, the instrument should have no direct im- pact on population growth. Finally, the instruments should not be correlated with potential confounders that might be in the error term. We turn to these tests in the instrumental variables results section. 19He goes on to write that in the U.S., a 35-foot draught is becoming standard (p. 21). Chapter 2. The Local Impact of Containerization 97

2.6 Results

With this empirical framework in hand, we now turn to estimation. The first subsection reports summary statistics and the difference-in-differences results. The second subsection presents tests of instrument validity, discusses our main instrumental variable results, and assesses whether the results are robust to alternative specifications. The third subsection tests whether containerization’s impact is larger, in percentage terms, in places with low initial land values.

2.6.1 Difference-in-Differences

We begin with the difference-in-differences specification to test the theoretical prediction that containerization increases local economic activity. The summary statistics in Table 2.1 illustrate the comparison at hand and preview the main results. The three leftmost columns report county means by distance to the nearest containerized port by 100 km bins; the fourth column shows means for all observations within 300 km of a containerized port, and the final column reports means for all other counties, which we call “never containerized.” A county may appear in only one distance bin. The number of observations in the “ever” and “never” columns sum to the total sample size (final row). On average, counties near container ports have experienced about forty years of containerization. The figures on log population in the first rows of this table clearly show that counties near containerized ports were larger pre-containerization and that counties closest to container- ized ports were largest. From 1910 to 1950—the pre-containerization years—log population in counties near future containerized ports is larger and increases at a faster rate than in coun- ties farther from future containerized ports. These differences between counties generate a possible bias in the OLS estimation that we address in the IV section. The summary statistics also show some additional differences between counties by proxim- ity to a containerized port. Across census regions, counties near containerized ports are over Chapter 2. The Local Impact of Containerization 98 represented in the Northeast, under represented in the Midwest and West, and about pro- portionately represented in the South. Counties near containerized ports had a substantially larger share of workers in manufacturing in 1956, on average. In addition, these summary statistics illustrate our main finding that counties near con- tainerized ports grow at a faster pace after the advent containerization than the average untreated county. This relative increase is visible not only in the population data, but also in the employment and payroll per employee data from the County Business Patterns. Moving to a regression framework, Table 2.2 presents difference-in-differences results, testing the prediction that proximity to a containerized port is associated with greater pop- ulation growth after the advent of containerization. Column 1 presents estimates including only regional fixed effects and shows a 97 percent increase in population growth for counties within 100 km of a containerized port relative to counties more than 300 km away from a containerized port. This coefficient declines to 42 percent for counties between 100 and 200 km from a containerized port and to 26 percent for counties between 200 and 300 km from a containerized port.20 The remaining columns in this table add additional covariates. To address the concern that counties of different size may grow at different rates—especially since counties near containerized ports are uniformly initially larger—Column 2 controls for log of population in years 1920, 1930 and 1940. We also add controls for the share of population with a college degree and share African American by county, both measured as of 1950. To isolate the impact of containerization from proximity to the coast, initial port intensity and pre-containerization port prominence, Column 3 adds additional controls. These are distance to the ocean, three variables for the number ports in 1953 within 300 km, measured in bins of 100 km, and three variables for the total value of 1955 international trade at ports within 300 km, again measured in bins of 100 km. Results decline by about one-third to one quarter, so that the gradient by distance bin is now 58, 29, and 17 percent, respectively.

20In this and all estimates in this paper, we cluster standard errors by the 2010 commuting zone to account for spatial dependence across counties. See footnote 15 for more details. Chapter 2. The Local Impact of Containerization 99

Finally, we address the higher rates of 1956 manufacturing activity near future container- ized ports, as seen in Table 2.1). The fourth column includes this variable and measures of the extent of pre-existing transportation networks as controls. Measures of the 1950s-era transportation are the length of highways, navigable waterways, and railways per square kilometer. These controls have little additional impact on the size of the coefficients. We now estimate 57, 28 and 14 percent increases in population with distance to the closest containerized port. These results are consistent with the theoretical predictions of a standard New Economic Geography model: population increases near containerized ports and gains dissipate with distance.21 Population increases are large and decline monotonically, but not linearly, with distance from the containerized port. We defer a detailed discussion of the magnitude of the estimates and the choice of the 300 km border until the presentation of the instrumental variables results.

2.6.2 Instrumental Variables

Although the difference-in-differences specification addresses many confounding factors po- tentially correlated with both proximity to a containerized port and population growth—such as past population and initial industrial mix—it is possible that some part of the error term remains correlated with the treatment. We now turn to our instrumental variables estimates. We start with the graphical reduced form intuition, proceed to instrument strength and validity, follow with instrumental variable results, and conclude with measures of robustness.

Reduced Form: Relating Proximity to Very Deep Ports and Population Growth. To give intuition for the instrument variable analysis, Figure 2.2b presents a graphical illus- tration of the reduced form regression (a regression of change in the log of population on the

21Our estimation does not discriminate between growth and reallocation. In the period between 1950 and 2010, the US population roughly doubled, from about 150 to roughly 300 million. Thus, our results seem very unlikely to be driven exclusively by reallocation, as they would require approximately half of the 1950 population to relocate due to containerization. Chapter 2. The Local Impact of Containerization 100 instrument). This figure presents the average log of population over time by initial depth category. Thick lines indicate counties within 300 km of ports that we classify as very deep in 1953; thin lines are counties within 300 km of ports less than 30 feet deep in 1953. We also include a line for counties not within 300 km of a container port. In essence, the estimation asks whether the thicker lines trend upward more after 1956 (the vertical ) than do the thin lines. This picture shows that the thick lines of counties near very deep ports do, and that the gains are driven primarily by initially smaller counties—the beige and purple lines.

Instrument is Strong and Unrelated to Pre-containerization Population Growth. We already saw from Figure 2.2a (discussed in Section 2.5) that the instrument is strong. Appendix Table 2.7 validates this intuition, reporting coefficients for the three equations that estimate the full first-stage (one equation per distance bin). The table shows the pattern we expect if the instrument is working as we hypothesize: counties that are between d1 to d2 km from the closest very deep port in 1953 are more likely to be between d1 to d2 km from the closest containerized port in 2010. These coefficients on the diagonal are large—in the 0.5 to 0.6 range—and strongly significant. Thus, even conditional on the many covariates we use, proximity to a very deep port in 1953 remains an important predictor of proximity to a containerized port in 2010. The lowest F statistic on the instruments in any of these three equations is 22; the highest is 59. Our two-stage least squares estimates tables always report the Kleinberg-Paap F statistic, which summarizes the overall strength of the first-stage, as suggested by Sanderson and Windmeijer (2016). In our main instrumental variable estimates, this F statistic is never smaller than 21.22 Given that the instrument is strong, we now turn to three tests for validity. First, we examine whether proximity to a very deep port is related to pre-containerization population changes; given what we have argued, it should not be. Figure 2.4 shows the distribution of

22These first-stage results are also qualitatively robust to defining “very deep” as one category above (greater than 35 feet deep) or one category below (greater than 25 feet deep). The F statistics are larger, and the estimates more precise, when we use the lower depth cut-off. Chapter 2. The Local Impact of Containerization 101 population change 1910 to 1950, conditional on regional fixed effects and distance to the coast. The red line shows the distribution for counties near (within 300 km) of very deep ports, and the the distribution for counties far from very deep ports. These distributions are virtually indistinguishable. The 95 percent confidence interval on a dummy from a regression distinguishing between these two types of counties is small relative to the first- stage coefficients and covers zero: [-0.11,0.04]. Thus, we find little evidence that proximity to a very deep port impacts pre-containerization population growth, adding confidence in the validity of the instrument. An additional implication of the IV framework is that, in cases where the instrument fails to predict treatment, the instrument should also be uncorrelated with the dependent variable – since the assumption underlying the instrumental variable specification is that the instrument impacts the dependent variable only through the endogenous variable. In our data, proximity to port depth fails to predict proximity to containerization in the Great Lakes region. Ports in this area were not very deep in 1953, yet regional ports did adopt containerization. If the proximity to deep ports impacts population and other outcomes only through proximity to containerization, then in cases where port depth is unrelated to containerization, it should also be unrelated to population changes (see Angrist et al. (2010), page 798). Limiting our analysis to the roughly seven hundred counties within 300 km of the Great Lakes, we find a very weak relationship between proximity to port depth and proximity to containerization. Further, we see no relationship between proximity to deep ports and population growth. The coefficients on the instrument in the reduced form specification are an order of magnitude smaller than the main estimates (coefficients by distance bin are -0.040, 0.078, and 0.050) and are never different from zero. See Appendix Table 2.8 for complete results. Our third test of instrument validity evaluates whether the instruments are correlated with county-level characteristics that might plausibly be in the error term. While we cannot Chapter 2. The Local Impact of Containerization 102 do this for all potential confounders, we can observe whether the identifying variation—the residual from a regression of an instrumental variable on the full set of covariates from Table 2.2—is correlated with specific pre-treatment covariates, also conditional on covariates. Recall that our regression specification controls for log of population in 1920, 1930, and 1940. Were the identifying variation in the instrument to be related to the log of 1910 population (conditional on covariates), this would suggest that the pre-treatment controls were not adequately capturing the historical pattern of population growth. We do not find this to be the case. We do a similar analysis for international trade at ports. Recall that the regression controls for the 1955 value of international trade flows in each of the three distance- to-containerized-port bins. If this covariate did not sufficiently control for the impact of pre- containerization port strength on population growth, we would expect that the identifying variation would be related to the 1948 value of international trade flows by distance-to- containerized-port bins, conditional on covariates.23 Appendix Figure 2.6 displays the full matrix of scatterplots showing the correlation be- tween 1910 population and 1948 trade and the identifying variation. There are no significant relationships, and the largest t value for any of these relationships is 2x10 − 8.

Instrumental Variable Results Consistent with Difference-in-Differences Find- ings. Given these tests of validity, we report instrumental variable results in the right half of Table 2.2. The columns repeat the pattern of covariates from the OLS half of the ta- ble. The coefficients are generally quite similar, though slightly larger than the OLS in the complete specification (columns 4 and 8). Why might IV results be larger? As discussed in section 2.3, we expect containerization to have a larger impact on population growth in initially smaller counties. When we use the instrument to correct for endogeneity in the proximity to a containerized port, we are in principle giving more weight to initially smaller counties where the depth is the main driver of the containerization decision. As a result,

23An alternative method is to include these controls directly in the regression, and results are robust to doing so. We believe that this test, however, highlights the econometric implication of this lack of importance: that the identifying variation is not correlated with likely confounders. Chapter 2. The Local Impact of Containerization 103 coefficients in the IV regression increase. The most complete model in column 8 shows a 70 percent increase in population growth over the 60 years from 1950 to 2010 for counties within 100 km of a containerized port relative to counties more than 300 km away from a containerized port. Consistent with the expected relationship between the gains to containerization and distance from the port, this coefficient declines to 33 and 23 percent for counties slightly farther from containerized ports.24 To interpret the magnitude of these results, we turn to Duranton and Turner (2012), who find that a 100% increase of a city’s initial stock of highways yields a 13 percent increase in population over a 20 year period. This corresponds to an annualized increase of about 0.6 percent. Our findings are similar. Being within 100 km of a containerized port causes a 70 percent increase in population over a 60 year period (exp(.53) − 1 = .70), implying a comparable annual growth rate. Our containerization effect is thus roughly equivalent to a doubling in the initial stock of highways in a county.25

Containerization’s Impact Increases Over Time. To test for changes in the impact of containerization over time, we re-estimate Equation (2.1) using different final years, start- ing in 1970. We report coefficients from these estimations in Appendix Figure 2.8, which displays results decade-by-decade. Full circles are significant coefficients and hollow circles are insignificant coefficients (at the five percent level). The red line at the top reports the coefficients for counties within 100 km of a containerized port; the orange line 100 to 200, and the yellow line 200 to 300 km. Apart from a blip in 1980, counties near containerized ports

24Both here and in the OLS estimates, we compare counties within 300 km of a containerized port to all other counties. As theory does not provide guidance on the physical distance over which containerization might have a measurable impact, we turn to the data as a guide. Appendix Figure 2.7 shows regression coefficients from a version of Equation (2.1) where distance to containerized port is measured in 50 km bins. Gray bands are confidence intervals. These results show that the association between proximity to a containerized port and population growth is indistinguishable from zero at 300 km. In our main specification, we use bins of 100 km, rather than the smaller 50 km ones, to increase the power in the estimates. This is particularly important when we examine whether containerization’s impacts differ by initial conditions in subsection 2.6.3. 25Containerization required substantial investments. In the years of peak outlays from 1968 to 1973, the U.S. spent about $2015 8 billion of public and private funds on the required port infrastructure (Kendall, 1986). This is about $2015 1.6 billion per year, one fourth of the annualized cost of the Interstate Highway Sys- tem from 1956 through 1991 (https://www.fhwa.dot.gov/interstate/faq.cfm, assessed on 08/21/2017). Chapter 2. The Local Impact of Containerization 104 have large population gains that increase over time. For example, in 1970, only 15 years after the advent of containerization, counties closest to containerized ports had grown by almost 49 additional percent relative to counties more than 300 km away from a containerized port. By 2010, this figure was 73. While estimates for counties farther from ports are smaller, they also follow this general pattern of increase. This increasing impact decade-by-decade may reflect the increasing size of the containerized port network, as shown in Figure 2.1.

Results Robust to Additional Considerations. We now turn to threats to identifica- tion. Rappaport and Sachs (2003) argue that coastal locations have long been associated with greater economic growth, crediting both increased productivity and, more recently, bet- ter amenities. We can interpret containerization as a productivity-enhancing mechanism that generates part of the Rappaport and Sachs result. However, our estimates show that containerization is more than just coastal proximity: our main results are little changed by the inclusion of a Rappaport and Sachs coastal indicator (Table 2.3 column 2).26 To further isolate the impact of containerization from proximity to the coast, Table 2.3’s column 3 restricts the sample to counties within 400 km of a port in 1953. The sample size drops from 3,023 to 1,767 observations, but the coefficients decline only slightly (compare estimates to column 1, which repeats the most complete specification from Table 2.2). This suggests that population growth in counties near a containerized port is not driven by a comparison with slower-growing centrally located counties. Furthermore, we know from the summary statistics in Table 2.1 that counties near con- tainerized ports experience more rapid population growth pre-containerization, and this trend may have continued after 1956 irrespective of containerization. We account for this in the main estimates by including log population in 1920, 1930, and 1940. Table 2.3’s column 4 additionally includes squares of those measures of past population, in the event that previous population impacts population growth non-linearly. Again, the estimates are little changed. As we discussed, our instrument does not predict containerization in the Great Lakes

26Rappaport and Sachs measure coast as locations within 80 km of the Great Lakes and ocean coasts. Chapter 2. The Local Impact of Containerization 105 region, which does have container ports. In addition, this region experiences the slowest population growth over our period of analysis. To allay fears that the results are driven by this potentially anomalous treatment of the Midwest, column 5 omits the Midwest region entirely, leaving 1,975 observations. Results in this column are smaller than the original specification, but the pattern of decline with distance to the closest containerized port remains. Indeed, we should expect smaller coefficients in this estimation because the control group—non-Midwest, non-containerized ports—now has a higher average population growth. Note the increase in the mean of the dependent variable from 0.373 to 0.508 (final row of the table). Still, we observe a relative population increase of 55 percent near containerized ports, an increase of almost three-quarters of the mean. Research in urban economics strongly suggests that growth is associated with an area’s education and demographic characteristics (Moretti, 2004). Column 5 includes additional controls for the share of people 25 or older with a high school degree, the share foreign born, the number of government workers per capita, and the share age 65 and older by county. The addition of these covariates decreases the coefficients slightly, with greater impact for the category closest to containerized ports. The coefficients remain sizeable, and retain the pattern of decline with distance to containerized ports. We conclude this discussion of robustness by considering two additional pre-1956 infras- tructure investments plausibly correlated with port depth. The first such infrastructure is naval bases. In the US, large military installations may promote local economic activity. If growth-yielding federal investments were concentrated near very deep ports, this could bias the coefficient on proximity to containerization upward. When we re-estimate Equation (2.1) using instrumental variables, omitting counties within 300 km of any naval base, coefficients are slightly larger and statistically indistinguishable from the main specification.27

27As of the 1950s, the US had four domestic naval bases, at least 10 naval stations, and over 250 total facilities, which includes hospitals, test stations, air stations, and a large variety of other installations (U.S. Department of the Navy, 1952, 1959). Naval bases were Pearl Harbor, HI; San Diego, CA; Norfolk, VA and New London, CT. New London was actually taken out of “base” status between 1952 and 1959, but we include it for completeness. Relative to naval bases, naval stations are smaller, serve more limited purposes, and receive less investment (Coletta, 1985). Naval stations are so numerous that 300 km bands around them Chapter 2. The Local Impact of Containerization 106

Similarly, if very deep ports were crucial for oil importation, and oil importation caused population growth, our estimate of β1 would be biased upward. A number of factors argue against this interpretation. First, as of 1948, 90 percent of US oil was produced domestically and the US accounted for 62 percent of the world oil market (Mendershausen, 1950, p. 4). It was not until the 1970s, almost two decades after the advent of containerization, that the US was no longer able to fulfill oil demand with domestic oil. Furthermore, port depth is not a key determinant for suitability as an oil port, allaying concerns about the validity of the instrument. During the period of domestic oil hegemony, most oil moved by pipeline, rather than by ship. Even when oil importation grew, port depth was not as crucial, because oil ships connect to offload via a pipeline, which can be quite long. Therefore, ships need not dock directly at the harbor to offload oil. Further, until the Suez Canal was dredged in the mid-1960s, it did not allow vessels with a draft deeper than 37 feet (Horn et al., 2010, p. 43). Our analysis of robustness concludes by turning to a dataset of world ports and world cities to assess containerization’s global impact. We focus primarily on the United States in this paper because of the rich data available at a relatively small geographic scale. However, containerization is clearly a global phenomenon, and one that may have had an even larger impact on economic activity in countries other than the United States. We use world popu- lation and port data to estimate regressions that parallel our main US regressions. We report results in Table 2.4. Columns 4 reports OLS results controlling for country fixed effects, the number of ports in 1953 within 300 km of each city (in 100 km distance bins), distance to the ocean, and log population in 1950. We find that cities within 100 km of a containerized port experience a 9 percent increase in population growth between 1950 and 2010 relative to cities more than 300 km away from a containerized port. Just as in the US sample, we are concerned that the assignment of containerized ports to cities is not random, generating bias. Using the same instrumenting technique as in the are indistinguishable from coastal locations; see our control for coastal locations in Table 2.3. Chapter 2. The Local Impact of Containerization 107

US sample, we find that, similar to the US, proximity to a very deep port in 1953 is strongly related to proximity to a containerized port in 2010 (Appendix Table 2.9 presents summary statistics and Appendix Table 2.10 shows a strong first stage). The instrumental variable coefficients have the same signs as the OLS results, but are substantially larger. In the most complete specification in column 8, we find that cities within 200 km of a containerized port grow by an additional 35 percent. For cities between 200 and 300 km of a containerized port, we estimate a statistically insignificant increase in population growth of about 12 percent. These results are smaller in absolute terms than for the US, likely because we consider a sample of international cities that are relatively larger than the majority of US cities.

Containerization’s Impact on Other Economic Outcomes. Having shown that prox- imity to a containerized port causes population growth, we test whether proximity to a containerized port also causes an increase in employment, nominal wages, industrial compo- sition, and income. Using instrumental variables estimation with the full set of covariates from Table 2.2, column 1 in Table 2.5 confirms that, from 1956 to 2011, employment increases more in counties near containerized ports.28 While only the coefficient for counties closest to a container port is statistically significant, the magnitude and pattern of employment increases is strikingly similar to what we find using Decennial Census population data. How- ever, in comparison to the mean, these figures are substantially smaller. The mean change in log employment over the period is 1.13 (see final row), compared to a mean increase in log population of 0.37 (see final row in Table 2.3). The dependent variable in Column 2 is nominal first quarter payroll per employee. Prox- imity to a containerized port is virtually unrelated to nominal payroll per employee. As discussed in Section 2.3, the net effect on nominal wages is theoretically ambiguous because the home market effect and the market crowding effect go in opposite directions. The middle two columns of Table 2.5 assess whether containerization changed the in- dustrial composition of counties near containerized ports. Column 3 reports the share of

28Employment data is from County Business Patterns (see details in Section 2.4). Chapter 2. The Local Impact of Containerization 108 employment in manufacturing, the industry most likely to produce products that travel in shipping containers. On average, across all counties, the share of employment in manufac- turing declined by about 20 percent from 1956 to 2011 (last row). The coefficients reveal very little evidence of a smaller decline in manufacturing among treated counties. Nonetheless, a more narrow focus on transportation does show relative growth. In col- umn 4, the dependent variable is the share of employment in transportation services, which is “services which support transportation,” and which includes “air traffic control services, marine cargo handling, and motor vehicle towing”.29 Relative to the miniscule one-tenth of one percent of employment in this industry on average, counties within 100 km of a con- tainer port see a statistically significant gain of three times this mean. Counties more than 100 km away from a container port see no significant change in this sector. Our finding that employment shifts towards transportation services is reminiscent of Michaels (2008) and Du- ranton et al. (2014) who finds that counties connected with highways experience an increase in trade-related activities, such as trucking and retail sales. Finally, in the last three columns, we look at the impact of containerization on the income distribution. We look at income for the 10th, 50th, and 90th percentiles and find that counties within 100 km of a containerized port experience larger and significant increases in income across the whole distribution. In addition, as with population and overall employment, the pattern of decline with distance to the closest containerized port remains: counties farther away from a containerized port experience smaller additional increases in income relative to counties more than 300 km away from a containerized port.

2.6.3 Where Gains to Containerization Are Largest

In the previous subsections, we show that, on average, proximity to a containerized port causes increases in population and employment. We hypothesize that gains should be greater in initially low land value areas, and this section reports results from testing this claim.

29For 1956, we use SIC 47 for “services incidental to transportation,” and for 2011 we use NAICS 488 for “support activities for transportation.” Chapter 2. The Local Impact of Containerization 109

We use three proxies for land values circa 1956. The first is the share of county employ- ment in the manufacturing sector in 1956. Manufacturing was the high tech of the 1950s, and we anticipate that productive places should also be high land value places (Moretti, 2011). The second proxy is county population density as of 1950, and the third is assessed land value from the 1956 Census of Governments. While this last measure is the closest to a direct measure of the variable of interest, assessed values are notoriously different from market values. Particularly in this period, it was not unusual for assessment practices to vary substantially – and systematically – across jurisdictions (Anderson and Pape, 2010). Table 2.6 reports coefficients on the measure of proximity to a containerized port and co- efficients on the interaction of being below the median of variable hi and near a containerized port. Again, the dependent variable is the change in log population. The first column shows that half of the containerization-induced population growth in counties within 100 km of containerized ports occurs in counties with lower than median share of workers in manufac- turing. For counties slightly farther from the port, almost all of the containerization-induced population growth occurs in counties with lower than median share of workers in manu- facturing in 1956. While no initial condition explains as much of containerization-induced population growth as an initially small manufacturing sector, containerization-induced pop- ulation growth is also large in initially less dense places (column 2). We observe no particular pattern in counties with low 1956 assessed land values (column 3). Overall, these results paint a picture of containerization exerting the greatest influence not in dominant agglomerations—large, wealthy urban areas—but in second-tier agglomerations. These second-tier agglomerations are initially less dense and less concentrated in the vanguard technology of the 1950s (manufacturing). This is consistent with containerization’s demand for large areas of land and suggests that containerization is easier to implement where land values are initially low. These results are also consistent with a complementary story about the role of market access (e.g. Donaldson and Hornbeck, 2016). This line of argument says that containeriza- Chapter 2. The Local Impact of Containerization 110 tion’s impact will be larger, in percentage terms, in areas with initially low market access. This hypothesis is consistent with the results in column 4, showing larger gains in counties at the bottom half of highway intensity (highways per square km). We see no preferential pattern, however, with railroads (column 5).

2.7 Conclusion

Containerized shipping is a fundamental engine of the global economy. Containerization simplifies and speeds packing, transit, pricing, and every transfer from ship to train to truck. It eliminates previously profitable pilferage and makes shipping more reliable. Since the advent of containerization in 1956, the cost of moving containerizable goods has plummeted. In this paper, we analyze how local economic activity responds to the dramatic decline in trade costs brought by containerization. We use a novel cost-shifter instrument based on the historical depth of ports to show that, consistent with the predictions of a New Economic Geography model, containerization caused substantial population and employment growth in counties near container ports. These gains follow the pattern of decline with distance predicted by theory: counties closer to a containerized port experience larger increases than counties located farther away. Finally, consistent with containerization’s need for substantial land for large cranes and vast marshalling yards, gains are located predominantly in counties with initially low population density and initially low manufacturing employment. Whether and how containerization impacts the location of population, employment, and wages has implications for both the agglomerative forces that drive innovation, and for politi- cal representation that yields democratic outcomes. For policymakers to mitigate the uneven impacts of globalization, it is useful to first understand its causes. Chapter 2. The Local Impact of Containerization 111

Figure 2.1 – Adoption of Containerization: 1956–2008 (a) United States

(b) Worldwide 600

← Int'l standard 400 200

← Int'l diffusion plateau: 90% of Number of container ports worldwide of container Number countries have >= 1 container port 0 1960 1970 1980 1990 2000 2010 Year

Note: The upper panel shows the diffusion of containerization across US ports; the bottom panel repeats this exercise for world ports. Source: Containerisation International Yearbook, volumes 1968 and 1970–2010. Chapter 2. The Local Impact of Containerization 112

Figure 2.2 – Graphical Intuition (a) First Stage: Depth and Likelihood of Containerization

(b) Reduced Form: Depth and Population Changes

Notes: In both figures, thick lines denote depths that we label “very deep” in our estimation. Figure 2.2a shows the likelihood that a county will have a containerized port within 300 km in year t by the depth of the deepest port within 300 km in 1953. On average, deeper ports are more likely to ever containerize, and more likely to containerize early. Figure 2.2b plots the logarithm of population over time by the depth of the deepest port within 300 km in 1953. Chapter 2. The Local Impact of Containerization 113

Figure 2.3 – Geographic Variation in Treatment and Instrument (a) Counties Near a Containerized Port in 2010

(b) Counties Near a Containerized Port in 2010 and Near a Very Deep Port in 1953

Notes: Figure 2.3a shows the distance to the nearest containerized port in 2010. Blue polygons are counties d1 to d2 km from the nearest containerized port. Distance bins {d1, d2} are {0 to 100, 100 to 200, 200 to 300}. Figure 2.3b shows the distance to the nearest containerized port in 2010 as well as the distance to the nearest “very deep” port in 1953. Green colors represent counties that are d1 to d2 km from the nearest containerized port and d1 to d2 km from the nearest “very deep” port in 1953. Chapter 2. The Local Impact of Containerization 114

Figure 2.4 – Port Depth Unrelated to Pre-Containerization Growth

Notes: This picture shows the distribution of county population change 1910 to 1950, conditional on regional fixed effects and distance to the ocean. Counties near very deep ports are in red and those not near very deep ports are in blue. Regressions results show no significant difference between these two means. Chapter 2. The Local Impact of Containerization 115

Table 2.1 – County Characteristics by Distance to Nearest Containerized Port

Distance to Containerized Port, km 100 to 200 to Ever Never 0 to 100 200 300 Cont. Cont. (1) (2) (3) (4) (5) Log Population 1910 10.31 10.03 10.02 10.11 9.47 [ 1.22] [ 0.82] [ 0.80] [ 0.95] [ 0.96] 1950 10.81 10.23 10.14 10.36 9.58 [ 1.47] [ 0.97] [ 0.97] [ 1.16] [ 0.96] 2010 11.70 10.75 10.52 10.94 9.79 [ 1.50] [ 1.16] [ 1.15] [ 1.35] [ 1.32] Log Employment 1956 9.02 8.19 8.04 8.37 7.18 [ 1.94] [ 1.44] [ 1.45] [ 1.65] [ 1.43] 2011 10.37 9.31 9.08 9.53 8.35 [ 1.83] [ 1.45] [ 1.47] [ 1.66] [ 1.55] Log Payroll Per Employee 1956 -0.27 -0.37 -0.40 -0.35 -0.50 [ 0.33] [ 0.29] [ 0.31] [ 0.31] [ 0.32] 2011 2.19 2.04 2.02 2.08 1.97 [ 0.29] [ 0.20] [ 0.19] [ 0.24] [ 0.22] Region Northeast 0.19 0.17 0.12 0.16 0.00 Midwest 0.19 0.28 0.38 0.29 0.39 South 0.49 0.48 0.45 0.47 0.43 West 0.13 0.07 0.05 0.08 0.17 Share Employment, Manufacturing 1956 0.42 0.41 0.42 0.42 0.26 [ 0.19] [ 0.19] [ 0.20] [ 0.19] [ 0.22] 2011 0.10 0.15 0.14 0.13 0.10 [ 0.09] [ 0.12] [ 0.12] [ 0.11] [ 0.12]

Observations 370 523 442 1335 1688

Note: This table reports means and standard deviations in brackets. The number of observations at the bottom of the table applies to all variables except the 1910 population and the payroll and employment variables; each has slightly fewer observations. Chapter 2. The Local Impact of Containerization 116 99.1 95.7 21 21.1 x x OLS IV (1) (2) (3) (4) (5) (6) (7) (8) 0.186 0.328 0.356 0.372 0.183 0.327 0.355 0.371 (0.064) (0.063)(0.054) (0.095) (0.053)(0.057) (0.094) (0.078) (0.056) (0.082) (0.076) (0.07) (0.087) (0.086) (0.07) (0.187) (0.087) (0.097) (0.18) (0.154) (0.102) (0.147) (0.14) (0.139) – Containerization Associated with Increased Population, Particularly Near the Port Table 2.2 100 to 200 km200 to 300 kmRegional fixed effectsDemographicsLog of population, 1920-1940Distance to 0.348*** the oceanNumber of 1953 0.351*** portsTotal int’l 0.235*** trade at 0.256*** ports,1950s-era 1955 transportation 0.202*** 0.249*** xShare manufacturing employment, 1956 0.156** 0.237*** 0.371*** 0.132* x x 0.219 0.215** 0.267*** 0.285* x x x 0.175 0.204 x x x x x x x x x x x x x x x x x x x x x x x x x x x x 0 to 100 km 0.684*** 0.604*** 0.464*** 0.453*** 0.685*** 0.642*** 0.410** 0.529*** R-squared Kleinberg-Paap F Stat Covariates Closest container port is Notes: Stars denote significancecommuting levels: zone. * The 0.10, dependentshare ** variable of 0.05, is people and the with *** changetrade a 0.01. in at college log ports All degree population, in regressions or1960, 1950-2010. 1955 use more kilometers are 3,023 The and of both observations mean share navigable vectors andfor of African waterways, with cluster complete the and America, totals standard details dependent kilometers both by errors on variable of measured 100 at is years railroads as km the 0.373. and c. bins. of 2010 sources. Demographics 1950. 1957 1950s-era is in transportation Number each is of county, a 1953 all vector ports per which and square measures total kilometer the international of kilometers land of area. highways c. See data appendix Chapter 2. The Local Impact of Containerization 117 covariates Additional demographic Omit region Midwest Squares of population port Within 400 km of a 1953 With R & S coast control (1) (2) (3) (4) (5) (6) 21.3 17.7 21.4 21.6 24.6 21.3 3023 3023 1767 3023 1975 3023 0.3570.373 0.363 0.373 0.311 0.514 0.371 0.373 0.293 0.508 0.371 0.373 col. 8 (0.18) (0.201) (0.202) (0.182) (0.201) (0.183) (0.147)(0.139) (0.146) (0.139) (0.176) (0.173) (0.147) (0.139) (0.166) (0.146) (0.145) (0.136) Table 2.2, Main spec., – Impact of Containerization Robust to Alternative Specifications Table 2.3 0 to 100 km200 to 300 km 0.529*** 0.423** 0.204 0.510** 0.205 0.448** 0.228 0.443** 0.449** 0.171 0.109 0.164 100 to 200 km 0.285* 0.260* 0.237 0.231 0.174 0.236 Kleinberg-Paap F Stat Observations Mean, dependent variable Closest Container Port is within R-squared Notes: Stars denote significanceerrors levels: at * the 0.10, 2010Table ** commuting 2.2. zone. 0.05, Column and Log 1 ***measure of repeats 0.01. of population the coastal All is most proximity. specifications the saturated1940 Column dependent instrumental estimation population. 3 variable variable from restricts regressions and Column Table the with all 5measured 2.2 sample clustered regressions omits in Column to standard include the 1950: 8. counties the Midwest share Column65 within census most of 2 and 400 region, complete people controls older. km which covariate for 25 list of has the or from a no Rappaport older 1953 very with and deep port. less Sachs ports. Column (2003) than Column 4 a 6 includes high includes squares school additional of degree, demographic 1920, share covariates 1930 foreign and born, government workers per capita, and share age Chapter 2. The Local Impact of Containerization 118 43.7 43.9 42.5 43.6 OLS IV – Containerization Impacts Growth in World Cities (1) (2) (3) (4) (5) (6) (7) (8) 0.655 0.684 0.663 0.690 0.648 0.678 0.652 0.680 (0.056) (0.056)(0.060) (0.066) (0.058)(0.064) (0.065) (0.067) (0.060) (0.071) (0.066) (0.067) (0.070) (0.091) (0.064) (0.122) (0.086) (0.112) (0.120) (0.124) (0.105) (0.118) (0.127) (0.120) Table 2.4 100 to 200 km200 to 300 kmCountry fixed effectsLog of population, 1950Distance to the oceanNumber of 1953 ports -0.040 -0.038 -0.035 x -0.027 -0.037 -0.035 -0.015 x x 0.188** -0.012 0.165* 0.040 x 0.310** 0.027 0.307*** x x x 0.114 x x 0.113 x x x x x x x x x x x 0 to 100 km -0.010 0.069 0.007 0.090 0.047 0.134* 0.216* 0.310*** Kleinberg-Paap F Stat Covariates R-squared Closest container port is Notes: Stars denote significanceas levels: least 50,000 * inhabitants 0.10, in ** 1950. 0.05, The and dependent *** variable 0.01. is All the regressions change in use log 1,051 population, observations, 1950 and to the 2010. unit of The mean observation is of a the dependent city variable with is 1.54. Chapter 2. The Local Impact of Containerization 119 is p th percentile income, where p 10 50 90 tion Services Transporta- IV, Dependent Variable is turing Manufac- Log payroll/ employee All industries Employment Share Log (1) (2) (3) (4) (5) (6) (7) – More Employment and Higher Earnings Near Containerized Ports 21.1 22 21.1 21.1 21.4 21.4 21.4 2985 2981 2985 2985 3022 3022 3022 0.1781.135 0.155 2.448 0.76 -0.215 0.075 0.001 0.298 3.547 0.436 3.147 0.324 3.176 (0.201)(0.156) (0.068)(0.147) (0.044) (0.021) (0.039) (0.017) (0.001) (0.017) (0.001) (0.083) (0.001) (0.067) (0.069) (0.067) (0.055) (0.055) (0.056) (0.045) (0.046) Log em- ployment Table 2.5 0 to 100 km100 to 200 km200 to 300 km 0.347* 0.124 0.049 0.02 0.043 -0.019 0.018 0 0.003** 0.008 0.175* 0.001 0 0.276*** 0.082 0.152** 0.058 0.109* 0.05 0.084* -0.01 Closest container port is R-squared Kleinberg-Paap F Stat Observations Mean, Dependent Variable Notes: Stars denote significancePatterns levels: and Census * income 0.10, data,zone. ** and The 0.05, include second and the pair *** most of 0.01. complete columns covariate All list report specifications from fewer are Table observations 2.2. instrumental because variable We some cluster regressions counties the with are standard County sufficiently errors Business small at the to 2010 suppress commuting all payroll information. Chapter 2. The Local Impact of Containerization 120 km county sq Rail km / km km / Highway county sq 1956 Assessed Land Value 1950 Interaction Variable is 1950 Density Population in 1950 is below the median among treated observations. i h median(column header variable)} (1) (2) (3) (4) (5) 10.90.44 10.6 17.1 7.6 0.01 12.7 0 6.9 0.07 0.368 0.368 0.355 0.377 0.358 ≤ (0.224)(0.168) (0.192)(0.144) (0.159) (0.255) (0.141) (0.235) (0.218)(0.145) (0.196) (0.186)(0.141) (0.231) (0.138) (0.182)(0.147) (0.193) (0.1) (0.159) (0.167) (0.113) (0.123) (0.145) (0.147) (0.131) (0.12) (0.152) (0.123) (0.136) Manuf. share of Employmt median ≤ – Greater Containerization-Induced Growth in Initially Lagging Places Table 2.6 0 to 100 km100 to 200 km200 to 300 km 0.293 -0.072100 to 200 km -0.105 0.293200 to 300 0.16 km 0.559** 0.056 0.521** 0.094 0.531*** 0.231 -0.01 0.467***0 0.218** to 100 km100 0.375 to -0.038 200 km200 0.282** to 0.293 300 -0.238 km 0.127 0.004 0.314** 0.267* -0.032 0.48 0.49 0.53 0.104 0.37 0.54 0.57 0.35 0.52 0.6 0.66 0.81 0.81 0.44 0.54 0.50 0 to 100 km 0.267* 0.389*** 0.038 0.523*** 0.166 Median, interaction variable Share of observations Closest container port is within Container port distance * 1{County R-squared Kleinberg-Paap F Stat Note: Stars denote significancepopulation levels: as * the 0.10, dependentpanel ** variable. of 0.05, All coefficients and regressions reports ***any have the 0.01. 3,023 additional average observations population impact All and growth of specifications cluster if containerization are standard the by instrumental errors distance county’s variable at value from estimates of the the of variable port; 2010 Equation commuting the (2.3) zone. second with panel The of first coefficients reports whether there is Chapter 2. The Local Impact of Containerization 121

2.8 Appendix A: Data Sources

We use data from a variety of sources. This appendix provides source information.

1. County Business Patterns These data include total employment, total number of establishments (with some vari- ation in this definition over time), and total payroll.

• 1956: Courtesy of Gilles Duranton and Matthew Turner. See Duranton et al. (2014) for source details. We collected a small number of additional counties that were missing from the Duranton and Turner data.

– In these data, payroll is defined as the “amount of taxable wages paid for covered employment [covered by OASI, or almost all “nonfarm industrial and commercial wage and salary employment” (page VII)30] during the quarter. Under the law in effect in 1956, taxable wages for covered employment were all payments up to the first $4,200 paid to any one employee by any one employer during the year, including the cash value of payments in kind. In general, all payments for covered employment in the first quarter were taxable unless the employee was paid at the rate of more than $16,800 per year. For the first quarter of 1956, it is estimated that 97.0 percent of total non-agricultural wages and salaries in covered employment was taxable. The taxable propor- tion of total wages becomes smaller in the later quarter of the year. Data are presented for the first quarter because wages for this quarter are least affected by the provisions of the law limiting taxable wages to $4,200 per year.” (page VI, Section III, Definitions in 1956 County Business Patterns report.)

• 1967 to 1985: U.S. National Archives, identifier 313576.

• 1986 to 2011: U.S. Census Bureau. Downloaded from https://www.census.gov/ econ/cbp/download/ 30Data also exclude railroad employment. Chapter 2. The Local Impact of Containerization 122

– For comparability, we also use total first quarter payroll from these data.

2. Decennial Census: Population and demographics data by county

• 1910: ICPSR 02896, Historical, Demographic, Economic and Social Data: The United States, 1790-2002, Dataset 38: 1950 Census I (County and State)

• 1920: ICPSR 02896, Historical, Demographic, Economic and Social Data: The United States, 1790-2002, Dataset 38: 1950 Census I (County and State)

• 1930: ICPSR 02896, Historical, Demographic, Economic and Social Data: The United States, 1790-2002, Dataset 38: 1950 Census I (County and State)

• 1940: ICPSR 02896, Historical, Demographic, Economic and Social Data: The United States, 1790-2002, Dataset 38: 1950 Census I (County and State)

• 1950

– ICPSR 02896, Historical, Demographic, Economic and Social Data: The United States, 1790-2002, Dataset 38: 1950 Census I (County and State)

– Census of Population, 1950 Volume II, Part I, Table 32.

• 1960: ICPSR 02896, Historical, Demographic, Economic and Social Data: The United States, 1790-2002, Dataset 38: 1960 Census I (County and State)

• 1970: ICPSR 8107, Census of Population and Housing, 1970: Summary Statistic File 4C – Population [Fourth Count]

• 1980: ICPSR 8071, Census of Population and Housing, 1980: Summary Tape File 3A

• 1990: ICPSR 9782, Census of Population and Housing, 1990: Summary Tape File 3A

• 2000: ICPSR 13342, Census of Population and Housing, 2000: Summary File 3

• 2010: U.S. Census Bureau, 2010 Decennial Census Summary File 1, Downloaded

from http://www2.census.gov/census_2010/04-Summary_File_1/ Chapter 2. The Local Impact of Containerization 123

• 2010 (2008-2012): U.S. Census Bureau, American Community Survey, 5-Year

Summary File, downloaded from http://www2.census.gov/acs2012_5yr/summaryfile/ 2008-2012_ACSSF_All_In_2_Giant_Files%28Experienced-Users-Only%29/

3. Port Universe and Depth

• We use these documents to establish the population of ports in any given year.

• 1953: World Port Index, National Geospatial-Intelligence Agency (1953)

• 2015: World Port Index, National Geospatial-Intelligence Agency (2015)

4. Port Containerization Adoption Year

• 1956–2010: Containerisation International Yearbook for 1968 and 1970–2010

5. Port Volume: Total imports and exports by port

• 1948: United States Foreign Trade, January-December 1949: Water-borne Trade by United States Port, 1949, Washington, D.C.: U.S. Department of Commerce, Bureau of the Census. FT 972.

• 1955: United States Waterborne Foreign Trade, 1955, Washington, D.C. : U.S. Dept. of Commerce, Bureau of the Census. FT 985.

• 2008: Containerisation International yearbook 2010, pp. 8–11.

6. Highways

• 2014: 2014 National Transportation Atlas, Office of the Assistant Secretary for Re- search and Technology, Bureau of Transportation Statistics, United States Depart-

ment of Transportation. http://www.rita.dot.gov/bts/sites/rita.dot.gov. bts/files/publications/national_transportation_atlas_database/2014/index. html. Chapter 2. The Local Impact of Containerization 124

• c. 1960: Office of Planning, Bureau of Public Roads, US Department of Com- merce, “The National System of Interstate and Defense Highways.” Library of Congress Call number G3701.P21 1960.U5. Map reports improvement status as of December 31, 1960.

7. Railways

• 2014: 2014 National Transportation Atlas, Office of the Assistant Secretary for Re- search and Technology, Bureau of Transportation Statistics, United States Depart-

ment of Transportation. http://www.rita.dot.gov/bts/sites/rita.dot.gov. bts/files/publications/national_transportation_atlas_database/2014/index. html.

• c. 1957: Army Map Service, Corps of Engineers, US Army, “Railroad Map of the United States,” prepared 1935, revised April 1947 by AMS. 8204 Edition 5-AMS. Library of Congress call number G3701.P3 1957.U48.

8. Waterways

• 2014: 2014 National Transportation Atlas, Office of the Assistant Secretary for Re- search and Technology, Bureau of Transportation Statistics, United States Depart-

ment of Transportation. http://www.rita.dot.gov/bts/sites/rita.dot.gov. bts/files/publications/national_transportation_atlas_database/2014/index. html.

9. World Population Data: World Urbanization Prospects, 2014 Revision

• Population counts for all urban agglomerations whose populations exceed 300,000 at any time between 1950 and 2010.

• Produced by the United Nations, Department of Economic and Social Affairs, Population Division. Chapter 2. The Local Impact of Containerization 125

• Downloaded from http://esa.un.org/unpd/wup/CD-ROM/WUP2014_XLS_CD_FILES/ WUP2014-F22-Cities_Over_300K_Annual.xls

10. Property value data

• 1956: 1957 Census of Governments: Volume 5, Taxable Property Values in the United States

• 1991: 1992 Census of Governments, Volume 2 Taxable Property Values, Number 1 Assessed Valuations for Local General Property Taxation

• In both 1957 and 1992, the Census reports a total figure for the New York City, which consists of five separate counties (equivalent to the boroughs). We attribute the total assessed value from the census of governments to each county (borough) by using each borough’s share of total assessed value. For 1956, we rely upon the Annual Report of the Tax Commission and the Tax Department to the Mayor of the City of New York as of June 30, 1956, page 23 which reports “Assessed Value of All Real Estate in New York City for 1956-1957.” For 1991, we rely upon Department of Finance Annual Report, 1991-1992, pages 19-24.

• The District of Columbia is missing an assessed value for 1956 in the Census of Government publication listed above. However, the amount is available in Trends in Assessed Valuations and Sales Ratios, 1956-1966, US Department of Commerce, Bureau of the Census, March 1970. We use this value.

• For 2010 value, we use the sum of the value of aggregate owner occupied stock (American Community Survey) and the aggregate value of the rental occupied stock. As the Census only reports aggregate gross rent, we convert aggregate gross rent to aggregate value of the rental stock by multiplying the aggregate value of the rental stock (by 12 to generate a monthly figure) by the average rent- price ratio for years 2008-2012 (corresponding with the ACS years) from Lincoln Chapter 2. The Local Impact of Containerization 126

Institute Rent-price ratio data31.

31http://datatoolkits.lincolninst.edu/subcenters/land-values/rent-price-ratio.asp Chapter 2. The Local Impact of Containerization 127

2.9 Appendix B: Data Choices

1. U.S. County Sample

Our unit of analysis is a consistent-border county from 1950 to 2010. We generate these counties by aggregating 1950 counties. Please see the final Appendix Table for the specific details of aggregation.32

The 1956 County Business Patterns allowed for reporting of only 100 jurisdictions per state, leading to the reporting of aggregate values for agglomerations of counties in states with many counties. See Duranton et al. (2014) for the initial collection of these data, and additional details. To resolve the problem of making these 1956 units consistent with the 1950 census units, we disaggregate the 1956 CBP data in the agglomerated reporting into individual counties, attributing economic activity by population weights.

Alaska and Hawaii were not states in 1950. We omit Alaska from our sample, because in 1950 it has only judicial districts, which do not correspond to modern counties. We keep Hawaii, where the 1950 borders are relatively equivalent to modern counties. We also keep Washington, DC, in all years.

We also make a few additional deletions

• Two counties that only appear in the data (1910-1930) before our major period of analysis: Campbell, GA (13/041) and Milton, GA (13/203).

• Two problematic counties. Menominee, WI (55/078) created in 1959 out of an Indian reservation; it has very few people. Broomfield, CO (08/014), created in

32These groupings relied heavily on the very helpful work of the Applied Population Laboratory group at the University of Wisconsin. See their documentation at http://www.netmigration.wisc.edu/ datadictionary.pdf. Chapter 2. The Local Impact of Containerization 128

2001 from parts of four other counties.

• Two counties where land area changes are greater than 40 percent. These are Denver County, CO (08/031) and Teton County, WY (56/039).

2. County Business Patterns data

• For some county/industry groupings, there is a disclosure risk in reporting either the total number of employees or the total payroll. In such cases, we convert the disclosure code (“D” in the years before 1974) to 0.

• “Payroll” is first quarter payroll.

3. Income distribution calculations

• We use binned income data. In 1950, the number in each bin is the total number of families and unrelated individuals. To be consistent, in 2010 we also use the total number of families plus unrelated individuals.

• To calculate percentiles, we assume that income is uniformly distributed within bins, with the exception of the top bin, which has no top code.

• For the top bin, we assume that income is distributed following a Pareto distribu-

tion, with a parameter α. We assume that α = max(ˆα, 1). Let NB be the number

of people in the top income bin, and NB−1 be the number of people in the second

highest bin. Similarly, LB be the lower bound of the top income bin and LB−1 be the lower bound of the second highest income bin. Then

log(N + N ) − log(N ) αˆ = B B−1 B log(LB) − log(LB−1)

. Chapter 2. The Local Impact of Containerization 129

Figure 2.5 – Evolution of Ship Sizes WWII technology

134x17x9 First container ships, 1956 to 1970s

Today, Post-Panamax

Source: WWII, authors; remaining ships, (Rodrigue, 2017). Chapter 2. The Local Impact of Containerization 130

Figure 2.6 – Instrument Variation vs. Pre-Treatment Covariates: All Instruments

(a) (b)

(c) (d)

(e) (f)

Notes: “Identifying variation” is the residual from a regression of the instrument (county is within d1 to d2 km of a “very deep” port in 1953) on the full set of covariates. Appendix Figures 2.6a, 2.6c, and 2.6e plot the identifying variation versus the residual of a regression of 1910 log population on the full set of covariates from Table 2.2. Appendix Figures 2.6b, 2.6d, and 2.6f plot the identifying variation versus the residual from a regression of total dollars of 1948 international trade at ports between d1 to d2 km of a county, conditional on the full set of covariates. Chapter 2. The Local Impact of Containerization 131

Figure 2.7 – IV Estimates Indistinguishable From Zero at 300 km

Notes: This picture reports coefficients from the specification in column 8 of Table 2.2, but paramaterize ∆Ci,t as six indicator variables, one for each distance bin of {0 to 50, 50 to 100, 100 to 150, 150 to 200, 200 to 250, 250 to 300} km. Each dot is an estimated coefficient from this regression and gray bands portray the 90% and 95% confidence intervals. Chapter 2. The Local Impact of Containerization 132

Figure 2.8 – Containerization’s Impact Increases Over Time

Notes: This picture reports coefficients from the specification in column 8 of Table 2.2, but where the dependent variable is the change in log population from 1950 to the year reported on the horizontal axis and the endogenous variable is the change in containerization status from 1950 to the year reported on the horizontal axis. Each dot corresponds to an estimated coefficient by distance bin. Full circles are significant at the 5 percent level; hollow circles are insignificant coefficients. Chapter 2. The Local Impact of Containerization 133

Figure 2.9 – Depth and Likelihood of Containerization, World Cities

1.00 {35-40}

t {30-35}

{25-30} 0.75 {>40}

{15-20}

0.50 {20-25}

{10-15} 0.25 {0-10} Likelihood of containerization by year by year of containerization Likelihood 0.00 1955 1975 1995 2015

Notes: Lines in the figure report the likelihood that a city will have a containerized port within 300 km in year t by the depth of the deepest port within 300 km in 1953. We use thick lines to draw counties near ports that we classify as “very deep,” and thin lines for the remainder of cities. The likelihood of being proximate to a container port is greater the closer the city is to a very deep 1953 port. Chapter 2. The Local Impact of Containerization 134

Table 2.7 – Complete First Stage Specification

1 if Nearest Container Port is d1 to d2 km of county 0 to 100 100 to 200 200 to 300 (1) (2) (3)

County is d1 to d2 of a very deep port 0 to 100 km 0.542*** 0.068 -0.012 (0.067) (0.066) (0.046) 100 to 200 km 0.015 0.605*** -0.013 (0.034) (0.049) (0.042) 200 to 300 km -0.016 -0.017 0.632*** (0.027) (0.04) (0.052)

R-squared 0.584 0.462 0.416 Joint F test, instruments 22.4 59 56.8 Mean, dependent variable 0.122 0.173 0.146

Notes: All estimations use 3,023 observations. Stars denote significance levels: * 0.10, ** 0.05, and *** 0.01. The F test values in this table are from a test of joint significance of the three reported coefficients. Table 2.2 reports the Kleinberg-Paap F statistic, as suggested by Sanderson and Windmeijer (2016). Chapter 2. The Local Impact of Containerization 135 to 2010 Reduced Form Change in Log Population, 1950 km of county 2 d to 1 d First Stage 0.4 5.3 13.5 . (1) (2) (3) (4) 0.16 0.329 0.35 0.397 0.512 0.264 0.269 0.332 (0.146)(0.083)(0.073) (0.188) (0.175) (0.137) (0.146) (0.143) (0.106) (0.158) (0.103) (0.081) 0 to 100 100 to 200 200 to 300 1 if Closest Container Port is – Midwest Counties Have No First Stage and Reduced Form Impacts Are Zero of a very deep port 2 d to 1 Table 2.8 d 100 to 200 km200 to 300 km 0.015 -0.031 -0.011 -0.376*** 0.496*** -0.06 0.05 0.078 0 to 100 km 0.065 -0.332* 0.248* -0.04 Mean, dependent variable County is R-squared Joint F test, instruments Notes: Stars denote significanceports. levels: All * regressions 0.10, use ** 702 0.05, observations and and *** cluster 0.01. standard errors The at sample the is 2010 restricted commuting to zone. the Midwest Census region, which has no very deep Chapter 2. The Local Impact of Containerization 136

Table 2.9 – World City Characteristics by Distance to Nearest Containerized Port

Distance to Containerized Port, km 100 to 200 to Ever Never 0 to 100 200 300 Cont. Cont. (1) (2) (3) (4) (5) Log Population 1950 12.52 12.03 12.05 12.32 11.98 [1.11] [0.94] [0.88] [1.06] [0.81] 2010 13.97 13.55 13.60 13.81 13.61 [1.04] [0.85] [0.80] [0.98] [0.80] Continent Africa 0.10 0.09 0.09 0.10 0.05 Asia 0.36 0.40 0.40 0.38 0.59 Australia 0.02 0.00 0.00 0.01 0.00 Europe 0.25 0.23 0.23 0.24 0.19 North America 0.18 0.19 0.20 0.19 0.11 South America 0.09 0.08 0.09 0.09 0.08

Observations 373 159 102 634 417

Note: The unit of observation in this table is a city with at least 50,000 inhabitants in 1950. We report means and standard deviations in brackets. See data appendix for more details on the world sample. Chapter 2. The Local Impact of Containerization 137

Table 2.10 – Complete First Stage Estimates for World Sample

1 if Closest Container Port is d1 to d2 km of city 0 to 100 100 to 200 200 to 300 (1) (2) (3)

City is d1 to d2 of a very deep port 0 to 100 km 0.573*** -0.019 -0.033 (0.045) (0.033) (0.028) 100 to 200 km 0.020 0.579*** -0.033 (0.048) (0.050) (0.032) 200 to 300 km 0.006 0.099* 0.511*** (0.045) (0.047) (0.055)

R-squared 0.653 0.457 0.406 Joint F test, instruments 75.7 58.4 37.7 Mean, dependent variable 0.355 0.151 0.097

Notes: All estimations use 1,051 observations. Stars denote significance levels: * 0.10, ** 0.05, and *** 0.01. The F test values in this table are from a test of joint significance of the three reported coefficients. Table 2.4 reports the Kleinberg-Paap F statistic, as suggested by Sanderson and Windmeijer (2016). Chapter 2. The Local Impact of Containerization 138 Notes Used to be part of Yuma County (04/027) Name change, from Dade County to Miami-Dade, yielded a numbering change. Yellowstone County merged is to Park County (30/067) Becomes Carson City (32/510) Used to be part of Valencia County (35/061) Is merged into Dewey County (46/041) FIPS Initial Counties County Name County FIPS County Grouped FIPS State Table 2.11 — County Groupings for Consistent Counties State ArizonaFlorida 04HawaiiHawaii 027 12Montana La Paz County 086 15Nevada 15 Miami Dade 012 010New 30 Mexico 010 Kalawao County 067South Dakota Maui 32 County 025 Yellowstone County 35 005 510 113 46 061 009 Ormsby County Cibola County 041 025 Armstrong County 006 001 Chapter 2. The Local Impact of Containerization 139 Is merged into Jackson County (46/071) South DakotaVirginiaVirginia 46VirginiaVirginia 071 51Virginia Washabaugh County 51Virginia 900 51 131 Virginia 901 51 AlbermarleVirginia County 906 51 AlleghanyVirginia County 902 003 51 ArlingtonVirginia County 903 51 Augusta 005 Virginia County 903 51 BedfordVirginia 013 County 904 51 CampbellVirginia County 015 905 51 CarrollVirginia County 019 915 51 ChesterfieldVirginia 031 County 924 51 DinwiddieVirginia County 035 041 906 51 ElizabethVirginia City 907 51 053 Fairfax County 904 51 Frederick Couty 908 055 51 Grayson County 059 909 51 Greensville County 069 905 Halifax County 077 910 081 Henrico County Henry County 083 087 089 Chapter 2. The Local Impact of Containerization 140 Is later folded into Suffolk County (51/800) VirginiaVirginiaVirginiaVirginia 51Virginia 51 911Virginia 51 912Virginia James City County 51Virginia Montgomery 800 County 51Virginia 095 Nanasemond 121 913 City 51Virginia 914 51 NorfolkVirginia County 123 915 51 PittsylvaniaVirginia County 913 51 PrinceVirginia George County 129 143 916 51 PrincessVirginia Anne 149 917 51 PrinceVirginia William County 918 51 RoanoakeVirginia County 153 919 151 51 RockbridgeVirginia County 920 51 RockinghamVirginia 161 County 921 163 51 Southhampton County 165 924 51 Spotsylvania 175 County 922 51 Warwick County 177 923 51 Washington County 924 Wise County 189 191 906 York County Alexandria City 195 199 510 Chapter 2. The Local Impact of Containerization 141 VirginiaVirginiaVirginiaVirginia 51Virginia 51Virginia 903 51Virginia 922 51 BedfordVirginia City 918 51 BristolVirginia City 900 51 BuenaVirginia Vista City 913 51 CharlottesvilleVirginia 515 City 901 51 ChesapeakeVirginia City 520 530 905 540 51 CliftonVirginia Forge City 901 51 ColonialVirginia Heights City 550 914 51 CovingtonVirginia 560 City 570 908 51 DanvilleVirginia City 906 51 EmporiaVirginia City 580 906 51 FairfaxVirginia City 920 590 51 FallsVirginia Church City 921 595 51 Franklin City 904 51 Fredricksburg City 600 610 924 51 Galax City 919 630 620 51 Hampton City 915 Harrisonburg City 918 Hopewell City 640 650 660 Lexington City 670 678 Chapter 2. The Local Impact of Containerization 142 Appears for a few years in County Business Patterns data as a county. VirginiaVirginiaVirginiaVirginia 51Virginia 51 903 51 916Virginia 51 Lynchburg City 916Virginia Manassas City 910Virginia 51 Manassas Park City 680 Virginia Martinsville City 800 683 51 685 Virginia 51Virginia Nanasemond County 690 924 51Virginia 695 913 51 NewportVirginia News City 913 51 NorfolkVirginia City 923 700 51 PortsmouthVirginia City 915 51 NortonVirginia City 924 51 PetersburgVirginia 710 City 710 912 51 Poquoson City 905 51 Radford City 720 730 917 51 Richmond City 917 735 51 Roanoake City 909 750 Salem City 760 913 South Boston City 770 South Norfolk City 780 775 785 Chapter 2. The Local Impact of Containerization 143 Is merged into Teton County (56/039) 047 Yellowstone Park County VirginiaVirginiaVirginiaVirginia 51Virginia 51Wyoming 902 51 913 51 Staunton City 902 51 Virginia Beach City 911 Waynesboro 56 City 907 790 810 Williamsburg City 039 Winchester City 820 830 840 Chapter 3

Subways and Urban Air Pollution

with Marco Gonzalez-Navarro, Stefano Polloni, and Matthew Turner

3.1 Introduction

We investigate the effect of subway system openings on urban air pollution. We rely on two principal data sources. The first describes the universe of world subway systems. The second is a remotely sensed measure of particulates, Aerosol Optical Depth (AOD), recorded by the Moderate Resolution Imaging Spectroradiometer (MODIS) aboard the Terra and Aqua earth observing satellites between 2000 and 2014. These data allow us to measure airborne particulates everywhere in the world, monthly, with approximately 3km spatial resolution. Our strategy for establishing the causal effect of subways on AOD relies on a comparison of changes in particulates within a city around the time of a new subway system opening. The data provides clear evidence of a structural break in an average city’s AOD level around the time that it opens its subway network and does not indicate a trend break at any time in our sampling frame. The magnitude of this break is about 4% and is about constant over 48 post-subway months. In fact, the 4% decrease in AOD is evident over all 96 post subway months we observe, although estimates over this longer horizon are less precise and less well identified.

144 Chapter 3. Subways and Urban Air Pollution 145

Consistent with the fact that subways tend to disproportionately serve central cities, we find that the effect of subway openings declines with distance from the city center. The data indicate that subways do not have the same effect on AOD in all cities. We find no evidence that richer, more populous, more rainy, or cities that open more extensive networks respond differently to subway openings, although we find suggestive evidence that more polluted and Asian cities experience larger AOD decreases following a subway opening. Finally, we find that subway openings have larger effects on AOD and on ridership than do the first expansions, and that the effect of the first expansion is larger than subsequent expansions. Our findings are important for three reasons. Subways are often proposed as a policy response to urban air pollution. For example, Vollmer Associates et al. (2011) list air pollution reduction as an objective for New York City’s 2nd avenue subway expansion. Our analysis provides a basis for assessing their cost effectiveness relative to other remediation policies. Apart from this paper, we are aware of only one study, Chen and Whalley (2012), that measures the effect of subways on air pollution. Like the present investigation, Chen and Whalley (2012) use an event study research design. Unlike the present study, Chen and Whalley (2012) study the opening of a single subway. In contrast, we study all of the 43 subway openings and 104 expansions that occurred anywhere in the world between February 2000 and December 2014. Thus, we dramatically improve on our ability to assess whether subway construction, in fact, reduces urban air pollution. Second, our estimates of the reduction in pollution following subway openings and ex- pansions, together with existing estimates of the health implications of particulates, allow us to calculate the value of averted mortality that follows from subway openings. We estimate that, for an average city in our sample, a subway opening prevents 9.4 infant and 221 total deaths per year. Using standard income-adjusted life values, this averted mortality is worth $21m and $594m per year, respectively. These estimates do not include the effects of par- ticulate reduction on morbidity or on productivity and so probably understate actual health benefits. Although available subway capital cost estimates are crude, the estimated external Chapter 3. Subways and Urban Air Pollution 146 health effects represent a significant fraction of construction costs, particularly for subway systems with costs at the low end of the observed range. Finally, little is known about transportation behavior in developing countries, and we shed indirect light on this important topic. First, and interestingly, we find no evidence that developing world and developed world cities respond differently to subways. This supports the idea that, at least in this regard, the two classes of cities are similar. Second, a back of the envelope calculation suggests that subways typically account for between 1.5%-10% of trips within a few years of their opening. Given what is known about the relationship between subway ridership and traffic, and between traffic and pm10 , this level of ridership can plausibly account for the observed 4% reduction in particulates that follows a subway opening only if subways divert trips that would otherwise have occurred in particularly dirty vehicles or at particularly congested times. This is consistent with evidence that public transit serves the poor and that subways are much more heavily used at peak times for vehicle traffic.

3.2 Literature

While the effects of subways have been studied extensively, these studies have overwhelmingly focused on within-city variation in the relationship between proximity to a subway and hous- ing prices or population density, e.g., Gibbons and Machin (2005) or Billings (2011). There are few studies which, like ours, exploit cross-city variation in subways. Among the excep- tions, Baum-Snow and Kahn (2005) study a small sample of US subway systems to examine the relationship between subways and ridership, Voith (1997) also examines a cross-section of cities to investigate the relationship between subways and ridership. Finally, Gonzalez- Navarro and Turner (2016) exploits the same underlying panel data on subway stations that we use here to examine the effect of subway systems on long run urban population growth. To our knowledge, the literature contains only a single paper (Chen and Whalley, 2012) examining the relationship between subways and urban air quality. Chen and Whalley (2012) Chapter 3. Subways and Urban Air Pollution 147 examine changes in air pollution in central Taipei during the year before and after the open- ing of the Taipei subway in March of 1996. Chen and Whalley (2012) use hourly pollution measurements from several measuring stations in central Taipei, together with hourly rid- ership data over the same period. By examining the change in pollution levels around the time of the system opening, Chen and Whalley (2012) estimate an approximately 5%-15% reduction in Carbon Monoxide from the subway opening, about the same effect on Nitrous Oxides, but little effect on either Ozone or particulates. It is useful to contrast this with our findings. We employ essentially the same research design, but consider the universe of subway cities over a considerably longer time horizon. On the other hand, we are restricted to a single measure of pollution, AOD, and to monthly frequencies. Our results are slightly different. Chen and Whalley (2012) find that the Taipei subway caused a 5-15% reduction in Carbon Monoxide. The confidence interval we estimate for AOD overlaps substantially with this range. However, Chen and Whalley (2012) find no effect on particulates, although our estimated 4% effect lies well within the confidence bounds of their estimated effect of the subway opening on Taipei’s particulates. Together with the fact that our estimates indicate considerable heterogeneity in the effect of subways across cities, this suggests the difference between our finding and Chen and Whalley (2012) may reflect sampling error. Unfortunately, the opening of the Taipei subway predates the availability of the remotely sensed AOD data on which our analysis is based, so we cannot carry out a replication of the Chen and Whalley (2012) result in our sample.

3.3 Data

To investigate the effect of subways on urban air pollution we require data for a panel of cities describing subways, air pollution, and control variables. Our air pollution data are based on remotely sensed measures of suspended particulates. Our subways data are the result of primary data collection. We describe these data and their construction below, before turning to a description of control variables. Chapter 3. Subways and Urban Air Pollution 148

3.3.1 Subways

We use the same subways data as Gonzalez-Navarro and Turner (2016) organized into a monthly panel. These data define a ‘subway’ as an electric powered urban rail system that is completely isolated from interactions with automobile traffic and pedestrians. This excludes most streetcars because they interact with vehicle and pedestrian traffic at stoplights and crossings. Underground streetcar segments are counted as subways. The data do not dis- tinguish between surface, underground or aboveground subway lines as long as the exclusive right of way condition is satisfied. To focus on intra-urban subway transportation systems, the data exclude heavy rail commuter lines (which tend not to be electric powered). For the most part, these data describe public transit systems that would ordinarily be described as ‘subways’, e.g., the Paris and the New York city subway, and only such systems. As with any such definition, the inclusion or exclusion of particular marginal cases may be controversial. On the basis of this definition, the data report the latitude, longitude and date of opening of every subway station in the world. We compiled these data manually between January

2012 and February 2014 using the following process. First, using online sources such as http: //www.urbanrail.net/ and links therein, together with links on wikipedia, we complied a list of all subway stations worldwide. Next, for each station on our list, we record opening date, station name, line name, terminal station indicator, transfer station indicator, city and country. We obtain latitude and longitude for each station from google maps. We use the subways data to construct a monthly panel describing the count of operational stations in each subway city between February 2000 and December 2014, the time period for which our air pollution data is available. By connecting stations with the most direct possible routes, we approximate network maps and can calculate route length. 171 cities had subways in 2014, 63 in Asia, 62 in Europe and 30 in North America. South America, Australia and Africa together account for the remaining 16. These 171 subway systems consist of 8,889 stations. Subway systems in Europe, North America and Asia are Chapter 3. Subways and Urban Air Pollution 149 all about the same size, and those in South America are distinctly smaller. On average, the world stock of subway stations increased by about 200 per year and grew by about one third between 2000 and 2014. Our data on subway systems begins in the 19th century. However, our satellite pollution measures are more recent. Thus, our analysis will rely on subway openings that occurred between 2000 and 2014. Table 3.8 lists all 43 subway system openings in the world between 2000 and 2014, by date, together with basic information about the cities where they are located. Subways opened fairly uniformly throughout the period and the average opening date is February 2007. During 2000-2014, an average system at the end of its first year of operation had a route of length 19.2km and 14.3 stations, usually on one line, but sometimes on two or more. Our analysis hinges on the ability to observe a subway city for some time before and after an opening. Thus, we face a trade-off between sample size, the length of time we observe cities, and maintaining a constant sample of cities. While we experiment with other study windows, our primary econometric exercise considers the change in AOD in the period extending from 18 months before until 18 months after a subway opening. Since the AOD data cover February 2000 to December 2014, to base this exercise on a constant sample of cities, we must restrict our attention to subways that open between August 2001 and July 2013. In Table 3.8, we see that these are the 39 cities beginning with Rennes and ending with Brescia. To consider the effect of subways over a longer horizon, in some specifications we consider a longer period after the subway opening. In this case, maintaining a constant sample of cities for each month requires that we drop cities that have openings near the end of the study period. In theory, our research design could confound the effects of system openings with those of expansions that occur soon afterward. In fact, this is rare in our data. Among the 39 cities that make up our main sample, only 4 experienced an expansion of its subway system within 18 months of the system opening. We also experiment with dropping cities that experience Chapter 3. Subways and Urban Air Pollution 150 an expansion soon after an opening from our sample. Our results are robust to such changes in sample. For 30 of these 43 cities we are able to gather ridership data describing unlinked trips, mostly from annual reports or statistical agencies.1 Ridership is reported at the monthly level for 19 of these 30. For the other 11, we interpolate to calculate monthly ridership from quarterly or yearly data. Table 3.8 also reports mean daily ridership for each city where data is available, at the end of the first year of the system’s operation. For an average city in our sample, about 97,514 people rode the subway on an average day in the twelfth month of the system’s operation. Figure 3.1 shows the evolution of ridership as a function of time from opening for the first five years of system operation. The horizontal axis in this figure is months from the opening date. The vertical axis is mean daily ridership per 1000 of city population. We see that ridership doubles over the three years following system opening and stabilizes at about 60 riders per thousand residents. Finally, we determine the date when construction began for each subway opening in our sample. On average, construction begins 80 months prior to opening. The 25th and 75th percentile of construction duration are, respectively, 46 and 97 months. As we describe below, this variation in construction duration is important because it allows us to check whether the effect of subways on pollution is determined by the opening or the construction of the system.

3.3.2 Aerosol Optical Depth measurements from the Terra and Aqua

earth observing satellites

The Moderate Resolution Imaging Spectroradiometers aboard the Terra and Aqua Earth observing satellites provide daily measures of aerosol optical depth of the atmosphere at a 3km spatial resolution everywhere in the world (Levy and Hsu et al., 2015b,a). Remer et al. (2013)

1Table 3.7 reports data sources for ridership data. Chapter 3. Subways and Urban Air Pollution 151 provide a description of how the AOD measure is constructed. Loosely, these instruments operate by comparing reflectance intensity in a particular band against a reference value and attributing the discrepancy to particulates in the air column.2 The MODIS data are available for download at ftp://ladsweb.nascom.nasa.gov/allData/6/. Data is available in ‘granules’ which describe five minutes of satellite time. These granules are available more or less continuously from February 24, 2000 until December 31, 2014 for the Terra satellite, and from July 4, 2002 until December 31, 2014 for Aqua. The complete Terra and Aqua data consists of about 1.6m granules. During January of 2016, we downloaded all available granules and subsequently consolidated them into daily rasters describing global AOD. Each of these daily aggregates describes about 86m pixels covering the earth in a regular grid of 3km cells. With 28 satellite years of daily observations, this means that our monthly AOD data results from aggregating about 850b pixel-day measurements of AOD. Appendix 3.10 describes data processing in more detail. Figure 3.2 shows the resulting images for the entire world for June 1, 2014 and for average AOD over 2000 to 2014, both from the Terra satellite. Red indicates higher AOD readings. Unsurprisingly, the figures show high AOD in India and China. Myhre et al. (2008) attribute high AOD over Central and Western Africa to anthropogenic biomass burning in the region. White areas indicate missing data. Because they are highly reflective, the algorithm for recovering AOD from reflectance values performs poorly over light surfaces, so missing data is common in desert regions and over snow (Levy et al.2013).

The modis instrumentation can only record AOD on cloud free days. In the June 1 image, much of the missing data reflects cloud cover, though some reflects the fact that TERRA’s

2Formally, Aerosol Optical depth is

 light arriving at ground  AOD = − ln . light arriving at top of atmosphere

That is, it is a measure of the fraction of incoming light reflected by the air column before reaching the ground. Since at least zero light is reflected by the atmosphere, AOD must be positive and increasing in the share of reflected light (Jacob, 1999, p. 105). The nominal scale of AOD reported by MODIS is 0 − 5000, although we have rescaled to 0 − 5 for legibility, as is common in the literature. Chapter 3. Subways and Urban Air Pollution 152 polar orbit brings it over most, but not all of the earth’s surface each day. Because AOD reporting is sensitive to cloud cover and light surfaces, there is seasonality in the MODIS data. We see more missing data in the Northern Hemisphere in the Winter than in the Summer. The counter-cyclical Southern Hemisphere phenomena also occurs but is less dramatic. With daily images in hand, it is straightforward to construct monthly averages. Figure 3.3 panel (a) illustrates the AOD data for Bangalore in June of 2014. To show scale, the large circle in this image is 10km in radius. Bangalore is noteworthy in two regards. First, there are relatively few pixels for which we have no AOD reading over the month. Second, there is a wide range in AOD readings at this scale of observation. The corresponding picture for Delhi is entirely black. Panel (b) provides the corresponding image for Bangalore in December 2014. We construct the monthly images presented in Figure 3.3 by averaging within each pixel over the course of a month. This means that, if we were to average over an area in the monthly images, a pixel which we observe only once during the month would receive the same weight as one that we observe many times. Therefore, to calculate city level monthly AOD measure, we instead average over a whole disk centered on the city for each day, and then average these city day measures, weighting by the number of pixels observed in each day. Thus, our measure of AOD within 10km of the center of a city is an average of all pixel-days of AOD readings that fall in this region during the month. We calculate this average for both satellites using disks of radius r ∈ {10km, 25km, 50km, 150km} for all cities in our sample. Table 3.1 provides worldwide and continental summary statistics. In 2014, the average AOD reading within 10km of a city center from the Aqua satellite was 0.42. It was higher in Asian cities, 0.56, and dramatically lower in European and North American cities. The corresponding reading from Terra is slightly higher. The top panel of Table 3.1 also reports AOD measurements based on disks with radius 25km centered on each city. Unsurprisingly, these larger disks have slightly lower AOD levels than the smaller and more central 10km disks. As for the 10km disks, AOD measures based on Terra are slightly higher than those Chapter 3. Subways and Urban Air Pollution 153 for Aqua, and Asian cities are more polluted than non-Asian cities. Table 3.8 reports the mean and standard deviation of AOD for each of our 43 new subway cities using the Terra satellite. In an average month, the AOD reading for an average 10km city disk is based on 109 pixel-days for Aqua and 123 for Terra. Since the pixels are nominally 3km, if all possible pixel-days in a 10k disk were recorded over a month, we would expect about (365/12) × π × (10km/3km)2 = 1061 pixels. Thus, conditional on observing one or more pixel-days, our city-month AOD values are based on measurements of about 10% of possible pixel-days. About 10% of city-months contain zero pixel-day observations and do not appear in our sample. An average AOD reading in a 25km disk is based on about 9 times as many pixel- days as the smaller 10km disks. Since the area covered by these disks is only about 6 times as large, we record a higher proportion of possible measurements in the large disks. We suspect that this reflects two factors. First, that the satellites may be worse measuring AOD over highly reflective built environments, and so do worse over more densely built up central cities. Second, pixels are included in a disk on the basis of their centroid location, and this makes it easier for the smaller disks to ‘just miss’ including pixels. The second panel of Table 3.1 presents AOD averages for 2000 for 10 and 25km disks. Because only the Terra satellite was in operation in 2000, the second panel presents only these measures. Comparing across years, we see that variation across years is small compared to the level and compared to variation across continents. Table 3.1 suggests a slight downward trend in European and North American AOD, a slight upward trend in South American and Asia, and no obvious trend in the 43 cities as a whole. Figure 3.7 illustrates the extent of satellite coverage for our sample of 43 subway cities. Panel (a) shows the count of cities by month for which we observe AOD in a 10km disk surrounding the city’s center for each of the two MODIS satellites. Panel (b) plots mean AOD for 10km disks centered on the central business districts of our 43 subway cities. Both figures show a strong seasonal pattern in AOD. This reflects seasonal variation in cloud cover Chapter 3. Subways and Urban Air Pollution 154 and motivates our use of city-by-calendar month indicators in all of our main regressions. Together the two panels of Figure 3.7 suggest that a relationship exists between the extent of coverage and the level of AOD. In fact, a regression of average AOD in a city-month disk on the count of pixel-days used to calculate that average reveals a slightly positive relationship. We conjecture that this reflects the fact that the air is cleaner in rainy places where cloud cover is more common. Regardless of the reason, we have experimented broadly with sampling rules that reduce the importance of city-months for which AOD data is sparse and with controlling for the count of pixel-days used to construct each city-month average. In most of our regression results we control for the number of pixel-days used to construct each city-month AOD observation.

3.3.3 Other control variables

In Section 3.4 we validate the relationship between the MODIS data and ground based measurements in our sample. Consistent with the large related literature, which we describe in Section 3.4, we find that local weather conditions are important determinants of MODIS

AOD. Given this, we construct several controls for city-month weather conditions. The cru gridded dataset from Harris et al. (2014) provides high-resolution monthly climatic data describing cloud cover percentage, frost day frequency, mean temperature, precipitation, and vapour pressure. We use these data to calculate monthly and annual averages of these variables over disks centered on each city. These are our ‘climate controls’. We also include city population and country GDP per capita to characterize the level of economic activity of each city. Our population data comes from the United Nation’s 2014 Revision of the World Urbanization Prospects (DESA Population Division, 2014). These data describe annual population counts for all urban agglomerations with populations exceeding 300,000 at any time between 1950 and 2014 and also provide coordinates for the centers of all of the cities they describe. With a few exceptions that we adjusted by hand on the basis of lights at night data, we use these coordinates for the centers of all of our cities. We use Chapter 3. Subways and Urban Air Pollution 155 the Penn World Tables to obtain annual measures of country GDP (Feenstra et al., 2015) for all cities in our sample.3 We interpolate these annual GDP and population data to construct monthly measures. Table 3.8 reports population and country level GDP per capita for each city in the 12th month after the subway system opening. Cities that open subways are large, their average population is 3.7m, and tend to be in middle or high income countries.

3.4 Aerosol Optical Depth versus ground based measure-

ments

A series of papers compare measures of AOD to measures of particulate concentration from surface instruments (e.g. Gupta et al., 2006; Kumar et al., 2007, 2011). In particular, Kumar et al. (2007) examines the ability of AOD to predict particulates in a set of large cities, several of which are subway cities. Broadly, this literature concludes that AOD is a good measure of airborne particulates, with two caveats. First, satellite reports of AOD describe daytime average conditions over a wide area at the particular time the satellite passes overhead, while ground based instruments record conditions at a particular location, often over a period of hours. This causes an obvious divergence between satellite and ground based measures. In addition, ground based instruments report the concentration of dry particulates, while the satellite based measure has trouble distinguishing water vapor from other particles. This suggests that some method of accounting for differences in climate will be important when we examine the relationship between subways and AOD. As a direct check on our AOD data, we use World Health Organization data (WHO,

2016a) describing average annual pm10 and pm2.5 concentrations (µg/m3) in cities where ground-based pollution readings were available. We successfully match 143 such cities to our

3Two of our subway cities, Algiers and Dubai, lie in countries for which the Penn World Tables do not provide GDP (Algeria and UAE). For these two cities, we use country level GDP information from the World Bank. Chapter 3. Subways and Urban Air Pollution 156 subway cities data. Of the 143, 20 report readings during four years, 80 during three years, 28 during two years and 15 in only one year. The readings span the 2003-2014 period, and not all city-years record both pm10 and pm2.5 . Averaging monthly AOD values to calculate yearly averages, we obtain 316 comparable city-years for pm10 and 227 comparable city-years for pm2.5 . To compare the WHO ground based annual measures of particulates to annual averages of MODIS AOD measurements in subway cities, we estimate the following regressions

PMyit = α0 + α1AODit + controlsit + it, where y ∈ {2.5, 10} is particulate size, i refers to cities and t to years for which we can match who data to our AOD sample. Table 3.2 reports results. The upper first column presents the results of a regression of the

WHO measure of pm10 on annual average Terra AOD within 10km of a subway city center. There is a strong positive relationship between the two quantities and the R2 of the regression is 0.54. The AOD coefficient of 122.85 in column 1 means that a one unit increase in AOD maps to a 122.85 µg/m3 increase in pm10 . From Table 3.1, we see that Terra 10k readings for North America decreased by 0.06 in subway cities between 2000 and 2014. Multiplying by 122.85 gives a 7µg3 decrease. By contrast, according to US EPA historical data, during this same period US average pm10 declined from 65.6 to 55.0 µg/m3, or about a 10 unit decrease.4 Since Table 3.1 reports AOD for just the four cities in North America with new subways, while the EPA reports area weighted measures for the US, this seems reasonably close. All specifications reported in Table 3.2 assume a linear relationship between AOD and PM. Figure 3.4(a) plots residuals of regressions of PM10 and AOD on all controls used in column 2 of Table 3.2, along with linear and locally weighted regression lines. This graph illustrates both how closely the two variables track each other and how close to linear is the relationship between them.

4https://www.epa.gov/air-trends/particulate-matter-pm10-trends, accessed April 3, 2017. Chapter 3. Subways and Urban Air Pollution 157

In column 2, we conduct the same regression but include linear and quadratic terms in our annual climate variables, counts of AOD pixel-days, and continent indicators. The coefficient on AOD drops from 122.85 to 99.62, and the R2 increases to 0.75. In column 3, we add controls for city population and country GDP. This decreases the coefficient on AOD to 75.26 and increases the R2 of the regression to 0.81. Columns (4) to (6) replicate (1) to (3) but use pm2.5 as the dependent variable. These regressions have greater predictive power, with qualitatively similar results. The AOD coefficients are about half as large, which is consistent with the pm10 to pm2.5 conversion factors used by the World Health Organisation (WHO, 2016a). Finally, the lower panel of Table 3.2 shows analogous regressions for Aqua. We conduct, but do not present, corresponding regressions where our AOD measure is based on a 25km ring around the city center instead of 10km. All results are broadly similar.5 Recall that the ground-based instruments and MODIS, in fact, measure something dif- ferent. Ground-based instruments measure pollution at a point over an extended period of time. Remote sensing measures particulates across a wide area at an instant. Given this difference, the extent to which the two measures appear to agree seems remarkable. In addition to validating the use of remotely sensed AOD, Table 3.2 provides a basis for translating our estimates of the relationship between subways and AOD into a relationship

5We note that the results in Table 3.2 are quite different from those on which the 2013 Global burden of disease estimates are based (Brauer et al. 2015). In particular, they estimate

ln(pm2.5 ) ≈ 0.8 + 0.7 ln(AOD).

Comparing with Table 3.2, we see that these coefficient estimates are quite different. The difference reflects primarily our use of the level of pm2.5 , rather than its logarithm, as the dependent variable. We also use the level of AOD rather than its logarithm as the explanatory variable. Since AOD is typically around 0.5, this turns out not to be important. Finally, our sample describes a different and more urban sample of locations, relies on annual rather than daily data, and measures AOD using just MODIS data rather than an average of MODIS and a measure imputed using a climate model and ground based emissions release information. We prefer the formulation in Table 3.2 to that in Brauer et al. (2015) for three reasons. First, AOD is already a logarithm (see footnote 2), so the Brauer et al. specification uses the logarithm of a logarithm as its main explanatory variable. Second, mortality and morbidity estimates are typically based on levels of pollutants, not on percentage changes, so the dependent variable in our regressions is more immediately useful for evaluating the health implications of changes in AOD. Finally, we control for weather conditions, which appears to be important. In any case, the R2 in both studies is of similar magnitude. Chapter 3. Subways and Urban Air Pollution 158 between subways and pm2.5 , or pm10 . To illustrate this process, and to help to describe our data, Figure 3.4(b) provides a histogram of the 12,169 city-months used for our main econometric analysis. The figure provides three different scales for the horizontal axis. The top scale is the raw AOD measure. The second two axes are affine transformations of the

AOD scale into pm10 and pm2.5 based on columns 1 and 4 of Table 3.2. For reference, the black line in the figure gives the World Health Organization recommended maximum annual average pm10 exposure level (20 µg/m3).

3.5 The relationship between subway system openings and

AOD

In our primary econometric exercise, we examine changes in AOD around the time that a city opens its subway system. Let i = 1, ..., I index subway cities and t index months between

February 2000 and December 2014. Thus, AODit denotes AOD in city i at time t. We are interested in changes to AOD in the months around a system opening. In fact, we usually observe each city twice in each month, once with each of the two satellites, but suppress the satellite subscript for legibility.

0 0 If city i opens its subway in month t , then define τit = t − t . That is, τit is ‘months since the subway opened’, with months before the opening taking negative values. Let k describe the window over which we analyze AOD, i.e., τit ∈ {−k, ..., 0, ..., k}. We will most often be interested in the case of k = 18, that is, the 37 month period extending from 18 months before until 18 months after a subway opening. There are 39 cities that open a subway for which this entire window falls within the February 2000 to December 2014 range of our AOD data. In order to keep a constant sample of cities for each month, we most often consider this set. Chapter 3. Subways and Urban Air Pollution 159

Now define the following families of indicator variables,

  1 τit = j Dit(j) = (3.1)  0 otherwise,   1 τ ∈ {j, ..., j0} 0  it Dit(j, j ) = (3.2)  0 otherwise,   1 τ j0 0  it ≶ Dit(τ ≶ j ) = (3.3)  0 otherwise.

Equation (3.1) describes indicators for sets of city-months that are the same number of months away from the month when their subway system opens. Equation (3.2) describes indicators for a set of step functions beginning j and ending in j0 months from the subway opening month. Finally, equation (3.3) describes indicators for city-months that are far enough from the subway opening date that they fall outside our study window. Sections 3.3 and 3.4 indicate the following patterns in the data. The AOD data are seasonal and the pattern of seasonality varies across the globe. The AOD measurements reported by Terra are systematically larger than Aqua, but otherwise the two satellites track each other very closely. Remotely sensed AOD may confound water vapor with anthropogenic particles. City-month AOD averages are slightly increasing in the number of pixel-days on which they are based. Finally, there are long run trends in pollution, and these trends are probably different on different continents. Given this, we include the following controls in our regressions; a satellite indicator, year- by-continent indicators, and city-by-calendar month indicators. We also include the count of AOD pixel-days used to calculate the city-month AOD and linear and quadratic terms of our monthly climate variables. Finally, the control set sometimes includes linear and quadratic terms in country level GDP and city level population. Subject to the change from annual to monthly observations, these control variables closely correspond to those we used in our Chapter 3. Subways and Urban Air Pollution 160 investigation of the relationship between pm10 , pm2.5 and AOD in Table 3.2. Following Andrews (1993), Hansen (2000), and Andrews (2003), we check for a structural break in the value of AOD around the time of system opening by estimating a series of regressions of AOD on a step function as we allow the timing of the step to traverse the study period. The relevant regressions are,

AODit = α0 + α1j Dit(j, k) (3.4)

+ α2 Dit(τ < −k) + α3 Dit(τ > k) + controlsit + it

for all j ∈ {−0.75k, ..., 0, ..., 0.75k}

For our main analysis we set k = 18. This window length strikes a balance between maintaining the set of cities from which we identify our coefficient of interest and having a long analysis window. Thus, we use a 37 month study window and estimate equation (3.4) for each month in j ∈ {−14, ..., 14} with errors clustered at the city level. We then calculate a Wald test of α1j = 0 for each j. By including pre- and post-period indicators in these regressions we use all city-months in our sample to estimate city-by-calendar month indicators, continent-by-year indicators and climate variables, while only using AOD variation near the subway opening date to identify the effect of subways on AOD. Panel (a) of Figure 3.5 plots these test statistics. The figure shows a clear pattern. Wald statistics increase from a low level to a peak right after the subway opening, before quickly falling. That is, this plot suggests that the subway system opening leads to a break in the AOD sequence, and that this break occurs around the time when the system opens.6 On the basis of Panel (a) of Figure 3.5, we assume a break in AOD levels that coincides

6Andrews (2003) gives (asymptotic) critical values for the test statistic values we have just generated, a ‘sup-Wald’ test for α1j = 0 for all j. For our case, where the break in question affects only one parameter and where we trim 25% from the boundaries of the sample, the 5% critical value of this statistic is 7.87, less than the largest value that we observe for the Wald statistic in the months after a system opening. With this said, our estimation framework differs from the one for which this test statistic is derived in several small ways, and so we regard this test with some caution. Chapter 3. Subways and Urban Air Pollution 161 with the first full month of subway system operations, i.e., τ = 1. Conditional on such a break, we next check for a change in the trend of AOD associated with subway openings. We proceed much as in our test for a break, but instead look for a change in trend around the time of a subway opening. Formally, this means estimating the following set of regressions,

AODit = α0 + α1τit + α2jτitDit(j, k) + α3Dit(1, k) (3.5)

+ α4Dit(τ < −k) + α5Dit(τ > k) + controlsit + it

for all j ∈ {−0.75k, ..., 0, ..., 0.75k}

As before, we estimate the regression (3.5) for each month in j ∈ {−14, ..., 14} with errors

7 clustered at the city level and calculate the Wald test for α2 = 0 for each regression. Panel (b) of Figure 3.5 plots these Wald statistic values as j varies.8 Thus, conditional on a step at τ = 1, subways openings do not seem to cause a change in the trend of AOD in a city. Figure 3.5 suggests that subway openings cause a one time shift in a city’s level of AOD, and no other change in the evolution of AOD. Given this evidence for the existence and timing of an effect of subway openings on the evolution of AOD, we turn to estimating its size.9 To illustrate patterns in the data, we also estimate,

X AODit = α0 + α1jDit(j) (3.6) j∈{−k,...,k}\{0}

+ α2 Dit(τ < −k) + α3 Dit(τ > k) + controlsit + it

7An alternative would be to simultaneously search for locations of the break and trend break. Hansen (2000) argues that sequential searching, as we do, arrives at the same result. 8All values are well below the 10% critical value of 6.35 given in Andrews (2003). Again, our framework differs from the framework under which this test statistic is derived so this test should be regarded with caution. 9Hansen (2000) provides an elegant method for estimating the location and size of the break along with confidence intervals. This method assumes a balanced panel, and so we have not been able to apply it to our data. Chapter 3. Subways and Urban Air Pollution 162

Control variables are the same as those used in our test for a structural break. Since the indicator for τit = 0 is omitted, the α1j measure mean difference in AOD for city-months with τit = j from those with τit = 0. Figure 3.5 plots as dots the 36 monthly values of α1j that result from conducting regression (3.6) with k = 18. The vertical axis of this figure indicates AOD. From Table 3.1, sample mean AOD is about 0.44, so monthly variation in AOD is small relative to the mean level. Inspection of Figure 3.5 suggests a drop in AOD around the time of a system opening. We are concerned that AOD responds not to subway openings, but to subway construc- tion. This might occur for at least three reasons. First, construction is dusty, and so ending it might reduce pollution independent of the effect of the opening of the subway. Second, construction is expensive and labor intensive. AOD might decrease in response to an overall decrease in economic activity when construction ends. Finally, subways might occupy space otherwise used for cars and thereby affect traffic through a channel unrelated to the subway’s intended function. To assess the importance and plausibility of these hypotheses, we collect data describing the start date of construction. These data, detailed in Table 3.8, indicate that on average construction began almost seven years prior to opening, with considerable variation around this mean. These data allow us to replicate the event study described in equation 3.6 around the beginning of construction. The bottom right panel of Figure 3.5 presents the results and does not support the idea that the start of subway construction affects city AOD levels. While it is easy to come up with reasons about why some event related to subway construction rather subway openings might affect AOD, the data do not offer immediate support for such hypotheses. To estimate the size of the post-subway drop in AOD, and to investigate its robustness, Chapter 3. Subways and Urban Air Pollution 163 we conduct regressions of the following form,

AODit = α0 + α1Dit(1, k) (3.7)

+ α2Dit(0) + α3 Dit(τ < −k) + α4 Dit(τ > k) + controlsit + it.

This is a regression of AOD on an indicator for post-opening and controls. As before, we include pre- and post-period indicators in order to isolate AOD variation for city-months close to the time of their subway opening. We include a dummy variable for city-months when τit = 0. This month is ‘partly treated’ and this specification ignores AOD variation for these partly-treated city-months. The intercept, α0, gives the conditional mean AOD over the city-months where τit ∈ {−18, ..., −1}. The coefficient of interest is α1, the difference in conditional mean AOD between city-months with τit ∈ {−18, ..., −1} and those with

τit ∈ {1, ..., 18}. Table 3.3 presents results. In column 1 of Table 3.3 we conduct the regression given in equation (3.7) with our basic set of controls.10 We see that the effect of subways on AOD in the 18 months following a subway opening, α1, is about -0.02 and is different from zero at the 5% level. Column 2 adds country level GDP and city population. Column 3 adds city specific trends. Column 4 adds country level GDP and population and city specific trends. Column 5 adds city specific trends and allows for a trend break in each city at time zero. Column 6 adds country level GDP and population to the specification of column 5. The estimated effect on AOD of subway opening is stable across these various specifications. The difference between the smallest and largest coefficients (columns 3 and 6) is 0.002. This is about 25% of the standard error of the most precise estimate in column 5. Panel (c) of Figure 3.5 shows the magnitude of the AOD coefficient from column 2, as well as 95% confidence bounds. All regressions in Table 3.3 include the following controls; linear and quadratic climate

10For reference, we observe 39 cities where subways open, for 12 month per year, for 14 years with two satellites. Multiplying gives 13,104. This slightly exceeds the size of our main regression sample because the Aqua satellite begins operation in 2002 rather than 2000 for Terra, and because of a few city-months for which a satellite reports zero AOD pixels. Chapter 3. Subways and Urban Air Pollution 164 controls (temperature, vapor, cloud cover, precipitation, frost days), AOD pixel count, satel- lite indicator, calendar month×city indicators, year×continent indicators, pre- and post- period indicators. Removing climate controls or AOD pixel count does not have an impor- tant effect on the subway coefficient or the regression R2. We also experimented broadly with sampling rules that reduce the importance of city-months for which AOD data is sparse and find that it does not qualitatively affect our results. However, calendar month×city indica- tors are important. If we omit these indicators, all else equal, the R2 of our regressions drops from about 0.83 to about 0.49, while the coefficient for subway openings about doubles in the first four columns of Table 3.3, but is not distinguishable from zero when we include city specific trend with a break at zero in columns 5 and 6. That the correlation between sub- way openings and AOD does not depend on weather seems unsurprising. Since the number of AOD pixels primarily reflects cloud cover, the same intuition extends to the pixel count control. The importance of calendar month control is also expected. City level seasonality is an important determinant of AOD levels. Cameron et al. (2008) find that asymptotic standard errors may not approximate exact fi- nite sample standard errors in samples containing 30 or fewer clusters. With 39 cities/clusters our main sample is not much larger. To confirm that the asymptotic standard errors reported in Table 3.3 approximate exact values, we implement the wild-cluster bootstrap procedure recommended in Cameron et al. (2008). We report wild cluster bootstrap p-values using 300 bootstrap replications in square brackets in Table 3.3. The statistical significance of the estimates is unchanged when we use the wild cluster bootstrap instead of the asymptotic standard errors. Given this, we report only asymptotic errors for the rest of our results. Table 3.3 strongly suggests that opening a subway network decreases a city’s AOD by about 0.02 over the 18 month period following the opening. The mean value of AOD across cities in our sample before subways open is 0.49, so this 0.02 subway effect on AOD represents a 4% reduction on average. Regression results so far focus on average effects over the 18 months before and after a Chapter 3. Subways and Urban Air Pollution 165 subway opening. Table 3.9 refines the results of Table 3.3 by decomposing the study window into 6 month bins. These estimates are broadly consistent with inspection of panel (c) of Figure 3.5 and Table 3.3. Subways have a negative effect on AOD during each six month period following a subway opening. These effects are largest during the period from 7-12 months after an opening. Unsurprisingly, our estimates are somewhat imprecise, and only the effect during the 7-12 months post-opening is distinguishable from zero at conventional confidence levels. Estimates of the subway effect on AOD in the two pre-periods are dra- matically smaller than post-periods, but are estimated with about the same precision. In particular, and reassuringly, they are not distinguishable from zero. In columns 5 and 6 of Table 3.3 we see that our estimates are robust to the inclusion of city specific trends, before and after the opening. Together with the results of Table 3.9 this suggests that our results are not driven by confounding trends in AOD around the time of subway openings. Table 3.3 presents estimates of the average effect of subways on a city. It is natural to suspect that, in fact, subways do not affect all cities in the same way. To investigate this possibility, we estimate a version of equation 3.7 where we allow the effect of subways to vary across cities. Figure 3.6 summarizes these results. The horizontal axis gives the magnitude of the subway effect for each city, and the vertical axis the corresponding standard error. Shaded regions indicate whether the city effect in question is distinguishable from zero at usual confidence levels. This figure suggests considerable heterogeneity in how cities respond to subways. We conduct a number of exercises in order to discern a pattern in this heterogeneity. Table 3.10 investigates whether subways have different effects on different types of cities. In this table, we replicate the results of column 2 of Table 3.3, but add an interaction between the post-subway indicator and a particular city characteristic, along with the characteristic in question. In order, the interactions considered are; an indicator for cities in the top half of the city size distribution in 2000, an indicator for cities in the bottom half of the country income distribution in 1990, an indicator for above median average rainfall, an indicator for Chapter 3. Subways and Urban Air Pollution 166 cities whose subway opening included more than the median number of stations, indicator for cities in Asia, and finally, an indicator for cities in the top half of the AOD distribution in the year prior to their subway opening. The interaction term has only a small effect on the main treatment effect and the interac- tion term is small and indistinguishable from zero when we consider heterogeneity of effects by city size, country income level, mean precipitation, and initial subway size (Columns 1-4 in the table). Heterogenous effects are more evident when we differentiate between Asian cities and the rest (column 5), and indicate that Asian cities experience larger pollution reductions with the introduction of their subway systems (-0.025 vs -0.015), although the interaction term is not statistically different from zero. Column 6 distinguishes between high and low pollution cities. As would be expected, the estimates suggest that pollution effects are larger (-0.029 vs -0.012) among initially more polluted cities. In fact, this specification indicates that pollution reduction effects can only be differentiated from zero for initially higher pollution cities.

3.5.1 Longer time horizons

Subways are durable and their effects probably extend over decades. Hence, it is of interest to extend our estimates of the effects of subways to the longest possible horizon that our data permit. Unfortunately, considering a longer treatment period requires that we degrade our research design in one of two ways. As we consider longer treatment periods we must either allow later post-treatment effects to reflect a decreasing set of cities, opening the door to confounding composition with subway effects, or else restrict attention to progressively smaller samples of cities, reducing precision and raising questions of external validity. In spite of this, the importance of obtaining estimates over a time horizon that more nearly approximates the planning horizon suggests that such estimates will be useful, even though we have less confidence in them. In Table 3.4 we continue to consider a pre-treatment period beginning 18 months before Chapter 3. Subways and Urban Air Pollution 167 an opening, but consider longer post-treatment periods. In columns 1 and 2 we consider two years after an opening, and allow the subway effect to vary by year using a specification that is otherwise the same as we used in Table 3.3. As in Table 3.3, the two columns differ only in that column 2 includes controls for city population and country level GDP, while column 1 does not. We see that the one year effect is about −0.02, statistically identical to our estimate of the 18 month effect in Table 3.3. Point estimates of the second year effect are slightly smaller and are estimated with about the same precision as the 1 year effect. We can reject neither the hypothesis that the second year effect is zero nor that it is the same as the one year effect. Panel (b) of Table 3.4 estimates the average subway effect over the two year post-period considered in the first two columns. Unsurprisingly, this average effect is statistically different from zero, but not from -0.02. In order to extend our sample to two years post-treatment with a constant sample of cities in all months, we lose a city and the first two columns of Table 3.4 reflect 38 subway openings instead of 39. In columns 3 and 4 of Table 3.4 we extend the post-treatment period to 36 months. Each of the three post treatment years are negative and estimated precisely enough that they can be distinguished from zero. Point estimates are slightly increasing in magnitude with time from the system opening, although the change is not large relative to the magnitude or precision of the estimates. To consider the three year post-treatment period with a constant sample of cities in each month, we restrict our sample to 35 cities. Thus, the slight change in the year 1 and 2 treatments from columns 1 and 2 to columns 3 and 4 may reflect the change in sample. Finally, columns 5 and 6 consider a four year post-treatment period. These columns indicate slightly larger effects than in the other columns, and suggest that the effect of subways increases over time, although the precision of these estimates does not allow us to reject the hypothesis that the effect is constant in all post-treatment years. To preserve a constant sample of cities over this longer horizon, we restrict our sample to only 28 subway openings. Again, the average effect presented in panel (b) is not distinguishable from -0.02. Table 3.5 extends the analysis horizon by dropping the requirement that the sample of Chapter 3. Subways and Urban Air Pollution 168 cities is constant throughout the window of analysis and allows us to show estimates of the effect of subways over the five to eight years after a system opening. To begin, columns 1 and 2 of Table 3.5 look at the effect of subways for a five year post-treatment period following the specification of column 2 of Table 3.3, except for the longer treatment period. For reference, column 1 restricts attention to set of 26 cities for which we observe the entire post-treatment period.11 In column 1 we see that all yearly effects are negative, and, except for the very large effect in the fifth year, not far from our baseline estimate of about −0.02. Switching to the larger sample of 39 cities in column 2, we see that coefficient magnitudes and standard error decrease slightly. In columns 3 to 5, we consider progressively longer treatment periods. Panel (b) reports the average effect over the whole relevant post-treatment period. In every case this average effect is quite close to our baseline specification in column 1 of table 3.3. Inspection of these results reveals three patterns. First, across specifications, point es- timates are within 1.64 standard errors of -0.02 in almost all cases. Second, only the first, third and fifth year effect are significantly different from zero and the fifth year effect seems anomalously large relative to the other years. Finally, there is no obvious trend in coefficient magnitudes with time from opening, while in all columns except 1, standard errors increase monotonically, or very nearly so, with time from opening. That is, unsurprisingly, as the subway opening is more remote, its effects are progressively more difficult to detect, and beyond 5-6 years we are unable to distinguish annual subway effects from zero. The lower panel in the table reports mean effects across post-treatment years and points to surprisingly stable estimates even as the number of years post treatment are extended (−0.019 in column 2 versus −0.018 in column 5). These results are consistent with a persistent AOD decrease of about 0.02 that is gradually overwhelmed by noise or with a transitory effect that lasts about five years.

11In Table 3.8 these 26 cities range from Rennes to Dubai. Chapter 3. Subways and Urban Air Pollution 169

3.5.2 Spatial scale of effect

Subways overwhelmingly serve the areas close to the most central part of a city. Gonzalez- Navarro and Turner (2016) document that about 40% of all subway stations in existing subway systems lie within 5km of the center and another 30% within 10km of the center. Thus, we expect larger effects on AOD nearer to city centers than further away, particularly for this sample of new subway system cities. Table 3.6 documents precisely this phenomenon. For reference, the first two columns of this table reproduce the first two columns of Table 3.3. Columns 3 and 4 are identical to columns 1 and 2, except that the dependent variable is mean monthly AOD between 10 and 25km of the center rather than within 10km. Columns 5 and 6 examine AOD between 25 and 50km, and columns 7 and 8 between 50 and 150km. As expected, the effect of subways on AOD decreases with distance from the city center and eventually goes to zero.

3.5.3 Further results

Placebo test: Table 3.11 presents the results of an important placebo test. To conduct this test, we match each subway city to the nearest city within 1000km that has a population within 20% of the target city. We are able to find such a match for 27 of our 39 subway cities. We replicate the regressions of Table 3.3 using placebo city outcomes in Table 3.11. Simple t-tests fail to reject the hypothesis that the subway effect is different from zero in the placebo sample, but do reject the hypothesis that the subway effect is the same for placebo and subway cities.

A larger sample of cities and country-by-month indicators: We cannot include country-by-month indicators in our main specification because many of the countries in our sample experience only one subway opening during our study period. Moreover, our results are somewhat sensitive to the inclusion of city-by-calendar month indicators. Given this, we experiment with different configurations of indicator variables to capture country and season Chapter 3. Subways and Urban Air Pollution 170 effects to check the robustness of our findings. First, our results are robust to using hemisphere-month fixed effects instead of year-by- continent and city-by-calendar month dummies. Second, in results presented in Appendix Table 3.12, we extend our sample to include all 650 cities in the United Nation’s World Urbanization Prospects data (DESA Population Division, 2014). This much larger sample of cities permits us to re-estimate and confirm our main results in specifications that use country-by-month indicators. This result indicates that our results hold even if we allow for completely flexible pollution trends at the country level.

Excluding observations with low pixel count: To check whether our main results are sensitive to the inclusion of city-months for which AOD coverage is sparce, we exclude observations with a number of pixel-days used to calculate AOD that falls below the tenth percentile of the distribution for a given satellite. The results are presented in Appendix Table 3.13 and, as can be seen, they are very similar to those obtained in Table 3.3.

Expansions vs openings: Up until now, our investigations have focussed on the effects of the initial opening of a subway. We now turn to an investigation of subsequent expansions. To consider the effect of a city’s first subway expansion (typically the second subway line in the system), as opposed to the system opening, we first restrict attention to the set of 14 cities for which we observe AOD for both the opening and the expansion. Column 1 of Table 3.14 reports the results of a regression like the one reported in column 2 of Table 3.3 to estimate the effect on AOD of a city’s first subway expansion over an 18 month treatment window. The point estimate of this effect is −0.015, smaller than the opening effect from Table 3.3, but we cannot distinguish this effect from zero. Because we observe the whole history of each subway system, we can also identify the date of the first expansion for cities that do not open their subways during the period when we observe AOD. This increases the number of available expansions from 14 to 26. Column 3 of Table 3.14 reports the estimate of the expansion effect on this larger set of cities. This Chapter 3. Subways and Urban Air Pollution 171 estimate is smaller and more precisely estimated, and while we cannot distinguish it from zero, we also cannot distinguish it from −0.02. Columns 2 and 4 replicate the estimations of columns 1 and 3, but use any expansion as the treatment variable. These estimates are both close to zero and estimated with sufficient precision to allow us to distinguish them from −0.02, but not from zero. In sum, this table provides suggestive evidence that subway expansions are less important than openings.

Ridership: Table 3.15 examines the effects of subway openings and expansions on subway ridership and confirms the conclusions suggested by Table 3.14. An average subway opening results in ridership of about 62 people per 1000 of population. The first expansion, on average, increases ridership by about 45 people per 1000 of population, and subsequent expansions have still smaller effects. In Table 3.8 we see that a subway system consists of 14.4 stations when it opens and that an average first expansion adds 15.7 stations. That is, an average expansion probably adds just about as much capacity as does an average opening. Together with our finding that first expansions seem to have a smaller effect than openings, this suggests that subway networks may not be subject to increasing returns to scale. Alternatively, it may be that subway expansions are built in places where there is less demand for subway travel than are the initial line.

Implications of longer pre-treatment period: The results presented so far are consis- tent with subway openings causing a decrease in AOD of about 0.02, and with this effect being about constant over at least the subsequent 5 to 8 years. If we instead consider a pre-period of around 48 months, then there are two noteworthy changes. First, from Table 3.8 the set of cities for which we observe a 48 month pre and post-period begins with Gwan- ngju, South Korea and ends with Shenyang, China, so we are left with a dramatically smaller sample of 21 cities. Second, in this sample there are three extremely low pollution months about 24 months prior to subway opening. The three months do not appear to indicate a Chapter 3. Subways and Urban Air Pollution 172 break in either the trend or the level of AOD. However, they are sufficiently different from neighboring months that they shift the 48 month pre-period AOD mean. With this said, in regressions analogous to those of Table 3.3 with the longer pre and post-treatment period and smaller sample, we estimate a decline in AOD around the time of subway opening which is only slightly smaller than what we report in Table 3.3. Figure 3.9 illustrates this longer study period but is otherwise analogous to the panel (c) of Figure 3.5.

3.6 Subways, AOD and urban travel behavior

To reconcile our estimate of the effect of subway openings on AOD with other facts about traffic, subways, and pollution, consider the following back of the envelope calculations. On the basis of travel survey data described in Akbar and Duranton (2017), residents of Bogota take about 2.69 trips per day and of these 19.3%, or 0.52 trips per day, are by private car or taxi. From Table 3.8, twelve months after its subway opening an average city in our sample has a population of about 3.7m. Multiplying per capita trips by this population, if people drive at the same rate as does the population of Bogota then an average city in our sample generates about 2m trips by car or taxi per day. From Figure 3.1, in the second year of its operation an average subway in our data provides about 60 rides per 1000 of population. Applying this rate to a city of 3.7m, an average subway system provides about 200,000 rides per day. If all subway rides replace car or taxi rides, this is about 10% of all rides in our hypothetical city. We can perform a similar calculation on the basis of US national averages using the 2009 National Household Transportation Survey.12 The 2009 NHTS records that an average US household took 3.8 trips per day, about 90% by car (Duranton and Turner, 2017). If everyone in a city of 3.7m takes as many trips as an average American, then the city generates about 13m trips by car per day. If the subway system provides the same about 200,000 rides per day and if all subway rides replace car or taxi rides, then the subway opening should reduce

12nhts.ornl.gov/2009/pub/stt.pdf, Table 3. Chapter 3. Subways and Urban Air Pollution 173 car trips by about 1.5%. For the sake of illustration, suppose a 1% reduction in traffic results in a 1% reduction in pm10 and that there is no demand response for car travel as drivers shift trips to the subway. In this case, the calculations above mean that a subway reduces pm10 by 10% in a city like Bogota and by 1.5% the US. These magnitudes bracket our sample average 4%

AOD reduction. Thus, if a 1% reduction in traffic results in a 1% reduction in pm10 and that there is no demand response for car travel, then our estimated 4% AOD reduction seems reasonable. With this said, the evidence for a direct relationship between driving and metropolitan pm10 levels is inconclusive. Friedman et al. (2001) examine changes in traffic and changes in pm10 in Atlanta around the 1996 summer olympics. During this time, the city imposed restrictions on driving and saw traffic fall by about 2.8% over a 17 day period. During the same period, pm10 fell by 16.1%. That is, each 1% reduction in driving was associated with about a 5% reduction in pm10 . In a related exercise, Gibson and Carnovale (2015) examine the effect of a change in Milan’s congestion pricing program on both traffic and pm10 . Their estimates allow us to calculate that a 4% reduction in traffic caused about a 1% reduction in pm10 .13 Finally, the US EPA attributes about 16% of US pm10 to on road vehicles.14 Of these studies, only Friedman et al. (2001) finds that a one percent decrease in traffic is associated with at least a one percent decrease in pm10 . This has important implications for our calculations. Suppose that, in line with Gibson and Carnovale (2015), we require a 4% reduction in traffic for each 1% decrease in pm10 . In this case, in order to achieve a 4% reduction in pm10 , a subway opening would need to reduce traffic by 16%, not by 4%. This seems implausible. Alternatively, if subway openings reduce driving by 4% then we would expect them to reduce pm10 by 1%.

13From Gibson and Carnovale (2015) Table 6, treatment increased log pm10 by 0.0404 and, from Table 2, traffic by 26,725 cars. From Table 2, mean traffic and pm10 are 169,744 and 47.66. Using these values to calculate percentage changes we arrive at a 16% change in traffic and 4% change in pm10 . 14U.S. EPA, Report on the Environment, https://cfpub.epa.gov/roe/indicator.cfm?i=19, accessed, April 2017. Chapter 3. Subways and Urban Air Pollution 174

In addition, the evidence that drivers respond to increased road capacity seems com- pelling. Duranton and Turner (2011) find that metropolitan area traffic increases in direct proportion to road capacity, a finding that Hsu and Zhang (2014) confirm for Japan. While these papers study road expansions explicitly, the logic of their finding suggests that a re- duction in traffic due to increased subway ridership ought to be met with almost exactly offsetting increases in the demand for automobile and truck travel. That is, that subways ought not to reduce traffic in urban areas. Duranton and Turner (2011) corroborate this argument by looking at the relationship between the level of driving in a US metropolitan area and the stock of buses and subway cars in the MSA. Although the resulting estimates are imprecise, they do not contradict the hypothesis of no effect. However, Duranton and Turner (2011) and Hsu and Zhang (2014) study, respectively, US and Japanese metropolitan areas, both with average populations of about 1m. In contrast, we consider a set of cities, all with subways, with average population of nearly 4m, of which only one is Japanese and two in the continental US. Thus, it may be that the demand responses estimated in Duranton and Turner (2011) and Hsu and Zhang (2014) do not extrapolate to our sample of larger and more international subway cities. Garcia-Lopez, Pasidis and Viladecans- Marsal (2017) provide suggestive evidence that the demand response to additional lanes is smaller for European cities with subways than without. With this said, the hypothesis of no demand response seems improbable. This, too, has important implications for our calculations. Suppose that the demand response in our sample is only half what Duranton and Turner (2011) find for the US. In this case, in order to reduce driving by one trip, we would require two subway trips. This would halve the effect of subways on pm10 in our back of the envelope calculation. To sum up, subway openings reduce metropolitan AOD levels by about 4% and probably carry approximately the same fraction of daily trips. However, a significant traffic demand response likely follows a subway opening, and on average a 1% traffic reduction probably reduces pm10 by less than 1%. It follows that the effect of subways on AOD probably does Chapter 3. Subways and Urban Air Pollution 175 not operate entirely by diverting average trips to the subway. To cause the observed decrease in pm10 , subway openings must divert trips that are particularly polluting. We can suggest two reasons why this might occur. First, marginal drivers who switch to the subway may be poor relative to the pool of all drivers, and if so, they may drive particularly dirty cars (or scooters). Thus, the availability of subways may affect pollution by affecting the composition of the stock of cars on the road. There does not seem to be a basis in the literature for estimating the magnitude of this effect. Second, evidence from hourly ridership indicates subways typically provide a disproportionate share of their trips at peak hours. In this case, subway trips may replace trips that occur at congested times and have high external congestion costs. If so, it may be that each car trips shifted to the subway has a disproportionate effect on pollution because they reduce pollution produced by other commuters. Anderson (2014) provides evidence, over the very short run, to support this idea.

3.7 Value of AOD reductions following subway openings

3.7.1 Value of health benefits from estimates in the economics lit-

erature

Arceo et al. (2016) use data describing Mexico city between 1997 and 2006 to estimate a weekly infant death rate of 0.24 per 100,000 per µg/m3 of pm10 . Thus, of 100,000 births, a 1 µg/m3 decrease in ambient pm10 averts about 12.5 infant deaths.15 Knittel et al. (2016) use data from California between 2002 and 2007 to estimate a weekly infant death rate of 0.19 per

100,000 births per µg/m3 of pm10 . This estimate implies that of 100,000 births, a 1 µg/m3 decrease in ambient pm10 averts 9.9 infant deaths. Chay and Greenstone (2003) consider data describing infant deaths in about 1000 US counties between 1978 and 1984 and estimate that a one unit µg/m3 decrease in ambient TSP averts about 5.2 infant deaths per 100,000 births.

15An infant survives its first year if it survives 52 weeks. Thus a weekly death rate of 0.24 × 10−5 gives [1 − (1 − (0.24 × 10−5))52]105 = 12.47 infant deaths per 100, 000 births. Chapter 3. Subways and Urban Air Pollution 176

Converting from TSP to pm10 is non-trivial, however pm10 = 0.55×TSP is a sometimes used rule of thumb (World Bank Group and United Nations Industrial Development Organization, 1999). Rescaling the estimate from Chay and Greenstone (2003) implies that a one µg/m3 decrease in ambient pm10 averts about 9.5 infant deaths per 100,000 births. In sum, these studies suggest that a one µg/m3 decline in pm10 averts about 10 infant deaths per 100,000 births.16 That none of these estimates can be distinguished from the others despite a range of mean pm10 of from about 28 µg/m3 for Knittel et al. (2016) to about 67 µg/m3 for Arceo et al. (2016) suggests that the infant mortality response is approximately linear in pm10 (as Arceo et al. (2016) observe). Burnett et al. (2014) confirms the approximately linear dose-response relationship suggested by Arceo et al. (2016). More specifically, Burnett et al. (2014) surveys the large public health literature on the health consequences of pm10 and find that responses are approximately linear in the range from 5-100µg/m3, although they find non-linearities outside of this range. Figure 3.4(b) shows that most of our city-months fall in this 5-100µg/m3 range. Therefore, in light of the results described above, we can reasonably assume a linear dose-response relationship in our sample. Using these results, together with our estimates, allow us to estimate annual infant deaths averted by a subway opening for an average city. From Table 3.3, subway openings cause about a 0.02 unit decrease in AOD. Using column 2 of Table 3.2 to convert from AOD to pm10 gives about 2.0 µg/m3 of pm10 . At 10 infant deaths per 100,000 births per µg/m3 of pm10 , the number of averted deaths due to a subway opening in city i is given by

−5 2.0 × (10 × 10 ) × Birthratei × Populationi,

16Currie and Neidell (2005) use data describing pm10 and infant mortality in California between 1989 and 2000 and conclude that pm10 has no measurable effect on infant mortality. Knittel et al. (2016) and Arceo et al. (2016) both replicate the Currie and Neidell (2005) research design and find much smaller effects than the IV estimates reported above. We note that Jayachandran (2009) and Gutierrez (2010) also estimate the effects of particulates on infant mortality. We do not discuss their estimates because they do not present their results in a way that permits a conversion to mortality rates per µg/m3 of pm10 . Chapter 3. Subways and Urban Air Pollution 177

An average city in our sample has a population of about 3.7 million in the year before its subway opens. With a 2% birthrate, a subway opening in this city averts about 15 infant deaths per year.17 With country-level birthrate data from the World Health Organization (2016c) and our population data, we can make this calculation somewhat more precisely. Specifically, im- puting country level birth rates to cities, calculating the implied number of averted infant deaths for each city, and averaging over cities, we find that an average subway averts 9.4 infant deaths per year. To monetize this benefit, we use country-adjusted values of a statistical life (VSL) to value averted infant deaths in each city.18 Averaging over all cities, the value of averted infant deaths is $21m per year. Our estimates do not allow us to conclude that subways continue to affect air quality beyond 5-6 years after their opening date. With a 5% discount rate, the present discounted value of this amount over five years is about 95.5m dollars. If the effect is permanent, the corresponding present value is 441m dollars.

3.7.2 Value of health benefits from the Global Burden of Disease

methodology

To extend our mortality calculations over the entire age distribution, we apply the method- ology employed by Global Burden of Disease project (WHO, 2016b). This methodology is complicated and is described in detail in the Appendix Section 3.11. We obtain integrated risk functions from Burnett et al. (2014) for five causes of mor- tality. These functions summarize the results of several epidemiological cohort studies, and consist of non-linear maps between pm2.5 concentrations and mortality risk ratios. We quan- tify the contribution of air pollution to age and disease-specific mortality by computing the

17The World Bank reports that the world average crude birth rate is 19.5/1000 (http://data.worldbank. org/indicator/SP.DYN.CBRT.IN, accessed April 2017). 18Specifically, we take Viscusi and Aldy’s (2003) 0.6 elasticity of VSL with respect to income, and impute a country’s VSL from the U.S. value of $6m. Chapter 3. Subways and Urban Air Pollution 178 population attributable fraction. This is the percentage mortality reduction that would oc- cur if pm2.5 concentrations were reduced to a counterfactual exposure level. We proceed in three steps. First, using coefficients in column 5 of Table 3.2 (Terra), we predict city- level pm2.5 concentrations during the 12 months preceding the opening of a subway. Second, applying the integrated risk functions, we calculate the population attributable fractions as- sociated with a 1.102 µg/m3 pm2.5 decrease from the predicted pre-subway exposure level.19 Third, using city population and country-level death rates from the World Health Organ- isation (WHO, 2016c), we construct city-specific mortality for each disease and age class. The total number of avoided deaths in each city is obtained by applying the population at- tributable fractions to the mortality rates, summing over every disease. Averaging over all cities, the 0.02 unit decrease in AOD that follows a subway opening saves about 222 lives per year. Valuing these lives, as before, with country-adjusted values of a statistical life, the value of averted deaths is $594m per year. With a 5% discount rate, the present discounted value of this amount over five years is about 2.7b dollars. If the effect is permanent, the corresponding present value is 12.5b dollars.

3.7.3 Discussion

These benefit calculations are obviously crude. They do not account for morbidity, for possible effects on labor productivity, nor for the fact that subways may reduce pollutants other than particulates. While the magnitudes of these effects remain uncertain, they are almost surely positive, and possibly large, e.g., Chang et al. (2016) or Murray (2016). Thus, we might reasonably expect that a complete accounting for the health and productivity related benefits of subway induced improvements in air quality would lead to a much larger value than we describe above. The mean length of track for a newly opened subway system in our study period is 19.2km.

19From column 5 of Table 3.2, a 0.02 unit decrease in AOD converts to a 0.02 ∗ 55.12 = 1.102 µg/m3 decrease in pm2.5 . Chapter 3. Subways and Urban Air Pollution 179

Baum-Snow and Kahn (2005) examine 16 US subway systems and estimate constructions costs ranging between 25m and 287m dollars per mile. Using this range of cost estimates, the cost of construction for an average subway system in our sample ranges from 298m to about 3.4b dollars. Our estimates of the present value of avoided infant mortality in an average subway city range between 95.5m and 441m, depending on whether the subway effect on pollution lasts for five years or is permanent. Our estimates of the present value of avoided all-age mortality in an average subway city range between 2.7b and 12.5b, again depending on whether the subway effect on pollution lasts for five years or is permanent. Comparing these magnitudes suggests that the value of subway induced improvements to air quality may be a substantial fraction of subway construction costs.

3.8 Conclusion

Column 2 of Table 3.3 indicates that AOD fell by about 0.021 in the 10k disk surrounding a city center during the 18 months following an average subway opening. This effect is robust to econometric specification, and is estimated precisely, the 95% CI is [-0.040,-0.002]. Mean monthly AOD in the regression sample before the introduction of subways is 0.49. Dividing, the mean AOD reduction from a subway opening is about 4% with a 95% confidence interval ranging from 0.5% to 8%. From Table 3.1, AOD readings from Terra in 2000 and 2014 are 0.38 and 0.45. Our 0.02 estimated subway effect is only about 4% of the level of AOD, but it is more than 25% of the 14 year change. Comparing across continents in 2014, we see that the difference between Europe and North America is about 0.07, three and a half subway effects, and between Europe and Asia the difference is 0.38, about 20 subway effects. In all, this suggests that subways may play a moderately important role in determining AOD in a city. To the extent that our data allows us to consider longer post-opening periods, this effect of subways on AOD appears to be approximately constant during the 5-6 years after an opening, and indeterminant beyond this time. As we consider larger areas around the city Chapter 3. Subways and Urban Air Pollution 180 center, the effects of subway openings attenuate in intuitive ways. The effects of subway expansions seem to be smaller than those of subway openings, an effect that we observe both in changes in AOD and ridership. To the extent that public policy encourages subway construction, this suggests that openings are relatively more important than expansions. We note that the decreasing marginal effect of subway expansion seems to be broadly consistent with the slight decreasing returns in the effects of metropolitan road networks observed in Couture et al. (2016). Extant estimates of the effects of particulates on mortality suggest that they are suffi- ciently poisonous that the small nominal reductions from subway openings are economically important. In particular, they appear to be large enough to justify moderate subsidies for subway construction in environments where more direct policy instruments for managing automobile pollution are not available. On the other hand, the channel through which subways affect pollution remains somewhat uncertain. Our calculation suggests that observed levels of ridership account for a significant fraction of metropolitan travel. In order for this level of ridership to cause the observed 4% decline in AOD, it seems likely that trips diverted to the subway are particularly polluting, either because they use particularly polluting vehicles or because they occur at especially congested times. Chapter 3. Subways and Urban Air Pollution 181

Figure 3.1 – Daily ridership per capita 100 80 60 40 20 0 Mean Daily Ridership per 1,000 people 0 12 24 36 48 60 Months since opening

Note: Graph depicts average daily passengers on subway per 1,000 people in metropolitan area, as well as a locally weighted regression of the series. Chapter 3. Subways and Urban Air Pollution 182

Figure 3.2 – Two maps showing AOD. Red indicates higher levels of AOD

Aerosol Optical Depth, June 1st, 2014 Terra

Aerosol Optical Depth, Average 2000-2014, Terra Chapter 3. Subways and Urban Air Pollution 183

Figure 3.3 – AOD for Bangalore in June and December 2014

(a) Bangalore, June 2014 (b) Bangalore, December 2014

Note: Terra AOD for Bangalore in June and December of 2014. The large circle in each image has a radius of 10km and is centered on each city’s central business district. Subway stations as of December 2014 are shown as black circles. Darker values indicates areas where AOD is larger and white indicates missing values. Chapter 3. Subways and Urban Air Pollution 184

Figure 3.4 – AOD versus PM 150 .045 .04 .035 100 .03 .025 50 Fraction .02 .015 .01 0 .005 0 0 .2 .4 .6 .8 1 1.2 1.4 1.6

-50 AOD 0 20 40 60 80 100 120 140 160 180 200 PM10

-100 0 10 20 30 40 50 60 70 80 90 100 -.4 -.2 0 .2 .4 PM2.5

(a) (b)

Note: (a) Plot showing residualized PM10 and AOD, together with linear trend and locally weighted regression line. (b) Histogram of city-months by AOD, pm10 and pm2.5 . pm10 and pm2.5 axes rescaled from AOD using columns 1 and 4 of Table 3.2. Black vertical line indicates WHO threshold level for annual average pm10 exposure (WHO, 2006). Chapter 3. Subways and Urban Air Pollution 185

Figure 3.5 – Break-tests and event studies the 18 months before and after subway openings and the start of construction 8 8 6 6 4 4 2 2 Wald statistic Wald statistic 0 0 -18 -12 -6 0 6 12 18 -18 -12 -6 0 6 12 18 months to opening months to opening (a) Test for discontinuity (b) Test for trend break .1 .1 .05 .05 0 0 AOD AOD -.05 -.05 -.1 -.1 -18 -12 -6 0 6 12 18 -18 -12 -6 0 6 12 18 months to opening months to construction start (c) Event study (d) Test for discontinuity

Notes: (a) Plot of Wald statistics for tests of a regression intercept discontinuity at time τ. Test statistics calculated in regressions that also control for a satellite indicator, year-by-continent indicator variables, city-by-calendar month indicators, AOD pixel-days and linear and quadratic terms of monthly climate variables, and linear and quadratic terms in country level GDP and city level population. (b) Plot of Wald statistics for tests of a trend break at time τ conditional on a discontinuity in the mean level of AOD at τ = 1. Other details are the same as in Panel (a). (c) Event study during 18 months before and after subway openings. (d) Event study during 18 months before and after start of subway construction. Chapter 3. Subways and Urban Air Pollution 186

Figure 3.6 – Heterogeneity of the effect of subway opening on AOD

Naha Almaty

.02 Suzhou Hangzhou

Lima Shenzhen Gwangju Bangalore

Valparaiso Kaohsiung Seattle Valencia Wuhan .015 Kunming Dalian Chongqing Xi'an Nanjing Maracaibo Bursa Santo Domingo Brescia Delhi Kazan San Juan Shenyang Standard Error Chengdu Dubai

Las Vegas Rennes

.01 10% significant Seville Palma 5% significant Porto Daejon Copenhagen Turin Lausanne Algiers 1% significant

-.15 -.1 -.05 0 .05 .1 Subway Effect

Notes: Illustration of 39 city specific subway effects. x axis is the estimated treatment effect, y is the standard error of the estimated treatment effect. Region in white contains estimates that are not significantly different from zero. Chapter 3. Subways and Urban Air Pollution 187

Table 3.1 – AOD in 43 new subway cities

World Africa Asia Europe North A. South A.

Subway openings since 2000 43 1 25 8 4 5 2014 Av. AOD (Aqua), 10km 0.42 0.20 0.56 0.18 0.26 0.31 Av. AOD (Terra), 10km 0.45 0.23 0.59 0.21 0.28 0.3 Av. # pixels (Aqua), 10km 109.04 242.21 98.68 170.03 51.49 75.06 Av. # pixels (Terra), 10km 123.64 255.94 114.41 183.55 56.84 96.26 Av. AOD (Aqua), 25km 0.37 0.17 0.51 0.15 0.19 0.23 Av. AOD (Terra), 25km 0.41 0.20 0.55 0.18 0.20 0.26 Av. # pixels (Aqua), 25km 989.77 2269.21 914.96 1436.29 672.48 637.03 Av. # pixels (Terra), 25km 1080.92 2418.59 1016.73 1522.35 568.57 820.31 2000 Av. AOD (Terra), 10km 0.43 0.27 0.54 0.28 0.34 0.29 Av. AOD (Terra), 25km 0.38 0.23 0.49 0.24 0.22 0.23 Chapter 3. Subways and Urban Air Pollution 188

Table 3.2 – The relationship between AOD and ground-based particulate measures

pm10 pm2.5 (1) (2) (3) (4) (5) (6)

Terra AOD 122.85∗∗∗ 99.62∗∗∗ 75.26∗∗∗ 61.58∗∗∗ 55.12∗∗∗ 42.53∗∗∗ (10.09) (9.33) (9.24) (8.18) (6.11) (6.10) Constant -1.26 59.37 123.66∗∗∗ -1.11 24.74 56.59∗∗ (2.99) (40.68) (38.28) (2.25) (25.18) (23.43) Mean dep. var. 46.17 46.17 46.54 19.62 19.62 19.66 Mean indep. var. 0.39 0.39 0.39 0.34 0.34 0.34 R-squared 0.54 0.75 0.81 0.51 0.73 0.81 Cities 140 140 138 85 85 84 N 316 316 311 227 227 225

Aqua AOD 124.49∗∗∗ 101.72∗∗∗ 76.15∗∗∗ 63.47∗∗∗ 56.74∗∗∗ 43.11∗∗∗ (10.52) (10.13) (10.08) (8.39) (6.05) (6.43) Constant 2.06 51.80 119.35∗∗∗ 0.10 16.42 50.45∗∗ (2.84) (43.41) (39.95) (2.10) (27.28) (25.43) Mean dep. var. 46.17 46.17 46.54 19.62 19.62 19.66 Mean indep. var. 0.35 0.35 0.36 0.31 0.31 0.31 R-squared 0.52 0.74 0.80 0.48 0.71 0.80 Cities 140 140 138 85 85 84 N 316 316 311 227 227 225

Note: Aerosol Optical Depth is the mean value in a 10km radius disk around the city center. Columns (1) and (4) have no control variables. Columns (2) and (5) add climate controls and continent dummies. Columns (3) and (6) add controls for city population and country GDP. Robust standard errors in parentheses. Stars denote significance levels: * 0.10 ** 0.05 *** 0.01. Chapter 3. Subways and Urban Air Pollution 189

Table 3.3 – Subway opening and AOD for the 18 month period post system opening

(1) (2) (3) (4) (5) (6) 1-18 months post -0.0205∗∗ -0.0209∗∗ -0.0201∗∗ -0.0210∗∗ -0.0213∗∗ -0.0223∗∗ (0.00931) (0.00946) (0.00883) (0.00895) (0.00815) (0.00865) [0.0266] [0.0266] [0.0200] [0.0133] [0.0133] [0.0200] city pop.& gdp No Yes No Yes No Yes city-level trends No No Yes Yes No No city-level pre/post trends No No No No Yes Yes

Mean dep. var. 0.457 0.457 0.457 0.457 0.457 0.457 R-squared 0.833 0.834 0.838 0.838 0.839 0.839 Number of events 39 39 39 39 39 39 N 12,169 12,169 12,169 12,169 12,169 12,169

Note: Dependent variable is monthly mean AOD in a 10km disk centered around the city center for a constant sample of cities. Standard errors clustered at the city level in parentheses. p-values on the coefficient of interest using the wild cluster bootstrap procedure in Cameron, Gelbach, and Miller (2008) are reported in square brackets. All regressions include the following controls; linear and quadratic climate controls (temperature, vapor, cloud cover, precipitation, frost days), AOD pixel count, satellite indicator, calendar month×city indicators, year×continent indicators, pre- and post-period indicators. Stars denote significance levels: * 0.10, ** 0.05, *** 0.01. Chapter 3. Subways and Urban Air Pollution 190

Table 3.4 – Longer term effects

(1) (2) (3) (4) (5) (6) Panel a. 1-12 months post -0.0203∗ -0.0208∗ -0.0186∗ -0.0185∗ -0.0222∗ -0.0209∗ (0.0102) (0.0103) (0.0108) (0.0108) (0.0116) (0.0109) 13-24 months post -0.0127 -0.0129 -0.0178∗∗ -0.0174∗ -0.0250∗∗∗ -0.0224∗∗ (0.0097) (0.0010) (0.0087) (0.0091) (0.0090) (0.00865) 25-36 months post -0.0271∗∗ -0.0260∗∗ -0.0366∗∗∗ -0.0324∗∗∗ (0.0103) (0.0098) (0.0107) (0.0095) 37-48 months post -0.0316∗ -0.0255∗ (0.0158) (0.0129) City pop. and gdp No Yes No Yes No Yes

Mean dep. var. 0.462 0.462 0.443 0.443 0.416 0.416 R-squared 0.833 0.834 0.831 0.832 0.829 0.830 Number of events 38 38 35 35 28 28 N 11,841 11,841 10,881 10,881 8,863 8,863 Panel b. average post-period -0.0167∗ -0.0171∗ -0.0209∗∗ -0.0204∗∗ -0.0273∗∗∗ -0.0243∗∗∗ (0.0087) (0.0089) (0.0081) (0.0079) (0.0089) (0.0072)

Note: Dependent variable is monthly mean AOD in a 10km disk centered around the city center for a constant sample of cities. Standard errors clustered at the city level in parentheses. All regressions include the following controls; linear and quadratic climate controls (temperature, vapor, cloud cover, precipitation, frost days), AOD pixel count, satellite indicator, calendar month × city indicators, year × continent indicators, pre- and post-period indicators. Stars denote significance levels: * 0.10, ** 0.05, *** 0.01. Chapter 3. Subways and Urban Air Pollution 191

Table 3.5 – Even longer horizon

(1) (2) (3) (4) (5) Panel a. 1-12 months post -0.0172 -0.0200∗ -0.0199∗ -0.0194∗ -0.0195∗ (0.0109) (0.0100) (0.0100) (0.0099) (0.0097) 13-24 months post -0.0218∗ -0.0120 -0.0119 -0.0108 -0.0110 (0.0107) (0.0097) (0.0010) (0.0099) (0.0099) 25-36 months post -0.0290∗∗∗ -0.0192∗ -0.0192∗ -0.0177∗ -0.0180 (0.0101) (0.00997) (0.0104) (0.0104) (0.0109) 37-48 months post -0.0243 -0.0177 -0.0176 -0.0158 -0.0162 (0.0155) (0.0137) (0.0142) (0.0142) (0.0146) 49-60 months post -0.0434∗∗ -0.0369∗∗∗ -0.0368∗∗ -0.0346∗∗ -0.0351∗∗ (0.0165) (0.0132) (0.0137) (0.0138) (0.0144) 61-72 months post -0.0237∗ -0.0215 -0.0220 (0.0138) (0.0145) (0.0155) 73-84 months post -0.0280 -0.0285 (0.0178) (0.0184) 85-96 months post -0.0150 (0.0210) Constant sample of cities Yes No No No No City pop. and gdp Yes Yes Yes Yes Yes

Mean dep. var. 0.392 0.457 0.457 0.457 0.457 R-squared 0.833 0.834 0.834 0.834 0.834 Number of events 26 39 39 39 39 N 8,312 12,169 12,169 12,169 12,169 Panel b. average post-period -0.0206∗∗ -0.0190∗∗ -0.0188∗∗ -0.0172∗∗ -0.0181∗∗ (0.0086) (0.0081) (0.0084) (0.0081) (0.00800)

Note: Dependent variable is monthly mean AOD in a 10km disk centered around the city center for a constant sample of cities. Standard errors clustered at the city level in parentheses. All regressions include the following controls; linear and quadratic climate controls (temperature, vapor, cloud cover, precipitation, frost days), AOD pixel count, satellite indicator, calendar month × city indicators, year × continent indicators, pre- and post-period indicators. Stars denote significance levels: * 0.10, ** 0.05, *** 0.01. Chapter 3. Subways and Urban Air Pollution 192

Table 3.6 – Spatial decay

0-10 km radius 10-25 km radius 25-50 km radius 50-150 km radius 1-18 months post -0.0205∗∗ -0.0209∗∗ -0.0167∗∗ -0.0173∗∗ -0.0115 -0.0121 -0.0045 -0.0049 (0.0093) (0.0095) (0.0078) (0.0081) (0.0078) (0.0082) (0.0069) (0.0071) City pop. and gdp No Yes No Yes No Yes No Yes

Mean dep. var. 0.457 0.457 0.400 0.400 0.366 0.366 0.295 0.295 R-squared 0.833 0.834 0.865 0.865 0.853 0.853 0.846 0.847 Number of events 39 39 39 39 39 39 39 39 N 12,169 12,169 12,169 12,169 12,535 12,535 11,897 11,897

Note: Dependent variable is monthly mean AOD at a given distance from the city center for a constant sample of cities. Standard errors clustered at the city level in parentheses. All regressions include the following controls; linear and quadratic climate controls (temperature, vapor, cloud cover, precipitation, frost days), AOD pixel count, satellite indicator, calendar month × city indicators, year × continent indicators, pre- and post-period indicators. Stars denote significance levels: * 0.10, ** 0.05, *** 0.01. Chapter 3. Subways and Urban Air Pollution 193

3.9 Appendix A: Ridership data

We gathered subway ridership data (unlinked trips) for 30 of the subway systems in our sample, mostly from annual reports or statistical agencies. In 13 cases we were either not able to find data on ridership at all, the data were not available from the opening date, or the ridership data was aggregated across cities or other rail systems. Data sources for each of the cities we were able to obtain usable data are detailed in Table 1. For ten of the cities we were able to obtain data, ridership was reported at the monthly level. For the other 20, quarterly or yearly data was available and we used linear interpolation to create a monthly level ridership dataset. Chapter 3. Subways and Urban Air Pollution 194

Table 3.7 – Ridership data sources City Source Almaty (Kazakhstan) International Metro Association Reports Bangalore (India) Bangalore Metro Operational Performance Brescia (Italy) Brescia Mobilitá Reports Copenhagen (Denmark) Statistics Denmark Daejon (South Korea) Daejon Metropolitan Corporation Delhi (India) Rail Corporation Annual Reports Dubai (UAE) Dubai Road and Transport Auth.: Annual Statistical Reports Gwangju (South Korea) Gwangju Subway Reports Hangzhou (China) Hangzhou Statistical Yearbook Istanbul (Turkey) Metro Istanbul Statistics (Only line M2) Kazan (Russia) International Metro Association Reports Kaohsiung (Taiwan) Corp. Transport Volume Statistics Las Vegas (USA) NTA National Transit Database and Webarchive lvmonorail Lausanne (Switzerland) Transports Lausanne Annual Reports Lima (Peru) Ministerio de Transportes y Comunicaciones Perú Mashhad () Corp.: Planning and Development Naha (Japan) Japan Ministry of Land, Infrastructure, Transport and Tourism Palma (Spain) Instituto Nacional de Estadística España Porto (Portugal) Statistics Portugal, Light Rail (Metro) Survey San Juan Puerto Rico (USA) Instituto de Estadísticas de Puerto Rico Santo Domingo (DR) Oficina para el Reordenamiento del Transporte Seattle (USA) Sound Transit (Only Central Link Line) Performance Reports Seville (Spain) Instituto Nacional de Estadística España Shenzhen (China) Shenzhen Municipal Transportation Commission Shenyang (China) Shenyang Statistical Information Net Suzhou, Jiangsu (China) Suzhou Statistical Yearbook Tehran (Iran) Research and Development Turin (Italy) Gruppo Torinese Transporti Reports Valparaiso (Chile) Memoria Anual Metro Valparaiso Xian, Shaanxi (China) Xian Bureau of Statistics We were not able to obtain ridership data from the time of opening for the following 13 cities in the sample: Algiers (Algeria), Brasilia (Brazil), Bursa (Turkey), Chengdu (China), Chongqing (China), Dalian (China), Izmir (Turkey), Kunming (China), Maracaibo (Venezuela), Nanjing (China), Rennes (France), Valencia (Venezuela), and Wuhan (China). Chapter 3. Subways and Urban Air Pollution 195

3.10 Appendix B: AOD data

The Moderate Resolution Imaging Spectroradiometers (MODIS) aboard the Terra and Aqua Earth observing satellites measure the ambient aerosol optical depth (AOD) of the atmo- sphere almost globally. We use MODIS Level-2 daily AOD products from Terra for February 2000-December 2014 and Aqua for July 2002-December 2014 to construct monthly average AOD levels in cities. We download all the files from the NASA File Transfer Protocol.20 There are four MODIS Aerosol data product files: MOD04_L2 and MOD04_3K, con- taining data collected from the Terra platform; and MYD04_L2 and MYD04_3K, contain- ing data collected from the Aqua platform. We use products MOD04_L2 and MYD04_L2 to get AOD measures at a spatial resolution (pixel size) of approximately 10 x 10 kilometers and products MOD04_3K and MYD04_3K to get AOD measures at a spatial resolution of approximately 3 x 3 kilometers. Each product file covers a five-minute time interval based on the start time of each MODIS granule. The product files are stored in Hierarchical Data Format (HDF) and we use the "Optical Depth Land And Ocean" layer, which is stored as a Scientific Data Set (SDS) within the HDF file, as our measure of aerosol optical depth. The "Optical Depth Land And Ocean" dataset contains only the AOD retrievals of high quality. We convert all HDF formatted granules to GIS compatible formats using the HDF-EOS To GeoTIFF Conversion Tool (HEG) provided by NASA’s Earth Observing System Program21. We consolidate every GeoTIFF granules into a global raster for each day using ArcGIS. First, we keep only AOD values that do contain information. The missing value is -9999 in AOD retrievals. Second, we create a raster catalog with all the granules for a given day and calculate the average AOD value using the Raster Catalog to Raster Dataset tool. Figure 3.7 provides more information about the coverage of the two satellites and the prevalence of missing data. The black dashed line in panel (a) of figure gives the count of

20ftp://ladsweb.nascom.nasa.gov/allData/6/ 21The most recent version of the software, HEG Stand-alone v2.13, can be downloaded at http://newsroom.gsfc.nasa.gov/sdptoolkit/HEG/HEGDownload.html Chapter 3. Subways and Urban Air Pollution 196 cities in our data for which we calculate an AOD from the Terra satellite reading for each month of our study period. These are cities for which there is at least one pixel within 10km of the center on one day during the relevant month. Since most of the cities in our data are in the Northern hemisphere, we see a strong seasonal pattern this series. The light gray line in this figure reports the corresponding quantity calculated from the Aqua satellite reading. Since Aqua became operational after Terra, the Aqua series begins later. The Aqua satellite data tracks the Terra data closely, but at a slightly lower level. Panel (b) of figure 3.7 reports city mean AOD data for all city-months in our sample over the course of our study period. As for the other series, this one too exhibits seasonality, although this will partly reflect a composition effect. As we see in panel (a) not all cities are in the data for all months. As in the first two panels, the dark line describes AOD readings from Terra and the light gray, Aqua. Finally, figure 3.8 shows the relationship between the ground based measurements and

MODIS AOD. Panel (a) of this figure plots ground based pm10 against Terra AOD in a 10km disk. That is, the raw data on which column 1 of table 3.2 is based. We see a strong positive relationship. Panel (b) shows a plot of the residuals of the regression in column 3 of table

3.2 against the residuals of a regression of 10km Terra pm10 on the control variables used in the same regression. Again, we see a strong positive slope. Note that the scales on the two graphs are not the same. The bottom two panels are the same as the top, but are based on ground based pm2.5 measures. Again we see a clear positive slope in both plots. Chapter 3. Subways and Urban Air Pollution 197

Figure 3.7 – modis Terra and Aqua AOD data 45 φ .6 6 = .5 40 .4 35 .3 Mean AOD, 10km disks # Cities with AOD .2 30 2000m1 2005m1 2010m1 2015m1 2000m1 2005m1 2010m1 2015m1

(a) (b)

Panel (a) gives count of subway cities months with AOD 10km measurements by month for Terra (dashed black) and Aqua (gray). Panel (b) shows mean AOD within 10km of the center of subway cities, averaged over cities, by month for Terra (dashed black) and Aqua (gray). Chapter 3. Subways and Urban Air Pollution 198

Figure 3.8 – Plots of ground-based pm10 and pm2.5 vs. MODIS AOD 300 150 100 200 50 pm10 pm10 100 0 -50 0 0 .5 1 1.5 -.4 -.2 0 .2 .4 Terra AOD 10km disk Residual Terra AOD 10km disk (a) (b) 150 40 20 100 0 pm2.5 pm2.5 50 -20 0 -40 0 .5 1 1.5 -.4 -.2 0 .2 .4 Terra AOD 10km disk Residual Terra AOD 10km disk (c) (d)

Note: Panel (a) Plot of ground-based pm10 against Terra MODIS AOD in a 10km disk. Panel (b) Plot of ground-based pm10 residual against Terra MODIS AOD in a 10km disk residual. Panel (c) Plot of ground-based pm2.5 against Terra MODIS AOD in a 10km disk. Panel (d) Plot of ground-based pm2.5 residual against Terra MODIS AOD in a 10km disk residual. NB: Scales not constant across graphs. Chapter 3. Subways and Urban Air Pollution 199

3.11 Appendix C: Global Burden of Disease based mortality

estimates

The integrated risk functions in Burnett et al. (2014) express the likelihood of dying from a disease at current pm2.5 exposure, relative to an environment where pm2.5 concentrations are set to a baseline harmless level of exposure. If Dd is the event of dying from disease d, the risk ratio (RR) of being exposed to pm2.5 concentration c is given by RRd(c, c¯) =

P (Dd | c)/P (Dd | c¯), where c¯ denotes the baseline harmless concentration. Burnett et al.

−γ(c−c¯)δ (2014) model RRd(c, c¯) to exhibit diminishing marginal risk: RR(c, c¯) = 1 + α(1 − e ) if c > c¯, and RR(c, c¯) = 1 otherwise, with c¯ assumed to lie uniformly between 5.8 and 8.8µg/m3. We refer the reader to Burnett et al. (2014) for details regarding the parametriza- tion and estimation of these functions for each disease. As described in the main text, we obtain RR functions for five diseases: ischemic heart disease, cerebrovascular disease (stroke), chronic obstructive pulmonary disease, lung cancer, and lower respiratory infection. For deaths attributable to stroke and ischemic heart disease, the integrated risk functions are age-specific. To construct population attributable fractions (PAF) for every disease and, when applicable, every age-group, we first predict pre and post-subway pm2.5 concentrations using the regression specification in column 5 of table 3.2. Specifically, we obtain predicted pm2.5 values from the annual city average of AOD (and all other covariates) during the 12 months preceding the subway opening. The post-subway pm2.5 concentrations are obtained by subtracting 0.02 × 55.12 = 1.102 µg/m3 to the pre- subway concentration, where 0.02 is the subway AOD effect from table 3.3, and 55.12 is the AOD coefficient in column 5 of table 3.2.

Let c0 and c1 respectively denote the pre and post-subway pm2.5 concentrations in a given city. For the purpose of our burden of disease calculations, the relevant risk ra- tio is RRd(c1, c0) = P (Dd | c1)/P (Dd | c0). Using the RRd(c, c¯) functions in Burnett et al.

(2014), we obtain this number by computing RRd(c1, c0) = RRd(c1, c¯)/RRd(c0, c¯). Here, Chapter 3. Subways and Urban Air Pollution 200

RRd(c1, c0) expresses how much less likely it is that individuals die of disease d when ex- posed to concentration c1, relative to concentration c0. Assuming that 100% of the city population is exposed to c0 and then c1, the population attributable fraction is then just

P AFd = 1 − RRd(c1, c0) = 1 − P (Dd | c1)/P (Dd | c0). Interpreting P (Dd | c) as the fraction of the total population that died of disease d when exposed to pm2.5 concentration c, we find that P AFd represents the fraction of total deaths from d that occurred because of incremental pollution c0 − c1. Finally, for each city i, we calculate the number of death attributable to disease d in age-group a (denoted Mida below). We use disease-specific country-level death rates from the World Health Organisation (WHO, 2016c) and apply them to city populations. Mortality data from the WHO is only available in 2000, 2005, 2010, and 2015. We use the year closest to a city’s subway opening year. The total number of avoided death in city i is given by P P d a P AFida · Mida. Chapter 3. Subways and Urban Air Pollution 201

Figure 3.9 – AOD during the 48 months before and after subway openings .1 .05 0 AOD -.05 -.1 -48 -36 -24 -12 0 12 24 36 48 months to opening (c) Event study Notes: Event study during 48 months before and after subway openings, constant sample of 21 cities. Chapter 3. Subways and Urban Air Pollution 202 Continued on next page – City level descriptive statistics Table 3.8 Plan Construction Opening Track Stations added Daily Mean SD City Country CityTehran (Iran)Izmir (Turkey)Istanbul (Turkey)Brasilia (Brazil) 1970 approvedRennes (France) 1992 Jan. begins 1977 1987Bursa (Turkey) Mar. Feb. 1995 2000 Sep. 10.2Copenhagen Aug. 1992 date (Denmark) 2000 1991 11.1 Sep. 2000 8Porto km (Portugal) 1989 Jan. 6.4 1992 10 opening 1995 1st Mar.Delhi exp. 2001 Jan. (India) 2nd 1997 1997 10 exp. 5 42.7 n.a. ridership Nov. Mar. 1996 AODDalian 2002 (China) AOD Jan. 15 Nov. 10.3 population 1998 2002 GDP n.a. 1 PC 1 20.2Naha Aug. 1996 (Japan) 2002 15 n.a. 19.1 290,740 13Gwangju Mar. (South 1999 Korea) 0.57 15 n.a. 17 n.a. Dec. 1995 0.18 2002Las n.a. Vegas (USA) 0.30 1999 6.2 64,360 n.a. n.a. 7,145 Jan. 0.09 1994 1998Wuhan 0.32 (China) n.a. Sep. Dec. 7 0.13 2000 1996 2002 n.a. 2,248 7,602 Aug. 1996Shenzhen 69,576 12.7 (China) May 2003 0.11 8,963 Apr. Nov. 0.19 2004 1996 52.4 11,264 0.07 4 10 10.4 1993 0.10 0.22 Aug. 11,221 2003 12 3,024 0.11 11.6 1,100 13 Aug. 2001 1999 0.27 3 5 15 283 8,445 Jul. 1992 0.09 2004 35,935 6 n.a. Dec. 2012 4.3 21,951 1,290 Dec. 22 n.a. 33,534 Sep. 1998 2004 n.a. 0.23 n.a. Dec. 28.1 7 2004 10,544 117,566 0.09 n.a. 34,639 41.0 0.88 25 1,266 319,033 0.46 0.35 n.a. 18 0.23 0.30 17,504 22,919 0.15 20 n.a. 0.59 1,400 69 0.27 2,325 304 29,078 n.a. 27,562 3,157 0.51 n.a. 34,085 0.20 204,944 5,473 0.85 1,565 0.27 1.01 50,048 0.31 8,409 7,110 6,400 6,249 Chapter 3. Subways and Urban Air Pollution 203 Continued on next page Table 3.8 – continued from previous page Plan Construction Opening Track Stations added Daily Mean SD City Country CitySan Juan Puerto Rico (USA)Kazan (Russia) 1992Nanjing (China) Jul. 1996Valparaiso Apr. (Chile) 2005 17.1 approvedTurin (Italy) 1989 16 beginsDaejon (South 1994 Aug. Korea) 1997 n.a. 1999 Aug.Valencia date (Venezuela) Dec. 2005 2000 9.4 n.a. May Aug. 2002Maracaibo 2005 km (Venezuela) 1996 27.3 Nov. opening 2005 24,000 5 1999 1stKaohsiung exp. 61.4 (Taiwan) Jan. 2nd 0.22 1994 16 exp. 1996 ridership Dec. 0.11 2000 AOD 1998 Mar. 20 n.a.Palma (Spain) 2006 AOD Nov. 1997 population Feb. 10.6 38 2,492 2006 GDP Mar. PC Nov.Lausanne 2004 n.a. 2006 n.a. (Switzerland) 9.5 1994 12 Nov. 3.9 50,799 2006 n.a. 46,388Santo Domingo n.a. 11 1.4 (DR) Jan. 2001 n.a. 0.24 3 2000Seville 0.16 Mar. 28,053 (Spain) 2008 2 n.a. 2004 n.a. 37.3 0.11 Feb. 2004 n.a. 1,121Seattle (USA) 0.03 2005 Aug. n.a. 0.83 Oct. 25,095 36 n.a. 2005 2008 15,304 0.31Dubai n.a. (UAE) 0.42 Jul. 843 5.5 Nov. 23,878 2008 2005 n.a. 0.25 n.a. 5,263 0.31 7.1 Jan. 2009 14 15,561 0.15 1999 1,449 11.3 n.a. 6,883 9 1,713 n.a. Aug. 29,209 16 1996 2005 105,618 0.26 0.58 Apr. 0.12 2009 34,143 0.41 n.a. Nov. 6 0.24 2003 2005 17.6 13 0.17 1,555 Jul. 62,638 1,510 2009 Mar. 21 1,945 2006 n.a. 14.6 0.21 n.a. 13,049 Sep. 36,785 0.07 2009 13,049 n.a. 4,348 40.7 44,946 8 334 0.35 0.19 n.a. 21 0.19 0.07 n.a. 51,771 46,922 2,542 16 381 n.a. 0.21 0.08 10,537 20,068 34,601 n.a. 0.17 695 123,980 0.08 0.52 33,882 0.24 3,052 1,748 49,281 58,319 Chapter 3. Subways and Urban Air Pollution 204 Continued on next page Table 3.8 – continued from previous page Plan Construction Opening Track Stations added Daily Mean SD City Country CityChengdu (China)Shenyang (China)Chongqing (China) 2000Lima (Peru) approved 1999 Dec.Xian, 2005 Shaanxi (China) begins 1998 Sep. Nov. 2010 2005Bangalore (India) 15.6 Sep. Jun. 2010 2007Mashhad date (Iran) 31.2 1994 16 Jul. 2011 1986Algiers 22 15.2 km Sep. (Algeria) 2006 20 opening Mar. 1st Sep. 2003 13Almaty 2010 exp. 2011 (Kazakhstan) 2nd 18 exp. 17.0 Jul. ridership n.a. 2011 Apr. AODSuzhou, 2007 Jiangsu AOD 10 19.0 (China) 1994 population 16 n.a. Oct. GDP 2011 PC Kunming 1988 (China) 16 1980 Dec. 5.6 279,395 1999 n.a. n.a. 2002 0.55 Oct.Hangzhou Mar. (China) Sep. 2011 1993 n.a. 1988 0.29 1.01 6 20.9 n.a. Nov. Dec. 2011 0.33 Dec. 2007Brescia 2011 (Italy) 5,770 n.a. 8.9 Apr. 193,324 21 8.0 2012 n.a. 6,420 2009 0.83 26.6 0.97 10,017 82,751 10 0.29 n.a. 0.24 7 10,017 n.a. 2005 May 0.76 24 2010 5,452 11,880 0.23 n.a. Jun. Mar. 18,433 n.a. 2012 2007 n.a. 38.9 9,251 0.40 10,636 1 10,759 Nov. 2000 2012 n.a. 75,330 0.17 45.9 n.a. 14 0.53 10,506 Jan. 8,908 n.a. 2004 0.17 31 16,944 Mar. n.a. 2013 126,335 0.32 2,785 4,698 14.3 0.87 n.a. 0.08 0.25 n.a. 0.27 15,876 17 0.12 1,454 n.a. 4,639 2,492 n.a. 305,314 22,010 11,187 0.85 13,193 0.23 n.a. 0.35 0.27 5,821 40,719 3,571 0.32 11,612 0.15 11,308 456 35,912 Chapter 3. Subways and Urban Air Pollution 205 Table 3.8 – continued from previous page Plan Construction Opening Track Stations added Daily Mean SD City Country CityAverageNote: Stations and daily ridership reported 12in months a after 10km opening. radius Mean circle and using SD Terra AOD satellite columns approved monthly report observations mean from and 2000-2014. standard deviation Metropolitan 1995 area values begins population in thousands. Jul. 2000 date Feb. 2007 19.2 km opening 14.4 1st exp. 2nd exp. ridership 15.7 AOD AOD population 10.8 GDP PC 97,514 0.46 0.18 3,719 20,011 Chapter 3. Subways and Urban Air Pollution 206

Table 3.9 – Subway opening and AOD by 6 month period, pre- and post-system opening

(1) (2) (3) (4) (5) (6) 1-6 months post -0.0134 -0.0140 -0.0139 -0.0145 -0.0144 -0.0149 (0.0134) (0.0136) (0.0135) (0.0136) (0.0130) (0.0133) 7-12 months post -0.0249∗ -0.0249∗ -0.0250∗ -0.0254∗ -0.0232∗ -0.0241∗ (0.0137) (0.0138) (0.0135) (0.0133) (0.0127) (0.0129) 13-18 months post -0.0226 -0.0226 -0.0225 -0.0231 -0.0189 -0.0201 (0.0157) (0.0162) (0.0159) (0.0159) (0.0155) (0.0159)

7-12 months pre 0.00668 0.00652 0.00569 0.00615 0.00892 0.00926 (0.0120) (0.0121) (0.0122) (0.0121) (0.0127) (0.0127) 13-18 months pre -0.00711 -0.00654 -0.00793 -0.00707 -0.00140 -0.000921 (0.0113) (0.0112) (0.0113) (0.0112) (0.0118) (0.0120) city pop.& gdp No Yes No Yes No Yes city-level trends No No Yes Yes No No city-level pre/post trends No No No No Yes Yes

Mean dep. var. 0.457 0.457 0.457 0.457 0.457 0.457 R-squared 0.833 0.834 0.838 0.838 0.839 0.839 Number of events 39 39 39 39 39 39 N 12,169 12,169 12,169 12,169 12,169 12,169

Note: Dependent variable is monthly mean AOD in a 10km disk centered around the city center for a constant sample of cities. Standard errors clustered at the city level in parentheses. All regressions include the following controls; linear and quadratic climate controls (temperature, vapor, cloud cover, precipitation, frost days), AOD pixel count, satellite indicator, calendar month × city indicators, year × continent indicators, pre- and post-period indicators. Stars denote significance levels: * 0.10, ** 0.05, *** 0.01. Chapter 3. Subways and Urban Air Pollution 207

Table 3.10 – Heterogenous effects

(1) (2) (3) (4) (5) (6) Big City Poor Rainy Big Subway Asia High AOD 1-18 months post -0.0231∗∗ -0.0227∗∗ -0.0204∗ -0.0204∗∗ -0.0145∗ -0.0124 (0.0099) (0.0095) (0.0102) (0.0095) (0.0078) (0.0077) Interaction 0.0046 0.0036 -0.0010 -0.0010 -0.0108 -0.0170 (0.0144) (0.0137) (0.0138) (0.0139) (0.0116) (0.0112) Mean dep. var. 0.457 0.457 0.457 0.457 0.457 0.457 R-squared 0.834 0.834 0.834 0.834 0.834 0.834 Number of events 39 39 39 39 39 39 N 12,169 12,169 12,169 12,169 12,169 12,169

Note: Dependent variable is monthly mean AOD in a 10km disk centered around the city center for a constant sample of cities. Standard errors clustered at the city level in parentheses. All regressions include the following controls; linear and quadratic climate controls (temperature, vapor, cloud cover, precipitation, frost days), AOD pixel count, satellite indicator, calendar month × city indicators, year × continent indicators, pre- and post-period indicators, city population and country GDP. Stars denote significance levels: * 0.10, ** 0.05, *** 0.01. Chapter 3. Subways and Urban Air Pollution 208

Table 3.11 – Placebo city AOD for 18 month period post system opening

(1) (2) (3) (4) (5) (6) 1-18 months post 0.0048 0.0066 0.0040 0.0058 -0.0039 -0.0014 (0.0103) (0.0112) (0.0095) (0.0104) (0.0088) (0.0092) city pop.& gdp No Yes No Yes No Yes city-level trends No No Yes Yes No No city-level pre/post trends No No No No Yes Yes Mean dep. var. 0.436 0.440 0.436 0.440 0.436 0.440 R-squared 0.817 0.819 0.822 0.822 0.823 0.824 Number of events 27 25 27 25 27 25 N 8,105 7,528 8,105 7,528 8,105 7,528

Note: Dependent variable is monthly mean AOD in a 10km disk centered around the city center. Standard errors clustered at the city level in parentheses. All regressions include the following controls; linear and quadratic climate controls (temperature, vapor, cloud cover, precipitation, frost days), AOD pixel count, satellite indicator, calendar month × city indicators, year × continent indicators, pre- and post-period indicators. Stars denote significance levels: * 0.10, ** 0.05, *** 0.01. Chapter 3. Subways and Urban Air Pollution 209

Table 3.12 – Robustness check using an expanded sample of cities and country by month fixed effects

(1) (2) (3) (4) (5) (6) 1-18 months post -0.0183∗ -0.0190∗ -0.0137 -0.0154∗ -0.0185∗∗ -0.0202∗∗ (0.00953) (0.00985) (0.00866) (0.00914) (0.00799) (0.00856) city pop.& gdp No Yes No Yes No Yes city-level trends No No Yes Yes No No city-level pre/post trends No No No No Yes Yes Mean dep. var. 0.431 0.433 0.431 0.433 0.431 0.433 R-squared 0.836 0.835 0.839 0.838 0.839 0.838 Number of events 39 39 39 39 39 39 N 198,381 193,867 198,381 193,867 198,381 193,867

Note: Dependent variable is monthly mean AOD in a 10km disk centered around the city center for a constant sample of cities. Standard errors clustered at the city level in parentheses. All regressions include the following controls; linear and quadratic climate controls (temperature, vapor, cloud cover, precipitation, frost days), AOD pixel count, satellite indicator, country×month indicators, calendar month×city indicators, pre- and post-period indicators. Stars denote significance levels: * 0.10, ** 0.05, *** 0.01. Chapter 3. Subways and Urban Air Pollution 210

Table 3.13 – Robustness check excluding observations with low pixel count

(1) (2) (3) (4) (5) (6) 1-18 months post -0.0199∗∗ -0.0205∗∗ -0.0196∗∗ -0.0208∗∗ -0.0173∗ -0.0185∗ (0.00964) (0.00979) (0.00922) (0.00966) (0.00963) (0.0102) city pop.& gdp No Yes No Yes No Yes city-level trends No No Yes Yes No No city-level pre/post trends No No No No Yes Yes

Mean dep. var. 0.442 0.442 0.442 0.442 0.442 0.442 R-squared 0.866 0.867 0.871 0.871 0.872 0.873 Number of events 39 39 39 39 39 39 N 10,957 10,957 10,957 10,957 10,957 10,957

Note: Dependent variable is monthly mean AOD in a 10km disk centered around the city center for a constant sample of cities. Standard errors clustered at the city level in parentheses. The sample excludes observations with a number of pixel-days used to calculate AOD that falls below the tenth percentile of the distribution for a given satellite. All regressions include the following controls; linear and quadratic climate controls (temperature, vapor, cloud cover, precipitation, frost days), AOD pixel count, satellite indicator, calendar month×city indicators, year×continent indicators, pre- and post-period indicators. Stars denote significance levels: * 0.10, ** 0.05, *** 0.01. Chapter 3. Subways and Urban Air Pollution 211

Table 3.14 – Expansions

Main Cities All Cities (1) (2) (3) (4) Expansions 2nd ≥2nd 2nd ≥2nd 1-18 months post -0.0147 0.0008 -0.0113 -0.0016 (0.0141) (0.0085) (0.0086) (0.0033) Mean dep. var. 0.613 0.633 0.516 0.488 Mean expansion size (stations) 17.7 15.3 14.4 11.0 R-squared 0.778 0.807 0.816 0.816 Number of events 14 21 26 104 N 4,375 6,678 8,110 31,323

Note: Dependent variable is monthly mean AOD in a 10km disk centered around the city center. Standard errors clustered at the city level in parentheses. All regressions include the following controls; linear and quadratic climate controls (temperature, vapor, cloud cover, precipitation, frost days), AOD pixel count, satellite indicator, calendar month × city indicators, year × continent indicators, pre- and post-period indicators, city population and country GDP. Stars denote significance levels: * 0.10, ** 0.05, *** 0.01. Chapter 3. Subways and Urban Air Pollution 212

Table 3.15 – Results on ridership per capita

(1) (2) (3) Event 1st 2nd ≥2nd 1-18 months post 62.42∗ 44.62∗∗∗ 29.03∗ (34.19) (8.092) (13.80) Mean dep. var. 32.63 37.14 37.50 R-squared 0.488 0.942 0.937 Number of events 28 8 15 N 4,788 1,282 2,112

Note: Dependent variable is ridership per 1000 of population. Standard errors clustered at the city level in parentheses. All regressions include the following controls; linear and quadratic climate controls (temperature, vapor, cloud cover, precipitation, frost days), calendar month × city indicators, year × continent indicators, pre- and post-period indicators, city population and country GDP. Stars denote significance levels: * 0.10, ** 0.05, *** 0.01. Bibliography

Abowd, J. M., R. H. Creecy, and F. Kramarz (2002): “Computing Person and Firm Effects Using Linked Longitudinal Employer-Employee Data,” Longitudinal Employer- Household Dynamics Technical Papers 2002-06, Center for Economic Studies, U.S. Census Bureau.

Abowd, J. M., F. Kramarz, and D. N. Margolis (1999): “High Wage Workers and High Wage Firms,” Econometrica, 67, 251–334.

Agarwal, R., R. Echambadi, A. Franco, and M. Sarkar (2004): “Knowledge Trans- fer Through Inheritance: Spin-Out Generation, Development, and Survival,” Academy of Management, 47, 501–522.

Akbar, P. and G. Duranton (2017): “Measuring the cost of congestion in a highly congested city: Bogota,” Processed, University of Pennsylvania.

Anderson, M. L. (2014): “Subways, strikes, and slowdowns: The impacts of public transit on traffic congestion,” The American Economic Review, 104, 2763–2796.

Anderson, N. B. and A. D. Pape (2010): “A Reassessment of the 1970s Property Tax Revolt,” Tech. Rep. Working Paper, University of Illinois at Chicago.

Andrews, D. W. (1993): “Tests for parameter instability and structural change with un- known change point,” Econometrica: Journal of the Econometric Society, 821–856.

213 BIBLIOGRAPHY 214

——— (2003): “Tests for parameter instability and structural change with unknown change point: A corrigendum,” Econometrica, 395–397.

Andrews, M. J., L. Gill, T. Schank, and R. Upward (2008): “High wage workers and low wage firms: negative assortative matching or limited mobility bias?” Journal of the Royal Statistical Society: Series A (Statistics in Society), 171, 673–697.

Angrist, J., V. Lavy, and A. Schlosser (2010): “Multiple Experiments for the Causal Link between the Quantity and Quality of Children,” Journal of Labor Economics, 28, 773–824.

Arceo, E., R. Hanna, and P. Oliva (2016): “Does the Effect of Pollution on Infant Mortality Differ Between Developing and Developed Countries? Evidence from Mexico City,” The Economic Journal, 126, 257–280.

Arcidiacono, P. and J. B. Jones (2003): “Finite Mixture Distributions, Sequential Like- lihood and the EM Algorithm,” Econometrica, 71, 933–946.

Arcidiacono, P. and R. A. Miller (2011): “Conditional Choice Probability Estimation of Dynamic Discrete Choice Models With Unobserved Heterogeneity,” Econometrica, 79, 1823–1867.

Arrow, K. (1962): “Economic Welfare and the Allocation of Resources for Invention,” in The Rate and Direction of Inventive Activity: Economic and Social Factors, National Bureau of Economic Research, Inc, NBER Chapters, 609–626.

Autor, D. H., D. Dorn, and G. H. Hanson (2013): “The China Syndrome: Local Labor Market Effects of Import Competition in the United States,” American Economic Review, 103, 2121–68.

Baum-Snow, N. (2007): “Did Highways Cause Suburbanization?” Quarterly Journal of Economics, 122, 775–805. BIBLIOGRAPHY 215

Baum-Snow, N. and M. E. Kahn (2005): “Effects of Urban Rail Transit Expansions: Evidence from Sixteen Cities, 1970-2000,” Brookings-Wharton Papers on Urban Affairs: 2005, 1, 147–197.

Bender, S., N. Bloom, D. Card, J. V. Reenen, and S. Wolter (2016): “Management Practices, Workforce Selection and Productivity,” NBER Working Papers 22101, National Bureau of Economic Research, Inc.

Bernhofen, D., Z. El-Sahli, and R. Kneller (2016): “Estimating the Effects of the Container Revolution on World Trade,” Journal of International Economics, 98, 36–50.

Billings, S. B. (2011): “Estimating the value of a new transit option,” Regional Science and Urban Economics, 41, 525–536.

Bonhomme, S., T. Lamadon, and E. Manresa (2015): “A Distributional Framework for Matched Employer Employee Data,” 2015 Meeting Papers 1399, Society for Economic Dynamics.

Brauer, M., G. Freedman, J. Frostad, A. Van Donkelaar, R. V. Martin, F. Den- tener, R. v. Dingenen, K. Estep, H. Amini, and J. S. Apte (2015): “Ambient air pollution exposure estimation for the global burden of disease 2013,” Environmental Sci- ence & Technology, 50, 79–88.

Bridgman, B. (2014): “Why Containerization Did Not Reduce Ocean Trade Shipping Costs,” Tech. rep., working Paper.

Broeze, F. (2002): The Globalisation of the Oceans: Containerisation From the 1950s to the Present, vol. 23 of Research in Maritime History, St. John’s, Newfoundland: International Maritime Economic History Association.

Burnett, R. T. et al. (2014): “An Integrated Risk Function for Estimating the Global Burden of Disease Attributable to Ambient Fine Particulate Matter Exposure,” Environ- mental Health Perspectives, 122, 397. BIBLIOGRAPHY 216

Cameron, A. C., J. B. Gelbach, and D. L. Miller (2008): “Bootstrap-based improve- ments for inference with clustered errors,” The Review of Economics and Statistics, 90, 414–427.

Campante, F. and D. Yanagizawa-Drott (2017): “Long-Range Growth: Economic Development in the Global Network of Air Links*,” The Quarterly Journal of Economics, qjx050.

Card, D., A. R. Cardoso, J. Heining, and P. Kline (2016a): “Firms and Labor Market Inequality: Evidence and Some Theory,” NBER Working Papers 22850, National Bureau of Economic Research, Inc.

Card, D., A. R. Cardoso, and P. Kline (2016b): “Bargaining, Sorting, and the Gender Wage Gap: Quantifying the Impact of Firms on the Relative Pay of Women,” Quarterly Journal of Economics, 131, 633–686.

Card, D., J. Heining, and P. Kline (2013): “Workplace Heterogeneity and the Rise of West German Wage Inequality,” Quarterly Journal of Economics, 128, 967–1015.

Chang, T., J. G. Zivin, T. Gross, and M. Neidell (2016): “Particulate Pollution and the Productivity of Pear Packers,” American Economic Journal: Economic Policy, 8, 141–69.

Chatterji, A. K. (2009): “Spawned with a silver spoon? Entrepreneurial performance and innovation in the medical device industry,” Strategic Management Journal, 30, 185–206.

Chatterji, A. K., R. J. P. de Figueiredo Jr., and E. Rawley (2016): “Learning on the Job? Employee Mobility in the Asset Management Industry,” Management Science, 62, 2804–2819.

Chay, K. and M. Greenstone (2003): “The impact of air pollution on infant mortality,” Quarterly Journal of Economics, 118, 1121–1167. BIBLIOGRAPHY 217

Chen, Y. and A. Whalley (2012): “Green Infrastructure: The Effects of Urban Rail Transit on Air Quality,” American Economic Journal, 4, 58–97.

Chinitz, B. (1960): Freight and the Metropolis: The Impact of America’s Transport Revo- lutions on the New York Region, Havard University Press.

Coşar, K. and B. Demir (2018): “Shipping Inside the Box: Containerization and Trade,” Tech. Rep. Discussion Paper No. DP11750, CEPR.

Coletta, P. E. (1985): “Preface,” in United States Navy and Marine Corps Bases, Domes- tic, ed. by P. E. Coletta and J. K. Bauer, Greenwood Press.

Couture, V., G. Duranton, and M. A. Turner (2016): “Speed,” Processed, Brown University.

Currie, J. and M. Neidell (2005): “Air pollution and infant health: What can we learn from California’s recent experience?” Quarterly Journal of Economics, 120, 1003–1030.

Decker, R., J. Haltiwanger, R. Jarmin, and J. Miranda (2014): “The Role of Entrepreneurship in US Job Creation and Economic Dynamism,” Journal of Economic Perspectives, 28, 3–24.

DESA Population Division, U. (2014): “World Urbanization Prospects: the 2014 Revi- sion, CD-ROM Edition,” .

Dillon, E. W. and C. T. Stanton (2017): “Self-Employment Dynamics and the Returns to Entrepreneurship,” Working Paper 23168, National Bureau of Economic Research.

Donaldson, D. and R. Hornbeck (2016): “Railroads and American Economic Growth: A Market Access Approach,” The Quarterly Journal of Economics, 131, 799–858.

Duranton, G., P. M. Morrow, and M. A. Turner (2014): “Roads and Trade: Evidence from the US,” Review of Economic Studies, 81, 681–724. BIBLIOGRAPHY 218

Duranton, G. and M. Turner (2012): “Urban Growth and Transportation,” Review of Economic Studies, 79, 1407–1440.

Duranton, G. and M. A. Turner (2011): “The fundamental law of road congestion: Evidence from US cities,” American Economic Review, 101, 2616–2652.

——— (2017): “Urban form and driving,” Processed, Brown University.

Evans, D. S. and L. S. Leighton (1989): “Some Empirical Aspects of Entrepreneurship,” American Economic Review, 79, 519–35.

Feenstra, R., R. Inklaar, and M. Timmer (2015): “The Next Generation of the Penn World Table,” American Economic Review, 105, 3150–3182, available for download at www.ggdc.net/pwt Accessed: 2016-09-06.

Feyrer, J. (2009a): “Distance, Trade, and Income - The 1967 to 1975 Closing of the Suez Canal as a Natural Experiment,” NBER Working Papers 15557, National Bureau of Economic Research, Inc.

——— (2009b): “Trade and Income – Exploiting Time Series in Geography,” NBER Working Papers 14910, National Bureau of Economic Research, Inc.

Fort, T. C., J. Haltiwanger, R. S. Jarmin, and J. Miranda (2013): “How Firms Respond to Business Cycles: The Role of Firm Age and Firm Size,” Working Paper 19134, National Bureau of Economic Research.

Franco, A. and D. Filson (2006): “Spin-Outs: Knowledge Diffusion Through Employee Mobility,” RAND Journal of Economics, 37, 841–860.

Friedman, M. S., K. E. Powell, L. Hutwagner, L. M. Graham, and W. G. Teague (2001): “Impact of changes in transportation and commuting behaviors during the 1996 BIBLIOGRAPHY 219

Summer Olympic Games in Atlanta on air quality and childhood asthma,” JAMA, 285, 897–905.

Garcia-Lopez, M.-A., I. Pasidis, and E. Viladecans-Marsal (2017): “Highway con- sgestion and air pollution in Europe’s cities,” Working paper.

Gertler, M. and S. Gilchrist (1994): “Monetary Policy, Business Cycles, and the Be- havior of Small Manufacturing Firms,” The Quarterly Journal of Economics, 109, 309–340.

Gibbons, S. and S. Machin (2005): “Valuing Rail Access Using Transport Innovations,” Journal of Urban Economics, 57, 148–1698.

Gibson, M. and M. Carnovale (2015): “The effects of road pricing on driver behavior and air pollution,” Journal of Urban Economics, 89, 62–73.

Goetz, C., H. Hyatt, E. McEntarfer, and K. Sandusky (2016): “The Promise and Potential of Linked Employer-Employee Data for Entrepreneurship Research,” in Measur- ing Entrepreneurial Businesses: Current Knowledge and Challenges, National Bureau of Economic Research, Inc, 433–462.

Gompers, P., A. Kovner, J. Lerner, and D. Scharfstein (2010): “Performance persistence in entrepreneurship,” Journal of Financial Economics, 96, 18–32.

Gompers, P., J. Lerner, and D. Scharfstein (2005): “Entrepreneurial Spawning: Public Corporations and the Genesis of New Ventures, 1986 to 1999,” Journal of Finance, 60, 577–614.

Gonzalez-Navarro, M. and M. A. Turner (2016): “Subways and Urban Growth: Ev- idence from Earth,” Processed, Brown University.

Gupta, P., S. A. Christopher, J. Wang, R. Gehrig, Y. Lee, and N. Kumar (2006): “Satellite remote sensing of particulate matter and air quality assessment over global cities,” Atmospheric Environment, 40, 5880–5892. BIBLIOGRAPHY 220

Gutierrez, E. (2010): “Using satellite imagery to measure the relationship between air quality and infant mortality: an empirical study for Mexico,” Population and Environment, 31, 203–222.

Hall, R. E. and S. E. Woodward (2010): “The Burden of the Nondiversifiable Risk of Entrepreneurship,” American Economic Review, 100, 1163–1194.

Haltiwanger, J., R. Jarmin, and J. Miranda (2013): “Who Creates Jobs? Small versus Large versus Young,” Review of Economics and Statistics, 95, 347–361.

Haltiwanger, J., R. S. Jarmin, R. Kulick, and J. Miranda (2016): High Growth Young Firms: Contribution to Job, Output, and Productivity Growth, University of Chicago Press, 11–62.

Hamilton, B. H. (2000): “Does Entrepreneurship Pay? An Empirical Analysis of the Returns to Self-Employment,” Journal of Political Economy, 108, 604–631.

Hansen, B. E. (2000): “Testing for structural change in conditional models,” Journal of Econometrics, 97, 93–115.

Harris, I., P. Jones, T. Osborn, and D. Lister (2014): “Updated high- resolution grids of monthly climatic observations - the CRU TS3.10 Dataset,” International Journal of Climatology, 34(3), 623-642, available for download at https://crudata.uea.ac.uk/cru/data/hrg/ Accessed: 2015-11-20.

Heckman, J. and B. Singer (1984): “A Method for Minimizing the Impact of Distribu- tional Assumptions in Econometric Models for Duration Data,” Econometrica, 52, 271–320.

Helpman, E. (1995): “The Size of Regions,” Papers 14-95, Tel Aviv.

Hincapié, A. (2017): “Where are the Young Entrepreneurs? A Study of Entrepreneurship over the Life Cycle,” Working paper. BIBLIOGRAPHY 221

Holmes, T. J. and E. Singer (2017): “Indivisibilities in Distribution,” Tech. Rep. Working Paper 739, Federal Reserve Bank of Minneapolis.

Horn, K., E. Wilson, K. Knight, and S. Durden (2010): “National Economic De- velopment (NED) Manual For Deep Draft Navigation,” Tech. rep., U. S. Army Engineer Institute for Water Resources.

Hotz, V. J. and R. A. Miller (1993): “Conditional Choice Probabilities and the Estima- tion of Dynamic Models,” Review of Economic Studies, 60, 497–529.

Hsu, W.-T. and H. Zhang (2014): “The Fundamental Law of Highway Congestion: Evi- dence from National Expressways in Japan,” Journal of Urban Economics, 81, 65–76.

Hummels, D. (2007): “Transportation Costs and International Trade in the Second Era of Globalization,” Journal of Economic Perspectives, 21, 131–154.

Hummels, D. and G. Schaur (2013): “Time as a Trade Barrier,” American Economic Review, 103, 2935–2959.

Humphries, J. E. (2017): “The Causes and Consequences of Self-Employment over the Life Cycle,” Working paper.

Hurst, E. and B. W. Pugsley (2011): “What Do Small Businesses Do?” NBER Working Papers 17041, National Bureau of Economic Research, Inc.

Hurst, E. G. and B. W. Pugsley (2015): “Wealth, Tastes, and Entrepreneurial Choice,” in Measuring Entrepreneurial Businesses: Current Knowledge and Challenges, National Bureau of Economic Research, Inc, NBER Chapters.

Iyigun, M. F. and A. L. Owen (1998): “Risk, Entrepreneurship, and Human-Capital Accumulation,” The American Economic Review, 88, 454–457.

Jacob, D. (1999): Introduction to atmospheric chemistry, Princeton University Press. BIBLIOGRAPHY 222

Jayachandran, S. (2009): “Air quality and early-life mortality evidence from Indonesia’s wildfires,” Journal of Human resources, 44, 916–954.

Kasahara, H. and K. Shimotsu (2009): “Nonparametric Identification of Finite Mixture Models of Dynamic Discrete Choices,” Econometrica, 77, 135–175.

Keane, M. P. and K. I. Wolpin (1997): “The Career Decisions of Young Men,” Journal of Political Economy, 105, 473–522.

Kendall, L. (1986): The Business of Shipping, Cornell Maritime Press, fifth ed.

Kihlstrom, R. E. and J.-J. Laffont (1979): “A General Equilibrium Entrepreneurial Theory of Firm Formation Based on Risk Aversion,” Journal of Political Economy, 87, 719–48.

Knittel, C. R., D. L. Miller, and N. J. Sanders (2016): “Caution, drivers! Children present: Traffic, pollution, and infant health,” Review of Economics and Statistics, 98, 350–366.

Kovak, B. K. (2013): “Regional Effects of Trade Reform: What Is the Correct Measure of Liberalization?” American Economic Review, 103, 1960–1976.

Kuby, M. and N. Reid (1992): “Technological Change and the Concentration of the U.S. General Cargo Port System: 1970-88,” Economic Geography, 68, 272–289.

Kumar, N., A. Chu, and A. Foster (2007): “An empirical relationship between PM2.5 and aerosol optical depth in Delhi Metropolitan,” Atmospheric Environment, 41, 4492– 4503.

Kumar, N., A. D. Chu, A. D. Foster, T. Peters, and R. Willis (2011): “Satellite Remote Sensing for Developing Time and Space Resolved Estimates of Ambient Particulate in Cleveland, OH,” Aerosol Science and Technology, 45, 1090–1108. BIBLIOGRAPHY 223

Lafontaine, F. and K. Shaw (2016): “Serial Entrepreneurship: Learning by Doing?” Journal of Labor Economics, 34, 217–254.

Lazear, E. P. (2005): “Entrepreneurship,” Journal of Labor Economics, 23, 649–680.

Levine, R. and Y. Rubinstein (forthcoming): “Smart and Illicit: Who Becomes an En- trepreneur and Do They Earn More?” Quarterly Journal of Economics.

Levinson, M. (2008): The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger, Princeton, NJ: Princeton University Press.

Levy, R. and C. Hsu et al. (2015a): “MODIS Aqua L2 Aerosol Product, NASA MODIS Adaptive Processing System, Goddard Space Flight Center, USA,” http://dx.doi.org/10.5067/MODIS/MYD04_L2.006, accessed: 2016-01-15.

——— (2015b): “MODIS Terra L2 Aerosol Product, NASA MODIS Adaptive Processing System, Goddard Space Flight Center, USA,” http://dx.doi.org/10.5067/MODIS/MOD04_L2.006, accessed: 2016-01-15.

Levy, R. C., S. Mattoo, L. A. Munchak, L. A. Remer, A. M. Sayer, F. Patadia, and N. C. Hsu (2013): “The Collection 6 MODIS aerosol products over land and ocean,” Atmospheric Measurement Techniques, 6, 2989–3034.

Magnac, T. and D. Thesmar (2002): “Identifying Dynamic Discrete Decision Processes,” Econometrica, 70, 801–816.

Mendershausen, H. (1950): “Dollar Shortage and Oil Surplus in 1949-1950,” Tech. Rep. No. 11, International Finance Section, Department of Economics and Social Institutions, Princetion Univeristy.

Michaels, G. (2008): “The effect of trade on the demand for skill: Evidence from the Interstate Highway System,” Review of Economics and Statistics, 09, 683–701. BIBLIOGRAPHY 224

Moretti, E. (2004): “Human Capital Externalities in Cities,” in Handbook of Regional and Urban Economics, ed. by V. Henderson and J. Thisse, Elsevier.

——— (2011): “Local Labor Markets,” in Handbook of Labor Economics, ed. by D. Card and O. Ashenfelter, Elsevier, vol. 4, 1237 – 1313.

Morgan, F. W. (1952): Ports and Harbours, London, UK: Hutchison’s University Library.

Mortensen, D. T. (2005): Wage Dispersion: Why Are Similar Workers Paid Differently?, vol. 1 of MIT Press Books, The MIT Press.

Murray, C. J. L. (2016): “Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990-2015,” Lancet, 388, 1659–1724.

National Geospatial-Intelligence Agency (1953): “World Port Index, 1953,” H.O. Pub. No. 950, First Edition.

——— (2015): “World Port Index, 2015,” Twenty-Fourth Edition.

Pascali, L. (2017): “The Wind of Change: Maritime Technology, Trade, and Economic Development,” American Economic Review, 107, 2821–54.

Rappaport, J. and J. D. Sachs (2003): “The United States as a Coastal Nation,” Journal of Economic Growth, 8, 5–46.

Redding, S. and D. M. Sturm (2008): “The Costs of Remoteness: Evidence from German Division and Reunification,” American Economic Review, 98, 1766–1797.

Redding, S. J. and E. Rossi-Hansberg (2017): “Quantitative Spatial Economics,” An- nual Review of Economics, 9, 21–58.

Redding, S. J. and M. A. Turner (2015): “Transportation Costs and the Spatial Or- ganization of Economic Activity,” in Handbook of Regional and Urban Economics, ed. by G. Duranton and W. Strange, Elsevier B. V. BIBLIOGRAPHY 225

Remer, L. A., S. M. R. C. Levy, and L. A. Munchak (2013): “MODIS 3km aerosol product: algorithm and global perspective,” Atmospheric Measurement Techniques, 6, 1829–1844.

Rodrigue, J.-P. (2017): The Geography of Transport Systems, Hofstra University, Depart- ment of Global Studies and Geography.

Romer, D. H. and J. A. Frankel (1999): “Does Trade Cause Growth?” American Economic Review, 89, 379–399.

Rosen, S. (1972): “Learning and Experience in the Labor Market,” The Journal of Human Resources, 7, 326–342.

Roy, A. D. (1951): “Some Thoughts on the Distribution of Earnings,” Oxford Economic Papers, 3, 135–146.

Rua, G. (2014): “Diffusion of Containerization,” Finance and Economics Discussion Se- ries 88, Board of Governors of the Federal Reserve System (U.S.).

Rust, J. (1994): “Chapter 51 Structural estimation of markov decision processes,” Handbook of Econometrics, 4, 3081 – 3143.

Sanderson, E. and F. Windmeijer (2016): “A weak instrument F-test in linear IV models with multiple endogenous variables,” Journal of Econometrics, 190, 212–221.

Sargent, A. J. (1938): Seaports and Hinterlands, London, UK: Adam and Charles Black.

Schoar, A. (2010): The Divide between Subsistence and Transformational Entrepreneurship, University of Chicago Press, 57–81.

Scott, P. T. (2013): “Dynamic Discrete Choice Estimation of Agricultural Land Use,” Working paper. BIBLIOGRAPHY 226

Serafinelli, M. (2015): “Good Firms, Worker Flows and Local Productivity,” Working Papers tecipa-538, University of Toronto, Department of Economics.

Song, J., D. J. Price, F. Guvenen, N. Bloom, and T. von Wachter (2015): “Firming Up Inequality,” NBER Working Papers 21199, National Bureau of Economic Research, Inc.

Syverson, C. (2011): “What Determines Productivity?” Journal of Economic Literature, 49, 326–365.

Topalova, P. (2010): “Factor Immobility and Regional Impacts of Trade Liberalization: Evidence on Poverty from India,” American Economic Journal: Applied Economics, 2, 1–41.

Traiberman, S. (2016): “Occupations and Import Competition: Evidence from Denmark,” Working paper.

United Nations Conference on Trade and Development (2013): Review of Mar- itime Transport, New York, NY: United Nations.

U.S. Department of the Navy (1952): “Catalog of Naval Shore Activities,” Tech. Rep. OPNAV P213-105.

——— (1959): “Catalog of Naval Shore Activities,” Tech. Rep. OPNAV P09B23-105.

Viscusi, W. K. and J. E. Aldy (2003): “The value of a statistical life: a critical review of market estimates throughout the world,” Journal of risk and uncertainty, 27, 5–76.

Voith, R. (1997): “Fares, Service Levels, and Demographics: What Determines Commuter Rail Ridership in the Long Run?” Journal of Urban Economics, 41, 176–197.

Vollmer Associates, LLP, SYSTRA Consulting, Inc., and Allee King Rosen & Fleming, Inc. (2011): “Manhattan East Side Transit Alternatives (MESA)/Second Avenue Subway Summary Report,” MTA New York City Transit. BIBLIOGRAPHY 227

Wilson, R. P. (1982): “The Containerization of Ports,” UC Santa Cruz Working Paper.

World Bank Group and United Nations Industrial Development Organiza- tion (1999): Pollution prevention and abatement handbook, 1998: toward cleaner produc- tion, World Bank Publications.

World Health Organization (2006): “WHO Air Quality Guidelines,” Geneva: World Health Organization.

——— (2016a): “Global Urban Ambient Air Pollution Database,” http://www.who.int/phe/healt_topics/outdoorair/databases/cities/en/, accessed: 2017- 04-02.

——— (2016b): “Ambient Air Pollution: A Global Assesment of Exposure and Burden of Disease,” .

——— (2016c): “Global Health Estimates 2015: Deaths by Cause, Age, Sex, by Country and by Region, 2000-2015,” .

Zarutskie, R. and T. Yang (2016): “How Did Young Firms Fare during the Great Recession? Evidence from the Kauffman Firm Survey,” in Measuring Entrepreneurial Businesses: Current Knowledge and Challenges, National Bureau of Economic Research, Inc, NBER Chapters, 253–290.