Essays on Gender Differences in Training, Incentives and Creativity, Survey Response, and Competitive Balance and Sorting in Football

Inaugural-Dissertation zur Erlangung der Doktorwürde der Wirtschafts- und Verhaltenswissenschaftlichen Fakultät der Albert-Ludwigs-Universität Freiburg i. Br.

vorgelegt von

Arne Jonas Warnke

geboren in Kiel

WS 2016/17 Dekan: Prof. Dr. Alexander Renkl

Wirtschafts- und Verhaltenswissenschaftliche Fakultät

Erstgutachter: Prof. Bernd Fitzenberger, Ph.D. Zweitgutachter: Prof. Dr. Stephan Lengsfeld

Promotionsbeschluss: 10. Mai 2017 Acknowledgements

I am indebted to many people who have helped and encouraged me in completion of this dis- sertation. First of all, I want to thank Bernd Fitzenberger for his supervision and his continuous support during the last years. I greatly benefited from his constant advice and his helpful ideas. His ex- cellent lectures about labor economics and micro-econometrics gave me a profound foundation for this dissertation. I also want to thank my second supervisor Stephan Lengsfeld and Lars Feld as a third dissertation examiner. In the course of this dissertation, I was very fortunate to work together with Christiane Bradler, Susanne Neckermann, Roman Sittl and, in particular, Susanne Steffes. These great collabora- tions were always very professional and instructive, yet at the same time enjoyable and pleasant. I also thank Francesco Berlingieri, Martin Kiefel, François Laisney and Thomas Zwick for many discussions about (personnel) economics, econometrics and statistics. I owe thanks to many colleagues at the Centre for European Economic Research (ZEW) for their support and encouragement. Finally, my biggest thanks go to my family. Rebecca always supported me during and well beyond the preparation of this PhD thesis. My parents and my family have imparted to me the desire for curiosity and learning. Contents

1 General Introduction 3 1.1 Summaries of the Chapters ...... 6

2 Incentivizing Creativity 12 2.1 Introduction ...... 14 2.2 The Experiment ...... 18 2.3 Main Results ...... 25 2.4 Supplementary Investigations ...... 30 2.5 Conclusion ...... 36 2.6 Appendix ...... 39

3 Competitive Balance and Assortative Matching 58 3.1 Introduction and Literature ...... 60 3.2 Background ...... 61 3.3 Data and Empirical Framework ...... 66 3.4 Empirical Results ...... 71 3.5 Conclusion ...... 84 3.6 Appendix ...... 87

4 New Evidence on Firm-Based Training 98 4.1 Introduction ...... 100 4.2 Research Questions ...... 103 4.3 Data ...... 104 4.4 Methods ...... 108 4.5 Results ...... 112 4.6 Conclusions ...... 121 4.7 Appendix ...... 124

5 Gender Differences in Wages and Training 137

1 5.1 Introduction ...... 139 5.2 Literature ...... 141 5.3 Data ...... 145 5.4 Results for Training ...... 150 5.5 Results for Wages ...... 161 5.6 Conclusions ...... 165 5.7 Appendix ...... 168

6 Linkage Consent Bias 180 6.1 Introduction ...... 182 6.2 Literature ...... 184 6.3 Research Questions ...... 186 6.4 Data ...... 189 6.5 Predictors of Linkage Consent and Establishment Heterogeneity ...... 191 6.6 Bias in Economic Models ...... 199 6.7 Conclusions ...... 204 6.8 Appendix ...... 207

5 References 221

Bibliography of the Chapters 240

2 1. GENERAL INTRODUCTION 3

1 General Introduction

Interactions between firms and workers play a central role in our economy. The vast majority of workers are employees who receive a salary. Their interests must be aligned with those of their employers. Incentives such as bonuses, which motivate workers and bolster their performance, play an important role here. It is rare that firms and workers are bound by mutual affection; it is more often the case that they collaborate in their pursuit of individual goals. A worker’s primary aim, for instance, might be to secure a regular income and avoid exerting too much effort in doing so. In contrast, a firm wants its workers to perform as effective as possible. Each party behaves strategically, prioritizing their own interests over mutual benefit. This can create principal agent dilemmas if, for example, the actions of a worker are difficult to monitor. Understanding this relationship and designing effective incentive structures is a major concern in personnel economics, both for theory and for empirical research. The importance of carrying out research into the firm-worker relationship has been confirmed by the bestowal of this year’s Nobel Prize on Bengt Holmström and Oliver Hart. Holmström’s research, for example, on the “informativeness principle” deals with principal agent theory and has strongly influenced personnel economics. According to Lazear and Gibbs (2009), the employment relationship is “one of the most complex types of economic transactions in the economy”. Among other things, the complexity of relevant research stems from the conflict- ing interests in this relationship, from information asymmetries and the difficulties involved in writing a contract which covers all features of a given job. Furthermore, whilst the interaction between firms and workers is shaped by many factors, only a few of these can be observed by the researcher. This illustrates that empirical research into questions concerning the interaction between firms and workers requires both innovative data and novel statistical and econometric methods. This thesis provides new insights into the interaction between firms and workers. In the fol- lowing five chapters, we will focus on different aspects of the employment relationship, and on various methodological issues relevant to research in the field of personnel economics. In the following, we are going to address “incentives, matching firms with workers, compensa- tion [and] skill development”, four of the “five aspects of employment relationships” identified by Lazear and Oyer (2014). In addition, we will touch on the fifth of these aspects, “the or- ganization of work”. In the first chapter, we compare how different incentive schemes affect performance in a simple and a creative task. The second chapter considers the role of player mobility on competitive balance in a sports market. Whilst the importance of firms’ and work- ers’ observable and unobservable characteristics in determining the individual’s participation in training is explained in the third chapter, Chapter four considers the reasons for variation in participation between male and female workers, as well as the impact of such differences on 1. GENERAL INTRODUCTION 4 wages. Finally, the fifth chapter explores how linking matched employer-employee survey data to social security records might lead to new forms of non-response. In addressing these four aspects, we make several contributions. Firstly, whilst these chapters have a largely empirical focus, they build on fundamental theoretical work, not only from the field of economics, but also from disciplines such as psychology, sociology and statistics. The- oretical models help us to understand the relationship between firms and workers. One example is the well-known human capital theory. This model indicates that education and training can be viewed as any other form of investment. These investments can lead to higher productivity, and can in turn also increase wages. Aside from increasing productivity, however, there may be other reasons for an individual to undertake further education and participate in training courses. Human capital investments might be used by an employee to signal private information to the firm. This is discussed in detail in Chapter four. Empirical research is essential as it allows us to test the accuracy of theoretical predictions by applying them to the real world. This is a primary objective of this thesis. Second, innovative datasets for the empirical analyses of the employment relationship must enable the behavior of the worker and of the firm to be observed simultaneously. In this study, we therefore utilize novel datasets which link the perspective of both parties. This includes primary data collected by the author and, not to forget, by his co-authors, but also secondary datasets which have been made available to the author. The secondary data includes matched employer-employee survey data allowing us to understand and explain the behavior of firms and workers. Rich survey data helps us shed light on the drivers of gender differences in training (see Chapter four). This data is also merged with social security records, thereby allowing us to model long-run employment biographies and to measure the business success of a firm (see Chapters three and four). Chapter five explores potential issues regarding the linkage of survey data with social security records. The primary data consists of a large-scale laboratory experiment covering more than 1,000 participants. In Chapter one, a randomized treatment and control group approach allows us to directly measure causal effects. This is much more difficult with non-randomized datasets. Chapter two uses novel information from sports competitions. Like laboratory experiments, sports markets allow us to directly observe the productivity of individual workers. We are therefore able to analyze mechanisms which remain unobserved in standard datasets. Third, the evaluation of complex employment relationships of course requires appropriate sta- tistical techniques, which can deal with hierarchical and bilateral interactions. The firm-worker relationship is hierarchical in two senses. Firstly, it usually concerns multiple workers at dif- ferent firms, and secondly, this relationship is not static but evolves and changes over time. We contribute to the research on the firm-employee relationship in several ways. We first propose novel methods, including multilevel models and machine learning algorithms that are well- 1. GENERAL INTRODUCTION 5 suited to this purpose and which of today have not been widely used in personnel economics so far. Secondly, we look at firm, worker and job variables and take the clustering of workers in firms into account, thereby acknowledging the particular complexity of the interaction. Fur- thermore, the generalizability of novel datasets which merge survey data with social security records, may be reduced due to “non-consent bias”. If respondents with certain characteristics tend not to provide linkage consent, a novel form of non-response could make a merged survey sample less representative of the population. We discuss this issue in the final chapter. Fourth, it is of course important that existing knowledge is reconsidered in the light of new findings. The technological advances seen in recent decades, for example, have increased de- mand for workers who can carry out complex, non-routine tasks (Autor et al., 2003). These technological changes have thus had a profound effect on the labor market. Creative tasks, for example, have become more important. It is often said that this class of tasks is associated with intrinsic motivation as individuals are genuinely interested in performing these tasks. Ac- cording to the motivational crowding out theory, monetary incentives may undermine intrinsic motivation. Standard economic theory has often abstracted from the question whether a task is intrinsically motivating or not. In the first chapter, we have therefore considered the extent to which we can justifiably generalize from simple to more complex tasks. Furthermore, complex, non-routine tasks require that workers continue to update their skills and adapt to changes in skill demand. Lifelong learning in the form of job-related training courses is an important way in which employees can acquire new, and preserve existing skills. If policy-makers wish to fos- ter lifelong learning, they must know who already receives training investments, and who does not (see European Commission, 2015). Perhaps most importantly, they must know whether it is the behaviors of firms or workers which are the main source of unequal training provision. We discuss this topic in Chapter three, touching on it in Chapter four. Finally, we place particular importance on ensuring the credibility of the results presented in this thesis. Whenever possible, we provide extensive robustness checks. In Chapter one, for example, in which we work with fully randomized data, we start by carrying out a simple non- parametric rank test before we move onto linear regression models. In Chapters two and five, we compare the results from standard regression approaches to regularized regression models. These confirm our previous results. In the case of the lasso approach used in Chapter five, we detect some differences between both methods which provides us with new, valuable informa- tion. In Chapter three, we show that our main results remain unchanged, regardless of whether we consider the incidence, or the intensity of training. In Chapter four, where we aim to mea- sure the wage effects of training, we show that results are the same for a full sample, and for a subsample of individuals who participated in training or who intended to participate in train- ing. Finally, we successfully replicate some of the main findings from Chapter three, where we look at the role of workers and firms in determining levels of participation in training for a new 1. GENERAL INTRODUCTION 6 sample which has not before been available. Our findings are of course relevant for researchers in the field of personnel economics. They are, however, also relevant for economists working in the fields of labor and education economics. Indeed, the relevance of results is not limited to economics. In Chapter one, for example, where we look at crowding-out effects, we borrow heavily from literature on aspects of psychology. Our results may also be of interest to human resource managers who are contemplating a new incentive structure. Sociologist may be interested in the results presented in Chapters three and four, where we look at participation in training and inequality with regard to training at- tendance. Chapter five is highly relevant for researchers working with interview data, survey methodologists and practitioners from polling institutes.

1.1 Summaries of the Chapters

Incentivizing Creativity: a Large-Scale Experiment with Tournaments and Gifts

In the first chapter, we investigate whether an alternative approach should be taken to motivate the creative performance of workers which is different to the one applied to motivating workers in their completion of routine tasks. Whilst various studies have analyzed what incentives might be used to increase the performance of routine tasks, little is yet known about what incentives can be used to foster the performance of creative task. This chapter, therefore, deals with the first three, incentives, matching firms with workers, compensation, of the five aspects of employment relationships. In a control-group experiment with more than 1,100 participants, we mimic an employer-employee relationship for a simple, routine task and for a creative task. We apply the same incentive structures for both types of task, therefore enabling us to see whether the response to treatment depends on the nature of the task. We look first at a tournament in which the individual pay-off depends on the individual per- formance. It is widely accepted that in the case of routine tasks, financial incentives have a positive effect on performance, provided that the marginal pay-off is higher than the marginal cost, represented here by the worker’s effort. We question the applicability of this to creative tasks, which may be prone to crowding-out of intrinsic motivation. We do not restrict analyses to performance-dependent tasks. We believe this is the first study to look at the performance effect of a wage gift for creative tasks. Behavioral studies have shown that such incentives, which are independent of performance, can trigger reciprocity. It is not known, however, whether reciprocal behavior applies in a similar way to creative tasks. Our results show that the performance-dependent tournament enhances performance for both types of task. The effect is economically large and almost identical for both routine and creative tasks. This indicates that crowding-out does not affect performance-contingent incentives. In 1. GENERAL INTRODUCTION 7 an additional treatment, we also show that increased performance is mostly due to the financial incentive itself. The tournament gives participants information about their performance relative to others. This feedback explains, however, only one fourth of the total effect. In the case of the performance-independent wage gift, the results for the simple and creative task differ. While this treatment results in performance increases for the simple task, we do not find any such effect for the creative task. We show that this asymmetry can be explained by the fact that there is (more) uncertainty regarding the link between effort and profit for the employer in the creative task. This uncertainty is a key element of creative tasks. We eliminate this uncertainty in a further treatment. In doing so, we find similar results for both types of task. These results are highly relevant for both practitioners and researchers in the field of human resource management. We show that it is possible to use incentive schemes to stimulate per- formance in creative tasks which are associated with intrinsic motivation. Results from routine tasks cannot, however, be directly transferred to more complex tasks.

Competitive Balance and Assortative Matching in the German

In Chapter two, we consider the sports industry to directly address the aspect of the employment relationship ‘matching firms with workers’. Like laboratory experiments, the sports market allows us to observe the productivity of individual workers. The analysis of such environments is therefore of interest to economists wishing to analyze mechanisms which are usually difficult to observe using alternative data. These mechanisms include, for example, the effect of worker mobility on the degree of competition in a market. Furthermore, the sports industry is relevant to economic theory as these markets are characterized by certain labor demand externalities. As famously demonstrated by the "Louis-Schmeling Paradox", sports spectators value uncertainty in regard to the outcome of sports matches or championships. In order to provide an entertaining fight, Louis needs Schmeling, and vice versa. Theory, however, gives ambiguous predictions as to whether competitive balance is in equilibrium or not. It is therefore an empirical question if unpredictability decline over time in sports markets. In order to provide new insights into this question, we analyze trends in competitive balance seen in the top two tiers of the German Bundesliga over the last two decades. We suggest that analyzing the distribution of player talent across clubs over time serves as a mechanism to reveal changes in competitive balance. Determining whether better players are increasingly playing for better teams serves as a measure for changes in competitive balance. It is of course, notoriously difficult to measure the productivity of an individual player. We therefore take a novel approach, using the partial correlation of each player with the goal margin as a measure of player productivity. Using standard regression methods, along with new techniques identified in the literature on 1. GENERAL INTRODUCTION 8 machine learning, we show that the distribution of player talent across clubs has become ever more unequal over time. It is increasingly the case that better players tend to play for better teams. These trends indicate that competitive balance has decreased in the German Bundesliga over the last two decades. We show that this trend can be observed both within the first Bundesliga, as well as between the two top tiers of the Bundesliga. We do not, however, observe changes in competitive balance within the second Bundesliga. We further demonstrate that player transfers between clubs in the Bundesliga results in a positive correlation between player talent and team strength. These domestic transfers do not, however, explain the reduction in competitive balance over time. The transfer of players from abroad, as well as the retention of player talent at better clubs, also play an important role in driving changes in competitive balance over time.

New Evidence on the Determinants of Firm-based Training

It is a well-known from economic theory, that both firms and workers invest in job-related training. These two parties might collaborate in making such investments. Alternatively, inde- pendent investments may be made by either the worker or the firm. Given that firms and workers are heterogeneous, one must ask, however, how exactly participation in training is distributed amongst workers within a firm and across firms. To the best of our knowledge, this question is yet to be analyzed in any depth. The reason for this is that it requires detailed matched data from multiple firms, relating to a sufficient number of workers over time. Answering this ques- tion will give further information about the employment aspect ‘matching firms with workers’. Furthermore, formal training is an important channel to develop new skills – another important research theme in personnel economics as identified by Lazear and Oyer (2014). In the third chapter, we investigate participation in job-related training in general and distin- guish between training courses financed by the firm, and those financed by the worker. We consider not only the incidence but also the intensity of training. We first examine how equally or unequally training participation is distributed across firms, but also between co-workers of the same firm. We suggest a novel method based on hierarchical or multilevel models in order to estimate the distribution of participation in training. Among other things, these findings af- ford us insights into differences in firms’ training policies. We show that the average training rates between firms do indeed vary. Differences in levels of training attendance seen between workers in the same firm, however, are much larger. The distribution of participation in train- ing courses is to some extent unequal. This phenomenon is much more emphasized, however, between individual workers rather than between firms. We go on to provide possible explanations for these differences in participation in training. We thereby consider various predictors of training attendance as discussed in the training literature. 1. GENERAL INTRODUCTION 9

These include the worker’s level of formal education, and the sector in which a firm is operative. We also incorporate new variables which have been more recently highlighted by economic theory. These predictors include the amount of wage compression which may serve as a proxy for a firm’s monopsony power. The inclusion of these novel variables is only possible through the linkage of survey data with administrative social security records. We further highlight the role of job content, that being the tasks performed by an individual worker in the workplace, in determining levels of participation in training. Our findings show that if we take heterogeneity in the workforce or job composition into account, firms often invest in training participation to a similar extent. This does not apply to the levels of training between individual workers in a given firm. Although we are able to explain approximately one third of the difference in the training participation within a firm, even after conditioning on socio-demographic characteristics or job requirements, large differences between co-workers remain. These differences remain stable for a time period of at least 4 - 5 years. Finally, we give some insight into the interrelation between firm-financed and worker-financed training. This information is then explored in more detail in Chapter four in regard to differences in levels of participation in training seen between male and female workers. Knowledge about the distribution of training between and within firms has important implica- tions for policy-makers and researchers. Policy-makers can use this information when consid- ering which firms and workers to approach in order to increase participation in training. Re- searchers who wish to establish causal effects of training participation, for example on wages or employment (see Chapter four), gain information about the strength of firms’ and worker’s selection effects into training.

Gender Differences in Wages and Training

Whilst we examine the distribution of participation in training between firms and workers in Chapter three, the subsequent chapter focuses on gender differences in participation in training. Chapter four does not, however, consist exclusively of an analysis of the determinants of partici- pation in training; emphasis is also placed on the question of whether differences in participation in training gives rise to wage differences between male and female workers. Although a broad literature has investigated gender-specific returns on formal education, little research has as yet considered whether wage returns on training differ between male and female workers. For the purposes of analysis, we distinguish between firm- and worker-sponsored training. We thereby show that male workers participate much more often than their female peers in training which has been initiated by the firm, and which takes place entirely during working hours. In contrast, female workers attend training courses which they initiate themselves and which over- lap with their leisure time almost twice as often as their male colleagues. In order to explain 1. GENERAL INTRODUCTION 10 these differences, we look at the role of firm and occupational segregation, statistical discrimi- nation, preferences and unobserved productivity. Whilst there is not a single factor which explains gender differences in training participation, all of the above factors seem to play some role in inducing such differences. Occupational segregation, for example, helps to explain training differences between male and female workers as women are more often employed in jobs which require more training. These training courses, however, are mostly initiated by the worker themselves and take place at least partially during the worker’s leisure time. Whilst differences in participation in firm-sponsored training arise from the behaviors of less productive workers, the behaviors of highly productive workers produce opposing results for worker-sponsored training. This finding gives us some reason to assume that statistical dis- crimination plays a role here as less productive women have on average lower labor attachment than less productive men. This points to a double standard in promotion practices, whereby women must be more productive than men to achieve an equivalent promotion. Furthermore, both findings suggest that participation in training may be used as a signal to the firm. In view of statistical discrimination, for example, female workers who have a higher than average degree of labor attachment may want to signal this, and their job engagement, to the firm. We also propose a novel method for measuring the wage returns on participation in training. Our results point to small wage returns on firm-sponsored training and large returns on worker- sponsored training. On the one hand, female workers participate less than male workers in firm-sponsored training. In terms of wage increases, however, women may see greater returns on attending such courses than their male colleagues. Although women participate more often in worker-sponsored training, they may gain less from these courses. These findings offset each other and we do not therefore, find that gender differences in training participation have an overall effect on the gender wage gap. This chapter touches all five of the aspects of the employment relationship. Statistical dis- crimination, for example, may exhibit strong disincentives to invest in training. Firm and job segregation are important elements of ‘the organization of work’. All in all, our results show that there remain considerable differences between male and female workers in the labor mar- ket, not only in terms of wages, but also in regard to participation in training. We discuss various economic theories which give some insight into the mechanisms behind these gender differences.

An Investigation of Record Linkage Refusal and Its Implications for Empirical Research

Personnel economists and many other researchers increasingly rely on social security data to answer questions regarding the labor market. Survey data is ever more frequently merged with 1. GENERAL INTRODUCTION 11 administrative records to provide a variety of additional information, about individual prefer- ences for example. We have used such merged datasets in Chapters three and four. Data privacy laws require that individual consent is acquired to link survey data to administrative sources. Merged datasets will not therefore include information about respondents who do not grant linkage consent. In Chapter five, we investigate predictors of linkage consent and its significance for applied research. To this end, we compare different German linked-employer-employee datasets which have been merged with social security data. Literature on linkage consent has thus far provided inconsistent findings. Existing studies have, for example, found both statistically significantly positive and negative associations between linkage consent and characteristics such as age or education. We seek to provide new insight into possible reasons for these inconsistencies. We show that findings based on these two datasets are generally similar. There are, however, contrasting results for non-cognitive skills such as personality traits, which we are unable to explain. This is the first study in which it is shown that co-workers in a given firm tend to make the same decision as to whether or not to provide linkage consent (coming back to Lazear’s and Oyer’s ‘matching firm with workers’). This dependency is yet not very large. We find that individuals in East Germany more often provide consent than workers in West Germany. This explains almost all the association between co-workers. Finally, this study is the first to replicate economic research for the group of respondents who did not provide linkage consent. In Chapter three, we used only information about respondents who consented to data linkage. In Chapter five, however, in order to assure the reader of the credibility of our results, we replicate the third Chapter by looking at all individuals, irrespective of whether or not they consented to data linkage. In addition, we consider whether findings regarding the association between, on the one hand wages, and on the other hand participation in training, education, and other variables, depend on the sample used. Our replications show minor differences between the sample of respondents who did not provide linkage consent and the group of workers who did consent to data linkage. General findings are, however, confirmed. We therefore reach a positive conclusion – samples which do not include respondents who refused to allow data linkage, are nonetheless representative of the population as a whole. 2 Incentivizing Creativity: a Large-Scale Experiment with Tournaments and Gifts

Christiane Bradler (Volkswagen AG) Susanne Neckermann (University of Chicago) Arne Jonas Warnke (Centre for European Economic Research, ZEW)

JEL-Classification: C91, D03, J33, M52 Keywords: Creativity, Incentives, Tournament, Reciprocity, Experiment, Crowding-out

Acknowledgements: We gratefully acknowledge financial support from the Germany’s Federal Ministry of Education and Research framework program “Economics of Science.” Further, we thank the SEEK research program for their financial support as well as the many seminar and conference participants whose valuable contributions made this paper much better. Specifical thanks goes to Dirk Engelmann, Michael Kosfeld, Francois Laisney, Steve Levitt, John Morgan, Nick Zubanov, and Thomas Zwick. We thank Christian Bommer, Michael Dörsam, Radost Holler, Sascha Lehman, Johannes Moser, Vera Schmitz, Jan Schneiderwind, Mattie Thoma, Jeannine van Reeken, and Timo Vogelsang for outstanding research assistance. This paper was previously circulated as “Creativity is different: Comparing rewards across a creative and a routine task” and “Rewards and Performance: A Comparison Across a Creative and a Routine Task”.

12 2. INCENTIVIZING CREATIVITY 13

Abstract: This paper reports the results from a large-scale laboratory experiment investigating the impact of tournament incentives and wage gifts on creativity. We find that tournaments sub- stantially increase creative output, with no evidence for crowding out of intrinsic motivation. By comparison, wage gifts are ineffective. Additional treatments show that it is the uncertain map- ping between effort and output that inhibits reciprocity. This uncertainty is prevalent in creative and other complex tasks. Our findings provide a rationale for the frequent use of tournaments when seeking to motivate creative output. 2. INCENTIVIZING CREATIVITY 14

2.1 Introduction

The share of workers performing tasks that require them to engage in non-routine problem solv- ing and creative thinking has increased substantially over the last several decades (Autor et al., 2003; Florida, 2002). This trend requires firms to adapt and change in many ways. One partic- ular challenge relates to incentivizing workers to perform well in these types of jobs. To date little is known about how creative performance responds to incentives.1 By comparison, there is a long tradition in economics that investigates the impact of work incentives on motivation and productivity in simple and routine tasks. The purpose of this paper is to gain a deeper un- derstanding of the incentive response function of creativity by investigating the extent to which lessons learned from simple tasks can be generalized to creative tasks. Dozens of studies have explored the impact of financial incentives on routine tasks, confirming what standard economic theory predicts: financial incentives have a positive effect on perfor- mance because agents increase effort as long as the benefits they derive from each additional unit of output exceed their effort costs. Positive incentive effects have been demonstrated for differ- ent types of performance-dependent rewards such as piece rates, where workers are rewarded according to their absolute output (for instance, Lazear, 2000) or tournaments, where workers are rewarded on the basis of their relative performance (Harbring and Irlenbusch, 2003).2 But even as early as 1999, Prendergast noted that these types of routine jobs are not very common. Explicit financial incentives are, of course, not the only way to trigger workers’ performance. An established literature on gift exchange suggests that workers reciprocate wage gifts with higher effort (Akerlof, 1982). This hypothesis has been tested and confirmed in a myriad of laboratory experiments with both chosen and real effort (see, e.g., Fehr et al., 1997 for an early study on the topic, or Fehr and Gächter, 2000 for an overview). There is, however, mixed evidence on the effectiveness of gift-exchange in the field (see, for example, Gneezy and List, 2006). Like the literature on financial incentives, these studies have also almost exclusively utilized simple, routine tasks. Taken together, these findings inform human resource management on how to optimally reward employees in jobs that involve a clearly defined and repetitive workflow. Yet it is critical to understand whether these insights into the effectiveness of different rewards also hold for jobs which rely on creativity.3 To date, economic theory mostly abstracts from the nature of the task

1The nascent literature on incentivizing creativity in economics will be discussed below. 2In the field, researchers observed work performance when, for instance, installing wind shields (Lazear, 2000), picking fruits (Bandiera et al., 2005), or planting trees (Shearer, 2004); in the laboratory, subjects have been rewarded for typing letters (Dickinson, 1999), cracking walnuts (Fahr and Irlenbusch, 2000), or filling envelopes (Falk and Ichino, 2006). A main advantage of these tasks is that they offer a precise and easily observable measure of the quantity (and the quality) of workers’ output. 3According to the dominant scholarly definition, creativity is defined as the production of ideas, solutions, or products that are novel (i.e., original) and appropriate (i.e., useful) in a given situation (Amabile, 1997). In 2. INCENTIVIZING CREATIVITY 15 and predicts uniform effects of rewards across different tasks. One exception is the literature on motivational crowding out that suggests that explicit incentives can sometimes be counter- productive. This is especially likely for tasks which are cognitively complex or high in intrinsic motivation, both of which are attributes of creative tasks (Bonner et al., 2000; Camerer and Hogarth, 1999; Amabile, 1996; Shalley et al., 2004). The idea is that the provision of financial incentives can “crowd out” the intrinsic motivation to perform the task (Deci et al., 1999).4 So far, the economic literature on crowding out has primarily focused on intrinsically motivated activities that are not creativity-oriented (e.g., Frey and Jegen, 2001, or Gneezy et al., 2011 for overviews) and still debates the existence of crowding out effects (e.g., Fang and Gerhart, 2012; Fehr and Falk, 2002; Charness and Gneezy, 2009). Gifts might be effective in situations where crowding out effects can be expected to be strong, for instance for creative work, yet there is no previous literature exploring this hypothesis. This paper reports the results from a laboratory experiment with more than 1000 subjects that addresses these open issues. Subjects were randomly assigned to groups of five, with one principal (“employer”) and four agents (“employees”). The four agents worked for three periods for their principal on either a creative or a routine task. Our basic design is 2 x 3: the routine and the creative task with three treatments (control, gift, tournament) each. We also ran a host of supplementary treatments that serve to elucidate the underlying mechanisms. In all treatments, the principal’s payoff depended on the output produced by their four agents, who received an exogenously assigned fixed wage in each period. After agents worked on the task for one period, principals could decide whether or not to provide additional rewards to their agents (at their own expense and without knowledge about agents’ performance in Period 1). In the Tournament treatment, the principal could provide an additional monetary prize to the 50% best performing agents in her group. In the Gift treatment, the principal could opt for a monetary gift, half as large as the tournament prize, to all four agents in her group. In the third period, subjects were asked to work for the principal for one more round without additional rewards. The experiment was run as a three period design to provide us 1) with a baseline measure of performance (capturing agents’ ability, intrinsic motivation to perform the task, and other- regarding preferences towards the principal), 2) with a measure of performance under both types

contrast, a task can be defined as routine “if it can be accomplished by machines following explicit programmed rules.” (Autor et al., 2003, p. 1283). The addition of “simple” refines this latter definition by restricting the set of tasks to those that are simple to understand and perform, i.e., that do not require much instruction, skill, or prior knowledge. 4According to self-perception theory (Bem, 1972), a reward causes a shift in people’s perception of why they perform a task: own behavior is attributed to the reward and not to the enjoyment of the activity itself. As a result, if a reward is subsequently removed, individuals are less motivated to work on the task (Deci, 1971; Lepper et al., 1973; Amabile, 1988, 1996; Joussemet and Koestner, 1999; Deci et al., 1999, among others). Other reasons for a negative effect of rewards on performance include the exertion of “too much effort” or an increase in self- consciousness beyond an optimal level (e.g., Camerer and Hogarth, 1999 or Ariely et al., 2009). 2. INCENTIVIZING CREATIVITY 16 of rewards schemes for both tasks,5 and 3) with a measure of whether or not treatments affected intrinsic motivation (which can only be captured in a subsequent period without rewards). Both reward schemes were designed to have identical costs to the principal and overall benefits to the group of workers; they differed only in the distribution of benefits among workers. This feature allows conclusions about which of the two schemes is preferable given a fixed budget for rewards. In addition to the two monetary reward treatments, we ran a control group for each task which allows us to account for learning and fatigue and to standardize performance across the two tasks, rendering the effect sizes comparable. We find a substantial and positive incentive effect of the Tournament in both tasks (routine and creative) in the second period. The effect sizes are of similar magnitude in the two tasks, sug- gesting that performance in both tasks is equally sensitive to competitive incentives. Subsequent to learning whether they were winners or losers at the end of Period 2, winners in the Tourna- ment outperformed comparable others in the control group in Period 3 even though there was no subsequent tournament prize at stake. Losers, on the other hand, returned to their baseline level of performance. This is evidence against the notion that financial rewards crowd out in- trinsic motivation and thus creative performance. To the contrary, it suggests that performance- dependent rewards in the form of tournament incentives increase creative performance and even have long-lasting effects on tournament winners. Interestingly, the performance response to the wage gift differs between the two tasks. Subjects in the routine task respond to the gift with an economically and statistically significant increase in their performance. The effect size is similar to that typically found in the literature on gift exchange (see, for instance, Fehr and Gächter, 2002). However, agents do not reciprocate the gift in the creative task. We discuss various potential explanations for this asymmetry and show that the absence of reciprocity in the creative task is driven by employees’ lack of ability to finetune their back transfer to the principal. Creative tasks are characterized by this inability since the exact value of an idea is typically somewhat uncertain and only becomes apparent with time, say, after an idea is implemented. The same was true in our experiment where the value of an idea to the principal depended on the originality rating of an idea. Overall our results suggest that creative tasks respond well to performance incentives and that responses to rewards are very similar in creative and routine tasks; however, certain features inherent in creative work affect the effectiveness of gifts as rewards. Therefore, wage gifts for employees in creative tasks might not effectively boost creativity in practice.

5We use the term reward for both the tournament reward scheme as well as the wage gift as a shorthand, even though the wage gift is not a reward as it is commonly understood, i.e. rewarding past performance. Instead it is independent of both past and future performance. Fehr and Falk (2002), for example, have therefore described wage gifts as implicit rewards. 2. INCENTIVIZING CREATIVITY 17

This study contributes to the small but growing literature in economics that studies the impact of rewards on creativity. Previous studies have explored how creativity is influenced by the size of the reward (Ariely et al., 2009) or the type of creative task (Charness and Grieco, 2012). Laske and Schroeder (2015) focus on the multi-tasking aspect of creativity by looking at how incentives affect quantity, quality, and novelty of creative output. Erat and Gneezy (2015), by comparison, compare the effectiveness of piece rate incentives and competitive incentives and find evidence for choking under pressure.6 We extend this nascent economic literature on creativity as well as the literature on incentive provision in five distinct ways. First, to our knowledge this is the first study to examine the effectiveness of financial gifts for increasing creative performance. The lack of literature on this subject is surprising given both the attention that gift exchange has received in the literature in the context of incomplete contracts (see Fehr and Gächter, 2000 for an overview) and the fact that creative jobs seem to be a prime example of jobs that are complex, hard to monitor, and typically governed by incomplete contracts. Second, as far as we know, this is also the first study to compare the effect sizes of a performance-dependent, competitive incentive (tournament) with that of a performance-independent wage gift in one set-up. By doing so, our design allows a direct comparison of the cost-effectiveness of the two reward schemes.7 Such a comparison is especially relevant for creativity as theory suggests that these two types of rewards might affect it in fundamentally different ways (e.g., Byron and Khazanchi, 2012). Third, we look into whether the incentive-response function differs between our creative and our simple, routine tasks. This comparison provides first insights into whether the lessons learned with simple tasks generalize to creative tasks, and, hence, whether an independent literature assessing incentives for creativity is needed. Fourth, we introduce the Unusual Uses task into the experimental literature on creativity. In our view, this task captures central elements of day-to-day creativity in organizations well (Woodman et al., 1993).8 In the task, subjects have to come up with as many and as original alternative uses for common objects such as a tin can or a sheet of paper. Hence, rather than measuring blue-sky creativity – even though Unusual Uses does allow measuring the originality of ideas as well – the task focuses on whether subjects can place common objects into a different context. This is a central element in business innovation

6Another study in this realm is Eckartz et al. (2012). They implemented a creative task as well as two control tasks for comparison (Raven’s IQ and a number-adding task) in one experimental set-up. They find that neither the tournament incentive nor a piece rate had any effect on performance in any of the three tasks. This makes it hard to draw clear conclusions about whether or not rewards fail to enhance creativity, since their rewards did not affect their control groups either. A likely explanation is that baseline motivation was very high in all three tasks. 7We used a tournament scheme rather than a piece rate as the performance-dependent reward scheme because tournaments are widely used in practice to reward individuals for creative performance and innovations (Brunt et al., 2012; Kremer and Williams, 2010). For instance, companies increasingly allocate creative tasks to online platforms with creative contests (such as www.innocentive.com or www.jovoto.com) to complement their in-house research and development. These platforms offer tournament-based compensation for various creative tasks such as scientific problem-solving, software development, and graphic art design (Boudreau et al., 2011). 8The task has also been used by Dutcher (2012) in his study on the effects of telecommuniting on productivity. 2. INCENTIVIZING CREATIVITY 18 and in corporate idea-suggestion systems. In that sense, the paper complements the existing literature that mostly focuses on blue-sky creativity or on tasks that involve very little creativity such as those that involve pattern recognition. An advantage of the task is that it measures and captures creativity along several different dimensions: quantity (the number of answers), breadth (the spread of answers across different idea categories), as well as originality (measured as either the statistical infrequency of answers or by subjective evaluation). The availability of these separate measures allows us to address issues such as quantity - quality tradeoffs when assessing the effect of incentives on creative performance. Finally, we provide another data point to the discussion on whether or not it is possible to foster creative performance through financial rewards. The paper is structured as follows. Section 2.2 describes the experimental set-up, the tasks, and the treatments. Section 2.3 presents our main results, Section 2.4 investigates mechanisms and looks into a number of supplementary issues, for instance, the absence of reciprocity in the creative task, the mechanism via which tournaments increase effort, and post-treatment effects. Section 2.5 concludes.

2.2 The Experiment

In the following, we introduce the experimental tasks, the set up, the treatments, and the exper- imental procedures.

2.2.1 The Tasks

In order to assess the effectiveness of rewards for routine and creative tasks, we implemented both types of tasks in the experiment. We use the “slider task” as a proxy for simple, routine tasks in the workplace (Gill and Prowse, 2012). The slider task is a real effort task that has a number of desirable attributes. It is easy to explain and to understand, and it does not require prior knowledge. It is identical across repetitions, involves little randomness, and leaves no scope for guessing. The task features a computer screen displaying 48 sliders on scales that range from 0 to 100. Figure 2.1 shows an example of the screen as it was presented in the experiment. Initially, all sliders are positioned at zero. The aim of the task is to position as many sliders as possible at exactly 50 within 3 minutes by using the mouse.9 Each slider can be adjusted and re-adjusted an unlimited number of times. While moving the mouse, subjects cannot be sure whether they positioned the slider at exactly 50. The exact position of the slider is displayed

9Keyboards were disconnected during the task to prevent the usage of the arrow keys. 2. INCENTIVIZING CREATIVITY 19 to the right of the scale only when the subject stops using the mouse. We measure a subject’s performance as the number of correctly positioned sliders within the alotted time. Gill and Prowse demonstrated that this measure corresponds closely to the effort exerted by a subject. Before the start of Period 1, subjects were given one minute to practice the task. We measure creative performance via the “Unusual Uses Task.” Originally developed as Guil- ford’s Alternative Uses Task (Guilford, 1967), it was later incorporated in the Torrance Test of Creative Thinking (Torrance, 1968, 1998), the most widely used and validated test to assess creativity (Kim, 2006).10 In the Unusual Uses Task, participants are asked to name as many, unique and unusual uses for an ordinary item, such as a tin can, as they can. This captures a central element in applied business innovations: the recombination of existing bits of knowl- edge in novel ways (Weitzman, 1998; Simonton, 2004). Specifically, the task requires divergent thinking or “thinking outside the box,” which is one of the most important components of the creative process (Runco, 1991). One advantage of the Unusal Uses task is that it provides a clean numerical measure of creative productivity. In the experiment, subjects had to sequentially brainstorm unusual uses for three different items: a sheet of paper, a tin can, and a cord. Subjects were informed that they should not limit themselves to a particular size of the item. Moreover, the unusual use that they come up with could require more than one of the items; for instance, the use could require more than one sheet of paper or several tin cans. The order in which subjects had to work on the items was fixed: (1) paper, (2) tin can, (3) cord.11 In the creative task, just as in the slider task, subjects had a test period of one minute. In this test period, subjects were given the item “old tire” to familiarize themselves with the task and the input mask on the screen. We used the three standard measures of the Unusual Uses task to evaluate subjects’ responses: fluency, flexibility, and originality (Guilford, 1959), and we told subjects how their answers would be scored.12 Fluency refers to the number of valid answers. An answer is valid if the stated use is possible to implement and the realization is at least vaguely conceivable. Fantastic or impossible uses are not counted. Examples of a valid use of a tin can are, for instance, a flower pot, a pen container, and a drum. In contrast, examples of invalid answers are the use of a tin can as a television, a computer, or a window.13 In the experiment,

10In order to assess overall creative potential, the Torrance Test of Creative Thinking also includes a number of figural elements that require drawing skills. The more specific Unusual Uses task, however, best captures the type of creativity that we want to study. 11Controlling for order effects is not important in our design as we use the same order in the control group and only look at changes in performance between periods and between treatment and control groups. 12The original Guilford Test uses a fourth criterion for scoring, elaboration, which refers to the degree of detail of the answers. We refrained from using this fourth dimension because it is largely effort-based and would have constrained our capacity to score answers within the time frame of the experiment. 13Usual uses, such as a food container in the case of the tin can, were not scored in the original version of Guilford’s Alternative Uses Task or the TTCT. However, the original instructions of the test (as well as our in- structions) do not explicitly exclude usual uses from scoring. We therefore scored usual uses as valid answers. 2. INCENTIVIZING CREATIVITY 20 each valid use was given one point. The second evaluation measure, flexibility, reflects the variety of a subject’s responses and is determined by counting the number of different categories into which responses fall. For instance, the answer candleholder falls into the category ‘decoration’, and answers like a rattle or a drum into the category ‘musical instruments.’ Subjects received one point for each category. Common categories for the tin can include ‘non-food containers’ (for instance, a pen container), ‘sporting goods’ (for instance, a football), and ‘communication’ (for instance, a tin can phone). Overall, there were roughly 55 categories for each of the three items that we used. Finally, the originality of responses was measured by the statistical infrequency of answers. In order to get an idea of the frequency of responses, we conducted a pre-test with 127 participants who worked on the three items under a fixed wage scheme. We then tabulated all valid answers for each item according to the frequency with which the answer was given and constructed a rating scale to assess answers in the experiment. This scale allotted one additional point to a valid answer if less than 8% (“original”) and two additional points if less than 1% (“very original”) of participants gave that answer. In comparison to other measures of creativity that rely on expert ratings, our statistical approach to originality is more reliable and objective.14 Examples for an original use of a tin can are an insect trap or an animal house. Very original answers include using the tin can as a scarecrow, a shower head, a treasure chest, or a grill (by putting coal into it and meat on top). Table 2.1 illustrates further examples of frequent answers and categories as well as original, very original, and invalid responses for all items. Scoring was conducted by research assistants who were carefully acquainted with the scoring procedures and blind to the treatments. In order to test whether the creative task is more intrin- sically motivating than the routine task, we had 100 subjects work on both tasks and rate their interest in the tasks on Likert-scales from 1 to 7. According to this assessment, the creative task was rated as statistically significantly more interesting than the routine task (Wilcoxon rank-sum test, p=0.02).

2.2.2 Basic Set up

The experiment uses a principal-agent set up, where subjects are randomly assigned to the role of a principal (“employer”) or an agent (“employee”). This feature is important for reciprocity considerations as it allows 1) for voluntary financial transfers from the principal to the agent, and

Excluding usual uses, however, does not alter any of the results reported below (results available upon request). 14The answers gathered during the experiment itself allowed us to update the rating scale to more accurately reflect overall statistical infrequency. The rating scale used for analyzing our results is based on more than 700 subjects (pre-test subjects as well as subjects in all main treatments including the supplementary Feedback treat- ment.) The results do not depend on whether or not we use an updated or the original, pre-test rating scale. Finally, our results are robust to using an expert panel for grading originality rather than the statistical approach (details below). 2. INCENTIVIZING CREATIVITY 21

2) agents’ effort to affect the principal’s payoff, which gives agents a clear way of reciprocating if they wish to do so. At the start of the experiment, subjects were assigned to groups of five participants, each consisting of one principal and four agents. The role and group assignment remained fixed throughout the experiment. All sessions were identical in their basic structure: employees were asked to work for the princi- pal for three 3-minute periods on either the routine or the creative task. In each period, employ- ees received a fixed wage that was exogenously set by the experimenter and announced at the start of the respective period. In the two reward treatments, principals could opt for or against a financial gift or a tournament before the start of Period 2 (details below). In the Gift treatment, Period 2 endowments were augmented by the amount of the gift when the principal opted for the gifts. In the Tournament treatment, agents could receive a tournament prize at the end of Period 2 if the principal opted for the tournament scheme and if their performance was above average. In all treatments, Period 3 was identical to Period 1 in that endowments were fixed and there were no additional rewards or tournaments. The three-period design allows us to measure agents’ baseline performance under a fixed wage in Period 1, the performance response to the reward in Period 2 and post-treatment performance under a fixed wage in Period 3.15 Period 3 is important as detrimental effects of rewards on intrinsic motivation cannot necessarily be detected when rewards are present as the increase in monetary incentives might outweigh the reduction in intrinsic motivation. After the three working periods, agents completed a couple of brief decision tasks; questions about their socio-demographic characteristics such as gen- der, field of study, level of education, high school grade and leisure activities; and questions regarding their personality traits. Employers’ payoffs consisted of a fixed pay component and a variable pay component that was determined by the performance of the four agents in their group. All payoffs during the exper- iment were stated in “Taler,” the experimental currency unit.16 In the routine task, principals received 5 Taler for each slider that was correctly positioned by their four agents. In the creative task, principals received 5 Taler for each validity point (valid answer), 5 Taler for each flexi- bility point (category mentioned), and 5 Taler for each originality point (5 Taler per original answer and 10 Taler per very original answer) given by their four agents. Agents learned about the scoring procedures and the principals’ payoff function in the instructions. In order to create an environment that carried an opportunity cost of working, we offered agents a time-out button (Mohnen et al., 2008) which was displayed at the bottom of the screen dur- ing all working periods. Each time an agent clicked the time-out button, the computer screen was locked for 20 seconds, prohibiting the entry of creative ideas or the movement of sliders,

15As agents work for a fixed wage in Period 1, their effort does not affect their own payoff. It does, however, affect the payoff of their principal. Therefore, baseline performance in Period 1 provides us with a joint measure of intrinsic motivation for the task, their altruism towards the principal, and a subject’s ability. 16The exchange rate was 100 Taler = 1 Euro. 2. INCENTIVIZING CREATIVITY 22 and 5 Taler were added to the agent’s payoff. This procedure has been used in a variety of experiments to ensure that experimental subjects do not merely work on the experimental tasks out of boredom due to the absence of alternative activities (Eckartz et al., 2012; Mohnen et al., 2008).17

2.2.3 Design and Implementation of Treatments

In order to address our research questions, we implemented a 2 x 3 design consisting of a Control group, a Tournament treatment, and a Gift treatment for both the routine and the creative task. In the Control group, agents were paid a fixed wage in each period and principals were not able to implement rewards. In the Tournament and the Gift treatments, principals and agents were informed at the end of period 1 that the principal could invest in a reward scheme for period 2. Regardless of whether the principal decided to implement the reward scheme, agents received information on the type of the reward (tournament or gift, depending on the treatment) and on the associated costs to the principal. Before the start of Period 2, agents learned whether their principal had instituted the reward. Subjects were also told that principals did not receive any information about their agents’ performance until the very end of the experiment. This ensured that agents perceived the wage gift as “kind,” rather than as compensation for good performance in Period 1. Moreover, it avoided an endogenous selection of rewarded agents based on Period 1 performance. In both treatment groups, principals and agents received a fixed wage of 300 Taler at the be- ginning of each period. The same is true for the Control group with the exception of Period 2 in which the fixed payment was doubled to mirror (expected) payoffs in the treatment groups (see below). In the Gift treatment, the principal had to decide whether or not to provide an additional monetary gift of 300 Taler to each of her four agents at a total cost of 200 Taler to herself.18 In the Tournament treatment, the principal could also transfer a total of 1200 Taler to her four agents at a cost of 200 Taler to herself. However, the payment structure was different. Agents’ performance dictated whether or not they received a reward. Specifically, the top 50%

17Subjects could push the time-out button as long as the remaining time in the working period was at least 20 seconds. In order to ensure that subjects were aware of the time-out button and understood its usage, we had a trial period that lasted 60 seconds in which subjects could test the time-out button. While the time-out button prevents any “production” in the simple, routine task, we cannot rule out that subjects in the creative tasks continued thinking about the problem during the time that their screen was locked. This does not mean, however, that subjects in the creative task did not face any opportunity costs of time. The timeout button may still hinder production since it precludes a critical task – entering ideas. Overall, the use of the time-out button was limited, suggesting that research subjects felt their time was better spent completing the assigned task. 18The use of efficiency factors is common practice in the experimental literature on gift exchange (see, for instance, Brandts and Charness, 2004) and is thought of as representing situations in which gifts are more valuable to the recipient than to the donor. The attractiveness of the reward was important in our setting because we were mainly interested in agents’ responses to rewards rather than in whether or not principals opted for the rewards. 2. INCENTIVIZING CREATIVITY 23 of performers (two out of four agents) received a bonus of 600 Taler each in Period 2, whereas the bottom 50% received nothing. When learning about the principal’s reward decision, employees also learned that performance would be evaluated immediately after Period 2 ended, and that the winners and losers of the Tournament would be revealed before Period 3 started. Finally, after Period 2 and after the revelation of winners and losers (presented as private information on a subject’s screen), sub- jects in both treatments were informed that there would be no further rewards. In all treatments (including the control group) it was further announced that the payment structure in Period 3 would be identical to that in Period 1. This study focuses on agents’ responses to rewards, rather than on principals’ reward decisions. Therefore, the rewards were relatively cheap for the principal, and endowments in the control group mirrored (expected) payments in the two treatment groups when the principal opts for the reward. Specifically, endowments in Periods 1 and 3, periods without rewards, are identical in the control and the two treatments groups: the principal as well as each of her four agents receive 300 Taler.19 In Period 2, principals in the control group receive an endowment of 100 Taler and agents an endowment of 600 Taler each. This mirrors the expected payoffs in the treatment groups when the principal opts for the reward scheme (unconditional gift or tourna- ment, depending on the treatment).20 This procedure ensures that any performance differences between the treatments and the control goup are solely driven by the rewards and not by other factors such as distributional concerns or income effects.21 Table 2.2 in the Appendix provides an overview of the fixed and variable pay components for all periods, treatments, and roles.

2.2.4 Hypotheses

Before we present our results, we briefly summarize the theoretical predictions for our treat- ments. Tournament treatment. According to standard economic theory, a tournament incentive should increase average effort and performance (Alchian and Demsetz, 1972; Lazear and Rosen, 1981). Since economic theory tends to abstract from the type of task, this should be true in both the

19On top of their endowment, principals earn additional money from the performance of their agents. 20If the principal decided against the reward, the principal and her agents received 300 Taler each as fixed Period 2 endowments in both the Gift and Tournament treatments (identical to payments in Periods 1 and 3), while they earned 100 and 600 Taler in the control group, respectively. Therefore, we cannot assess responses to negative reward decisions in an experimentally clean way. 21See, e.g., Fehr and Schmidt (1999) on inequality aversion. Our set up also disentangles pure intention-based reciprocity from other distributional concerns (Charness, 2004) by allocating the same payoffs to subjects in the treatment groups (with positive reward decisions) and the control group in Period 2. The only difference that remains is that control group payoffs were exogenously imposed by the experimenter, while treatment payoffs were chosen by experimental subjects, the principals. 2. INCENTIVIZING CREATIVITY 24 creative and the routine task. Theories on crowding-out, on the other hand, suggest that intrinsic motivation might be impaired by the reward (e.g., Deci, 1972; Lepper et al., 1973; Deci et al., 1999). This holds in particular for the creative task that is more intrinsically motivating. When crowding out is strong, the reduction in intrinsic motivation would outweigh the incentive effect and Period 2 performance would fall. Otherwise, there would be a positive response to the tournament incentive in Period 2, and a reduction in intrinsic motivation – if permanent – can be detected in Period 3, where the tournament reward is no longer at stake (Frey and Jegen, 2001; Bowles and Polania-Reyes, 2012).22 Gift treatment. Established theories on reciprocity predict that agents reciprocate principals’ gifts with increased effort (Akerlof, 1982; Levine, 1998; Dufwenberg and Kirchsteiger, 2004; Falk and Fischbacher, 2006). Thus, we expect a positive performance response to the gift in both tasks. Depending on how sustainable these responses are, this effect might carry over to Period 3.23

2.2.5 Procedures

The experiment was conducted at the experimental laboratories at the universities of , Mannheim, and Heidelberg, Germany. Participants were recruited via the Online Recruitment System ORSEE (Greiner, 2004). The experiment was computerized using the software z-Tree (Fischbacher, 1999). All interactions within the experiment were anonymous and communication was not allowed. Subjects were seated randomly at a computer workstation upon arrival and were provided with hard copy instructions that detailed the random matching of groups and roles (“employer” or “employee”), the basic structure of the experiment, the task (routine or creative), and the scoring procedures. A translation of the original instructions can be found in the Appendix. A few pieces of information, such as fixed wages in Periods 2 and 3 as well as the availability of rewards and reward decisions, were presented on the computer screen during the experiment. Before the experiment started, subjects had to complete a series of questions about how their actions would determine their own and their principal’s payoffs to ensure that they understood the instructions. At the end of the experiment, subjects’ payoffs in the experimental currency unit “Taler” were converted into Euros at an exchange rate of 100 Taler = 1 Euro. Subjects were paid in private.

22Unlike other models of crowding out, in Benabou and Tirole’s (2006) model, crowding out only persists as long as the incentives are in place. Crowding out that is temporary and tied to the presence of incentives cannot be directly identified in our design. Most discussions of crowding out (e.g. in the education literature) focus on the permanent crowding out as a consequence of temporary incentives. It is that phenomenon that we attempt to capture in this paper. 23Theoretically, agents could also reciprocate the investment in the tournament. We will address this point in the results section. 2. INCENTIVIZING CREATIVITY 25

Sessions (including instructions, the experiment, and questionnaires) lasted about 75 minutes and subjects earned 15 Euros on average.

2.3 Main Results

2.3.1 Descriptive Statistics and Methods

Table 2.3 provides an overview of the descriptive statistics split up by treatment and task. Al- though we don’t discuss supplementary treatments until later, for completeness they are in- cluded in the balance table. Overall, our sample contains 1123 subjects: 224 employers and 899 employees.24 As we are interested in assessing how rewards affect performance, we ran sessions until each treatment included roughly 60 agents with a positive reward decision.25 Overall, we observe 116 rewarded employees and 60 control group employees working on the slider task and 116 rewarded employees and 56 control group employees working on the creative task in our main treatments. We also have data for 364 additional agents with positive reward deci- sions in supplementary treatments that we describe further below. The treatments are largely balanced with regard to the location of the experiment, gender, age, and field of study, albeit some differences are statistically significant. As we will show in our main analyses, controlling for these characteristics does not alter the results. The last row of Table 2.3 displays means and standard deviations of the baseline performance in Period 1 across treatments and tasks. Baseline performance provides a measure of agents’ motivation to work on the task in the absence of rewards and comprises agents’ ability, intrin- sic motivation for the task, as well as their desire to benefit the principal. Mean performance varies between 16.6 and 22 in the slider task, where performance is measured by the number of correctly positioned sliders within each three minute work period. In the creative task, average performance varies between 16 and 18. Performance here represents a subject’s score in the creative task (see Section 2.2 for details on the scoring procedure). Apart from the Tournament treatment in the slider task, there are no statistically significant differences between the treat- ments and the respective control groups in either task. In the former, individuals’ performance is slightly better in Period 1 than that of subject in the control group (Wilcoxon rank-sum test, p<0.05). To account for these initial performance differences, we control for baseline perfor- mance in the analyses that follow and use the change in performance between Periods 1 and 2 as the outcome variable.

24We had to exclude 9 employees from the analysis due to an insufficient knowledge of the German language. 25Principals did not receive any information on agents’ performance in period 1 and agents knew this. Thus, as expected, our data does not reveal a significant relationship between reward implementation and baseline per- formance. We will report statistics on principals’ decisions regarding reward implementation as well as agents’ behavior in case of reward denial further below. 2. INCENTIVIZING CREATIVITY 26

Figure 2.2 displays the change in raw performance from Period 1 to Period 2 by treatment and task. The figure shows that performance in the slider task increases across both treatment groups between these two rounds. The performance increase is particularly strong in the Tour- nament treatment, but is also clearly detectible in the Gift treatment. The moderate performance increase in the control group is probably a consequence of either learning or a response to the higher fixed wage in period 2.26 The pattern is somewhat different in the creative task where only agents in the Tournament treatment improve their performance between Periods 1 and 2. There are no notable changes in mean performance in either the Gift treatment or the Control group. Thus, the raw data suggest that performance increases in the creative as well as in the simple task in response to the tournament. However, only employees in the simple task seem to respond to the gift. Below, we will corroborate these findings with more detailed analyses.

2.3.2 Further Analyses

In the analyses that follow, we compare the change in performance of individuals in the treat- ment groups with that of individuals in the respective control group. We make performance comparable between the different periods (and thus different items on the creative task) and the two different tasks by standardizing performance.27 Further, we address potential mean rever- sion by controlling for baseline performance. This also allows us to exploit the panel structure of our experiment. In order to assess the effect of the different rewards on performance in period 2, we fit the following regression model using ordinary least squares (OLS):

Std. Performance Period 2i = β0 + β1 Std. Performance Period 1i 1 + β2 Std. Performance Period 1i × Slider-Task i 1 + β3 Gifti + β4 Gifti × Slider-Task i (2.1) 1 + β5 Tournamenti + β6 Tournamenti × Slider-Task i

+ γXi + i.

Standardized Period 2 performance of individual i is the dependent variable. It is regressed on

26The higher fixed wage mirrors the wage of agents in groups with a positive reward decision by the principal (at least in expectation). This allows us to disentangle reward effects from endowment effects. This is of particular interest for the Gift treatment as it disentangles reciprocity from the response to the wage increase per se (Charness, 2004). 27We use the standard approach of subtracting the mean performance of the control group (in the respective working period and task) and dividing the resulting difference by the standard deviation of the control group in the respective task. Therefore, the standardized performance of the control group has a mean of zero and a standard deviation of one. Treatment dummies (and additional controls) remain non-standardized to ease interpretation. 2. INCENTIVIZING CREATIVITY 27 i’s baseline performance in Period 1 as well as on the treatment dummies, and, in the most com- prehensive model, a set of person-specific control variables (Xi). Treatment dummies as well as baseline performance are interacted with a dummy that indicates the type of task (here: the simple task). This allows the treatment effects as well as the impact of baseline performance to differ between the creative and the simple task. The creative task control group serves as the reference category. Standard errors are adjusted for potential heteroscedasticity in all re- gressions.28 Column I presents the most parsimonious specification and does not control for baseline performance. Column II of Table 2.4 shows the results for Equation (2.1). Column III adds additional control variables including age, age squared, sex, field of study, time period (semester, exam period, semester break), and location. Adding these controls does not alter the results.29 Similarly, the results are robust to the exclusion of baseline performance as a control (see Column I of Table 2.4). When discussing the results below we refer to the speci- fication in Column III which includes all control variables. Column IV reports the results of a supplementary treatment that will be discussed below (Section 2.4.1). As was predicted by standard agency theory, the Tournament treatment has a large and statisti- cally significant positive effect on performance in both tasks. This effect is substantially larger than that of the Gift treatment (Wald test, p<0.01 for both tasks). This suggests that creative performance is responsive to incentives and that tournaments have the power to substantially increase creative output. Interestingly, the Tournament effect is very similar in both tasks: Agents increase their performance by approximately 0.7 standard deviations as compared to the control group in both the creative and the simple task. This is first evidence against crowding out as crowding out would have implied a smaller performance increase (if not a performance decrease) in the intrinsically motivating creative task. The analysis of performance in Period 3 below will further corroborate this finding.30 While the Tournament worked well in stimulating creative performance, the Gift did not. The simple task shows that this absence of reciprocity is not driven by the gift being too small or too unimportant to have any effect. In the simple task the Gift induced an economically and statistically significant effect (p<0.05). The effect size of 0.2 standard deviations is small relative to that of the tournament, but well in line with what other studies on reciprocity have

28The results are robust to using cluster-robust standard errors (by session) that control for potential intra-session correlation. 29The regressions in Columns III and IV also include observations from a supplementary feedback treatment, discussed below, that was run concurrently with the main treatments. We include observations from this treatment in order to estimate the coefficients of the other control variables, introduced in column III, with greater precision. We also ran more comprehensive specifications with further control variables, such as the Big Five and incentivized risk and reciprocity measures, but their inclusion does neither improve the fit of the model nor does it affect the results. 30Also, responses to the tournament do not differ between agents with above- and below-average baseline per- formance. Results available upon request. 2. INCENTIVIZING CREATIVITY 28 found.31 The absence of reciprocity in the creative task is surprising and suggests that the responsiveness to rewards differs between simple and creative tasks, at least with respect to gifts. Below we will look further into this asymmetry in an effort to uncover the mechanism behind these findings.32 Performance in the creative task is multi-dimensional, and an obvious concern is that the ob- served performance increase is caused by subjects increasing the quantity of ideas at the expense of their originality – a pattern that might not generally be desirable. The slider task is different in this respect as it is one-dimensional, and an overall performance increase is unambiguously positive for the principal. The Unusual Uses task allows us to look into this issue. In the Unusual Uses task, creativity is scored along three dimensions: 1) the number of valid uses (validity), a measure of quantity, 2) the number of categories into which answers fall (flexibility), and 3) the originality of ideas (statistical infrequency of a response). The main analysis above used a aggregate measure of these three dimensions as the dependent variable. Table 2.5 shows the results split up by these different dimensions. Interestingly, the pattern of results that we dis- cussed for the overall score in our main analyses is reflected in all three dimensions of creativity. Most noteworthy, there is no evidence that the strong performance increase in the Tournament treatment comes at the expense of the originality of the answers. To the contrary, the share of originality points out of the total number of points – referred to as “originality rate” in the table – is even slightly higher in the Tournament treatment. That is, the increased originality rate suggests that the number of original uses increases more than the total number of uses.33 It is also insightful to look at the principals’ reward decisions. Three quarters of the principals in the Tournament treatment invested in instituting the tournament. This is true in both the simple and the creative task. In the Gift treatment, by comparison, three quarters of the slider task principals opted for the gift, but only approximately half of the creative task principals

31We find no evidence for gender differences in either the Tournament or the Gift treatment. We can also look at the use of the time-out button to examine whether the effects in the Tournament and the Gift treatment are driven by a reduction in the number of breaks or by a performance improvement during actual work time (results available upon request). In both tasks, the Tournament effect remains positive and significant (albeit much smaller) after controlling for the number of breaks. In contrast, the performance increase in response to the Gift in the slider task seems to be solely driven by a reduction in breaks and, hence, by an extension of actual working time. 32Performance effects in response to the tournament are equal between the creative task, in which there is no reciprocity, and the simple task in which there is reciprocity. This suggests that reciprocity should not play a big role for explaining the performance increase in response to the tournament. 33One might be concerned that statistical infrequency does not accurately reflect what is commonly understood as an “orignal” idea. Therefore we checked the robustness of our results on originality using subjective originality ratings. Towards that end, we asked five research assistants unfamiliar with the experiment and blind to the treat- ments to evaluate the originality of each answer. The evaluators were instructed to assign one point to answers that they perceived as original, two points to answers that they perceived as very original, and zero points otherwise. Using the score from this subjective originality assessment as the dependent variable does not alter the results: originality increases under the Tournament incentive and does not increase in response to the Gift treatment. These results are available from the authors upon request. 2. INCENTIVIZING CREATIVITY 29 did so. This is suggests that at least some of the principals might have anticipated that the gift would not work well in the creative task.34 We also asked principals to provide reasons for why they did or did not choose to implement the reward.35 In both tasks the main driver for the reward decision was profit maximization, i.e. an expectation that subjects would or would not respond favorably to the treatment. Hence, the larger share of principals who opted for the gift in the slider task did so because they expected that this would raise their profits, and the share of principals who opted against the gift in the creative task did so because they did not think that the gift would pay off. Another piece of evidence documenting the lack of reciprocity in the creative task stems from the behavior of agents when the principal refused to implement the reward. These results have to be interpreted with care as 1) the numbers of observations are low, and 2) the fixed payments in the control group were designed to be equal to (expected) payments under positive, but not negative reward decisions. Hence, treatment and control groups are not payoff equivalent in cases of negative bonus decisions.36 In total, 116 agents were affected by negative reward deci- sions. Figure 2.3 shows the results. The solid bars represent agents’ performance responses (as compared to the control group) to positive reward decisions. These effect sizes correspond to the main regression results discussed above. Hatched bars represent agents’ performance response to the announcement that the principal had not invested in additional rewards. Interestingly, the announcement that there would be no tournament has no effect on agents’ performance in either task. Hence, performance responses to tournaments are similar in creative and simple tasks for both positive and negative reward decisions. Again, an asymmetry between the two tasks emerges when looking at the Gift treatment. While agents substantially reduce their per- formance in response to a negative Gift decision in the slider task, they remain unaffected in the creative task. Hence, neither positive nor negative reciprocity is at play in the creative task, while both are active in the slider task. In the slider task the response to a negative reward deci- sion is substantially larger than the response to a positive reward decision. This is in line with other evidence showing that negative reciprocity tends to be stronger than positive reciprocity (Kube et al., 2013). Most noteworthy for our purpose is that the absence of reciprocity in the creative task holds not only for positive but also for negative reward decisions. Overall our results suggest that tournaments effectively boost performance in both the simple, routine task as well as in the creative task. There is no evidence for motivational crowding out

34There is no evidence for intra-session-correlation regarding the reward implementation. Results upon request. 35Principals could mark whether they opted in favor of the reward 1) to maximize own profits, 2) to be nice to the agents, 3) to maximize the total payoff of all participants in their group, (4) to reward good performance (in case of the Tournament), or 5) other reasons. If the principal denied the reward, they could indicate whether they 1) thought that the rewards were not profitable for their own payoff, 2) did not want to provide extra earnings to the agents, 3) did not want agents to earn more than themselves, or 4) other reasons. 36Principals and agents earn a fixed wage of 300 Taler in Period 2 when the principal decides against giving a reward, whereas they are endowed with 100 Taler and 600 Taler, respectively, in the control group. 2. INCENTIVIZING CREATIVITY 30 in the creative task. This is good news for firms, as it suggests that creativity can effectively be raised with financial rewards. In contrast, financial gifts work well only in the routine task. Here agents reciprocate the gift in an order of magnitude similar to that found in other studies on reciprocity. There is no reciprocal response in the creative task. The absence of reciprocity in the creative task holds for both positive and negative reward decisions. Principals seem to have anticipated the lack of response because a substantially smaller fraction of employers instituted the gift in the creative task. This finding has important implications for firms as creative tasks tend to occur in settings where contracts are incomplete – that is, precisely in settings in which organizations might want to exploit reciprocal inclinations in an attempt to elicit additional effort. In the following section, we therefore further explore the absence of reciprocity in the creative task in an effort to uncover the underlying mechanism.

2.4 Supplementary Investigations

2.4.1 The Mechanism behind the Absence of Reciprocity in the Creative Task

This section discusses potential mechanisms that can explain why agents do not reciprocate the gift in the creative task but do so in the simple, routine task. One possible explanation could be the availability of an “excuse for low performance” in the creative task. Such an excuse could be available if effort were only weakly correlated with performance in the creative task. If this were true, agents who are motivated to reciprocate by image concerns (but who don’t have an innate desire to reciprocate) might put in less effort in the creative task, hiding behind a story in which they tried hard to come up with ideas but simply did not succeed. Such an excuse is not available in the simple task, by comparison, where increased effort directly translates into more correctly positioned sliders and, in turn, increased performance. While this explanation is plausible, there is recent evidence that shows that reciprocal behavior is not affected by the availability of these kinds of “excuses” for not performing well (van der Weele et al., 2010). A related explanation relies on the notion that the performance measure in the creative task is less sensitive to agents’ increased effort than the one in the simple task. That is, reciprocity might lead to effort increases in both tasks, but the effort increase might not translate into significantly higher output in the creative task. The results of the Tournament treatment speak against this explanation. In this treatment the financial stakes were identical across the two tasks, and both tasks see a performance increase of similar magnitude. This suggests that the responsiveness of output to incentives is similar across the two tasks. Finally, agents did not have perfect knowledge about how much surplus (in terms of the value of their ideas) they generated for the principal in the creative task. Although the procedure 2. INCENTIVIZING CREATIVITY 31 for evaluating ideas was detailed in the instructions, subjects could not perfectly predict the exact number of points that they were generating for the principal. This is true because agents’ scores depended on the validity of answers as well as their originality (statistical infrequency). Hence, even though subjects probably had a rough understanding of whether they provided few or many answers and of whether their answers were off-the-chart creative or relatively unoriginal, there was still some uncertainty with respect to the exact number of points generated for the principal.37 In the simple task, by comparison, the value generated for the principal was fully transparent, as subjects could easily assess how many sliders they positioned correctly. A possible explanation therefore is that this difference in agents’ control over how much surplus they generate for the principal drives the differences in reciprocal behavior between the two tasks. Hennig-Schmitt et al. (2010) and Englmaier and Leider (2010) provide initial evidence that this type of uncertainty can reduce agents’ reciprocal behavior. This mechanism should be more relevant in the Gift treatment where agents presumably want to tailor their back transfer to the perceived kindness of the gift, and should be less relevant in the Tournament treatment where agents most likely trade-off the cost of effort with the perceived likelihood of winning, which is independent of the profit they generate for the principal. In order to investigate whether this mechanism explains the results, we ran two supplementary treatments: Creative Transfer Control and Creative Transfer Gift. These treatments were iden- tical to the control group and the Gift treatment in the creative task apart from the following aspects: First, at the end of each period, agents were told how many Taler they had generated in the preceding round. Second, they could then decide how much of this surplus to transfer to the principal. Hence, agents had perfect control over how much money would accrue to the principal as a result of their work. Surplus that was not transferred was “lost” in the sense that it did not benefit the principal (and agents’ payments were independent of their performance). As in the Gift treatment, the principal had the option to provide a monetary gift of 300 Taler to each of her agents before the start of Period 2 in the Creative Transfer Gift treatment. The treatment Creative Transfer Control serves as an additional control group that allows us to seperate out a change in the response to the gift from changes in behavior that are solely driven by either the provision of feedback on the number of points earned or by the ability to choose a transfer amount. Comparing behavior in Creative Transfer Gift with that in Creative Transfer Con- trol therefore allows us to assess whether reciprocity emerges when agents have the power to fine-tune their back transfers in the creative task. Table 2.3 shows that baseline performance is very similar in Creative Transfer Control and

37Just like in our experiment, agents in a business context typically also have a rough sense of the value of their ideas but cannot be sure about their precise implications for the bottom line: Ideas typically go through various stages of evaluation, and there is often a relatively large time lag betweeen the creation of an idea and its implementation. Finally, the value of an employee’s idea to the firm is frequently determined by the market, say by the demand for a new product, which is also not perfectly known during the idea generation stage. 2. INCENTIVIZING CREATIVITY 32

Creative Transfer Gift and slightly higher than average performance in the other treatments utilizing the creative task.38 Only about two thirds of all subjects transfer the maximum amount in Period 1. About 20% transfer less than half of the generated surplus while the remaining subjects transfer intermediate amounts. In the following analysis, we will focus on transfers, as transfers and not actual performance influence principals’ payoffs and thus signal reciprocity. As in the main analysis, we control for baseline transfer – each agent’s transfer in Period 1 – in all regressions. The results are striking. Agents in the Creative Transfer Gift treatment transfer significantly more of their surplus in Period 2 (after the gift) than in Period 1 (Wilcoxon signed-rank test, p<0.01). They also transfer significantly more of their surplus in Period 2 than do agents in the Creative Transfer Control group in the same period (Wilcoxon rank-sum test, p<0.05). Figure 2.4 depicts treatment effects for all Gift treatments. The bars show coefficients from separate OLS regressions analyzing the effect of the Gift treatment on Period 2 effort (or, in the case of the supplementary treatments, amount transferred), controlling for baseline performance (trans- fer). The graph shows not only that subjects reciprocate the gift in the Creative Transfer Gift treatment, but also that the order of magnitude of the effect is similar to the increase in output in the Gift treatment in the simple task. Column IV of Table 2.4 reports the results from the associated regressions. Again, one can see that reciprocity emerges in the Creative Transfer Gift treatment and that the effect is similar to that in the simple, routine task.39 Taken together, the findings from these two additional treatments suggest that the lack of reci- procity in the creative task was driven solely by agents’ inability to control and fine-tune their impact on the principal’s bottom line. When agents have that control, they reciprocate the wage gift in both tasks and the effect sizes are of similar magnitude. Such control is, however, typi- cally relatively low for creative tasks as well as for other complex problem solving tasks that are prevalent in many white-collar jobs. In these settings information on the exact value of effort (and ideas) to the principal is often not available. Thus, in practice this specific feature inherent in creative tasks may hinder the emergence of reciprocity.

38This may not be surprising as the transfer option allows agents to decouple their work from what accrues to the principal. Therefore, agents who intrinsically enjoy working on the task but that are also inequality averse, can work harder in these treatments without fearing that they create unduly large profits for the principal. 39Performance itself is not affected by the gift, which is not surprising given that many subjects had substantial leeway in increasing transfers without increasing performance. Just like in the main Gift treatment, we find no evidence for gender differences in response to the gift in the Creative Transfer Gift treatment. With respect to principals’ reward decisions, only 8 participants were affected by a negative reward decision by the principal, making it impossible to reliably estimate this effect. The existing data, however, suggest that negative reciprocity re-emerges in the Creative Transfer Gift treatment with a mean performance decrease of 0.82 in response to a negative gift decision, similar to what was observed in the slider task. 2. INCENTIVIZING CREATIVITY 33

2.4.2 Analyses of Post-treatment Effects in Period 3

We now briefly turn to performance in Period 3 in order to assess whether the gift or the tour- nament had any long-lasting effects on performance. The analysis of Period 3 is particularly interesting for the Tournament treatment as any crowding out of intrinsic motivation might have been dominated by the tournament’s incentive effect in Period 2 (see Bowles and Polania-Reyes, 2012 for a recent overview). Also, Period 3 performance can reveal whether winning or losing a tournament has an independent effect on subsequent performance, as subjects in the Tourna- ment treatment learned at the end of Period 2 whether they did or did not belong to the best 50% (winners/losers) in Period 2 and, hence, whether or not they won the tournament prize. Period 3 endowments were identical to those in Period 1 and subjects were aware that there would be no further rewards or incentives. Analogous to the analysis of the main treatment effects in Period 2 (see Equation (2.1)), we assess Period 3 performance by comparing the change in treatment group performance from Period 1, with that of the control group.40 This allows us to account for effects associated with learning and exhaustion. Table 2.6 shows the results. For ease of interpretation, we separate the analyses by task. Columns I and II show the results for the slider task, and columns III and IV depict the results for the creative task. In each case, the first column (I and III) reports overall treatment effects. Interestingly, our main treatment effects carry over to Period 3 even in the absence of further gifts or tournaments. The reciprocal effect of the gift in the slider task is statistically significant and even slightly larger in Period 3 than in Period 2. Hence, reciprocity in the slider task has a persistent effect. The same pattern holds in the Creative Transfer Gift treatment (see columns III and IV of Table 2.6). As was to be expected, there is no evidence for reciprocity in Period 3 in the original Gift treatment (columns III and IV). In line with standard economic theory, performance in the Tournament treatment is lower in Period 3 than in Period 2 due to the removal of the tournament incentive. It is noteworthy, however, that Period 3 performance exceeds Period 1 performance. This suggests that the Tour- nament has a sustainable performance-enhancing effect. Columns II and IV reveal that the tournament winners drive this overall performance increase in Period 3. They work roughly 0.5 standard deviations harder than would-be winners in the control group. Interestingly, losing a tournament does not affect subsequent behavior. Losers’ performance is either not statistically significantly different from their baseline performance or is even a bit higher. For real-world applications, it is important to understand the mechanism behind this positive winner effect. One possible explanation is learning on the task. By definition, tournament

40The results are robust to including Period 2 performance, using fixed-effects linear panel models or random- effects models. 2. INCENTIVIZING CREATIVITY 34 winners have worked more and harder on the task than have other participants. To the extent that the tasks are subject to a steep learning curve, increased Period 2 effort and performance could translate into increased Period 3 performance even if Period 3 effort was the same for tournament winners and tournament losers. It is unlikely, however, that this effect is driven solely by learning as performance in the control group increases only slightly between Periods 1 and 2 and is relatively similar in Periods 1 and 3 (see Figure 2.5). Alternative explanations are that 1) the positive feedback associated with winning a tournament could heighten self- confidence and intrinsic motivation (Eisenberger and Shanock, 2003; Vansteenkiste and Deci, 2003), or that 2) winning a tournament could put individuals in a positive mood, which would in turn positively affect their performance.41 These two explanations differ with respect to whether the positive effect from winning should raise subsequent performance only for the task at hand (task-specific confidence or intrinsic motivation), or whether it should also spill over to a different, unrelated task (general mood effect). To shed light on this issue, we conducted two supplementary tournaments. In these supplemen- tary tournament treatments, subjects were asked to work on both the routine and the creative task in alternating orders (either slider (Period 1) - creative (Period 2) - slider (Period 3) (SCS) or creative (Period 1) - slider (Period 2) - creative (Period 3) (CSC). Identical to the main Tour- nament treatments, agents received fixed wages in Periods 1 and 3, and the principal could implement a tournament in Period 2. In total, 55 subjects participated in the Slider-Creative- Slider (SCS) treatment, and 46 subjects in the Creative-Slider-Creative (CSC) treatment. The right-hand columns of Table 2.3 show summary statistics for these observations. There are no significant differences in baseline performance between these two supplementary tournament treatments and the respective control groups. For this comparison, and for analyzing treatment effects below, we use the creative task control group as a benchmark for baseline performance and Period 3 performance in CSC, and the slider task control group to assess performance in Pe- riods 1 and 3 in SCS. Note that we did not include additional control groups with varying tasks but without the tournament. We therefore cannot control for changes in Period 3 performance that are caused by the change in the tasks per se, rendering our findings below suggestive rather than conclusive. Table 2.7 presents the results on Period 3 performance. The results are split up by treatment and task. Columns I and II report results from the slider task (overall and split up by winners and losers). Columns III and IV do the same for the creative task. Analogously, Period 3 treatment effects on the the mixed tasks SCS and CSC are depicted in columns V-VIII. Each regression controls for baseline performance and compares effects to the respective control group. We find no evidence for positive spillover effects – neither overall nor for winners or losers separately. The

41For a review of mood effects on performance see, for instance, Lane et al. (2005). Related to mood effects, Kräkel (2008) discuss the notion that tournaments may induce a “joy of winning” which in turn affects subsequent performance. DeJarnette (2015) proposes a theory of effort momentum, with similar predictions. 2. INCENTIVIZING CREATIVITY 35 treatment effects in the mixed tasks treatments, SCS and CSC (Columns V-VIII) are positive but small and statistically insignificant. Hence, winning a tournament or receiving positive feedback may lead to higher subsequent performance, but we have suggestive evidence that such an increase is limited to the task at hand. Possible mechanisms are increased task-specific self-confidence or intrinsic motivation. Our data does not lend support for more general and task-unspecific effects on intrinsic motivation, such as mood effects.

2.4.3 Looking into the Effectiveness of Tournaments

Tournaments affect behavior via two different channels: 1) a concern for a good relative stand- ing and 2) the monetary incentive (the tournament prize). For policy, it is important to un- derstand how much of the tournament effect is driven by the (costly) prize and how much is driven by (cheap) relative performance information. To disentangle these two channels, we conducted a Feedback treatment that conveyed the same information about relative rank than the Tournament treatment but without monetary consequences.42 In the Feedback treatment, the principal had to decide before the start of Period 2 whether or not relative rank information would be provided to her agents at the end of Period 2. The provision of relative performance feedback was costless to the principal and payoffs in this treatment were identical to those in the control group in all three rounds. When the principals opted for feedback provision, agents were informed at the beginning of Period 2 that they would learn at the end of Period 2 whether or not they belonged to the best 50% of their group. Hence, the Feedback treatment mirrored the information structure in the Tournament treatment, but without monetary consequences. We observe 56 agents with a positive feedback decision in the slider task and 68 agents with a positive feedback decision in the creative task in this treatment. Table 2.8 displays the coefficient estimates of the feedback treatment from our main regressions. We find that Feedback increases performance by about 0.18 standard deviations in both tasks. The coefficent estimates are marginally statistically significant in period 2. The effect sizes are statistically significantly lower than those in the Tournament treatment in both tasks (Wald-tests, p<0.00) and suggest that about one fourth of the increase in the Tournament treatment is driven by a concern for a good relative standing while the remainder is driven by the desire to win the tournament prize. Effect sizes are of similar magnitude in period 3 but lose statistical significance. These overall effects hide, however, that individuals who received positive relative performance feedback per- form significantly better than comparable others in the control group, mirroring our findings for

42There is a growing literature in economics that documents the impact of relative performance feedback on behavior. So far, the evidence is mixed with respect to the direction (positive or negative) of the effect. Azmat and Iriberri (2010) and Blanes i Vidal and Nossol (2011), for example, find that feedback increases performance, whereas Barankay (2011) shows that performance rankings decrease performance. 2. INCENTIVIZING CREATIVITY 36 tournament winners in the Tournament treatment.43 Recipients of negative performance feed- back, however, are not demotivated (in both the Feedback as well as the Tournament treatments). Their performance is not statistically significantly different from their baseline performance. Taken together, the findings from both the Tournament and Feedback treatments suggest that the chance of receiving positive feedback enhances performance but that the effect is only strong when performing well is also rewarded with a monetary prize. This holds for both tasks as well as for ex ante incentive effects and for overall post-treatment performance.

2.5 Conclusion

This paper reports the results from a large-scale laboratory experiment that studies the impact of both explicit incentives (tournaments) and implicit rewards (wage gifts) on creativity. To the best of our knowledge, this is the first study to analyze the impact of wage gifts on creativity. We also have not come across another study that compares the effectiveness of wage gifts and tournaments in one set-up, providing insights into the relative effect sizes of these two rewards on both a creative and a simple task. The inclusion of the simple task allows us to benchmark the effectiveness of the different rewards on creativity and to link our results on creativity to the existing literature on tournament incentives and wage gifts that uses simple tasks. We report two sets of interesting results. The first relates to the tournament incentive. Our re- sults suggest that a tournament prize for above-average performance has a substantial positive incentive effect on creativity. The effect size is similar to that on performance in the simple task. This indicates that incentives can influence creativity and that there is no crowding out of in- trinsic motivation.44 About one fourth of this effect seems to be driven by a concern for relative rank, as is suggested by a supplementary treatment in which subjects work towards receiving the same rank information as in the tournament but without financial consequences. Thus, it is largely the monetary prize that is responsible for the positive incentive effect of the tournament. Interestingly, the tournament does not only increase the quantity of ideas submitted, but also their quality in terms of originality. Further, tournament winners continue to outperform their own baseline performance and agents in the control group. A set of supplementary treatments suggests that this is driven by an increase in task-specific motivation that does not spill over to other tasks. Losers of the tournament show no signs of demotivation in both tasks. The effec- tiveness of tournaments is in line with the observation that creative tasks are often organized in a tournament framework in the real world. For example, architects on virtually all major projects are chosen via a winner-take-all competition. The same is true on most innovation platforms

43In fact, the coefficients on Tournament winner and Positive relative feedback are statistically indistinguishable in both tasks (Wald-test creative task, 0.19; Wald-test simple task, p=0.25). 44Erat and Gneezy (2015) find evidence for choking under pressure in their creative task that is more “blue skye” in nature. 2. INCENTIVIZING CREATIVITY 37 such as innocentive.com that many companies now utilize for creative input. A second set of interesting results relates to the financial gift. We find an asymmetry in its effec- tiveness between the two tasks. While the gift effectively triggers a reciprocal response in the simple, routine task, there is no evidence for reciprocity in the creative task. This suggests that the incentive response function differs between creative and simple tasks, despite the similarity of responses in the Tournament treatment. Interestingly, this asymmetry holds for both positive and negative reciprocity, and principals seem to have anticipated this asymmetry as many more principals opted for the gift in the simple task than in the creative task. We explore a set of ex- planations for this finding and, through the implementation of additional treatments, can trace the effect back to agents’ lack of knowledge about how exactly their effort translates into profit for the principal. While agents perfectly observe how much output they produce in the simple task, there is some uncertainty in the creative task because the value of their ideas to the prin- cipal depends, for example, on an originality rating. This is true for our creative task, but also holds for creative and complex tasks more generally. One implication of our finding therefore is that wage gifts might not boost performance in white-collar jobs that involve complex tasks and creativity. This is important to note as creative and complex tasks are typically governed by incomplete contracts that might have rendered gift exchange a viable way of increasing effort. These results also speak to the ongoing debate about reciprocity in the lab versus in the field (e.g., Kube et al., 2012 or Kessler, 2013) and suggest that one reason for the observed absence of reciprocity in the field could be agents’ imperfect knowledge about how their effort affects their principal’s profits. But even if potential benefits from ideas were fully transparent, our study suggests that wage gifts are less effective than tournaments in triggering agents’ performance in both tasks. While the tournament was profitable for the principals that opted for it, the gift was not.45 In this study, tournaments clearly emerge as preferable to wage gifts in terms of fostering creativity. Nevertheless, tournaments do have well-understood downsides that should be considered before implemention. For instance, tournaments have been shown to increase sabotage among workers (for instance, Harbring and Irlenbusch, 2011), to induce a self-selection of more risk-tolerant agents (for instance, Eriksson et al., 2009; Dohmen and Falk, 2011), and to make low performers more likely to give up early in the contest (for instance, Berger et al., 2013). The present study calls for future work that addresses whether the positive tournament effect that we find, also holds for contests with high-stakes. Previous studies indicate that high stakes cause choking under pressure and, hence, a non-linear relationship between reward size and effort, and that this is particularly true for cognitively challenging tasks (Ariely et al., 2009;

45The payoff consequences of the Gift in comparison to the control group (difference in average effort per work group minus costs of the gift) are -0.27 Euro in the slider task, -1.82 Euro in the creative Gift treatment and -0.29 Euro in the treatment Creative Transfer Gift. In the Tournament, principals’ payoff increased by 1.39 Euro in the slider and 1.45 Euro in the creative task. 2. INCENTIVIZING CREATIVITY 38

Bracha and Fershtman, 2012). Further, creative tasks come in many different forms. Charness and Grieco (2012) show that responses to incentives might differ between tasks that have a clearly delineated solution and tasks that are open and require pure out-of-the-box thinking. In our opinion, our task serves as a good proxy for everyday idea generation in firms. However, organizations also rely on other types of creative input that our task could not capture, such as idea implementation or breakthrough innovations. Future work needs to test the robustness of our findings for other kinds of complex and creative tasks. 2. INCENTIVIZING CREATIVITY 39

2.6 Appendix

2.6.1 Figures and Tables

Figure 2.1: Screenshot of the Slider Task

Note: The figure presents a screenshot of the slider task. The screen displays the remaining time and the number of correctly positioned sliders. The time-out button is displayed at the bottom of the screen. 2. INCENTIVIZING CREATIVITY 40

Figure 2.2: Difference in Performance between Periods 2 and 1 by Treatment and Task

Slider Task Creative Task 14 14 12 12 10 10 8 8 6 6 4 4 2 2 0 0 Difference between Round 2 and 1 Difference between Round 2 and 1 -2 -2 -4 -4 Control Gift Tournament Control Gift Tournament

90% Confidence Interval 90% Confidence Interval

Note: The bars show the difference in mean performance between Period 2 and Period 1. The whiskers depict 90% confidence intervals of paired t-tests. 2. INCENTIVIZING CREATIVITY 41

Figure 2.3: Effect Sizes for Positive and Negative Reward Decisions by the Principal

Note: The dependent variable is standardized performance of agents in Period 2. The bars show the estimated regression coefficients of separate OLS regressions. Performance is measured as the number of correctly positioned sliders in the simple task and as the score achieved in the creative task. The regressions control for baseline performance in Period 1. The respective control group (slider or creative) serves as the reference category. Statistical significance: * p < 0.1, ** p < 0.05, *** p < 0.01. 2. INCENTIVIZING CREATIVITY 42

Figure 2.4: Overview of Effect Sizes for All Gift Treatments .6 .4 .2 0 -.2 Gift Slider Gift Creative Gift Creative Transfer

90% Confidence Interval

Note: The bars show the estimated regression coefficients of separate OLS regres- sions. Standardized performance (transfer) in Period 2 is the dependent variable. Per- formance is measured as the number of correctly positioned sliders in the simple task, as the score achieved in the creative task, and as the amount tranferred in the creative transfer treatments. The regression controls for baseline performance in Period 1. The respective control group serves as the reference category. 2. INCENTIVIZING CREATIVITY 43

Figure 2.5: Difference in Performance between Periods 3 and 1 by Treatment and Task

Slider Task Creative Task 6 6 4 4 2 2 0 0 -2 -2 Difference between Round 3 and 1 Difference between Round 3 and 1 -4 -4 Control Gift Tournament Control Gift Tournament

90% Confidence Interval 90% Confidence Interval

Note: The bars show the difference in mean performance between Period 3 and Period 1. The whiskers depict 90% confidence intervals of paired t-tests. Note, this figure is on a bigger scale than Figure 2.2 to improve readability. 2. INCENTIVIZING CREATIVITY 44

Table 2.1: Examples of Answers and Categories for the Unusual Uses Task by Item

Paper Tin can Cord

Frequent Paper airplane Pen container Shoestrings Answers Paper hat Tin can phone Dog leash Toilet paper Ball Fishing line

Frequent Toys Non-food container Clothing accessories Categories Clothing Communication Leashes Hygiene/Cleaning Sport devices Fishing

Original Lampshade Bedframe Pulley Answers Filter Animal house Rope bridge Game of cards Insect trap Bowstring

Very original Sound amplifier Scarecrow Trap (to play a trick) Answers Pin wheel Shower head Straightening of acreages Artificial snow (for decoration) Treasure chest To cut a cake

Invalid Pencil Computer Glasses Answers Television Window Electric conductor Surfboard Shoes Rope for bungee jumping 2. INCENTIVIZING CREATIVITY 45 a b Control Gift Tournament Payments in Taler Table 2.2: Overview of Payoffs by Treatment, Role, and Period Principal Agent Principal Agent Principal Agent Tournament winners (best 50%) receive a bonus of 600 Taler and tournament losers a Fixed WageReward Costs(-)/Benefits(+)Total (Expected) Payoff - 100 - 100 600 -200 600 100 +300 300 600 -200 300 100 +0/600 300 600 300 : Assuming risk neutrality and a 50% chance of winning, a subject’s expected earnings in the Period 1 - Fixed WagePeriod 2 Period 3 300 - Fixed Wage 300 300 300 300 300 300Variable Payments (in Taler): 300Principal: Per Slider 300 300Per Valid AnswerPer Category 300Per Original AnswerPer Very Original Answer 5 Agent: 10 300 Per 5 Time-out 5 5 5 Note (worst 50%) receive nothing. b tournament treatment at theend beginning of of the the second period secondThe are period 600 experimental as Taler. currency well unit asTaler = “Taler” average 1 was earnings Euro. converted at into the Euros at an exchange rate of 100 2. INCENTIVIZING CREATIVITY 46 Total Mean treatment it refers to Control . Task Task (SCS) (CSC) 01 . 0 p < , *** 05 . Table 2.3: Balance Table 0 p < , ** 1 . 0 presents the number of subjects per treatment group that we use in the statistical analysis. All stated means and p < Main Treatments Supplementary Treatments # Rewarded Agents Slider Task Creative Task Slider Creative Creative Transfer Mixed Task Mixed Task CGTCGTFFCGTT (12.5) (14.1) (12.8) (11.7) (10.7) (10.9) (11.4) (10.3) (11.7) (10.0) (13.1) (10.4) (11.6) reports the mean (standard deviations in parentheses) of the number of correctly positioned sliders or of the Unusual Uses score in working period : C = Control Treatment; G = Gift Treatment; T = Tournament Treatment; F = Feedback Treatment; SCS = Slider-Creative-Slider Treatment; CSC reports the number of agents with a positive reward decision by the principal in the reward treatment groups. In the FrankfurtMannheimHeidelberg 67% 33% 60% 0 40% 64% 36% 0 71% 29% 71% 29% 0 73% 27% 64% 36% 0 59% 41% 0 56% 44% 0 54% 46% 27%*** 0 22% 26%*** 0 58% 22% 0 34% 0 51%*** 52%*** 7% Mean AgeShare of WomenEconomics majorLocation 58% 50%Baseline 24.9 Performance 43% 57% 16.6 23.3 52% 55% 18.9 23.1* 52% 59% 22.0** 24.0 57% 61% 16.8 22.8* 53% 68% 22.9* 16.0 57% 64% 23.2 17.9 38% 49% 20.3 23.4 56% 66% 17.9 22.1 59% 53% 19.8 23.4 20.3 58% 22.4** 49% 19.5 21.6*** 50% 67%* 23.1 58% 17.8 53% 18.6 # of Subjects# Agents# Rewarded Agents 60 75 60 60 100 56 95 80 56 70 76 56 135 56 105 60 108 80 84 56 90 64 68 89 72 71 94 71 68 76 94 55 75 96 46 77 1123 712 899 Note: Abbreviations = Creative-Slider-Creative Treatment. # Rewarded Agents percentages refer to the numberBaseline of performance rewarded agents. 1 – prior to anygroup. treatment Significance intervention. levels are Stars denoted indicate as the follows: results * from a unpaired t-test on the difference between performance in the respective treatment and control the total number of agents. Hence, 2. INCENTIVIZING CREATIVITY 47 2. INCENTIVIZING CREATIVITY 48

Table 2.4: Treatment Effects in Period 2

I II III IV Gift Treatment -0.019 0.026 -0.000 0.019 (0.143) (0.099) (0.101) (0.100)

Gift x Slider-Task 0.375** 0.209* 0.195* 0.178* (0.170) (0.109) (0.107) (0.107)

Tournament Treatment 0.795*** 0.736*** 0.735*** 0.731*** (0.173) (0.143) (0.142) (0.142)

Tournament x Slider-Task 0.166 -0.058 -0.063 -0.064 (0.178) (0.166) (0.167) (0.167)

Creative Transfer Treatment 0.087 (0.105)

Creative Transfer x Gift 0.249** (0.115)

Baseline 0.657*** 0.631*** 0.656*** (0.076) (0.062) (0.046)

Baseline x Slider-Task 0.002 0.100 0.071 (0.094) (0.079) (0.066)

Intercept -0.000 -0.000 0.589 0.187 (0.093) (0.062) (0.496) (0.451) Controls NO NO YES YES Observations 348 348 472 611 R2 0.146 0.546 0.564 0.554

Note: This table reports the estimated OLS coefficients from Equation (2.1). The dependent variable is the standardized performance in Period 2 and refers to the number of sliders moved in the simple task, the creativity score in the creative task, and the amount transferred in the Creative Transfer treatments. Heteroscedastic-robust standard errors are reported in parentheses. The estimation includes data from agents in groups with a positive reward decision by the principal. Additional control variables are age, age squared, sex, location, field of study as well as a set of time fixed effects (semester period, semester break, exam period) and observations from and controls for a supplementary feedback treatment that was run concurrently (Section 2.4.3 and Table 2.8 describe the treatment and the results). The latter are included to increase the precision with which we can estimate the control variables. Significance levels are denoted as follows: * p < 0.1, ** p < 0.05, *** p < 0.01. 2. INCENTIVIZING CREATIVITY 49

Table 2.5: Dimensions of Creativity: Treatment Effects on Standardized Performance in Period 2

Creative Task Score Validity Flexibility Originality Originality Rate I II III IV V Gift -.0220 0.005 0.002 -0.101 -0.0180 (0.120) (0.126) (0.115) (0.144) (0.023)

Tournament 0.746*** 0.805*** 0.601*** 0.719*** 0.045** (0.161) (0.169) (0.148) (0.181) (0.021)

Period 1 0.615*** 0.598*** 0.613*** 0.523*** 0.173** (0.061) (0.062) (0.056) (0.071) (0.071)

Intercept 1.908 1.620 1.233 3.168 0.293 (2.029) (1.917) (1.685) (2.72) (0.395) Controls YES YES YES YES YES Observations 240 240 240 240 215 R2 0.484 0.470 0.515 0.346 0.109

Note: This table reports OLS estimates of Equation 2.1, where we regress standardized performance in Period 2 on baseline performance and treatment dummies. Column I reports results on the aggregated creativity score. Columns II to IV report the results on the different sub-dimensions of the creativity score. Column V displays treatment effects on the originality rate. The originality rate equals achieved originality points divided by the total number of points for a subject’s answers (subjects with zero points were dropped in this column). The estimation includes data from agents in groups with a positive reward decision by the principal. Additional control variables are age, age squared, sex, location, field of study as well as a set of time fixed effects (semester period, semester break, exam period) and observations from a supplementary feedback treatment (with separate treatment dummies, see Section 2.4.3 and Table 2.8). The latter are included to increase the precision with which we can estimate the control variables. Heteroscedastic-robust standard errors are reported in parentheses. Significance levels are denoted as follows: * p < 0.1, ** p < 0.05, *** p < 0.01. 2. INCENTIVIZING CREATIVITY 50

Table 2.6: Treatment Effects in Period 3

Slider Creative I II III IV Gift 0.259** 0.268** 0.033 0.031 (0.110) (0.110) (0.113) (0.114) Tournament 0.223 0.365*** (0.153) (0.126) Creative Transfer -0.038 -0.035 (0.122) (0.122) Creative Transfer x Gift 0.366*** 0.362*** (0.110) (0.110) Tournament Winner 0.503*** 0.493*** (0.189) (0.159) Tournament Loser -0.018 0.242* (0.159) (0.140) Controls YES YES YES YES Baseline YES YES YES YES Intercept YES YES YES YES N 232 232 379 379 R2 0.712 0.728 0.540 0.548

Note: This table reports OLS estimates of standardized performances in Period 3 analogous to Equation 2.1. Performance is measured as the number of correctly positioned sliders, as the score achieved in the creativity task, and as the amount transferred in the Creative Trans- fer treatments. Heteroscedastic-robust standard errors are reported in parentheses. The esti- mation includes data from agents in groups with a positive reward decision by the principal and observations from a supplementary feedback treatment (see Section 2.4.3). The latter are included to increase the precision with which we can estimate the control variables. Ad- ditional control variables are age, age squared, sex, location, field of study as well as a set of time fixed effects (semester period, semester break, exam period). Significance levels are denoted as follows: * p < 0.1, ** p < 0.05, *** p < 0.01. 2. INCENTIVIZING CREATIVITY 51 Mixed Tasks . The estimation includes data from 01 . 0 (SCS) (CSC) p < , *** 05 . 0 p < , ** 1 . 0 p < (0.180)(0.146) (0.170) (0.134) (0.122) (0.142) (0.146) (0.132) Slider Task Creative Task Slider-Creative-Slider Creative-Slider-Creative I II III IV V VI VII VIII (0.137) (0.126) (0.109) (0.121) (0.073) (0.073) (0.093) (0.094) (0.073) (0.073) (0.094) (0.094) Table 2.7: Post-treatment Effects of the Tournament in Period 3 by Task This table reports OLS estimates of standardized performances in Period 3. Performance is measured as the number of correctly Note: positioned sliders in the simple taskin and parentheses. as the Significance score levels inagents are the in denoted creative groups task, as with respectively. a follows: Heteroscedastic-robust positive standard * reward errors decision are by reported the principal. TournamentTournament Winner 0.245* 0.534***ObservationsR-squared 0.340*** 0.449*** 116 0.111 0.679 116 0.064 0.708 116 0.529 0.055 0.535 116 0.734 0.091 115 0.735 115 0.503 102 0.504 102 Tournament LoserStandardized Performancein Period 0.888*** 1Intercept 0.848*** 0.693*** 0.670*** 0.888*** 0.891*** -0.030 0.648*** (0.060) 0.648*** (0.063) 0.000 (0.080) 0.000 0.236* (0.086) 0.000 (0.052) 0.000 (0.053) (0.089) 0.000 0.158 (0.090) 0.000 0.000 0.007 0.000 2. INCENTIVIZING CREATIVITY 52

Table 2.8: Period 2 and Period 3 Effects of the Feedback Treatment

Slider-Task Creative-Task Feedback 0.184* 0.182* Period 2 (0.108) (0.102)

Feedback 0.118 0.171 Period 3 (0.132) (0.129)

Positive Relative 0.303 0.319** Feedback Period 3 (0.196) (0.148) Negative Relative -0.028 0.035 Feedback Period 3 (0.135) (0.144)

Controls YES YES

Note: This table reports the estimated OLS coefficient estimates for the Feedback Treatment for Periods 2 and 3. Period 3 effects are split up by the whether person learned that they did or did not belong to the 50% top performers. Feedback effects for Period 2 are regression coefficients from specification IV in Table 2.4 while post-treatment effects for Period 3 are from Table 2.6. Heteroscedasticity-robust standard errors are reported in parentheses. Control variables include observations from the Gift treatment, the Tournament treatment, and the Control group. Additionally, we control for age, age squared, sex, location, field of study as well as a set of time fixed effects (semester period, semester break, exam period). The Control group serves as the reference category. Significance levels are denoted as follows: * p < 0.1, ** p < 0.05, *** p < 0.01. 2. INCENTIVIZING CREATIVITY 53

2.6.2 Translated Instructions (Original in German)

General Instructions

Please read the instructions carefully. If you have any questions, please raise your hand. Keep in mind that communication among participants is prohibited during the experiment. Please turn off your mobile phone and other electronic devices for the entire duration of the experiment. During the experiment, you will have the opportunity to earn money in the form of Taler. How many Taler you will earn depends on a random draw as well as on your decisions and the decisions of other participants. All Taler that you earn in the experiment will be converted to Euros at the end of the experiment. The exchange rate is

100 Taler = 1 Euro

At the end of the experiment you will receive the amount of money that you have earned during the experiment in cash. Your earnings will be rounded up to full 10-cent amounts. We would like to point out that your name is only required for the settlement of payments at the end of the experiment. Your name will not be connected to any decisions you make during the experiment. Your actions are completely anonymous.

Assignment of roles At the beginning of the experiment a computer will randomly assign you the role of an “employer” or an “employee”. You will keep this role for the entire experiment. Further, you will be randomly assigned to groups of 5 participants. One employer and four employees form one group. The groups will remain the same for the entire experiment.

Structure of the experiment The experiment consists of two parts. Part 1 is the actual experiment. Employees only interact with their employer in Part 1. Part 2 consists of a series of further decision tasks. Only employees will be active in Part 2. The instructions for Part 2 of the experiment will be shown on your computer screen once Part 1 is finished. After to Parts 1 and 2, both employees and employers will be asked to fill out a questionnaire. Please find the instructions for Part 1 below. 2. INCENTIVIZING CREATIVITY 54

Part 1 Instructions

Fixed payment

Part 1 of the experiment will be carried out in three rounds. In the first round employers and employees will receive a fixed payment of 300 Taler. The amount of the fixed payment for the second and third round will be displayed on the computer screen shortly before the respective round starts.

Variable payment

Employers earn additional money. How much they earn is determined by the work performance of the four employees in her group.46 Towards that end, employees are asked to work on a simple task in each round. The task will be the same in every round. Employers’ additional earnings depend on the overall performance of her four employees in all three rounds. The employer will not receive any information on agents’ performance or her own earnings until the very end of the experiment. Employees are free to decide how much effort they want to exert and thereby how much money they want to earn for their employer. Employees’ own earnings will not be influenced by their work performance. Employees do, however, have the option to press a time-out button while they work on the task. This button will lock the screen for 20 seconds so that they cannot work on the task during that time period. Employees receive 5 Taler for each time that they press the time-out button. The button can no longer be pressed when there are less than 20 seconds left in a round. Employers are also allowed to work on the task in all three rounds. Their performance, however, will neither influence their own payment nor their employees’ payments. The employer can also press the time-out button. Pressing the button, however, does not translate into extra income for the employer. The employer cannot earn additional Taler during time-out.

[The original instructions had the description of the creative or the simple, routine task at this point.]

The employer will not receive any information related to how many points each of her agents generated. Her total payoff will only be revealed at the end of the experiment. Employees will also not be informed about the total amount of Taler generated by themselves or other group

46For simplicity we use the feminine (she, her) to describe employers, but this does, of course, refer to both sexes and employers will be both male and female. 2. INCENTIVIZING CREATIVITY 55 members. All participants will be called individually for payment. Please remain in your seat at the end of the experiment until your seat number has been called.

[Further instructions, depending on the treatment were displayed on the computer screen.] 2. INCENTIVIZING CREATIVITY 56

Task description: Creative task

As described above, the first part of the experiment consists of three rounds. In each round, participants will work on a task that requires creativity.

Example of the task

Please list as many, as different and as unusual uses for a rubber tire as you can think of. Do not restrict yourself to a specific size of a tire. You can also list uses that require several tires. Do not restrict yourself to uses you are familiar with, but think of as many new uses as possible!

You will receive the same task in each round with varying objects. You will have three minutes per round. There will be a break of a few minutes after every round.

Evaluation of the task

The employer receives 5 Taler per valid answer. Answers are “valid” when they are practicable and when their realization is at least vaguely conceivable. Please describe the possible use in a few words if necessary (Using the example of the rubber tire: “sled” or “flower box” are clear answers, whereas “target” would require further explanation such as “ball game with tire as target”.).

For original (rare) answers, the employer receives 5 extra Taler. For very original (very rare) answers, the employer receives 10 extra Taler. An answer is considered (very) “original” if only (very) few people think of it. To this end, the answers are compared to a catalog of answers that is based on the answers of more than 100 test persons.

Furthermore, answers will be assigned to different answer categories. The employer receives 5 extra Taler for each category answers fall into. Using the example of the rubber tire: “car tire” and “bicycle tire” belong to the category “tires as wheels” and result in 5 extra Taler. The answer “swing seat” is a different category (category “toys”) and yields an additional 5 Taler. At the end of the experiment, we will ask you to fill out a brief questionnaire while we evaluate the answers of all employees. We will calculate the employer’s variable payment from the total score of all four employees over the three rounds. 2. INCENTIVIZING CREATIVITY 57

Task description: Simple, routine task

In every round you will work on a task that requires you to use your mouse to move “slid- ers” on the computer screen. You will see 48 sliders on your screen in each round. These sliders have to be moved to position 50. Each slider starts at position 0 and can be moved anywhere up to position 100. The number displayed to the right of each slider indicates the slider’s current position. You can adjust the slider as often as you like. Your working time is set to 3 minutes (180 seconds) per round. If you have moved all 48 sliders to the correct position before the end of a round, you will automatically be directed to a new screen with 48 additional sliders. Thus, you can move a maximum of 96 sliders to the correct position during the 180 seconds of each round. During each round, you will see some information at the top of your screen. You will see the round number, the remaining time for the current round, and the number of sliders that you have moved to position 50 in the current round so far. Before the first round starts, you will be able to test the task for 60 seconds. Your performance in the test round will not affect your own or your employer’s payoff. The employer will receive 5 Taler for each slider that you correctly position in the three rounds. At the end of the exper- iment, the number of correctly positioned sliders of all four employees of a group across all three rounds are summed up to calculate the variable payment of the employer. 3 Competitive Balance and Assortative Matching in the Ger- man Bundesliga

Roman Sittl (University of St. Gallen)

Arne Jonas Warnke (Centre for European Economic Research, ZEW)

JEL-Classification: Z2, J44, J63, L51, L83

Keywords: Competitive Balance, Uncertainty of Outcome, Player Mobility, Playing Talent, , Soccer, Sports Economics, Bundesliga, UEFA Champions League

Acknowledgements: Special thanks go to Martin Kiefel, Bernd Fitzenberger, Bernd Frick, Jan Höcker, Martin Hud, François Laisney, Michael Lechner, Tim Pawlowski, Andreas Peichl, Vera Schmitz, Thomas Walter and seminar participants at ZEW Mannheim and Universität St. Gallen, 7th ESEA Conference on Sports Economics in Zurich (2015), 20th Conference of the German Association of Sport Economics and Sport Management (2016) and International Conference Sport Economics & Sport Management (SESM, 2016). This paper was previously circulated as “Competitive Balance in the German Bundesliga: A Fixed Effects Approach”.

58 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 59

Abstract: In this paper we consider trends in the distribution of player talent across association football clubs over time. Player talent is the most important prerequisite for team success in professional sports leagues and changes in players’ assortativeness in regard to the clubs they play for may arguably be an important factor for changes in competitive balance. We offer a new approach for measuring player talent and its distribution - the partial correlation of each player with the goal margin. We use this measure to analyze the degree of competitive balance over time. This approach enables us to examine how player mobility drives competitive balance over time. Empirical results are based on 19 seasons of the first two divisions of the German Bundesliga as well as domestic cup games. Our results show a decrease in competitive balance over time; better teams tend to attract increasingly better players. We show that this is driven by an increasingly unequal inter-divisional distribution of teams, coaches and players, as well as increasing assortativeness in the 1st Bundesliga. We further demonstrate that player transfers between Bundesliga teams results in assortative matching between players and teams. These domestic transfers do not, however, explain the reduction in competitive balance over time. Furthermore, we show that UEFA Champions League payments may have contributed to the reduction in competitive balance over the last two decades. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 60

3.1 Introduction and Literature

Trends in competitive balance, or the lack of predictability of competitions, is widely discussed amongst fans, officials and academics interested in association football (e.g. The Economist, 2016).47 Competitive balance refers to the distribution of playing talent across clubs and the resulting uncertainty about match or championship outcomes. New broadcasting deals, such as that in the English Premier League, further intensify this debate, raising awareness of diverging financial power and inequality across and within leagues in Europe. The ongoing domination of the German Bundesliga by Bayern München, for example, often dampens the fans’ of other Bundesliga clubs enthusiasm about the limited success of their teams. Fans of small market teams often complain about ‘buying up policies’ of large market teams. Such policies inhibit the equal distribution of player talent and consequently diminish competitive balance. To take one example, Robert Lewandowski was transferred from , a contender for winning the league, to the most successful team, Bayern München. Likewise, Borussia Dort- mund was able to attract promising player talent from less successful Bundesliga clubs, as in the case of Marco Reus in 2012 (Borussia Mönchengladbach to Borussia Dortmund).

Whilst this pattern is certainly evident on an anecdotal level, this study shall empirically analyze whether any trends exist in terms of the assortative matching of players to teams.48 By looking at the degree of assortative matching for each season, we examine trends in competitive balance within the last two decades in the German Bundesliga. This study contributes to the existing literature by offering a novel approach to measuring performance in football and a new method of investigating trends in competitive balance, focusing on changes in the distribution of player talent across clubs. Looking at single match data, we draw on up-to-the-minute lineup data as an indicator of the performance of players, teams and coaches. Through estimation of our player, team and coach measures, we are able to decompose success in football (measured by the goal margin achieved) into long-term and medium-term institutional effects (team strength and ability to attract better coaches) and actual player talent (a more short- and medium-term measure of performance). Utilizing the tendency of players and coaches to switch frequently between teams, we examine whether better players are evermore at better clubs, or whether

47In the paper we use the term football for association football or soccer. 48The concept of assortative matching is based on marriage market models. This term is used to describe mating between partners and spouses in terms of education, income etc. In the literature of, for example, job mobility, assortative matching refers to matching of high-wage workers to high-wage firms and low-wage workers and low- wage firms. In this paper, assortative matching/assortativeness relates to matching of better players (workers) to better teams (firms) and vice versa. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 61 long- and short-term success increasingly go hand in hand.

The paper is organized as follows. In Section 3.2 we discuss theoretical and empirical find- ings on competitive balance and outline features of the German Bundesliga during the last two decades and Section 3.3 introduces our data set and empirical framework. Section 3.4 provides regression results and our subsequent interpretation regarding competitive balance. Section 3.5 provides some conclusions. We demonstrate the robustness of the results in the Appendix.

3.2 Background

3.2.1 Literature Review

The empirical literature analyzing competitive balance varies in its measurement of competitive balance as it can be interpreted in various ways. For example, it can refer to the number of different teams winning a title or how points are distributed across clubs at the end of a season. However, in the theoretical literature it is commonly agreed that the distribution of talent defines competitive balance. Therefore, we define competition in a league as balanced if there is an equal and stable distribution of talent across teams. This connection between player talent and competitive balance was first stated and theoretically analyzed by Rottenberg (1956) who pinned down the peculiarities of the sports industry and investigated the effects of the reserve clause (limiting player mobility) in American baseball. As described by Cairns et al. (1986), there are specific demand-side externalities to this industry:

For a single club its own playing success will also be significant. Hence a given team may have incentives to continue increasing its playing strength vis-à-vis its competitors, generating atten- dances for itself without taking account of any external costs of reduced attendances elsewhere, due to lessened uncertainty of outcome.

This peculiarity of sports economics is also described in Neale (1964) by the Louis-Schmeling Paradox refering to the two famous boxers. While firms in ’normal’ markets seek for monopoly in order to maximize profit and diminish competition, there would be no profits at all in sports if there were no surviving competitors. Louis needs Schmeling (and vice versa) in order to create an entertaining competition.

The important question in the literature is whether teams internalize these demand-side exter- nalities and whether rich clubs [...] outbid the poor for talent, taking all the competent players for themselves and leaving only the incompetent for the other teams Rottenberg (1956). Rot- 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 62 tenberg, however, finds that there is no need for a reserve clause or other restriction on player mobility to ensure competitive balance. This result is known as the invariance principle, which is primarily derived from the assumption that clubs are profit maximizers and that they internal- ize these externalities. Sloane (1971) has questioned whether this assumption can be applied to European football, deriving a model in which teams are thought to be utility maximizers sub- ject to a budget constraint. In this setting, there is no equilibrating force towards competitive balance. Top players are attracted to big market teams in order to maximize the probability of winning. In this setting, a balanced competition is not necessarily reached endogenously, see Koning (2009) for an overview. This setting gives justification for imposing restriction rules on competition, as analyzed, for example, by Szymanski and Késenne (2004) for gate revenue sharing. This short review shows that a not too equal distribution of playing talent is a vital ele- ment of sports industries. These theoretical models give ambiguous predictions about whether competitive balance is an equilibrium or not. Therefore, we offer a novel approach to look empirically at trends in competitive balance, measured by the changing distribution of playing talent across clubs, for two decades of the German Bundesliga.

Empirical research on trends in competitive balance generally returns ambiguous results con- cerning changes in Germany. It is not clear whether competitive balance has actually decreased or remained stable throughout the history of the Bundesliga. Goossens (2005), Feddersen (2006), Feddersen and Maennig (2005) , Koning (2009) and Haan et al. (2007) detect no signif- icant changes in competitive balance, whereas Pawlowski et al. (2010), Partosch (2014), Groot (2008) and Michie and Oughton (2004) have found evidence for decreasing competitive bal- ance. Similar results are found for other European leagues. Generally, results vary depending on domestic leagues, time periods and on the application of different measures of competitive balance. In the case of decreasing competitive balance, changes can be attributed to regulatory developments, as in the case for gate revenue sharing, investigated by Robinson and Simmons (2014). Other factors determining competitive balance include the Bosman Ruling, as investi- gated by Binder and Findlay (2011), or the Champions League, see e.g. Pawlowski et al. (2010). The majority of these studies primarily apply descriptive methods based on final rankings, an- alyzing competitive balance based on aggregated seasonal outcomes and assessing trends by tracking competitive balance measures over time, such as the Herfindahl-Hirschman Index or the standard deviation of winning percentage. To our best knowledge, no study has yet directly analysed the distribution of playing talent across clubs.

The direct channel of transfers and its effect on competitive balance has rarely been analyzed in 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 63 sports. Maxcy et al. (2009) analyzes Baseball player transfers after changing the MLB’s system of revenue sharing to a more egalitarian system. Maxcy et al. (2009) finds that this change in 1997 resulted in increased divesting in talent of low revenue producing clubs. For football, Robinson and Simmons (2014) examine player mobility before and after the abolishment of gate revenue sharing in England. They find increased probability of better quality players to move to bigger teams afterwards (inter- and intra-divisional). But rather than looking directly at player distribution, these studies investigate changes in player movement after certain league policy changes.

Summing up, it is generally acknowledged that player talent is the most important factor de- termining (long-term) success of football teams, as seen, for example, in the high correlation between average team wage bill and performance, compare Hall et al. (2002). Theoretically, Rottenberg (1956) and Késenne (2000) assume the distribution of player talent to be a crucial determinant of competitive balance: ‘Competitive balance in a sports league [...] depends pri- marily on the distribution of player talent among teams’. Given the importance of the debate and the rich literature on this topic (e.g. Szymanski (2001) or Flores et al. (2010)), it is surprising that there are only a few studies which actually consider disaggregated player data. Most studies simply rely on information about end-of-season league position to look at trends in competi- tive balance. We use a new measure of player performance, similar to the Plus-Minus statistic used in ice-hockey. By focusing on the team’s net scoring, compare for example Macdonald et al. (2012), this individual performance measure has been successfully applied to estimate a player’s impact on a match in other sports than football.49 Our performance estimates then en- able us to identify changes in the assortativeness of players to teams, thus enabling these trends to be connecting with competitive balance.

3.2.2 Bundesliga in 1998-2016

The Bundesliga is one of the "Big Five" leagues in Europe which dominate continental club football competitions. Findings based on the German league are therefore highly relevant for other sports leagues in Europe and further afield. Providing a background for the empirical analysis, this passage briefly summarizes the main structural changes in German professional football in the course of the last 19 seasons, giving an overview of the economic and sporting

49While writing this paper, we became aware of another paper using a similar model to look at player perfor- mance in football: Sæbø and Hvattum (2015) concurrently show the usefulness of this measure for football analysis using transfer fees as outcome. However, this study neither looks at the team dimension nor refers to competitive balance. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 64 development in the Bundesliga. Figure 3.1: Revenue and Expenditure Streams

Source: DFL (2006–2015) and UEFA (2004–2015), Partosch (2014). No data is available for 1997/98, for 2015/16 and for the 2nd Bundesliga before 2003/04.

Economic Development At first sight the period between 1998 and 2016 can be economically characterized as a boom period for the German Bundesliga. A more detailed look at the development of different rev- enue and expenditure streams of all clubs in the Bundesliga, however, using reports on the economic situation of the Bundesliga (published annually by the DFL (2006–2015)) and infor- mation gathered by Feddersen (2006) and Partosch (2014), reveals that the period considered in our study can in fact be separated into two to three different periods. Financial data is available for the seasons from 1998/99 to 2014/15.

Figure 3.1 plots the logarithm total wage bill for licensed player by division for each season plus UEFA Champions League revenues to top teams. For the 1st Bundesliga, personnel expenditures follow a corresponding progress throughout the period. There was a large increase in playing staff expenditures in the early period (1998-2002). This is caused by a huge increase in TV-right revenues. The Kirch media group was the major contributor to this development, investing massively in Bundesliga TV rights at the turn of the millennium.50 The turning point was caused by the Kirch insolvency in 2002 and the following 5-6 seasons are characterized by an adjustment process in the aftermath of this. From 2008 onwards, all

50Compare Frick and Prinz (2006) who give an overview of the development of the financial situation in the Bundesliga up to 2003. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 65 revenue and expenditure categories in the 1st Bundesliga grew constantly throughout these seasons. A very similar distinction in periods can be observed for Champions League revenues of German clubs (see Figure 3.1), strongly connected to their international sporting success, as will be discussed below. The development in the 2nd Bundesliga, for which no data is available before 2004, is more influenced by the relegation of certain well-endowed teams, the peak in 2008 is for example due to the presence of teams such as Bor. Mönchengladbach, 1899 Hoffenheim or FC Köln. The difference in personnel expenditures between the 1st and 2nd Bundesliga has considerably grown over time, it more than doubled from 402 million Euro in 2004 to 826 million Euro in 2015.

Regulatory and Sporting Development Besides a change in the format of relegation, no major regulatory changes took place between 1997-2016. Prior to 2008/09, three teams were promoted or relegated between the first two divisions of the German Bundesliga at the end of each season. Since 2008, when the relegation playoff was introduced, the team finishing 16th in the first league plays the team placed third in the second league.51 Since 2009, only two play-offs have been won by the lower division team. In total, only eight teams have stayed in the 1st Bundesliga during the entire sample period.

The championship outcome in the 1st Bundesliga is mainly dominated by FC Bayern München, who won the Bundesliga 12 times during the observed 19-year period. The remaining 7 cham- pionship titles are distributed as follows: three times Borussia Dortmund and once Werder Bremen, VfB Stuttgart, VfL Wolfsburg and FC Kaiserslautern. Some teams were able to com- pete with FC Bayern München during more than one season. These teams, however, such as Werder Bremen since 2010/11 or Borussia Dortmund between 2004/05 and 2009/10, failed to regularly qualify for international championships. No other team than Bayern München showed a consistently excellent level of performance, winning a record four consecutive championship titles between 2012/13 and 2015/16.

This dominance is also reflected in Figure 3.2 which shows a standard measure of competitive balance; the average share of games won by the four best teams according to the end-of-season league table for the first two divisions.52 For the 1st Bundesliga this measure shows a more or less continuous increase in the dominance of the top teams or, accordingly, a decrease in competitive balance over time. In contrast, no trend is observed for the 2nd Bundesliga.

51During the entire Bundesliga period, there was already a relegation play-off in place between 1982 and 1991. 52This graph looks very similar for other measures, e.g. the standard deviation of points. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 66

Figure 3.2: Share of Games won per Season by Top Four Teams (By End-of-Season League Position)

Note: The lines represent non-parametric kernel density regression curves.

3.3 Data and Empirical Framework

Match day reports taken from the online web page of the German Kicker Sportmagazin provide the data base. It contains detailed match day data for 11,626 matches played in the first two di- visions of the German Bundesliga between 1997/1998 and 2015/2016.53 Table 3.1 summarizes the characteristics of this data set. Furthermore, we add all DFB-Pokal (German Cup) games between teams which play in a given season in one of the first two divisions. This allows us to observe in-season matches between teams from the two different divisions and increases the precision of our results. To assure comparability between Bundesliga and cup matches, we use the results achieved in cup matches after 90 minutes, ignoring extra time or penalty shootout.54

Table 3.1 shows the sample sizes of the whole sample and the subset of players and coaches

53There are 5,813 games in each divisions. Two games (one in each division) are missing because those games were judged by the DFB sports court after the game. These are St. Pauli vs Schalke 04 in the first division in 2010/11 and FC Rot-Weiß Erfurt vs SpVgg Unterhaching, 2004/05 in the second division. 54Including German Cup games does not alter our results. The number of draws after 90 minutes is quite similar between Bundesliga matches (26.3%) and German Cup matches (26.7%) and statistically they are not distinguishable from each other (according to a t-test, p-value: 0.84). The front runner wins almost 55% of cup matches after 90 minutes between two teams from different divisions and in only 20% of the cases does the underdog win, although 62% of those matches are played at the home stadium of the second-division team (the home distribution between the divisions is skewed due to certain rules that lower-tier teams are allowed to play at home in early rounds of the cup). 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 67 we use in most analyses. In general we include only players who have played in at least 50 games in the two divisions, and consider coaches who have Bundesliga tenure of at least 18 games. Estimation of player or coach performance is very imprecise for individuals with only few observations. We show in the Appendix that the sample restriction does not change our results.

The number of matches and teams is not reduced in the restricted sample.55 Further information is available in the Appendix.

Table 3.1: Descriptives of the Population and the Main Sample (Players >= 50 Games and Coaches > 17 Games)

Total 1st Bundesliga 2nd Bundesliga German Cup # Matches 12,133 5,813 5,813 507 # Teams 73 36 65 72 # Players 5,072 2,795 3,736 3,224 # Players ( >=50 Games) 2,140 1,645 1,601 1,469 # Coaches 364 170 285 222 # Coaches ( >17 Games) 215 127 171 181 Note: We include only German Cup games between two teams which both play in either of the top two divisions of the Bundesliga in a given season. The row "Players ( >=50 Games)" should be read as follows (similar for coaches with more than 17 games): There are 2,140 players who appear in at least 50 matches in the both divisons or the cup, 1,645 out of them play at least one of those 50+ games in the 1st Bundesliga.

As already stated, we first introduce a new performance measure for player, team and coach performance which we then use to investigate changes in competitive balance over time. This performance measure is the partial correlation of each player (and team and coach) with the goal margin, derived from a simple linear regression of goals scored minus goals conceded (taking into account the exact minutes when goals were scored or how the lineup changed through substitutions or dissmissals) on players, team and coaches, plus a few control variables.

Though rarely used in the empirical literature, goal margin is a particularly suitable measure to reflect inequality in player talent. Firstly, player productivity, in terms of the goal margin outcome, can be directly and frequently observed on an almost weekly basis, without the need to aggregate a huge number of statistics (and is aplicable also for the youth sector where detailed statistics are not available). Secondly, it captures the defensive and attacking quality of teams. Both components are probably more or less equally important, particularly for competitiveness in the long run. Thirdly, the goal margin is important for achieving success in sports such as

55Although there are many players who play in fewer than 50 games in total, they do not have a large influence on the results since the distribution of total games played is very right-skewed. Players appearing in more than 50 games in total account for 86% of all player-game observations and for 88% of all minutes played. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 68 national championship or cup titles since winning or losing can be directly derived from the goal margin. Finally, besides capturing variation from match to match, we can also include within-game changes in the lineup by looking at the exact minutes of goals scored.

To measure separate individual performance, we run a simple linear regression, sometimes referred to as a hierarchical fixed effect model, which captures the goal margin on player, team and coach fixed effects and further controls:

O Goal_Marginit = γi + λJ(i,t) + ϕG(i,t) + λJ(i,t) + xitβ + it (3.1)

Further information about the goal margin is given in the next paragraph. This empirical frame- work considers all matches from the viewpoint of each player who starts or is substituted for a team in a given match.56 The goal margin of player i on match day t is assumed to be linear function of a player fixed effect γi, a team fixed effect λj - where J(i, t) = j if player i plays for team j on match day t-, a coach fixed effect ϕk -where G(i, t) = k if coach k manages player O i on match day t -, opponent fixed effects λj , time varying characteristics xit and a noise term

it. xit includes possible league and season effects (such as rule changes etc.) and a dummy if the player plays for the home team. We interact the season indicators with league information as well as with the home advantage indicator to allow home advantage, for example, to vary non-linearly over time. Furthermore, we include the age of players of i at match day t (and age squared) and control for the number of times players from both teams are sent off during a match (either no dismissals, one dismissal or two and more dismissals for a team in a match).57 Our sample size is 287,685 which means that we observe on average 23.7 players each game, where each players is observed in at least 50 games during the whole period.58

The goal margin represents the difference between goals scored and goals conceded for team

Ji of player i for the time he is on the pitch. We look at in-game changes by tracking minutes of substitutions and replacements, dismissals and of course, the timing of goals scored. The goal margin generally equals the final score for players who appear in the lineup and are not replaced during a match. Players therefore have a different impact on a game depending on the total number of minutes played. Equation (3.1) is estimated via weighted least squares

56This is gives us a dyadic (or paired) dataset, similar to those of trade flows between countries, see Cameron and Miller (2014). 57Results are robust to not including age and age squared. 58The maximum is 28 since 22 players are in the starting lineup and up to 3 players in each team may be replaced during match. Teams tend to use all 6 possible substitutions in the majority of games (in slightly more than two-thirds of all matches). 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 69 dummy variable regression.59 Observations are weighted by the fraction of minutes played in each match. Accordingly, players replaced in minute x are weighted by x/91, players brought in minute x are weighted by (91−x)/91 and players starting and finishing a match are weighted by 1.

The following example, Bayern München vs. FC Augsburg (32nd match day, 2014/15), il- lustrates our approach. Bayern München lost on home ground 0:1, with Augsburg’s Bobadilla scoring in the 71st minute. Table 3.2 depicts information for three selected players of this match in our final data set. Philipp Lahm was chosen to be in the starting line up by his coach Guardi- ola, but he was replaced in the 14th minute. Since the score up to this point was 0:0, Philip Lahm’s contribution in this match is a goal margin of 0 and he is weighted by the fraction 0.15 in this match. Meanwhile, his team-mate Robert Lewandowski played on the pitch until the 74th minute and was therefore on the pitch when Augsburg scored their winning goal. Hence, he is observed with a goal difference of -1 and he is weighted by the fraction 0.81 in this match. Additionally, since Bayern München received a red card in this match, we control for the num- ber of players on the pitch, as well as for the number of players for the opposing team. To take one example from FC Augsburg, Abdual Rahman Baba played through all the minutes and is therefore weighted by 1 and attributed a goal difference of 1. Since his team received no red card, the number of players on the pitch for FC Augsburg is 11, while their opposing team, Bayern München, only had 10 players on the pitch.

Table 3.2: Example: Bayern Munich vs. FC Augsburg (09/05/2015)

Goal Difference 0 -1 1 Player P. Lahm R. Lewandowski A. Baba Team Bayern München Bayern München FC Augsburg Coach Guardiola Guardiola Weinzierl Home Ground 1 1 0 Minute Out 14 74 91 Minute Fraction 0.15 0.81 1 Age 31 26 20 Number of Players on Pitch, Team 10 10 11 Number of Players on Pitch, Opponent 11 11 10

Returning to our empirical framework, the player component γi is interpreted as the average impact of a player on the goal margin, involving a combination of different skills, such as work rate and a talent factor. With regard to competitive balance, γi represents a short-run factor for explaining performance inequality. In contrast, λj reflects long-run team heterogeneity. λj in- fluences team performance as an average institutional goal premium (or expectation), capturing,

59Results are very similar if we estimate Equation (3.1) via ridge regression, see Appendix 3.6.3. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 70 for example, the effectiveness of official boards in constructing a competitively viable team, the quality of training facilities and other resources, such as physical therapists among others. Ad- ditionally, ϕk reflects the average effectiveness/ability of each coach, controlled for the amount of player talent and resources available. In line with the interpretation of the other parameters, the coach effect captures performance heterogeneity in the medium run. Time varying charac- teristics include season dummies, opponent fixed effects and a home dummy. Opponent fixed O effects λJ(i,t) control for the opponent team j faces on match day t and the home dummy repre- sents a goal premium evoked by the home advantage. Fan support or a greater tendency to play more aggressively and offensively on home ground provide evidence for the home advantage in soccer, which is not assumed to be part of competitive balance. Furthermore, the impact of a single player is not fixed and may change over the life-course. We therefore add age and age squared to capture career-related performance changes. Equation (3.1) does not take into ac- count that the outcome is the same for all players of one team in a given match. We run another specification on the match level to account for possible effects of fellow players, among others, which gives similar results and is described in the Appendix.

Hierarchical fixed effects models have become popular in different fields in economics in recent years. Bertrand and Schoar (2003) for example looked at the effects of managers on firm per- formance while Chetty et al. (2014) use such an approach to study long-term effects of teacher quality on students. Card et al. (2013) decompose changes in variation of individual wages in West-Germany with respect to variation of person and establishment effects. Hentschel et al. (2014) transfer this model to football by analyzing the impact of coaches on team success be- tween 1993/1994 and 2013/2014 (without considering player talent). Applying a very parsimo- nious approach, they include team fixed effects, coach effects and half season effects in order to explain the average number of points gained by a coach during a half season. Exploiting the fact that coaches move frequently between teams, they are able to disentangle coach and team fixed effects. Whilst the findings are interesting in their own right, facilitating the identification of under- and over-performing coaches, they also provide evidence that coaches or executives generally play a considerable role, affecting organizational performance.

The available data set turns out to be particularly suitable for this indicator of performance. The important feature of this data set is the availability of a high share of players and coaches moving between teams – a prerequisite for the precise estimation of their separate impact on sporting success (Abowd and Kramarz, 1999). The system of relegation and promotion in football also contributes to a high fluctuation of players, coaches and teams between seasons. Amongst all 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 71 players in our sample, an average of 50% remain in the same team from one season to another, 15% move to another team in our sample and 35% drop out of the sample (e.g. retire or move abroad). Table 3.7 shows that more than 70% of all players with at least 50 games play for at least two different teams. The same is true for 58% of all coaches (with at least 17 games). For our main sample of players who appear in at least 50 Bundesliga games, 59% are retained, 19% move and 22% drop out. Regarding teams, only eight teams managed to stay in the 1st Bundesliga all 19 seasons, while 28 different teams were relegated to the second division at least once in the course of the 19 seasons. As relegation is connected with great losses in revenues, the likelihood of a team being relegated is closely related to changes in the distribution of talent across teams.

3.4 Empirical Results

3.4.1 Player and Team Performance

Before considering trends in competitive balance (in Section 3.4.2), we will provide an overview of the results from estimating Equation (3.1). We focus here on the results derived from the sample of players who appear in at least 50 games. Further analyses and robustness checks to other specifications are available in the Appendix.

Results are generally in line with expectations and with existing literature. Unsurprisingly, home advantage is associated with a large and statistically significant advantage with respect to the goal margin. This applies across all seasons, comparable to having one top player (98th- percentile of the player performance distribution) on the pitch rather than one median player.60 The magnitude of the home advantage is comparable between both divisions of the Bundesliga, but it is larger in the German Cup. It decreases strongly over time, the reduction being by more than 40% over the sample period (see Table 3.3. Age follows an inverse U-shaped pattern with player performance with an insignificant positive coefficient for the linear age term and a strong and highly significant negative coefficient for age squared. The estimated decline in performance with age is considerable; a player at the age of 30 has on average a lower performance of 0.8 standard deviations of player fixed effects. Dismissals (yellow/red and red cards) are strongly associated with the goal margin (for either the own or the opponent team).

60For statistical tests, we carry out multi-way clustering by teams and players as described in Cameron and Miller (2015) for non-nested hierarchical data. In a future version, we will use standard errors that take into account the dyadic error correlation of the data as described in Cameron and Miller (2014). 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 72

Player, team and coach performance measures are all important predictors of the goal margin. This is shown by simple F -statistics of joint significance, which gives a p-value of nearly zero for players, teams and coaches, respectively. A variance decomposition (available upon request) shows that team heterogeneity is the most important performance dimension, followed by coach differences and finally player heterogeneity.61 Given that there are eleven players on the pitch, but only one coach for each team, this variation in the performance measures seems plausible. It highlights the importance of medium to long-term institutional factors for success in football. The variance decomposition shows also that the residual variation makes up slightly more than 80% of the total variation. Again, this is not surprising given the difficulties of predicting football matches and the discrete nature of our outcome.

Table 3.3: Estimation Results, Sample 50+

Player, Team and Coach Parameters Number of player effects 2140 Number of team effects 73 Number of coach effects 214 Summary of parameter estimates Std. dev. of player fixed effects 0.307 Std. dev. of team fixed effects 0.522 Std. dev. of opponent fixed effects 0.510 Std. dev. of coach fixed effects 0.376 Other parameters and statistics Mean Home advantage in 1998-2000 0.804 Mean Home advantage in 2014-2016 0.468 1 dismissal own team -0.681 2+ dismissals own team -0.917 1 dismissal opponent 0.671 2+ dismissals opponent 0.920 Age (standardized) 0.104 Age squared (standardized) -0.201 RMSE 1.6893 Adjusted R-squared 0.198 Std. dev. of goal margin 1.49 Sample size 287,685 Note: Results from Equation (3.1). The number of parameter and their standard deviation include reference groups.

Analyzing the estimated effects in more detail, Table 3.4 presents percentile differences of team, player and coach effects and their corresponding difference in the goal premium. For example,

61As a proxy for this analysis one can look at the standard deviation of teams (sd(λ) ≈ 0.522), which is approximately two-thirds larger than the standard deviation of the player performance distribution (sd(γ) ≈ 0.307) with coaches being in between (sd(ϕ) ≈ 0.376). 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 73 a player at the 5th percentile in the player effects distribution, whilst replacing a player at the 95th percentile yields, on average, ceteribus paribus, a goal premium of 1.03. Particularly in view of the fact that superstars such as Robben, Gündogan or Thomas Müller are representing the top percentiles of player distribution, the difference identified in performance and ability is deemed reasonable. These players indeed have the ability to turn a match around, with high influence on the goal margin outcome due to their single contribution.62 Figure 3.12 shows the distribution of player performance for players who appear only in the 1st Bundesliga (25%), those who always play in the 2nd Bundesliga in the sample period (23%), and those who play for teams in both divisions (52%). The distributions look reasonable with a clear hierarchy but considerable overlap between players in different divisions. The density estimation is based on the 2,138 individual players who play in at least 50 games during our sample period. The density estimation is slightly skewed to the left, this is a reflection of the fact that better players play, on average, in more games. We find a similar convincing hierarchy for the sum of team and coach effects (see Figure 3.4).

Player performance measures are closely related to professional grades attributed by the Kicker sports magazine. Amongst other things, these grades have been used in Buraimo et al. (2015) to analyze moral hazard aspects with respect to contract length of Bundesliga players. If we run the same Equation (3.1) for grades rather than for the goal margin, we find a correlation of ρ ≈ −0.79 between our performance measure and the coefficients derived from the regression with grades as outcome (in Germany lower grades are better), see Appendix.

Looking at percentile differences for coach effects, we also see a strong dispersion in coach abil- ities. The difference between the 95th and 5th percentile in the coach distributions constitutes a goal margin of approximately 1.26. Again, comparing these results with common perception of coach ability supports the plausibility of our results. For instance, top coaches such as Jürgen Klopp or Thomas Tuchel are found in the top percentiles of our coach effects distribution (along coaches such as Martin Schmidt who currently manages Mainz 05 or Edmund Becker who pro- moted Karlsruher SC to the 1st Bundesliga). Our results also confirm the importance and great contribution of managers to organizational success, as illustrated by Hentschel et al. (2014). We find strong rank correlation between our estimated coach effects and their coefficients for coaches which are estimated without taking players into account.

We observe strong trends in average performance over time, which is a well-known issue for various rating measures (e.g. in chess Elo-ratings which have been adapted to football). We

62Nonetheless, there are some surprising results for players who have played in relatively few games. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 74

Table 3.4: Percentile Difference of Team/Player/Coach Fixed Effects in the 50+Sample

Corresponding Goal Margin Impact Percentile Difference Team FE Player FE Coach FE 95th - 5th 1.53 1.03 1.25 90th - 10th 1.25 0.80 0.92 75th - 25th 0.70 0.42 0.44 75th - 50th 0.32 0.23 0.20 50th - 25th 0.38 0.19 0.24 # Parameter Estimates 73 2140 214 Note: Results from Equation (3.1). Reference groups are included. cannot determine the extent to which this phenomenon is caused by for example true perfor- mance gains over time or reflects statistical artifacts. In the following, we therefore standardize team, player and coach fixed effects by season by subtracting the respective season-specific mean effect and dividing by the respective standard deviation (of players or coaches within a season). Standardizing does not alter the interpretation of our results because the covariance of standardized variables is simply the correlation.63 Where we illustrate aggregated individual performance measures (such as in Figure 3.4), we simply average season-specific standardized fixed-effects.

In regard to estimated player productivity, Figure 3.3 reveals (unsurprisingly) that the mean player fixed effects and average points have a corresponding development for different teams. A first glance at trends in player performance over time amongst clubs which are observed in the entire sample period, gives an initial indication of declining competitive balance.

Amongst the teams who have made gains in terms of player talent are Bayern München, Bor. Mönchengladbach, FC Schalke 04 and, perhaps surprisingly, . Teams such as 1. FC Kaiserslautern, 1860 München, 1. FC Nürnberg, Hertha BSC and Werder Bremen, for example, have lost out in this area.

Looking at Werder Bremen (Figure 3.3(b)), we can see a strong connection between both mea- sures of performance. In 2003/04 Werder Bremen won the Bundesliga championship and in the following seasons were able to regularly qualify for the Champions League, resulting in steadily rising mean player effects. Failing to regularly qualify for international competitions since 2008/2009, however, Werder Bremen has experienced a sharp decline in final league po-

63Results regarding player talent are very similar. The Pearson correlation between players (non-standardized) fixed-effects and the mean of the player fixed effects standardized by season is ρ = 0.86 and the Spearman’s rank correlation coefficient is ρ = 0.84. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 75

(a) Borussia Dortmund (b) Werder Bremen

(c) 1860 München (d) SpVgg Greuther Fürth

Figure 3.3: Mean Player Effects 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 76 sitions, as well as in average player talent. The drop in mean player performance of Werder Bremen before the season 2008/09 and in 2010/11 is one of the largest in the whole sample. In summer 2010, Werder Bremen lost Mesut Özil (to Real Madrid) while Naldo (who won the German cup with Wolfsburg in 2014/15) was severely injured such that he missed the whole following season. Both Özil and Naldo are well ranked players. A similar development can be seen for Borussia Dortmund (Figure 3.3(a)). In this case, however, we can see that the finan- cially challenging seasons between 2004 and 2009 had a rather modest influence on average player talent. In contrast, Borussia Dortmund’s recent success in the Bundesliga is character- ized by a huge increase in player productivity. Robert Lewandowski, Mario Götze and Shinji Kagawa were, for example, amongst the best ten players in the season 2011/12. Figures 3.3(c) and 3.3(d) show different trends in player talent for 1860 München and SpVgg Greuther Fürth, two clubs who played in both divisions during the sample period. While 1860 München shows a more or less long-term decline in player talent, we see a inverted U-shaped series for SpVgg Greuther Fürth which succeded to be promoted to the Bundesliga in 2012/13 but was relegated after only one season.

Regarding team effects, Figure 3.4 depicts the relation between the estimated (conditional) team effects added to their corresponding mean coach effects and the (unconditional) average points gained by each team. We find a strong positive relation between the two measures of team performance (ρ ≈ 0.89).

Figure 3.4: Team + Mean Coach FE and Mean Points by Match 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 77

All together, these results confirm the plausibility of our estimated effects as well as the validity of splitting team productivity into a team (long-term), coach (medium term) and player (short term) component. It also reveals a high dispersion of, for example, player effects, indeed im- plying high inequality within the Bundesliga. The next section addresses this issue by looking at the correlation of player and team effects over time.

3.4.2 Assortative Matching of Players to Clubs

Returning now to competitive balance, we identify decreasing competitive balance on the basis of rising assortativeness of players in regard to the teams they play for. We understand increased assortative matching to imply that teams with high long run productivity increasingly attracted players with high individual productivity, while players with low productivity are increasingly signed by teams with lower long run productivity. We have already seen in the previous section that teams such as Bayern München or Schalke 04 have managed to attract better player talent, but does this indicate rising assortative matching in general?

To answer this question, we focus on the pattern of player movement by looking at the correla- tion between our team and player effects for each season. This correlation indicates the degree of imbalance for each season, providing insight into whether better players increasingly migrate to teams which exhibit long-term success. This in turn provides direct insight into the allocation of talent as a measure of competitive balance.

We look at each club’s squad in a season and analyze the correlation between club and player fixed effects. The squad consists of all players who participate in at least one game for a given team during a season (weighted by the number of minutes played to take into account individu- als’ playing time). If players switch clubs in the winter break, they may therefore appear twice in the sample. Club effects are defined as the sum of team and coach fixed effects.64

  Corx = Cor γj,T , λj + ϕj,T (3.2)

Where γj,T are standardized (by season) fixed effects of all players of team j in season T , λj are standardized team fixed-effects of team j and ϕj,T are standardized and seasonal-averaged

64An average coach fixed effect is calculated for each season and for each team in order to incorporate possible coach switches during a season. Results are very similar if we do not consider coaches at all and just look at the correlation for players and teams. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 78 coach-fixed effects of team j in season T .65 All parameters are estimated via Equation (3.1) and player fixed-effects are weighted by minutes played in matches in the Bundesliga (1st or 2nd division, German Cup games are not considered here).

(a) Both Leagues (b) By League

Figure 3.5: Assortative Matching

Figure 3.5(a) plots the correlation coefficient for the main sample, Sample 50+. There is a clear rising linear trend revealing increased assortative matching throughout the period. Better players increasingly tend to play for superior teams and the opposite is true for weaker teams.66 This increase was most marked between 2000 and 2003. This was followed by a more or less linear increase until 2012. Assortative matching seems to have stagnated on a high level since 2012.67 This is in line with the results of simple measures such as the share of games won by the top four teams (see Figure 3.2). This result is also confirmed by the other specifications and for different sub-samples, as seen for example in Figure 3.9 and Figure 3.10 in the Appendix.

Next, we investigate the reasons for the reduction in aggregated competitive balance over time. To answer the question whether this development is driven by between or within divisional transfers, we apply a simple statistical decomposition. We decompose the pooled correlation of players to clubs using the Law of total variance which also applies to covariances (for simplicity,

65  1 P  Results are almost identical if we look at Cov γj,T , = N−1 (γj,T )(λj + ϕj,T ) as in Card et al. (2013). 66To some limited extent, better players in high-performing teams could also increasingly play more minutes. 67We place little importance to the drop in competitive balance to the season 2015/16 as there is greater uncer- tainty in the first and last seasons in our sample. This is because we observe fewer players in these seasons who play in a total of at least 50 games, due to the left- and right censoring of our sample. For this reason, we do not emphasize the negative correlation in the beginning of the sample period which is not apparent in the results for the Ridge regression which penalizes large coefficients, see Appendix 3.6.3. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 79 we look here at covariance terms):

cov(γj,T , λj + ϕj,T ) = (3.3)     = E cov(γj,T , λj + ϕj,T | Z) + cov E(γj,T | Z) ,E(λj + ϕj,T | Z) | {z } | {z } Changes within 1st and 2nd Bundesliga Changes between 1st and 2nd Bundesliga where Z is a binary variable indicating the division.

The first term in Equation (3.3) measures distributional changes of playing talent within both divisions. This terms allows us to analyze whether the decrease in competitive balance is for example mostly driven by increased inequality in playing talent in the 1st Bundesliga. Fig- ure 3.5(b) shows indeed that assortative matching has increased in the 1st Bundesliga but not in the 2nd Bundesliga. For the former, we witness a strong increase in the assortativeness of player talent until 2004, followed by slight declines until 2009 and a subsequent strong increase. This pattern is not observed for the 2nd Bundesliga where the distribution of player talent re- mained more or less stable over time (perhaps except for a drop in assortative matching in recent years).68 In particular, we see increased assortativeness amongst the 19 most established teams which we observe for all seasons in the sample. There is no trend, however, for the next 19 teams in the sample which we observe for between 7 and 18 seasons (or for all other teams). Bayern München is the record champion and the team with the highest fixed-effect (and high- est average players fixed-effect) in the sample. This team is an important driver of this trend because it is also one of the teams with the largest increase in player talent. The increase in assortative matching is not, however, only due to Bayern München (see Appendix).

The second term in Equation (3.3) looks at assortative matching between the two top divi- sions. Here we investigate whether the 2nd Bundesliga loses ground to the top division. Since inter-divisional financial inequality has widened substantially (see e.g. Frick and Prinz, 2006 or Figure 3.1), we expect increasing heterogeneity of player talent over time between the first and second division. Furthermore, financial inequality has also increased within the 1st Bun- desliga due to UEFA Champions League payments and advertisement deals. Figure 3.6 shows the trends in player performance and in the sum of team and coach performance by division over time. There is a clear trend in both performance measures. Player talent between the 1st Bun- desliga and the 2nd Bundesliga has become increasingly unequally distributed over time with

68Results for the 2nd Bundesliga are less precise compared to those for the top division. This is because there are more teams in the second league which we are able to observe only for a few seasons (compare Table 3.9). Having said this, if analyses are restricted to teams which we observe for at least 5 to 10 seasons, results appear fairly similar. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 80 better players progressively appearing in the 1st Bundesliga. There are several explanations for this. Firstly, the 1st Bundesliga progressively managed to retain better players whilst the oppo- site is true of the 2nd Bundesliga. Secondly, weaker players increasingly tend to move down to the second division. Thirdly, better players tend to play more often (or more minutes) over time in the 1st Bundesliga but not in the 2nd Bundesliga. We see a similar trend of increasing polarization between both divisions for the additive measure of team and coach performance. Coach performance in particular has become stronger in the 1st Bundesliga but poorer in the 2nd Bundesliga over time. Looking at the only 1st Bundesliga season of SpVgg Greuther Fürth (see Figure 3.3), for example, our model would had predicted a positive goal margin of +14 if Fürth had remained in the 2nd Bundesliga, an absolute difference of 35 in goal margin com- pared to the prediction for the 1st Bundesliga. To summarize, it is clear that the first Bundesliga has become stronger over time, whilst the 2nd Bundesliga has lost ground.

Figure 3.6: Trends in Performance Measures by League

Note: Results from Equation (3.3).

Next, we link changes in competitive balance (approximated by changes in assortative match- ing) to financial information. Unfortunately, only limited information is available about the financial situation of German football clubs since they are not required to publish detailed ac- counts (see Frick and Prinz, 2006). Therefore, we have to rely on aggregated data published by the UEFA and the German Football League (DFL).

Figure 3.1 shows total revenues from media contracts, match-day earnings, merchandising, advertising, transfers and other sources by league. This information is gathered from DFL Bundesliga reports available to us since 2006 (DFL, 2006–2015, unfortunately DFL could not provide data for the 2nd Bundesliga before 2004). The graph shows clearly that absolute rev- 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 81

Table 3.5: UEFA Champions League Payments to Bundesliga Clubs and Subsequent Competi- tive Balance

Outcome: Competitive Balance [Cor(γj,T , λj + ϕj,T )] Estimate Std. Error t value Log(Lagged UEFA CL Payments) 0.097∗ 0.060 1.61 Linear Time Trend 0.017 ∗ ∗ 0.0062 2.80 Intercept −0.536 ∗ ∗ 0.2094 −2.56 n 18 R2 0.63

Note: Heteroscedasticity robust standard error; p-values **: p< 0.01; *: 0.05 > p >= 0.01. enues between 1st and 2nd Bundesliga have diverged. These differences in revenues are most probably the main driver of increased differences in playing talent between both divisions.

Several studies have highlighted the role of increasing payments to teams participating in the UEFA Champions League (Pawlowski et al., 2010; Binder and Findlay, 2011) in the reduction in competitive balance in Europe’s major leagues. We link inflation-adjusted revenues from the UEFA Champions League (see Figure 3.1) to our measure of competitive balance (for the 1st Bundesliga only) and to a linear time trend. This is illustrated in Table 3.5. We use a lagged value of the UEFA Champions League payments to preclude possible reversed causality issues and take the logarithm of the annual payments made to Bundesliga clubs. The results indicate that UEFA Champions League may play a role for inhibiting competitive balance. Alongside a general trend towards increasing assortative matching (or a reduction in competitive balance), we find that competitive balance decreases in seasons after large UEFA Champions League payments have been made to Bundesliga clubs.

3.4.3 Drivers of Increassed Asortativeness

In this section, we look specifically at transfer and drivers of reduced competitive balance. As outlined in Section 3.3, more than 70% of all players (with more than 50 games in total) appear at least at two clubs (see Table 3.7), a necessary precondition for our analyses. Transfers often gain a considerable media attention, especially between Bundesliga teams: Mario Götze’s move from Borussia Dortmund to Bayern München in 2013 was covered by news channels as an "Earthquake in the Bundesliga" (Kulish, April 24, 2013). Transfers between teams in our sample have increased slightly over time (today about 16% to 18% of all players in our sample played in the previous season for another team in our sample) and it is not a priori 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 82

Table 3.6: Transfer Prediction

Outcome: Change in Team Fixed Effects (λk,t − λj,t−1) Estimate Std. Error t value Player Effect (Standardized) 0.0203∗ 0.0085 2.39 Average Residual in Season before Transfer (Std.) 0.0653 ∗ ∗ 0.0079 8.29 Intercept −0.0058 0.0087 −0.66 n 2544 R2 0.0299

Note: Cluster-robust standard error (by player); p-values **: p< 0.01; *: 0.05 > p >= 0.01. clear whether transfers are the main driver of decreased competitive balance. Bayern München, for example, won recent championships with several top players such as David Alaba, Philipp Lahm, Thomas Müller, Bastian Schweinsteiger who played already for their youth team. Bevor we look at the drivers of decreased competitive balance, we investigate whether our measure of player performance is predictive for movements to better teams.

Regarding transfers, common wisdom says that (only) one good season will bring you to a (much) better team but it should also be the case that the long-term performance matters for where you end up. We investigate this question looking at transfers only. To investigate career building and professional decline, we look at the change in average team effects between the destination and the origin team (λk,t −λj,t−1). As before, we use player fixed-effects γi as proxy for long-term performance. To measure short-term success, we look at the average residual iTi of individual i in the season Ti preceding the transfer. Both explanatory variables are stan- dardized by subtracting the mean and dividing by the standard deviation to make effect sizes comparable. The results are shown in Table 3.6.69 We see that both player fixed-effects and performance deviation from the long-term average in the season before a move are predictive for the difference in team strength due to the transfer. This highlights that player fixed-effects take into consideration (unknown) performance measures which are also relevant for scouts. Furthermore, scouts seem to take into consideration also performance during an often relatively short time period for a transfer.

In the following, we investigate whether player transfers between Bundesliga teams drive the reduction in competitive balance. We distinguish between three groups of players, those stay- ing in the team, moving between teams in our sample and unknown moves. The first group are Stayers. Stayers are defined as players who have already played for the relevant club in the pre-

69Results are robust to including individuals who do not switch teams. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 83 vious season. These players constitute almost 50% of our sample. The second group consists of players who move between Bundesliga teams. In other words, players who play two consec- utive seasons in either the 1st and 2nd Bundesliga but not for the same team (ca. 15%). Movers represent transfers such as that of Mesut Özil, who was sold by Schalke 04 to Werder Bremen in 2008, as well as loans such as that of Toni Kroos, who was loaned by Bayern München to Bayer Leverkusen in 2009 and 2010. A third group includes players for whom we do not know whether they previously played for the same clubs, but in a different division (e.g. youth team), or for another team not included in our sample, for a team in a lower division, or a team based abroad, for example (35%). Mesut Özil for example, is classified as unknown in 2010, as he moved to Real Madrid in the summer break. We match players with the teams to which they move or where they stay in the subsequent season and calculate Equation (3.1) separately for all three groups.

Figure 3.7: Assortativeness of Player and Club Fixed Effects by Player Groups

Note: Smoothing by cubic polynomial.

Figure 3.7 shows the correlation analysis for these three groups. The mover sample illustrates positive assortative matching to clubs for all seasons. This trend is fairly stable over time. This indicates that player transfers between Bundesliga teams reduces competitive balance by allocating better players to better teams. However, it is not only transfers which have caused the reduction in competitive balance over time. Bayern München, for example, is notorious for buying player talent from temporarily competing teams such as Bayer Leverkusen in the early 2000’s (e.g. Michael Ballack or Lucio), Werder Bremen (e.g. Miroslav Klose or Claudio 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 84

Pizarro) or, most recently, Borussia Dortmund (e.g. Mario Götze or Robert Lewandowski).

In contrast, the other two groups show a strong increase in assortativeness over time, up until ap- proximately 2012. Reduction in competitive balance is therefore attributed to two other mech- anisms. Firstly, the increased availability of up-and-coming young players to play in the top divisions of the Bundesliga has reduced competitive balance. These young players are trained in clubs’ own youth academies, which have been mandatory since 2002/03, or are attracted to clubs at a young age. The transfer of players from abroad may also play a significant role here. Secondly, competitive balance may be reduced by the retention of certain players, e.g. top stars such as Bernd Leno, Thomas Müller or Marco Reus who could have left the Bundesliga in the mid 2000s, when German teams were performing badly in international competitions.

3.5 Conclusion

We offer a new approach to measuring player performance which enables us to investigate the role of player mobility for competitive balance. This approach allows us directly to test the- oretical predictions regarding transfers of player talent as described by Cairns et al. (1986): Hence a given team may have incentives to continue increasing its playing strength vis-à-vis its competitors, generating attendances for itself without taking account of any external costs of reduced attendances elsewhere, due to lessened uncertainty of outcome. Player performance is measured by the partial correlation of each player with his goal margin, taking into account substitutions, dismissals and other important contributors to team success such as home advan- tage. Using data relating to 19 seasons of the top two divisions of the German Bundesliga plus domestic cup matches, we investigate changing trends in the distribution of player talent across clubs. Player talent is arguably the most important prerequisite for success in sports and its distribution is therefore a critical measure of the degree of competitive balance. By linking the distribution of player talent to competitive balance, we overcome problems faced by the previ- ous literature which relied mostly on aggregated end-of-season league position and could often not detect significant changes over time.

Our results indicate that there is a clear trend towards a more unequal distribution of player talent across teams in the German Bundesliga top divisions between 1998 and 2016. We interpret this as a reduction in competitive balance. Drivers of this decline in competitive balance are, on the one hand, rising inter-divisional inequality of teams, coaches and players between the two German top football divisions. On the other hand, we see increasing assortative matching of 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 85 players to teams within the top division of the German football league system (1st Bundesliga). This does not hold for the 2nd Bundesliga where financial endowments such as media payments grew much more slowly compared to those in the 1st Bundesliga, where strong increases in revenue have been seen over the last two decades.

We show that there is a circular trend, whereby more successful teams tend to attract better players. We point out that UEFA Champions League payments might play a distinct role in allowing certain clubs to attract better player talent. Furthermore, we show that the reduction in competitive balance is not driven by the direct movement of players between Bundesliga clubs. It is rather the increased hiring of talented players at a younger age from abroad, which explain the reduction in competitive balance. Furthermore, the retention of players by certain teams also increasingly drives these trends.

We confirm the results found by Groot (2008) or Pawlowski et al. (2010) who also provide evidence for decreased competitive balance in the 1st Bundesliga in recent years, and add in- formation about the 2nd Bundesliga which has been neglected in the previous literature. In line with our results, Pawlowski et al. (2010) also finds that increasing Champions League payments is one remarkable driver of this development, not only in Germany but also in other European Leagues. However, our results differ from the findings of earlier studies (see Section 3.2.1) which did not detect any changes in competitive balance for the beginning of our sample pe- riod. It is difficult to say why these studies come to inconclusive results since each study uses a different (aggregated) method with varying sample periods. Using individual level player data, we are able to provide clear evidence for gradually declining competitive balance. Hereby, we reveal important theoretical causes: how competitive balance is determined by player mobility.

Nevertheless, we are unable to assess whether more competitive balance is desirable. Increased assortative matching may be socially optimal since more successful teams have usually a larger fan base. Indeed, economics theory is also unable to provide an answer to this question. On the one hand, the 1st Bundesliga has the highest average stadium attendance amongst football leagues worldwide, and attendance figures have risen considerably within the last two decades. This may indicate that the degree of competitive balance is currently not harmful to levels of fan interest (and empirical studies have so far not found a link between measures of competitive balance and stadium attendance). On the other hand, Bayern München has just won a record fourth consecutive championship with total points at the end of the season unseen only a few years ago. Based on our results, one could argue that the football market seems to work ef- ficiently by allowing increasingly better equipped teams to attract better players. Regulations 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 86 such as salary caps, gate sharing or restrictions on the transfer of young players could poten- tially help to increase competitive balance if this is considered beneficial. Further research should investigate more thoroughly the long-term impact of competitive balance on fan interest and should provide recommendations where necessary. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 87

3.6 Appendix

3.6.1 Data Preparation

Data has been separately prepared in R and Stata. Table 3.7 shows the descriptive statistics for three different datasets.

As mentioned before, the hierarchical fixed effects model requires strong separability of team, player and coach effects in order to disentangle goal margin variation attributable to team, player and coach specific performance. Separate identification of individual and team fixed effects is only satisfied if all clubs are connected through player mobility, see Abowd and Kramarz (1999). Therefore, team effects and player effects cannot be disentangled if respective teams and players are only jointly observed. Mover players aid separation concerning this dimension, while also enabling separation of player and team effects for non-movers. As depicted in Table 3.7, approximately two-fifth of "Sample All" are mover players which is much larger than in other areas such as firm mobility (e.g. Card et al., 2013). Even new teams relegated to the 2nd Bundesliga usually have several players with previous Bundesliga experience in their squad which allows to disentangle player and teams effects as well (see also Table 3.9). In our main sample of players who appear in at least 50 games in total, more than two-thirds of all players play for at least two teams. By excluding players with less than 50 matches, we admittedly lose 57.8% of all players but only 13,9% of all player x match observations since lineup players are mostly quite experienced (or will be quite experienced at the end of their career). We do not loose any entire match observation.70 As separate identification is more accessible when only considering mover players - observed at least at two different clubs - we introduce the sub-sample b) Mover 50+, used as a robustness check.

Furthermore, a similar match condition for coaches is imposed: Only coaches with at least 18 match observations are considered in all samples. All coaches failing to meet this condition are treated as interim coaches, used as the reference category in our estimation. There is a large number of interim coaches, 105 out of 364 coaches in our sample manage less than 10 games in total. Two promising coaches manage their respective team for all seasons we observe them in our sample (Dirk Schuster at SV Darmstadt 98 and Frank Schmidt at 1. FC Heidenheim). Therefore, we cannot separate team and coach fixed effects in these cases and we treat them as reference coaches. This does not influence our results since we add up team and coach fixed

70A few matches from the viewpoint of Jahn Regensburg are missing. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 88 effects for the covariance analyses in Section 3.4 but it could explain the high team effects of SV Darmstadt 98 and 1. FC Heidenheim which represent to some extent coach ability.

Finals in the German Cup are held in Berlin since 1985. To account for the fact that there is no true home team in finals, we add another home category for those matches and count both teams neither as home nor as away.

Table 3.7: Data Sets

a) Sample All c) Sample 50+ d) Mover 50+ # Players 5,072 2,140 1,514 Mover Share 39.4% 70.7% 1 # Teams 73 # Coaches 364 # Coaches 215 (57.9% appear at least for two teams) (>17 matches) Player x Match Obs. 334,234 287,685 223,452

3.6.2 Analysis on the Match-Level

The outcome of a match is the joint result of all players on the pitch (and depends of course on the performance of the opposing team). In our approach so far (Equation (3.1)), we have not taken into account fellow players. The error terms of different players who appear together in a given match are correlated. This leads to correlation of the error term across team members and, furthermore, to serial correlation of the error term over time, biasing the covariance matrix of the coefficients.71 An econometric issue regarding the point estimates might arise if for example the performance of players depent on fellow players, e.g. if a left-winger is used to play together with a certain left-back. Moreover, it is possible that those two players play often together against stronger opponents. This leads to an omitted-variable bias with its familiar problems. To account for this fact, we use a second model on match level where we look simultaneously at all players on the pitch. Here, we allow individual player performance to be correlated with fellow players (and opponents).

In this approach, each player of one team in a given match ’receives’ the same goal margin, irrespective of whether he was in the lineup or substituted in later. In the previous analysis, we

71We used multiway clustering on a high level (player and team) to take into account these correlations (Cameron and Miller, 2015) when referring to variance estimates. We will use standard errors that take into account the dyadic error correlation of the data as described in Cameron and Miller (2014) in a next version of this paper. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 89 related to each player the goal margin achieved during his playing time (often a zero for late substitutions). Here we relocate the analysis to the match level (or, to be more precise, to the match-team level).72 Each player impact is still weighted by the number of minutes he played during a match. Substitutes therefore have (still) less influence than players who are on the pitch during the entire match. We illustrate this approach again by the match between Bayern München and FC Augsburg (see Table 3.2) and the following two final matches of Bayern München in the season 2014/15.

Table 3.8: Data Sets

Goal Home × Team Bayern Team Opponent Opponent Opponent Player Margin 2014/15 München . . . Sent-Offs . . . FC Augsburg SC Freiburg Mainz Neuer Lahm Dante Xabi Alonso T. Müller Lewandowski . . . -1 1 1 . . . 1 . . . 1 0 0 0.846 0.154 1 0 0.813 0.813 ... -1 0 1 . . . 0 . . . 0 1 0 1 0.209 0 0.703 0.297 1 ... 2 1 1 . . . 0 . . . 0 0 1 1 1 1 0.582 0.505 0.802 ...

Using this approach, we have to take into account multicollinearity due to the fact that many players often appear together in a given match. These players are therefore statistically difficult to distinguish. We use a linear ridge regression model which is a popular method in machine learning if the number of observations is not very large compared to the number of coefficients to be estimated (Friedman et al., 2001). Ridge regression minimizes a penalized residual sum of squares.73

2∗N p p h X X 2 X 2i argmin Goal_Margini − β0 − xijβj + λ βj (3.4) β i=1 j=1 j=1

Here, we look at match j from the viewpoint of each participating team (the number of ob- servations is therefore 2 ∗ N = 24, 262, twice the number N of matches in our sample).74 x has column rank p = 2, 307 mostly consisting of player indicators (2140) but also columns for each team (73), opponent (73), sent-offs (4), and home advantage (interacted with season to account for shrinking home advantage over time, 19). For simplicity, we do not include season, league, coaches and further interaction between season or home advantage and league (which were not very relevant in the previous analysis).75 λ controls the amount of shrinkage and is

72Different goal margins for players with different playing time in a given match would also be possible by shifting the analysis to a minute-based match analysis. 73For calculation we use the glmnet package for R (Friedman et al., 2010). Ridge regression methods have also been proposed by Kiefel and Warnke (2015) and Sæbø and Hvattum (2015). 74We lose 4 match-team observations due to insufficient number of players with at least 50 games. 75Including these variables does not alter our results. The only exception is coaches which further reduces the statistical power of our analysis and gives very noisy results for many teams and coaches for which we have few observations. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 90 estimated via 20-fold cross-validation.76 Again, we restrict the analysis to all players with at least 50 observations because those coefficients would be shrunk toward zero anyway and are less informative.

Results on the match-level mostly confirm the results presented in this paper: Coefficients for player strength derived from the ridge regression (Equation (3.4)) are positively correlated with average standardized (by season) player fixed-effects (ρ ≈ 0.44) or unstandardized coefficients (ρ ≈ 0.28) from Equation (3.1). Taking fellow players into account, we find again a strong increase in assortative matching between players and teams (Equation (3.2)) for teams which we observe for at least 10 seasons.77 We do not find trends in assortative matching among all teams which is not surprising since coefficients for teams with fewer seasons are shrunk towards zero. Although the number of observations for teams is large compared to the number of ob- servations for each player, we find a surprisingly low correlation (ρ ≈ 0.26) for the coefficients of teams, derived from the ridge regression compared to the coefficients estimated in Equation (3.1). Results regarding team strength estimated in Equation (3.4) are, in our view, not entirely convincing. For example, VfR Aalen, SV Wehen Wiesbaden and Wacker Burghausen are in- cluded among top five teams (and Bayern München is only ranked 8th). In contrast, coefficients of teams have been convincing in the previous analyses (compare Figure 3.4). We interpret this finding that the estimated shrinkage parameter λ should probably penalize teams differently to players (e.g. via a hierarchical model). Interestingly, running Equation (3.4) with or without team indicators does not alter the results for player strength, the correlation between the model described in this paragraph and a parsimonious model without team information and some fur- ther interaction (see below) gives a correlation coefficient of ρ ≈ 0.978.78 Looking deeper into this topic is beyond the scope of this study but future research should investigate for example a hierarchical ridge regression framework in this context.

Apart from the positive trend described in the previous analysis, we offer further evidence to confirm our results. Similar to Sæbø and Hvattum (2015), we drop team information in Equation (3.4) and use only player information (plus indicators for the season-specific home advantage and the opponent) and use a very simple measure of team performance: The average points

76k-fold cross-validation partitions the original sample into k subsamples of equal size. k − 1 samples are used as training set and one remaining sample is then used for validation. This exercise is repeated k times. As loss function we use the mean squared error criterion. For simplicity, we ignore the here the hierarchical nature of the data. 77Surprisingly, the correlation is strongly negative for earlier seasons. 78In contrast, dropping team information in Equation (3.1) alters player coefficients much more. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 91 achieved in our sample period.79 Correlating these player coefficients with the naive measure of team performance, we find a strong increase in assortative matching. This confirms our previous results as shown in Figure 3.8. The correlation is larger and less volatile for the 1st Bundesliga for which we find a more or less steady increase (with the exception of two seasons 2006/07 and 2007/08). For the case of the 2nd Bundesliga, it seems that assortative matching has also increased from being at around 0.2 until 2008/09 to around 0.3 to 0.4 thereafter (except for the season 2012/13).

(a) Both Leagues (b) By League

Figure 3.8: Assortative Matching (Ridge Regression on the Match-Level)

3.6.3 Ridge Regression Estimate of Equation (3.1)

We have estimated Equation (3.1) using ordinary least squares (OLS) to present a simple and widely used (unbiased) estimator. Several recent publications have investigated the distribution of fixed-effects which were estimated via OLS, e.g. Card et al. (2013) or Chetty et al. (2014). Results presented in this paper are robust to estimating Equation (3.1) using ridge regression which gives some bias for the coefficients but a lower variance (see Appendix 3.6.2 for more de- tails about ridge regression). Figure 3.9 shows the results for the assortative matching (Equation (3.2)) when we use ridge regression to estimate Equation (3.1).80 As in the previous Section, we use 20-fold cross-validation to calibrate the shrinkage parameter. We base this analysis (as

79Unsurprisingly given the large correlation between player coefficients with or without team information (see paragraph above ρ ≈ 0.978), results look quite similar if we use the model including team indicators and further interaction terms for this analyses. 80For the ridge regression approach, we use weights for the correlation (Equation (3.2)) but not to estimate player, team and coach coefficients (Equation (3.1)). 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 92 in the case of the OLS) on the sample of players who appear in at least 50 matches and coaches with more than 17 games because the ridge regression becomes computationally intensive for larger samples (and coefficients for players with few games would be shrunk toward zero any- way). Interestingly, we do not find a negative correlation in the beginning of the sample period as in Figure 3.5 when we use ridge regression. This indicates that the (small) negative correla- tion in the beginning is due to greater uncertainty in the first (and last) seasons in our sample because players in these seasons tend to play for fewer total (sample) matches. Ridge regression takes this into account by penalizing those coefficients.

Figure 3.9: Assortativity Matching (Ridge Regression for Equation (3.1))

3.6.4 Robustness to Other Samples

The main conclusions are robust to considering a different match condition or only mover play- ers (where separate identification of players and teams is more straightforward and precise es- timation more accessible) as shown in Figure 3.10. Sample selection does not drive our results (players with at least 50 observations are of course a selected sample of all players appearing in the 1st or 2nd Bundesliga. To show this, we estimate Equation (3.1) and Equation (3.2) sep- arately for the sample of players who appear at least in two teams and those with at least 10 matches. Furthermore, we exclude Bayern München in Equation (3.2) for the standard sample to assure that results are not driven by one team. Results are weaker for the robustness checks than for the standard sample but an increase in assortative matching is obvious for different 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 93 samples considered.81 Furthermore, a match might be already decided at a certain point of time

Figure 3.10: Assortativity Matching (for Different Samples) if one team has a large goal margin. Here, teams might bring players who are supposed to get some playing experience, e.g. after injuries, without large incentives to change the actual goal difference. To check whether this changes our results, we ran the analysis based on the sample of lineup and full-time players only. This does not alter the interpretation of our results.

3.6.5 Robustness to Other Outcomes

Our analyses show that playing talent is increasingly unequally distributed across divisions and within the 1st Bundesliga but this does not hold for the 2nd Bundesliga. This pattern is also apparent for the last ten seasons if we look at betting odds for German Cup matches. Unfortunately, betting odds are only available since 2005 but since then there is a clear pattern: If we look at average maximum odd over time for German Cup matches between teams of the same league, we see that the average maximum odd has increased statistically significantly for matches between two teams playing in the 1st Bundesliga but no time trend is discernible

81We expect weaker analyses for different reasons. First, movers for example are not the main driver for in- creased assortative matching, see for example Section 3.4.3 and the sample size is here reduced. Second, Bayern München is the team with both the highest team fixed effect and the highest average player performance. Third, although the sample of players playing at least 10 matches is considerably higher than the "50+ sample", per- formance measures are much less precise in this analysis due to the large random component natural to sports matches (the residual variation makes up slightly more than 80% of the total variation in a variance decomposition for Equation (3.1)). 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 94 for matches between two 2nd Bundesliga teams (the number of matches between teams of the 2nd Bundesliga is with 43 compared to 105 considerably lower but a time plot shows in this case a more or less flat line). The average maximum odd has also increased significantly for the 152 matches between 1st Bundesliga and 2nd Bundesliga teams. Looking at football odds, therefore, also indicates decreasing competitive balance between the top two divisions of the German Bundesliga and within the 1st Bundesliga. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 95

3.6.6 Validity of our measure

Does our measure of the partial correlation of player’s appearance on the pitch and the goal margin constitute a valid proxy for performance? We give several arguments why this might be the case. First, in a separate study we show that this approach can be used to give good predictions for future football matches based only on player data (and home advantage).82 Secondly, our measure is correlated with expert ratings for players’ performance – not only with current expert ratings but also with future expert ratings (see Kiefel and Warnke, 2015). The Pearson correlation coefficient between players’ performance measured by Equation (3.2) and by players’ average grades during the whole period is ρ = −.26 (in Germany lower grades are better). If we adopt a similar approach as in Equation (3.2) for grades (where we replace goal difference as an outcome with goals), we get a correlation coefficient of ρ = −.79. This shows that the partial correlation of each player with the goal margin is closely related with a ratings by experts used in other studies such as Buraimo et al. (2015).

Figure 3.11: Association Between Player Performance measured by Grades and by Goal Margin

Figure 3.12 shows the distribution of player performance for players who appear only in the 1st Bundesliga (25%), who always play in the sample period in the 2nd Bundesliga (23%) and those of play for teams in both divisions (52%). The distributions look reasonable with a clear hierarchy but considerable overlap between players of different divisions. The density

82This is still work in progress but a current version is available upon request. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 96 estimation is based on 2,138 unique appearances in at least 50 games in our sample and it is slightly skewed to the left since better players play more matches on average. We find a similar convincing hierarchy for the sum of team and coach effects (see Figure 3.4).

Figure 3.12: Distribution of Player Performance (Player Fixed-Effects for Players with at least 50 games)

Note: Kernel density estimates of individual player performance (for players with at least 50 games) with fixed bandwidth of 0.05. The densities are slightly skewed to left because players with high fixed-effects play more games on average.

Furthermore, we have shown in Section 3.4.3 that player fixed-effects are important for transfers (besides short-term performance prior to a transfer, see Table 3.6). A further anecdotal evidence for the validity of our measure can be seen if one looks at the top players who still played in 2015/16 (the last season we observe) in the 50+ Sample: This lists includes many acclaimed Bayern München players such as Xabi Alonso, Robert Lewandowski, Javi Martinez, Manuel Neuer, Philip Lahm, Frank Ribery or Arjen Robben, Brasilian footballers such as Dante, Luiz Gustavo and Naldo (currently playing for VfL Wolfsburg). But also players like Kagawa or Lukasz Piszczek who won two championships with Borussia Dortmund or , who will move to Arsenal London in the next season, and Yann Sommer, the first-team regular goalkeeper for Switzerland, from Borussia Mönchengladbach. We could only find few surprises such as Jan Simunek who plays currently for VfL Bochum and is the only 2nd Bundesliga player within the top 20 players in 2015/16. 3. COMPETITIVE BALANCE AND ASSORTATIVE MATCHING 97

Table 3.9: Teams in the Sample

Team G S P C Team G S P C Bayern München 722 19 (19) 107 8 279 8 (0) 76 11 VfB Stuttgart 688 19 (19) 136 17 FC Ingolstadt 04 245 7 (1) 58 8 Werder Bremen 688 19 (19) 111 7 175 5 (0) 48 5 VfL Wolfsburg 687 19 (19) 141 14 VfL Osnabrück 175 5 (0) 63 5 Borussia Dortmund 685 19 (19) 107 9 Wacker Burghausen 174 5 (0) 36 4 684 19 (19) 120 12 143 4 (0) 37 5 FC Schalke 04 684 19 (19) 121 14 142 4 (0) 30 7 Eintracht Frankfurt 677 19 (14) 141 14 SV Sandhausen 141 4 (0) 35 3 1. FC Kaiserslautern 677 19 (11) 160 13 Waldhof Mannheim 141 4 (0) 38 4 Bor. Mönchengladbach 676 19 (16) 145 15 TuS Koblenz 140 4 (0) 40 4 1860 München 676 19 (7) 132 14 1. FC Saarbrücken 139 4 (0) 38 5 SC Freiburg 675 19 (12) 114 4 Carl Zeiss Jena 109 3 (0) 31 5 VfL Bochum 673 19 (10) 138 11 VfR Aalen 107 3 (0) 23 3 1. FC Köln 672 19 (11) 137 18 SSV Reutlingen 05 106 3 (0) 26 4 Hamburger SV 671 19 (19) 132 14 SSV Ulm 1846 105 3 (1) 23 4 Hertha BSC 670 19 (17) 131 12 Eintracht Trier 105 3 (0) 24 1 1. FC Nürnberg 670 19 (12) 141 14 Fortuna Köln 103 3 (0) 20 3 1. FSV Mainz 05 667 19 (10) 135 10 73 2 (0) 23 3 SpVgg Greuther Fürth 667 19 (1) 141 9 1. FC Heidenheim 72 2 (0) 14 1 635 18 (14) 119 14 SV Wehen Wiesbaden 72 2 (0) 26 4 MSV Duisburg 604 17 (5) 151 13 SV Darmstadt 98 71 2 (1) 24 1 Energie Cottbus 596 17 (6) 117 8 KFC Uerdingen 05 71 2 (0) 18 3 Karlsruher SC 593 17 (3) 117 14 RasenBallsport Leipzig 71 2 (0) 15 3 567 16 (8) 126 12 Rot-Weiss Essen 71 2 (0) 30 4 FC St. Pauli 519 15 (2) 104 12 VfB Lübeck 71 2 (0) 19 1 Hansa Rostock 496 14 (9) 109 10 Chemnitzer FC 69 2 (0) 13 3 461 13 (1) 95 10 FC Gütersloh 69 2 (0) 18 3 FC Augsburg 354 10 (5) 76 5 SG Wattenscheid 09 69 2 (0) 11 2 Rot-Weiß Oberhausen 352 10 (0) 64 8 Jahn Regensburg 67 2 (0) 21 4 SC Paderborn 07 348 10 (1) 82 9 SV Meppen 36 1 (0) 11 2 Erzgebirge Aue 348 10 (0) 72 8 FSV Zwickau 35 1 (0) 7 1 1. FC Union Berlin 346 10 (0) 79 7 35 1 (0) 8 3 1899 Hoffenheim 327 9 (8) 65 9 SV Babelsberg 03 35 1 (0) 10 2 SpVgg Unterhaching 314 9 (2) 69 8 VfB Leipzig 35 1 (0) 9 2 Fortuna Düsseldorf 313 9 (1) 75 11 1. FC Schweinfurt 05 34 1 (0) 5 1 FSV Frankfurt 281 8 (0) 74 4 Rot-Weiß Erfurt 34 1 (0) 12 2 280 8 (1) 55 8 Note: G : Total number of games (including domestic cup games); S: Total number of seasons observed in either the 1st Bundesliga or 2nd Bundesliga in the sample period (in parenthesis only 1st Bundesliga); P: Total number of unique players with at least 50 games in total who appear in at least one match for the respective team; C: Total number of unique coaches with at least 18 games in total who managed the respective team. 4 New Evidence on the Determinants of Firm-based Training

Susanne Steffes (Centre for European Economic Research, ZEW)

Arne Jonas Warnke (Centre for European Economic Research, ZEW)

JEL-Classification: I24, J24, M53

Keywords: Human Capital, Training, Linked-Employer-Employee Data (LEE), Decomposi- tion, Unobserved Heterogeneity

Acknowledgements: Special thanks go to Daniel Dietz, Bernd Fitzenberger, Katja Görlitz, François Laisney, Henrik Stryhn, Thomas Zwick and seminar participants at ZEW Mannheim and University of Freiburg, University of Würzburg, EEA-ESEM (2014), Colloquium on Per- sonnel Economics (2014), Workshop on Economics of Education (2014), DFG Conference on the German Labour Market in a Globalized World (2015), The Economics of Vocational Edu- cation and Training (2015).

98 4. NEW EVIDENCE ON FIRM-BASED TRAINING 99

Abstract: In this paper we analyze firm- and worker-level observable and unobservable hetero- geneity in participation in job-related training. We use a novel panel dataset which links firm and worker surveys, and contains detailed information on the incidence, duration, initiative and funding of training. Using multilevel methods, we analyze selection processes for participation in training and consider the complementarity of investments in training made by the firm and worker. Our results point towards significant differences in the determinants of participation in courses financed by the firm, or training co-financed by the worker. Another original result from our study concerns the significance of job characteristics in determining individuals’ par- ticipation in training. The information included in the analysis on workers’ job characteristics, has played a significant role in the identification of important determinants. Firm-financed and worker (co-)financed training seem to be neither substitutes nor complements of one another. In addition, we show that whilst unobserved firm-heterogeneity is of little significance, worker characteristics remain important determinants of participation in training, even after controlling for variables. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 100

4.1 Introduction

In contrast to schooling decisions, investments in on-the-job training (formal or informal) are determined jointly by firms and workers. Given that firms and workers are not alike, training investments most probably depend on characteristics of both parties. In addition, training in- vestments are driven by the job context. Certain occupations and tasks require that workers regularly learn new, or preserve existing skills. In other words, the heterogeneity of firms and the diversity of jobs and workers within these firms are crucial elements of the training decision.

The bilateral nature of investment decisions makes it difficult to empirically disentangle the determinants of training (Lynch and Black, 1998). Many empirical studies have investigated determinants of work-related training and showed that both individual and firm characteristics are important (this literature is summarized by Arulampalam et al., 2004; Bassanini et al., 2005; Asplund, 2005). Very few studies, however, address both dimensions simultaneously by looking at multiple workers in several firms at the same time.83 Despite the importance given to the interdependency of training investment decisions in the theoretical literature, empirical evidence rarely acknowledges such considerations.

In this paper, we raise the question as to whether, and to what extent workers’ participation in training differs across firms (between-firm heterogeneity) or between employees in the same firm (within-firm heterogeneity). Between-firm heterogeneity considers the extent to which participation in training differs between two comparable workers who work for two different firms. Within-firm heterogeneity looks how much training rates differ between co-workers in a given firm. We focus on neither the specific determinants of participation in training, nor the expected returns. Our interest rather lies in the general interplay between firms and workers when it comes to job-related training. Therefore, we are the first who simultaneously investigate the extent to which individual training rates vary between different firms and workers, and suggest reasons for these heterogeneities in training, including for example, job characteristics. We distinguish formal training on the basis of the party initiating it and the party financing it - two other vital aspects addressed in the theoretical literature.

Several studies in the literature on training such as Lynch and Black (1998), or Zwick (2005) for Germany, have used firm-level data with aggregated information about, for example, the proportion of highly skilled workers to show the interrelation between workforce composition

83Most studies look either at individuals only using household data, or at firms using establishment data (some- times aggregating individual information), or at the workforce of single firms. Whether the focus is on the firm or the worker dimension depends usually on the aggregation level of the data at hand. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 101 and training rates at the firm-level. These studies use information about the share of workers who participate in training but they do not reveal the dynamics of workers’ participation in training over time (given the establishment dynamics). We know neither how often certain workers participate in training, nor anything about the workers’ investments. Furthermore, a change in a firm’s average training rate may be caused by either an adjustment of the firm policy or due to changes in the composition of workers (e.g. within the group of highly-skilled workers). Aggregated data does not enable us to identify which mechanism has been decisive for individual participation in training.84 Here we use matched employer-employee data where individual workers give information about their training history, which combines the advantages of disaggregated data on individual employees and aggregated firm data. We therefore have detailed information, not only about the participation in training of an individual worker, but also about the participation of their co-workers in a firm (and changing compositions of the workforce).

We are aware of only two studies which use linked-employer-employee data with information about individual participation in training: Frazis et al. (1999) is an early exception which uses a US matched dataset to analyze firm and worker characteristics correlated with provision of, and participation in training. The authors can mostly validate results of previous studies using unmatched data. An interesting finding concerns firm size. The positive relationship between firm size and training seen in the dataset is primarily driven by the presence of fringe benefits and innovative work practices. Almeida Santos and Mumford (2004) use Australian linked- employer-employee data of almost 1,500 workplaces and 14,000 employees. The authors in- vestigate firm-specific variables such as (voluntary) turnover, percentage of unionized members or wage compression. Neither study provides estimates of the significance of firms for workers’ participation in training (or the heterogeneity between them), nor assess the interdependence of firm and worker co-financed training.

We are not aware of any such study that simultaneously observes several firms and workers being employed in these firms in Germany. Instead, a number of studies conducted for Ger- many use worker-level data in order to investigate determinants of work-related training. Using the German Socio-Economic Panel (SOEP) from the late 1980’s, Pischke (2001) distinguishes between formal training in general and training which is (co-)sponsored by the employer.85 In

84It is also questionable to what extent a manager knows the exact number of (unique) workers who attended a training in a given year when asked about it. 85Employer sponsored training is defined in this study as "if the training either took place during work hours or the employer is named as the organizer of the training or the employer bore at least some of the monetary cost of 4. NEW EVIDENCE ON FIRM-BASED TRAINING 102 addition to the worker-level characteristics, this data provides some basic information about the employer such as the firm size and sector. The results point towards higher rates of participa- tion amongst men, more highly educated, and younger workers. Firm and job characteristics are also important. Pischke finds that workers in larger firms, in the public sector and in man- agerial or high-skilled white collar jobs, receive more training. The results are very similar for both forms of training. Using an updated version of the same data, Grund and Martin (2012) investigate formal training patterns of workers in the private sector between 1989 and 2008. Training rates have increased over this time period in Germany. In contrast to Pischke’s find- ings, this study does not identify any differences in average training rates between men and women. Training profiles are inversely U-shaped with respect to age and U-shaped with tenure. Participation rates are once again higher for more highly educated workers and for those in more complex professions. Grund and Martin confirm previous results with respect to firm-size and show pronounced differences in training between different sectors. It remains unclear, how- ever, whether firm-size or sector differences are proxies for other unobserved characteristics or attributes which lead to additional differences in training between workers. Fitzenberger and Muehler (2014) use personnel records from a single firm to analyze gender differences in the age-profile of firm-financed training. The use of single-firm data offers the advantage that the confounding impact due to male and female sorting into certain firms can be excluded. The results indicate differences in age-related patterns in training amongst male and female employ- ees. Women recieve less firm-sponsored training than men, particularly when they are in their thirties. Female workers participate relatively more often in training at higher ages but they do not catch up with male training rates.

First, we shed new light on the interplay between the firm and worker dimensions. Estimating the explanatory power of different sets of observables make it easier for the empirical work to test appropriate hypotheses. A worker’s characteristics and the market conditions of firms depend upon one another as certain workers will choose to sort into certain firms. This makes it difficult to rely on either models which focus on the firm-level only or models which focus exclusively on worker heterogeneity.

Secondly, this does not only apply to the empirical work on training determinants but also to the studies analyzing the effects of training. These studies try to measure the benefits of training such as higher wages on the worker-level or productivity growth on the firm-level. This liter- ature has to tackle the issue of unobserved heterogeneity as training investments are typically training". 4. NEW EVIDENCE ON FIRM-BASED TRAINING 103 not allocated on the basis of characteristics which are observable for researchers. Some estab- lishments might have higher training rates due to the introduction of new technologies while certain workers are more bound to the firm and receive accordingly more training. Knowledge about the relative importance of firm and worker heterogeneity (observable but especially un- observable) is crucial if an appropriate identification strategy to estimate causal returns is to be looked for.

The results of this paper show that in comparison to worker heterogeneity, firm heterogeneity is far less important. This means that differences between co-workers in a given firm have a much greater impact on determining investments in training than aggregated differences between the workforce of two firms do. In addition, we can explain a large share of aggregated training differences between two firms if we control for firm, worker and job characteristics. In contrast, most of the worker-level variance remains unexplained even after adding a large set of control variables. These results are not only found for job-related training in general but also hold if we distinguish between purely firm-financed and worker co-financed training. Unsurprisingly, worker heterogeneity plays a more significant role in determining training investments where courses are co-financed by workers.

The paper is organized as follows. In the next section, we will present our research questions (Section 4.2). Section 4.3 and 4.4 will present the available linked-employer-employee panel dataset and the methods we will use to make the most of the available structure. Section 4.5 will show the results of our analyzes while Section 4.7.1 discusses further topics and gives robustness checks. In Section 4.6 we will provide a conclusion, detailing what has been learnt from the study and will outline implications for policy recommendations.

4.2 Research Questions

Our first research question uses detailed linked employer-employee data from multiple firms, along with information about the participation of workers in training, in order to calculate the overall importance of firms for workers’ attendance of training. We contrast this with the het- erogeneity in participation in training between workers within a firm to see whether average training rates differ between co-workers. We first add information on observable characteris- tic of workers, then firms, and finally on job attributes. This helps us to explain why average training rates differ between firms, or which groups of workers have high training propensities. Furthermore, we give information about the relative importance of unobserved variables, i.e. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 104 remaining and unexplained heterogeneity between firms and between workers. We compare re- sults for training in general to analyses based on firm-financed and worker co-financed training courses only.

Research Question 1 How important are firms and workers for individuals’ participation in training? Are there differences between firm-financed and worker co-financed training?

It is difficult to derive unambiguous predictions about the association between certain charac- teristics and training investments from theory. The recent empirical literature has found a large number of relevant predictors for investments in training. However, the majority of these studies either ignore the firm dimension or do not account (properly) for the compositional differences of the workforce between firms. Assume, for instance, that older workers operate in labor market conditions where firms exert more monopsony power (because these workers are less mobile). According to the theory outlined above, the propensity of a firm to provide training should generally be higher in such markets. Accordingly, if we do not take a firm’s monopsony power into account (e.g. a measure of wage compression), we should find that rates of invest- ment in training for older workers are confounded by the omission of the market condition. Thus, a

Research Question 2 Which are the relevant firm-, worker-, and job-level predictors for par- ticipation in training?

Previous literature has often neglected the fact that decisions to participate in different forms of training are not independent of one another. An early exception is Royalty (1996) who uses a multinomial probit model to investigate participation in on-the-job and off-the-job training as well as no training. We offer a new approach to this matter.

4.3 Data

We use the German linked employer-employee dataset WeLL: Berufliche Weiterbildung als Bestandteil Lebenslangen Lernens (Further training as a part of lifelong learning, see Huber and Schmucker, 2012) which comprises four waves of a worker survey conducted between 2007 and 2010. Data has been collected by the Research Data Centre of the Federal Employment Agency at the Institute for Employment Research (FDZ). The sample of survey participants was selected in two steps. First, a random sample of 149 establishments in the manufacturing and service sector was drawn from establishments which participated in the IAB Establishment 4. NEW EVIDENCE ON FIRM-BASED TRAINING 105

Panel in 2005. This is an annual employer survey of approximately 16,000 businesses (see Kölling, 2000). This sample is stratified to establishments with 100 to 2,000 employees from three West-German and two East-German states. Second, an employee sample was randomly drawn from all employees who were covered by the social security system and were employed on December 31st 2005 in one of the 149 establishments. The first wave consisted of 5,819 interviews, the subsequent waves of 4,560, 4,667 and 3,720 interviews respectively.86 It was possible to link the survey data to the administrative records of each employee (provided that he or she granted the relevant permission). Records include information on wages per day and the duration of employment. Survey data was also linked to the IAB Establishment Panel (Schmucker et al., 2014).

WeLL is a particularly well-suited dataset for analyzing training effects as it provides com- prehensive information on workers’ recent participation in training. The questionnaire includes (retrospectively asked) questions about participation in formal training courses such as the dura- tion, content, initiative, financing and whether the training overlapped with leisure time. Survey participants are asked about the number of job-related training courses they had attended since a reference date (which was the last interview for panel participants and a maximum of two years). Basic information about participation in training (start and end date) are available for all training courses while detailed information has been collected for up to three most recent training courses per wave. In addition, the survey includes questions about socio-demographic characteristics, personality traits and satisfaction with one’s life and work. Moreover, respon- dents were asked about their health, labor attachment and career aspirations.

The administrative data provides information on individuals’ employment histories in regard to earnings, employment status, unemployment status and respective benefits going back to 1975.87 This information is available, not only for participants in the WeLL panel, but for all workers in a firm. This gives us information about the size of an establishment, its average wage and wage growth rates without much measurement error.

In addition, it provides us with information regarding firm heterogeneity relevant to partici- pation in training. This information includes sector, legal form, age, and whether a workers’ council was present.

We limit the analyses to individuals still working in one of the 149 establishments from which

86This includes only those interviews in which participants agreed that their survey data can be linked to social security. 87Periods of employment which are not subject to social security, such as periods of self-employment or em- ployment in the civil service, are not observed. For further information see Jacobebbinghaus and Seth, 2007. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 106 they were drawn in 2006.88 We also excluded observations which were missing information in relevant variables such as training or socio-demographic characteristics. In addition, we dis- regarded all workers who attended more than three training courses as information was not adequately detailed.89 Finally, we limited the analyses to workers aged 21-64 and excluded apprentices and workers in partial retirement. We focus on full-time or regular part-time work- ers and disregarded those with a usual working time of at least 15 hours a week. We were subsequently left with 5,785 workers and 12,560 observations. The mean number of unique WeLL-participants per establishment is 39 (median 24) with an interquartile range of 15 to 45.

4.3.1 Variables

The main variable of interest is the participation in formal training courses. In the first wave for instance, individuals where asked "Did you participate from January 1, 2006 up to now in any job-related seminars or training courses?". If they responded positively to this question they were then asked "Who launched the initiative for the participation in training?" and were given the option of selecting the answers "my own initiative", "by order of the firm/supervisor", "mostly upon advice of my firm", "mandatory occupational training" and "mostly upon the advice of someone else". We have excluded mandatory training such as obligatory first-aid courses, fire safety training and equal opportunities courses from analysis. This reduced the number of training courses by around 15.1% in the first wave. We also disregarded training courses completed on advice from a third party (ca. 1%). This gives us an average training participation rate of 44.7% per wave which is comparable to other sources such as the Adult Education Survey (e.g. Autorengruppe Bildungsberichterstattung, 2012). Training participation rates vary statistically significantly by wave and by establishment (according to a F-test). The distribution of number of training courses per worker and wave is shown in Table 4.1. 27.9% of of workers participate in a given waven for example in one training course.

Almost all training courses (more than 85%) included in the survey are rather general in nature in the sense that skills learnt are completely or predominantly transferable to other firms. There- fore, we do not distinguish between general and firm-specific training, but consider who bears the costs for the course.90 To this end, we distinguish between firm-financed and worker-co

88For an analyses of the effects of training on labor mobility using WeLL see Dietz and Zwick, 2016. 89This concerns ca. 4.0% of all workers. Sensitivity analyses regarding participation in any training course are shown in Table 4.10 and discussed in Section 4.7.1. 90Participations in training have been asked whether the skills learnt in the course are transferable to other firms. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 107

Table 4.1: Number of Training Courses Attended per Wave

Observations Percent Number of training courses 6,942 53.0% no course attended 3,652 27.9% participated in one course 1,481 11.3% two courses attended 485 3.7% participated in three courses 529 4.0% attended more than three courses Note: Number of job-related training courses (excl. mandatory training) per worker and wave. Workers attended more than three courses were excluded for the main analyses but are considered in Section 4.7.1.

financed training courses by identifying whether a worker participated in training exclusively during his or her working hours, or whether training also took place during workers’ leisure time. 60% of all training courses took place during working hours only. 20% occurred either partially or only in leisure time.91 If we take participants who attended no training courses into account, we find that 24.4% of workers participate in firm-financed training only, whilst 14.9% participate in training overlapping with leisure time, and 5.3% of workers participate in a given wave in both forms of training (e.g. attend at least two courses where one is completely in working time and the other overlaps with leisure time).

Summary statistics are presented in Table 4.5.

We look at observable individual as well as establishment characteristics with possible relevance for participation in training. We distinguish between establishment, worker, and job attributes. Given that the time period about which workers are retrospectively asked differs between and within waves (depending on the month of the (last) interview) we add a variable to each speci- fication which captures the length of the time period in months.

Our choice of characteristics in the empirical model was influenced by the training literature (e.g. Asplund, 2005). Establishment characteristics include the state in which the establishment operates (these states differ in public training subsidy programs), information about establish- ment size and age, whether the establishment is a public enterprise and whether it operates in the service sector. We use information about the presence of a works council to capture possi- ble training differences due to labor institutions.92 From social security data of all employees of

A training course is here defined as general if trained skills are completely or predominantly transferable to other firms while truly firm-specific training implies that skills are not at all transferrable. 91We do not consider a monetary investment by the worker because only 16.2% of all training courses are not completely paid by the firm and often workers only contribute a small sum. This is also in line with conversations with practitioners: Firms often pay for training even if this is not directly relevant for the tasks involved in a job itself, but for which attendance is required at the weekend or in the evening. 92In a study for Germany, Pfeifer (2015) finds significant positive effects of works councils but not of unions on 4. NEW EVIDENCE ON FIRM-BASED TRAINING 108 the establishments (survey participants or not), we derive the median wage and a linear growth rate for establishments’ median wages from 2000-2010. Furthermore, we add a measure for the wage-compression of an establishment which, according to the model provided by Ace- moglu and Pischke (1999), might induce firms to invest in general training: This measure is the standard deviation of establishments’ log daily wages of full-time workers in 2006.93

As worker characteristics we consider sex, age, age squared, foreign citizenship and living alone. In addition, we look at levels of education (vocational qualification or tertiary degree), experience of unemployment in the last five years (derived from social security data) and a proxy for a worker’s labor attachment and health status. Information on labor attachment was collected via a question which asked workers how likely they consider it that they will be still be employed in a year’s time (on a scale ranging from 0 to 10). Health status was assessed subjectively on a Likert scale ranging from very good to very bad (1-5).

Job characteristics included working hours, temporary work contract, tenure and an indicator for recent recruiting. The latter is included because we expect higher training rates for new workers who lack firm-specific skills. We are also interested in job requirements since we as- sume that workers performing complex tasks have greater training requirements. We proxy job requirements by job-tasks (see Figure 4.1), occupational information (Figure 4.2) and manage- rial responsibilities. Job-tasks measure actitivities workers frequently perform in the job.94

4.4 Methods

We aim to analyze establishment and worker determinants of participation in training simulta- neously for which we use multilevel generalized linear models (for a textbook introduction see Gelman and Hill, 2007)95.

These models allow us to simultaneously consider firm and worker heterogeneity in participa- tion in training. We combine the advantages of individual level data (rich information, not only about the individual worker, but also about workers in the same firm) with attributes of the firms. participation in training, so we do not include information about the presence of unions. 93Median wages are calculated for all employees while the wage compression refers to full-time workers only; we impute censored wages using a tobit model. 94Information regarding the job-tasks are comparable to information used for instance, by Spitz-Oener (2006) who investigates technology-driven changes in the skill demand in Germany in recent decades. Respondents in the survey were asked whether the perform a task frequently, seldom or never. We consider here only job-tasks which are frequently performed. 95Other terms are for example mixed or two-way random effects models. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 109

Firms’ average training rates are not, for example, confounded by the changing composition of their staff. We can estimate the relevance of firms’ and workers’ observable and unobservable heterogeneity with regard to training. Furthermore, we are able to perform all analyses which can be done by either disaggregated household data or aggregated firm level data because stan- dard linear least squares estimates can be considered a special case of multilevel models.

In Section 4.7.1, we show that results are quite similar if either the incidence or the intensity of training is considered (training yes/no or the number of courses a worker attended). We therefore focus primarily on the incidence of training – or whether a given worker participates in any formal training at a certain period.

The model gives us an estimate of the total variation in participation in training between es- tablishments and between workers within the same establishment. The training differences can be either observable or unobservable for us. We can explain observable variation by adding establishment, worker or job characteristics in the following specification while unobservable variation remains the same (in the form of a positive variance of establishment and/or worker random effects). Since our outcome is binary, we use a random effects logit model.

  −1 (1) (1)  Pr Training=1it αJ(i,t), θi, Tt = logit αJ(i,t) + θi + Ttτ (4.1)

Here and in the following specifications, we examine whether worker i in establishment J(i, t) participates in training at period t. α and θ are establishment respective worker random effects and T year dummies (we do not assume normality regarding the time dimension because there are at maximum four observations for each worker).96 We approximate the log-likelihood, which has no closed form solution, by Gauss-Hermite quadrature (see StataCorp, 2013). We use the variance of establishment and worker random effects to assess the variation attributable to each dimension. We identify establishment random effects by observing multiple workers per establishment (within and across waves), whilst it is the panel structure of the data which 2 2 enables us to identify worker random effects. We assume that α ∼ N (0, σα) and θ ∼ N (0, σθ ) (and mutual independence of the random effects). We must assume that workers do not self- select into certain firms because of favorable unobserved characteristics.97 The independence assumption is necessary because we observe only four workers who appear in two different

96We denote here the random effects of Equation 4.1 with a superscript "(1)" to allow for the variance decom- position a distinction to the random effects of other equations. 97Our (necessary) assumption that establishment and worker random effects are mutually independent (plus in- dependent of the other covariates) excludes for example job mobility related to the idiosyncratic training provision of an establishment. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 110

WeLL establishments (for a formal discussion see Abowd and Kramarz, 1999). In Section 4.7.3 we discuss a further specification using establishment fixed effects.

We first estimate a basic model without any control variables other than for time effects. We cluster standard errors on the establishment level.98 In a second step, we add establishment char- acteristics Z to our equation to see which attributes of establishments are associated with higher or lower training rates of the workers. Furthermore, we investigate how well establishment char- acteristics explain inter-firm variation in training. All workers of the same establishment share the same establishment attributes. Therefore, by construction, establishment characteristics Z can only explain variation between establishments and not between workers within the same firm.

  −1 (2) (2)  Pr Training=1it αJ(i,t), θi, Tt, ZJ(i,t),t = logit αJ(i,t) + θi + Ttτ + ZJ(i,t),tδ (4.2)

We then once again exclude observable establishment information and look at the association between participation in training and workers’ observable attributes. On the one hand, differ- ences in worker characteristics X can explain training differences between workers in the same establishment in terms of their participation in training. This may be the case, for example, where higher ability correlates with increased levels of training. On the other hand, it might be that certain workers tend to work at certain establishments - high-ability workers for example might gravitate towards a certain employer, this might also explain why average differences be- tween workers within the same establishment, for example if worker’s ability is correlated with training. On the other hand, if certain workers such as high-ability workers tend to cluster at certain establishments, worker characteristics would explain why average training rates differ between firms.

  −1 (3) (3)  Pr Training=1it αJ(i,t), θi, Tt, Xit = logit αJ(i,t) + θi + Ttτ + Xitβ (4.3)

Finally, we look simultaneously at firms and workers to see whether associations identified be- tween establishment characteristics and training for example, are driven by differences between the workforces of establishments.

  −1 (4) (4)  Pr Training=1it αJ(i,t), θi, Tt, Xit, ZJ(i,t),t = logit αJ(i,t) + θi + Ttτ + Xitβ + ZJ(i,t),t δ (4.4)

98This should give us a conservative estimate of the standard errors since there are (almost) no mover between establishments in our sample. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 111

In a final step, we add information about job characteristics W such as tenure or job require- ments (our "full model").

  Pr Training=1it αJ(i,t), θi, Tt, Xit, ZJ(i,t),t, WJ(i,t),t = (4.5) −1 (5) (5)  = logit αJ(i,t) + θi + Ttτ + Xitβ + ZJ(i,t),t δ + WJ(i,t),t γ

Research Question 2 investigates predictors for participation in training. We will answer this question by looking at observable predictors of participation in training X, Z and W . The relative importance of single variables can be easily assessed by looking at either the size of their coefficients as we have standardized all right-hand-side variables including binaries, or by comparing t-statistics in order to incorporate uncertainty.

In order to investigate the role of firm and worker heterogeneity (Research Question 1), we look at the variance of worker and firm random effects before and after adding observable information. The reduction of the variance of the random effects in the full model (e.g. af-  5   5 ter controlling for worker and firm characteristics Var αJ(i,t) , Var θk ) compared to the basic  1   1 model (Var αJ(i,t) , Var θk ) gives an estimate of the relevance of observable vs. unobservable heterogeneity.99

To investigate the importance of establishments and workers with regard to training, we use a variance decomposition approach. The variance decomposition for mixed models without random coefficient is easily derived as the covariance in participation in training between two  2 workers of the same establishment is simply Cov Traini1t , Traini2t = σα(1) . We are there- 2 2 2 Π2 fore able to partition the total variance σ = σα + σθ + 3 into the three variance terms - the 2 variance of the establishment random effects σα(1) , the variance of the worker random effects 2 σθ(1) and the variance of the latent error which is assumed equal to the variance of the logistic Π2 100 distribution 3 . This gives us an estimate of the relevance of inter-establishment training differences compared to heterogeneity of participation in training between workers of the same establishment over time (this measure is usually called the intra-class correlation). Equation (4.6) shows a measure of the overall or unconditional relevance of establishments for workers’ participation in training, whilst Equation (4.7) shows the relevance of establishments after con- trolling for establishment, worker and job characteristics. Sections 4.7.3 and 4.7.4 discuss other specifications including different approaches to the variance decomposition.

99We abstract here from the complication that observable heterogeneity is time-varying while random effects are by definition constant over time. 100This is a standard assumption in the literature, see for example Hox (2010). In Section 4.7.4, we show results using a linear approximation as suggested in Goldstein et al. (2002). 4. NEW EVIDENCE ON FIRM-BASED TRAINING 112

σ2 σ2 θ1 (4.6) θ5 (4.7) 2 2 Π2 2 2 Π2 σα1 + σθ1 + 3 σα5 + σθ5 + 3

Finally, we investigate the relationship of firm- and worker-financed training. Workers might participate in both firm. and worker co-financed training, or indeed, in neither of them. We first analyze whether establishments which provide a great deal of training are also those in which many workers participate in training which they themselves co-finance. We hereby run the specifications from Equation (4.1) and Equation (4.5) for participating in training entirely during working hours, and for training which overlaps with workers’ leisure time. We then (1) (1)  correlate the establishment random effects Corr αFirm financed, αWorker co-financed in order to shed light on the relationship between aggregated firm- and worker co-financed training rates in a first step.

For the individual worker perspective, we differentiate between not participating in training, attending firm-financed, worker co-financed training courses or participation in both forms of training in a given wave. We use a multinomial logit model.101

T τ +X β +Z δ +W γ   exp t c it c J(i,t),t c J(i,t),t c Pr Training=cit Tt, Xit, ZJ(i,t),t, WJ(i,t),t = (4.8) P4 Ttτc+Xitβc+ZJ(i,t),tδc+WJ(i,t),tγc i=1 exp

For computational reasons, we do not include establishment or individual random effects here. Standard errors are clustered on the establishment level to account for intra-establishment cor- relation. We refrain from using multiway clustering for reasons of simplicity since there is almost no mobility of workers between establishments and clustering on the highest level gives conservative estimates.

4.5 Results

4.5.1 Research Question 1

In the following, we present results gained for our research questions introduced above in Sec- tion 4.2, namely, how important firm and worker heterogeneity is for the individual participation

101Unfortunately, we do not have variables available which vary with the alternative training outcomes. This means that we cannot relax the independence of irrelevant alternatives assumption by using a multinomial probit model without relying on the correct specification of the functional form. As discussed in Keane (1992), results for the multinomial probit without exclusion restriction are "extremely tenuous". A nested logit model has been numerically instable so far, so we do not present its results here. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 113 in training. And whether there are differences between firm-financed and worker co-financed training. To this end, we first compare the intra-class correlation for establishments and work- ers by looking at the relative importance of firms (inter-firm correlation), and workers (corre- lation of workers within the same establishment) within the overall variance in participation in training. Table 4.2 shows the intra-class correlation for establishments and workers for all job-related training courses (Equation 4.6 and 4.7). Our estimate of the unconditional intra- establishment correlation (basic model) attributes 12% of the total variation in participation in training to differences between establishments (across-firm heterogeneity). The importance of workers is more than twice as large (28%) (within-firm heterogeneity), while according to the latent-response formulation, most of the variance in training remains unexplained (slightly above 60%). According to a likelihood-ratio test, both establishment and worker random effects are highly significant.

Table 4.2: Intra-class Correlation (job-related training in general)

Level Basic Model Firm Char. Worker Char. Firm+Worker + Job Char. Equation 4.1 Equation 4.2 Equation 4.3 Equation 4.4 Equation 4.5 Establishment 11.9% 6.4% 9.5% 5.6% 1.4% Worker 27.6% 29.4% 25.7% 26.9% 19.3% Latent Error 60.5% 65.6% 65.0% 69.0% 79.6% Note: This table reports the estimated intra-class correlation (Equation 4.6 to 4.7) for different specification of a multilevel logit model for participation in job-related training courses (see Table 4.7). Establishment and worker random effects are statistically significant in all specifications according to a likelihood-ratio test.

In the next step, we take the fact that businesses and workers differ in observable character- istics into account. We thereby compare intra-class of the basic models with the full model and other specifications. Comparison of the first column (basic model) and the last column (full model with establishment, worker, and job characteristics) of the first row in Table 4.2, indicates that almost 90% of the across-establishment correlation can be explained by observ- ables. The comparison of the basic model with other specifications (columns 2-4) shows that the variation in average training rates between establishments is mostly reduced by establish- ment and job characteristics, and to a lesser extent by worker characteristics. This indicates that establishment-variation in terms of workers’ participation in training courses can primarily be explained by firm-level or job-level differences, but to a lesser extent by the composition of the workforce itself.102

102In our final specification, including establishment, worker and job characteristics, the intra-establishment cor- relation drops to meager 1.4%. However, establishment random effects variation remains still significant according to a likelihood-ratio test (as do worker random effects). 4. NEW EVIDENCE ON FIRM-BASED TRAINING 114

Next, we look at training differences between workers within the same establishment. Here, we can only explain 30% of the variance in worker random effects by including a large set of observables (1 − 19.3/27.6 = 0.30). This is in stark contrast to the previous results for the establishment level where little unexplained heterogeneity remained.103 For the explanation of training incidence differences between workers we find that job characteristics are more power- ful predictors than worker characteristics (establishment characteristics cannot explain training heterogeneity of workers within an establishment by construction). Note, however, that the set of observed job characteristics is also larger than the set of observed worker characteristics (36 vs. 10).

The intra-class correlation (and the variation in random effects) is an abstract concept which is not always immediately intuitive. The relevance of the variation in worker random effects can be expressed with a simple statistical example. For this example, we look at 1,090 workers who participate in all four waves of the survey (18.8% of all survey participants). If we ignore the establishment dimension here, we can expect slightly more than 9% of workers to never participate in training. If training were to be randomly allocated, we could expect 4% of workers to participate in training in all waves, with a probability of 45%.104 The actual numbers are strikingly greater, 26.6% do not participate in training at any point, whilst 9.1% of workers participate in training in all four waves. Here we see a much larger variation than would be expected from random allocation.

Table 4.2 reveals three main findings. Firstly, the largest share of the variance in participation in training can be attributed to neither firm differences, nor to worker differences. Secondly, whilst most of the establishment differences can be explained by observable characteristics, this does not translate to the worker level. The majority of within-establishment worker heterogeneity remains unexplained (the intra-class correlation goes down from 27.6% to 19.2%). Unexplained heterogeneity is therefore a major issue. Finally, Table 4.2 indicates that job characteristics are a significant source of training heterogeneity on both the firm and worker level.

Before we discuss the findings for purely firm-financed and co-financed training, we aim to illustrate the magnitude of differences in training participation. We wish to show how the vari- ance of establishment random effects is reflected in average training rates between establish- ments.105 One must bear in mind that all variables including binaries are standardized such that

103Of course the number of observations per establishment is much larger than that is the case for workers. 104(1 − 0.45)4 ≈ 0.0915 and 0.454 ≈ 0.041. 105 2 2 Random effects variance components are available in Table 4.7 in the second last block (θF irm / θW orker). 4. NEW EVIDENCE ON FIRM-BASED TRAINING 115 we can read the aggregate average training probability simply by considering the intercept (and of course, the logit link function). The aggregate average training rate of the establishments 1 in the basic model is 1+exp(−0.38) ≈ 40.6%. The variance of establishment random effects is 2 106 σθ(1) = 0.65. This means that an establishment with an unobserved random effect which is one standard deviation below the average has an unconditional training probability of just 1 1+exp(−0.38−0.65) ≈ 26.3%. However, looking at the full model (last column of Table 4.7), we 2 see that σθ(5) = 0.05. Tis means that differences in training rates between establishments can, to a large extent, be explained by observables. This is a result which we already have seen in the intra-class correlations. The analogous difference translates into a range of 2.4 percentage     points ( µ − σθ , µ + σθ = 41.1% , 43.5% ).

We then go on to provide a decomposition for the variance for training which is either fully financed by the firm, or at least partially financed by the worker (part two of Research Ques- tion 1, see Table 4.3). This analysis delivers very similar results to those seen following the decomposition of the sum of all training.107 Again, we find that the worker dimension is more important than the establishment dimension, and we can explain a much larger share of the variance on the establishment level with reference to observables than we can on the worker level. Most interesting, perhaps, is the finding that co-financed training exhibits a larger share of worker level variance compared to training which is purely financed by the firm. The share on the establishment level meanwhile, is ≈11%, making it similar across all types on training.

Table 4.3: Variance Partitioning (firm-financed / worker co-financed training)

Firm-financed Training Worker co-financed Training Level Null Occ Null Occ Establishment 11.2% 2.3% 10.8% 3.7% Worker 23.4% 18.0% 34.8% 30.4% Note: Table 4.3 reports the estimated intra-class correlation (Equation 4.6 and 4.7) for differ- ent specification of a multilevel logit model for participation in job-related training courses either completely in working-time of partly in leisure time (see Table 4.8 and 4.9). Establish- ment and worker random effects are statistically significant in all specifications according to a likelihood-ratio test.

106The aggregate average training rate is somewhat lower than the overall average of 45%. This is mostly due to the fact that establishments and workers are not balanced in size. Establishments with high training rates, e.g. with large random effects, are larger on average (although this association is mitigated by other variables as shown below). 107Here, we only show the comparison between the basic and the full model. Results of the other specifications can be delivered on request. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 116

4.5.2 Research Question 2

In order to answer Research Question 2, we look at the regression coefficients for the different specifications in Table 4.7. With regard to establishment characteristics, we do not find differ- ences between smaller and larger firms in our sample.108 The same applies to establishment age. We find higher training rates for workers in public enterprises and, in line with the theoretical literature (Acemoglu and Pischke, 1999), in establishments with a compressed wage structure. However, given that adding job characteristics results in a loss of statistical significance, train- ing advantages in these firms seem to be driven by differences in the tasks involved in specific jobs (see Column 5 in Table 4.7). We find consistently higher training rates amongst work- ers in the service sector, in high wage establishments and for workers in establishments which experienced a positive linear wage trend. Having said this, the magnitude of this coefficient falls considerably if worker and job characteristics are included. Once again, in comparison to worker characteristics, job characteristics can be considered more important.

Various worker characteristics, in particular workers’ level of education, are found to be strong predictors of their likelihood of participating in training. (see Table 4.7, Column III and IV). However, after adding job characteristics, the majority of coefficients fall and once again be- come insignificant. Less well educated workers and individuals with health issues exhibit yet lower rates of participation in training. Despite this, after the addition of job characteristics, highly educated workers and those with a high degree of labor force attachment no longer show an advantage in participation in training. The connection between higher levels of education, particularly tertiary education, and increased on-the-job training has been identified in the lit- erature on a number of occasions (e.g. Carneiro and Heckman, 2003). We have been able to confirm such observations. In the full model, however, the coefficient of tertiary education re- duces almost to zero, thereby becoming statistically insignificant. This means that a high level of education correlates with a significant number of job characteristics such as with profes- sional occupations. The causes of training advantages remain unclear. Our results do at least point strongly to the finding that the dimension of job characteristics plays a considerable role in the assignment of training courses within businesses. In addition, we find an interesting re- sult regarding the selection of certain groups of workers. The tendency of women and older workers to participate in training is lower only after job characteristics are added. This is an indication that these groups perform more training-intensive tasks but receive less training than

108Note, that our sample consists of establishments ranging from 100 to 200 employees. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 117 co-workers in similar jobs. Lower training attendance is particularly pronounced amongst older workers (see Zwick, 2015). This has been identified by a large negative coefficient for age squared.

As shown by the variance decomposition approach, job characteristics are important predictors for participation in training. We find considerably higher training rates for workers who have managerial responsibilities. In contrast to other studies (Grund and Martin, 2012), we do not find lower training rates for part-time workers. Moreover, participation in training varies de- pending on job content, that is the specific occupation and the tasks involved in a job, as shown in Figure 4.1 (see also Görlitz and Tamm, 2016). Here we find the largest difference between manufacturing and knowledge-based jobs. If we translate the tasks into a classification of rou- tine versus non-routine and analytic, complex, interactive, or manual jobs (Spitz-Oener, 2006), we find that it is in particular non-routine, complex activities, such as analytic or interactive tasks which are associated with higher rates of training. There are, however, no negative asso- ciations for routine tasks. Workers who perform manufacturing tasks for example, very seldom participate in training. If we look at occupations, we find marked differences in participation in training. This is shown in Figure 4.2. The reference occupations are other services which consists mostly of low-skilled white-collar jobs such as security contractors and cleaners. We see that many occupations show higher training rates (only assembling and food processing jobs have actual insignificantly lower participation rates) than other services. Training is particularly common in certain female-dominated occupations such as health and education professions in which more ca. two-thirds of all workers are female. In addition, rates of training are high amongst engineers and electricians, as well as in certain white-collar jobs including merchants and in a number of professions such as consultancy and accountancy. Training differences between workers who perform certain tasks or work in certain occupations are significant, as illustrated by the large (standardized) coefficients.

We now turn to the question as to whether findings for Research Question 2 depend on the type of training concerned; training which takes place during working hours only, and training which overlaps in leisure time. The answer is: partly yes. Here, we compare results for the total participation in training with results of training taking place during working time (see Table 4.8) and training overlapping with leisure time (see Table 4.9). We have found some interesting results with regard to the variables measuring firm characteristics. While both proxies for the economic situation (the median wage and its linear trend) are positively related with training courses taking place in working time, this is not the case for training overlapping with leisure 4. NEW EVIDENCE ON FIRM-BASED TRAINING 118 time. Indeed, establishments’ median wage is negatively correlated with workers’ participation in co-financed training in the full model including job characteristics (at the 10% significance level). It is to be expected that training which is fully financed by the firm is more sensitive to the economic situation of the business. Moreover, the negative association between the median wage and worker co-financed training might point to a possible substitution effect. Workers make up for the lower investments made by the firm by themselves contributing more to finance the human capital investment.

A comparison of the worker characteristics also shows very interesting results. First, in the models excluding job characteristics signs and significance levels of almost all worker char- acteristics are similar across all types of training, including workers’ level of education. One exception to this is female workers who tend to participate less in training courses occurring in working time, and more in training courses which overlap with leisure time. Furthermore, the shapes of the age-training profiles differ for both forms of training. Whilst the age structure was significantly u-shaped for training co-financed by the worker, there is no such relationship between age and training in working time. Again, we see that most coefficients decrease in magnitude and become insignificant after job characteristics are taken into account. Still, some differences are striking: after the inclusion of job characteristics, only female workers and those with health issues exhibit significantly lower rates of training in working time. That means, on the one hand, that the assignment of purely firm-financed training is strongly determined by the worker’s specific job. On the other hand, it would seem that firms investment less in women and workers with health problems.109

The picture is quite different if we look at the full model of training courses overlapping with leisure time. Firstly, women participate significantly more often in such training courses. Sec- ondly, differences between groups of workers depending on their levels of education remain significant but coefficients are not very large. In addition, it is interesting that after includ- ing controls for the job cell, age differences become even larger. Finally, other characteristics which might be proxies for worker productivity, labor attachment or health status for example, become insignificant if job heterogeneity is taken into account. Thus, we can conclude that job and worker attributes are important determinants of participation in training courses for which workers bear some of the costs in that training overlaps with their leisure time. The economic

109Note, we do not argue that firms discriminate against women and sick persons. We just find that both groups attend less in (fully) firm provided training, even after controlling for various characteristics. The actual reasons therefore are difficult to establish, e.g. whether this is voluntary or not. We investigate the possible reasons for lower female job-related training rates more in detail in Chapter 5 where we also show that this is probably not due to higher turnover rates, see also Royalty (1996). 4. NEW EVIDENCE ON FIRM-BASED TRAINING 119 rationale of a worker, such as the expected amortization period of training investments, seems to play an important role in this context. This is in line with theoretical considerations we presented in Section 4.2.

Finally, we look at the findings for job characteristics. In line with theoretical considerations, workers with limited contracts and those in part-time participate less in training courses taking place during working time. This does not apply to training which overlaps with leisure time. Here we even find a positive correlation with part-time work which might again suggest that substitution effects play a role here (of part-time working women). Workers with managerial responsibilities display a greater probability of participating in both types of training. As shown in Figure 4.1, there are some minor differences between performing certain job-tasks such as conducting negotiations and training (strong positive relationship with firm-financed training but virtually no for worker co-financed training). In general, we find very similar results for both types of training.

In contrast, we find large occupational differences between training in working time and training overlapping with leisure time. Figure 4.2 shows, on the one hand, that higher training rates in most occupations (compared to other services) are driven by worker co-financed training. Only four occupational groups, engineers, merchants, clerical professionals and teaching professions are provided with significant more training. On the other hand, workers in many other jobs are trained more often in their leisure time. This is particularly true for health-related professions.

In Table 4.6, we arrange the coefficients from the previous specifications in Tables 4.7 to 4.9 according to the absolute value t-statistic. This is a simple but common measure of relative importance which takes the magnitude of the association with training as well as the uncertainty into account.110 Job characteristics which constitute somewhat more than 50% of all variables (36 out of 64) dominate Table 4.6. For our overall training regression, only the wage level of an establishment is among the ten predictors with the highest absolute t-statistics. If we compare firm financed and worker co-financed training, only three tasks are significant in both specifications: computer usage, organizing and advising resp. counselling.

4.5.3 Interrelation of firm-financed and co-financed training

Substitution between firm-financed and co-financed training might exist if workers who do not have access to purely firm-financed training compensate this by investing in training (at least

110We refrain from using more advanced statistics because of computational aspects. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 120 partly) themselves. A complementary effect is, however, also likely as the mechanism which increases the incentive to invest in training might be the same for both parties. The interrelation of firm-financed and co-financed training therefore remains an empirical question. Even if we interpret firm-financed training courses as firm-specific and worker co-financed training as general (or a mixture of firm-specific and general training), we cannot derive clear hypotheses regarding the sign of the interrelation. We would expect that establishment characteristics are important predictors for firm-specific training investments. It is plausible to assume that certain establishments benefit more from such investments than others (e.g. those in which technology evolves rapidly). In imperfect labor markets, this should similarly apply to firms’ investments in general training; and should translate to workers’ investments. Individual attributes such as age, education, and labor attachment can also be expected to be relevant for the investment made by the business, just as such factors are relevant for the individual worker. However, if the expectation about lifetime returns differs between establishment and worker, we would expect a substation of both types of training.

To investigate the possible interrelation between firm-financed and worker co-financed train- ing, we start with an aggregate analysis and run the previous specification (Equation 4.1 and Equation 4.5) for training completely in working time and training overlapping with leisure time. To examine whether workers in establishments which provide a lot of training during working hours also often co-finance training, we correlate the 149 establishment random ef- fects of both regression before and after conditioning on observables. The results indicate a small complementarity between both forms of training on the establishment level before adding establishment, worker and job characteristics. After conditioning on observable heterogeneity, this positive association becomes very small and close to zero, indicating that worker and job heterogeneity confounds this picture.

In order to gain a more detailed view on this, we also ran a multinomial logit regression, which is presented in Table 4.11. We distinguish not only between workers who participate in training entirely during working-time or in training which overlaps with leisure time, but we also observe workers who participate in either of the two (or neither). The reference group in Table 4.11 are workers who do not participate in any form of training. This analysis reveals some interesting further insights about training.

We start with the establishment level. Establishments in the service sector and high wage es- tablishments provide more training which takes place entirely during working time. Workers in these businesses, however, do not themselves invest more in training than workers in other firms 4. NEW EVIDENCE ON FIRM-BASED TRAINING 121 do. Therefore, we do not see any differences regarding worker co-financed training or partici- pation in both forms of training. For firm-financed training, we find some differences regarding the firm size: firms with 200 to 500 employees (the middle category in our dataset) seem to provide more training than smaller or larger firms. We do not have an explanation for this and it is the only specification for which we find a correlation between training investments and firm size. Unsurprisingly, workers in rapidly growing businesses, as measured in rising wages, seem to participate in more training both provided by the firm but simultaneously co-financed by the worker.

With regard to worker heterogeneity, we once again find that women tend to participate in no training, rather than in firm-financed training. Royalty (1996) found a similar result in the United States in the 1990’s using a multinomial probit approach.111 If we differentiate between women who participate only in worker co-financed training (but not in firm-financed training as well), we find higher training rates for women in worker co-financed training (not depicted here). However, women seem to participate more in both forms of training which points to more heterogeneity in participation in training for women (here p-value: 0.14; a fact which we further investigate in Chapter 5). In terms of workers’ level of education, there are only few low-skilled workers in our dataset but they seem to participate less in all forms of training particularly in multiple training courses.112 Older workers do not necessarily participate less in firm-financed training but they participate much less in training during leisure time, as indicated by a significant negative squared term for age and as seen above.

Looking at job characteristics, we once again see that workers with managerial responsibilities participate more in any form of training whilst workers with a limited contract or part-time workers tend more often not to participate in training which takes place completely in working hours. This suggests that the part-time disadvantage is not driven by our definition of firm- financed and worker co-financed training.

4.6 Conclusions

A large empirical literature addresses determinants and returns to training, few studies have looked simultaneously at multiple firms and their individual workers. The reason for this is

111Royalty (1996) assumes independent error for the multinomial probit approach. This has no advantage com- pared to the multinomial logit approach with respect to the independence of irrelevant alternatives assumptions (see also Footnote 101). 112The extremely large coefficient for low-skilled workers participating in both types of training can be explained by sample size issues. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 122 clear. Such an approach requires precise matched-employer-employee data including detailed training information on the individual level which has as yet been barely available. The lack of such empirical studies is clearly a shortcoming in this field as theory predicts that training investment decisions of firms and workers are clearly interlinked.

In this study, we have shed new light on observable and unobservable determinants of job- related training using a new matched dataset of 149 establishments and 7,000 workers. We first analyzed the incidence, and later also the intensity of workers participation in training within and across firms. We investigated which observable characteristics such as firm size, educa- tion level or the tasks involved in a given job can be used to predict the participation in job- related training courses. We distinguished between training courses fully financed by firms and those co-financed by workers and took into account the role of mandatory occupational training. Furthermore, we considered how similar workers within an establishment behave with respect to participation in training. Second, we identified the relevance of unobserved heterogeneity amongst firms and workers in order to build a complete picture of the determinants of training investments. Our study thus provides econometricians analyzing returns to training with impor- tant indications about the source of endogeneity. Finally, we examined the link between firm- and worker-financed training in order to discover the link between them.

Our empirical findings have shown that that whilst job-related training is unequally distributed across firms, it is even more unequally distributed across workers within firms. Interestingly, most of the variation in training between firms could be explained by observable characteristics such as the sector, the economic performance or the amount of wage compression within the firm. After conditioning for these variables, training differences between firms became almost negligible. This may in particular be due to the inclusion of job characteristics which exhibit the highest degree of observable heterogeneity.

Results were quite different however, at the worker level, which were found to be twice as significant as results at the establishment level in determining the individual’s likelihood of par- ticipating in training. The majority of the training differences between workers within the same establishment could not be explained by observable attributes of either the worker or the job itself. Interestingly, socio-demographic characteristics had only a small predictive power for training. Whilst job attributes do seem to be important, the reasons for the majority of variation in training participation between workers within the same firm remain unexplained. Unsurpris- ingly, unobservable heterogeneity between co-workers was a more significant determinant more for worker co-financed training courses (overlapping with leisure time) than for training courses 4. NEW EVIDENCE ON FIRM-BASED TRAINING 123 taking place completely within working hours.

All in all, our findings are important for econometricians wishing to measure causal returns to training. Unobserved heterogeneity seems to be a significant cause for differences between workers in terms of whether or not they choose to participate in training. This does not hold, however for establishment differences (or at least to a much lesser extent).

A further interesting finding described in this paper is the considerable importance of job char- acteristic such as the specific occupation or the tasks involved in a job, in determining workers’ participation in training. Many determinants used in the existing literature, both at the firm- and at the worker-level, became insignificant or, at least, decreased in magnitude after job attributes had been controlled for. This was true for example, for the training advantage of workers in the public sector or those who hold a tertiary degree. This finding is extremely significant as many previous studies which analyzed the determinants of training did not include informa- tion on job characteristics. These variables explain a substantial proportion of the unobserved heterogeneity between individuals and firms with similar observable characteristics.

Future research should aim to acquire better matched employer-employee data which allows further examination of the assumption that workers and firms are independent when it comes to job-related training. This requires a sufficient number of workers switching between establish- ments. Furthermore, a more detailed analysis of the interrelation of firm-financed and worker co-financed training, using plausible instrumental variables, could provide valuable information about as to whether firms harness workers’ willingness to invest in training. Results presented here show that some workers such as managers participate more in all forms of training, whilst female workers for example, participate less in firm-financed training but more in training that they themselves co-finance. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 124

4.7 Appendix

4.7.1 Further Analyses

The results we have presented here are remarkably stable, even where different sets of variables or different samples are added or removed. The interpretation of our results is also quite similar whether we look at the incidence of training (Section 4.4-4.5) or the intensity of training (shown in Section 4.7.2). Our results are robust to other specifications as shown in Section 4.7.3 and 4.7.4.

4.7.2 Intensity of Training

In this section, we will illustrate that the basic results are similar if we take into account the number of training courses a worker participated in. As discussed in Section 4.3, we have so far restricted our analyses to workers who participated in no more than three training courses (see Table 4.1). For overall participation in training, we also include workers who took part in more than three training courses (the maximum per wave is eighteen).113 We use a multilevel Poisson regression for the analyses of training intensity (for an introduction see Hox, 2010). Again, we standardize all coefficients.

  µit = E yit αJ(i), θi, Tt, Xit, ZJ(i),t, WJ(i),t = (4.9)

= exp(αJ(i)) · exp θi · exp Ttτ + Xitβ + ZJ(i),t δ + WJ(i),t γ

For the consistency of this estimator it is sufficient to specify the correct conditional mean func- tion (the random-effects have to be uncorrelated with the other characteristics, but the correct distributional assumption is not required, see Rabe Hesketh and Skrondal, 2012). The Poisson multilevel model is more flexible than a standard Poisson regression by allowing for certain forms of overdispersion (the variance depends on the variation of the random effects and it is not automatically equal to the mean) (Rabe Hesketh and Skrondal, 2012).

Unfortunately, there is no latent-response formulation for the Poisson model available which would enable us to use a simple variance partitioning similar to the logit model. We can, how- ever, still consider a simple form of the intra-class correlation as ratio of the establishment variation compared to the total variance (establishment plus worker, see Stryhn et al., 2006).

113The results here are very similar if we look at the intensity for up to three training courses. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 125

We compare our new results to the results from Equation (4.10) excluding the term for the variance of the logistic distribution. 2 σθ 2 2 (4.10) σα + σθ

4.7.3 Decomposition Based On Establishment Fixed-Effects Estimation

In the following, we show that our basic results hold if we use a logit model including a dummy indicator for each establishment (fixed-effects) and (only) worker random-effects. We do not include worker fixed-effects because there are too few workers who switch from one WeLL establishment to another, thereby facilitating the separation of firm and worker fixed-effects.114 We exclude observable establishment characteristics because the majority of characteristics are time-invariant. We restrict this analyses to those workers in establishments with at least 25 interviews, this leaves us with 12,143 workers in 125 establishments.115

The results regarding observable worker and job characteristics are similar to those shown in Ta- ble 4.7.116 As already seen in Section 4.5, women, low-skilled and older workers and those with health issues show lower rates of participation in training. The lower training rates for workers with recent unemployment experience, for part-timers and for high-tenure workers becomes sig- nificantly negative while the training advantage for recently hired workers becomes significantly positive. In the decomposition, we find almost similar results for the worker random effect vari- 1 5 ation (σθ ≈ 1.39 and σθ ≈ 0.64), while the variance of the coefficients of the establishment indi- cators is considerably larger and remains more important after introducing worker and job char- q q FE1 FE1 FE5 FE5 acteristics ( Var(β1 , . . . , β125 ) ≈ 0.98 in the basic model Var(β1 , . . . , β125 ) ≈ 0.58). The larger variance for the fixed-effects is not surprising, this is down to the fact that random effects components were weighted between the fixed-effects estimates for each establishment and the average over all establishments according to the sample size of an establishment (see Gelman and Hill, 2007). In our view, the lower reduction in the variance at the establishment level is primarily a result of problems associated with the small sample size. If we re-run the same regressions for all 149 establishments, we see that the variance of the establishment indicators doubles (in both specifications). Nonetheless, this could indicate that the role of unobservable establishment heterogeneity is not as low as indicated by the multilevel model.

114For the assumptions required to separate identify firm- and worker fixed effects, see Abowd and Kramarz (1999). 115Establishment averages estimated with fixed-effects are very noisy for establishments with fewer interviews and artificially inflates the importance of the firm dimension in a variance partitioning. 116These results are available upon request. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 126

4.7.4 Alternative Decomposition: Linear Approximation

Goldstein et al. (2002) suggest a linear approximation if probabilities are sufficiently far away from 0 or 1. Here, we treat participation in training as if it were normally distributed and use the estimated variation of the residuals for the variance decomposition. We therefore estimate a multilevel model of participation in training on the same variables as in Section 4.3 with an identity link function. This model is in general not consistent (Amemiya, 1977) but the results are quite similar, so we are confident that we can use it as a cross-check for our logistic model. We obtain the following decomposition for the basic and full model including establishment, worker and job characteristics (Equation 4.1 and 4.5):

Table 4.4: Intra-class Correlation Using a Linear Approximation (job-related training in gen- eral)

Level Null Occ Establishment 7.7% 0.7% Worker 20.2% 13.1% Residual 72.2% 86.2% Note: This table reports the estimated intra-class correlation (Equation 4.6 and 4.7) for a multilevel model of participation in training with an identity link function.

Unsurprisingly, the results of the variance decomposition differ somewhat from the logit model in relative magnitude because of a different estimate of the residual variation. In general, the anlyses allow a similar interpretation; we are able to explain most of the variation between establishments (ca. 90%), but only one third of the variance between workers within the same establishment. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 127

4.7.5 Figures and Tables 4. NEW EVIDENCE ON FIRM-BASED TRAINING 128 ofcet r tnadzdadtknfo ersini oun5i al . otoln o salsmn,wre n o characteristics. job and worker establishment, for controlling 4.7 Table in 5 Column in regression from taken and standardized are Coefficients iue41 akPromneadTann Participation Training and Performance Task 4.1: Figure 4. NEW EVIDENCE ON FIRM-BASED TRAINING 129 Figure 4.2: Training Participation Across Occupations Coefficients are standardized and taken from regression in Column 5 in Table 4.7 controlling for establishment, worker and job characteristics. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 130

Table 4.5: Descriptive Statistics Worker and Establishment Characteristics

Variable Absolute In Percent Mean Std. Dev. Year 2006 5,618 28.2% Year 2007 5,641 28.4% Year 2009 4,271 21.5% Stat S-H 1,443 7.3% State NRW 6,590 33.1% State Bavaria 3,984 20.0% State MV 1,852 9.3% State Saxony 6,027 30.3% Sector Industry 5,137 25.8% Sector Manuf. / Constr. 5,667 28.5% Sector Service 4,681 23.5% Sector Health 4,411 22.2% Ownership LLC 11,315 56.9% Ownership Corporat. 2,997 15.1% Ownership Public 4,150 20.9% Ownership Partnership 1,434 7.2% Founded after 1975 9,641 48.5% Employees 100-200 2,752 13.8% Employees 200-500 5,731 28.8% Employees 500-2000 11,413 57.4% Worker Council 18,259 91.8% Median Wage (e per day) 97.2 24.9 Women 7,467 37.5% Women x Child < 3 198 1% Women x Child 3-5 418 2.1% Women x Child 6-9 486 2.4% Women x Child 10-18 1,355 6.8% Cohabiting Household 15,701 78.9% Single Household 3,101 15.6% Other Household 1,094 5.5% Tenure (in Days) 4,647.5 3,154.0 Tenure Squared 3.15e7 3.71e7 Professional Qualification 12,966 65.2% Tertiary Degree 4,494 22.6% Adv. Prof. Qualification 2,036 10.2% Basic Education 400 2.0% Age 45.0 9.7 Age Squared 2,120.0 838.8 Health Very Good 4,871 24.5% Health Good 10,720 53.9% Health Medium 908 4.6% Health Bad 151 0.8% Health Very Bad 358 1.8% Limited Contract 1,237 6.2% Leader 6,026 30.3% Practice Job Learnt 13,700 68.9% Labor Attachment 8.5 2.4 Full-Time Employed 16,337 82.1% Marginally Employed 682 3.4% Part-Time Employed 2,448 12.3% Working Time 43h+ 429 2.2% 4. NEW EVIDENCE ON FIRM-BASED TRAINING 131

Table 4.6: Most Relevant Predictors For Training

Rank Job-Related Training in General Firm-financed Training Worker co-financed Training 1 Task Computer Usage Task Computer Usage Health Occupations 2 Task Organizing / Making Plans Managerial Resp. Task Advising / Informing 3 Task Advising/Counselling Task Manufacturing Age Squared 4 Health Occupations Dealer Occupation Task Computer Usage 5 Task Manufacturing Establ. Wage Level Educational or Social Jobs 6 Educational or Social Jobs Task Organizing Technical-Physical Jobs 7 Managerial Resp. Task Advising/Counselling Organisational Jobs 8 Establ. Wage Level Establ. Wage Trend Year 2010 9 Task Gathering information Engineering Occupation Occupation Electr. Techn. 10 Organisational Jobs Task Negotiaing Age Note: Table 4.6 shows the for each regression in Table 4.7 to 4.9 the five variables with the largest absolute t-Statistic (excluding the intercept). Variables in black have a positive coefficient while red indicates negative coefficients. All variables are statistically significant on the 1% percent level. 4. NEW EVIDENCE ON FIRM-BASED TRAINING 132

Table 4.7: Participation in Job-related Training Courses

Dependent Variable: Participation in Job-related Training Courses Variable Null Model +Firm +Worker +Firm & +Firm & Variables Variables Worker Var. Worker + Occ. Year 2008 -0.15 (0.09) -0.14 (0.09) -0.13 (0.09) -0.13 (0.09) -0.13 (0.08) Year 2009 -0.01 (0.08) -0.01 (0.08) 0.01 (0.08) 0.01 (0.08) -0.02 (0.07) Year 2010 -0.16** (0.07) -0.16** (0.07) -0.13* (0.07) -0.13* (0.07) -0.14** (0.06) State Saxony 0.16 (0.12) 0.09 (0.11) 0.11 (0.09) State Bavaria -0.01 (0.07) -0.01 (0.07) 0.03 (0.05) State NRW -0.09 (0.08) -0.06 (0.08) -0.02 (0.06) State MV -0.06 (0.09) -0.1 (0.09) -0.02 (0.06) Service Sector 0.32*** (0.06) 0.31*** (0.06) 0.09* (0.05) Public Sector 0.23*** (0.08) 0.19** (0.07) 0.03 (0.06) Employees 200-500 -0.05 (0.06) -0.05 (0.06) 0.00 (0.04) Employees 500-2000 -0.06 (0.08) -0.07 (0.07) -0.04 (0.05) Founded before 1975 -0.09 (0.08) -0.07 (0.08) 0.00 (0.06) Founded after 1991 0.02 (0.06) 0.02 (0.06) 0.03 (0.04) Worker Council 0.00 (0.04) 0.02 (0.04) 0.05 (0.04) Median Wage 0.28*** (0.07) 0.2*** (0.06) 0.07* (0.04) Median Wage Trend 0.22*** (0.05) 0.19*** (0.05) 0.14*** (0.03) Wage Compression 0.21*** (0.07) 0.17** (0.07) 0.05 (0.06) Women 0.00 (0.04) -0.02 (0.04) -0.07** (0.03) Cohabiting 0.04* (0.03) 0.05* (0.03) 0.01 (0.02) Tertiary Educ. 0.32*** (0.04) 0.31*** (0.04) 0.03 (0.04) No Voc. Qualification -0.16*** (0.03) -0.15*** (0.03) -0.06** (0.03) Age -0.02 (0.23) 0.01 (0.23) 0.37 (0.24) Age Sq. -0.27 (0.22) -0.29 (0.22) -0.59** (0.23) Unempl. Exp. -0.09** (0.03) -0.08** (0.03) -0.06 (0.04) Labor Attachment 0.09*** (0.02) 0.08*** (0.02) 0.03 (0.02) Foreign Citizenship -0.05* (0.03) -0.04 (0.03) -0.03 (0.02) Health Status -0.09*** (0.03) -0.1*** (0.03) -0.06** (0.02) Limited Contract -0.04 (0.03) Managerial Resp. 0.15*** (0.03) Part-Time -0.04 (0.03) Tenure -0.2 (0.21) Tenure Squared 0.1 (0.18) Recently Hired 0.04 (0.04) Constant -0.38*** (0.08) -0.25*** (0.06) -0.36*** (0.07) -0.26*** (0.05) -0.31*** (0.04) Occupations No No No No Yes Tasks No No No No Yes 2 σFirm 0.65 (0.10) 0.22 (0.06) 0.48 (0.08) 0.18 (0.05) 0.05 (0.02) 2 σWorker 1.5 (0.16) 1.5 (0.16) 1.29 (0.14) 1.3 (0.14) 0.79 (0.1) N 12560 12560 12560 12560 12560 Wald χ2 207 358 606 743 2141 Log Likelihood -8058 -8006 -7898 -7852 -7453 Note: Own calculations; Multilevel logit estimation; cluster-robust standard errors are in parentheses (on the firm-level); Mandatory-job training courses are excluded. Regression includes a variables capturing the number of months a worker has been asked retrospectively. Significance levels: ∗= significant at 10%-level, ∗∗= significant at 5%-level, ∗∗∗= significant at 1%-level 4. NEW EVIDENCE ON FIRM-BASED TRAINING 133

Table 4.8: Participation in Training Courses in Working-Time

Dependent Variable: Participation in Job-related Training Courses In Working-Time Variable Null Model +Firm +Worker +Firm & +Firm & Variables Variables Worker Var. Worker + Occ. Year 2008 -0.1 (0.08) -0.1 (0.08) -0.09 (0.08) -0.09 (0.08) -0.1 (0.08) Year 2009 0.04 (0.08) 0.05 (0.08) 0.06 (0.07) 0.06 (0.08) 0.03 (0.07) Year 2010 -0.05 (0.06) -0.05 (0.06) -0.03 (0.06) -0.03 (0.06) -0.05 (0.06) State Saxony 0.34** (0.13) 0.27** (0.13) 0.23** (0.11) State Bavaria 0.1 (0.09) 0.08 (0.09) 0.13* (0.07) State NRW -0.05 (0.11) -0.03 (0.10) 0.00 (0.08) State MV 0.06 (0.09) 0.01 (0.09) 0.04 (0.07) Service Sector 0.14** (0.07) 0.15** (0.06) 0.07 (0.06) Public Sector 0.11 (0.08) 0.1 (0.08) 0.00 (0.07) Employees 200-500 -0.01 (0.06) 0.00 (0.06) 0.05 (0.05) Employees 500-2000 -0.02 (0.08) -0.04 (0.07) -0.01 (0.05) Founded before 1975 0.00 (0.09) 0.02 (0.08) 0.04 (0.08) Founded 1976-1991 0.03 (0.06) 0.03 (0.06) 0.04 (0.05) Worker Council 0.03 (0.04) 0.03 (0.04) 0.05 (0.04) Median Wage 0.41*** (0.07) 0.33*** (0.07) 0.2*** (0.06) Median Wage Trend 0.2*** (0.06) 0.18*** (0.06) 0.14*** (0.05) Wage Compression 0.25*** (0.06) 0.22*** (0.06) 0.13*** (0.05) Women -0.08** (0.03) -0.09*** (0.03) -0.09*** (0.03) Cohabiting 0.04 (0.03) 0.04 (0.03) 0.02 (0.03) Tertiary Educ. 0.22*** (0.04) 0.21*** (0.04) 0.00 (0.04) No Voc. Qualification -0.14*** (0.04) -0.13*** (0.04) -0.06 (0.04) Age -0.42* (0.22) -0.38* (0.22) -0.19 (0.21) Age Sq. 0.21 (0.21) 0.18 (0.21) 0.00 (0.21) Unempl. Exp. -0.1*** (0.03) -0.08** (0.03) -0.03 (0.04) Labor Attachment 0.08*** (0.02) 0.07*** (0.02) 0.02 (0.02) Foreign Citizenship -0.06* (0.03) -0.05* (0.03) -0.04 (0.03) Health Status -0.09*** (0.03) -0.1*** (0.03) -0.06** (0.02) Limited Contract -0.09*** (0.03) Managerial Resp. 0.13*** (0.03) Part-Time -0.12*** (0.04) Tenure -0.22 (0.21) Tenure Squared 0.19 (0.18) Recently Hired 0.02 (0.04) Constant -1.26*** (0.08) -1.11*** (0.06) -1.24*** (0.08) -1.12*** (0.05) -1.14*** (0.05) Occupations No No No No Yes Tasks No No No No Yes 2 σFirm 0.56 (0.10) 0.22 (0.05) 0.46 (0.09) 0.19 (0.05) 0.09 (0.03) 2 σWorker 1.18 (0.13) 1.18 (0.13) 1.06 (0.12) 1.06 (0.12) 0.74 (0.09) N 12560 12560 12560 12560 12560 Wald χ2 73 181 264 397 1642 Log Likelihood -7262 -7216 -7164 -7123 -6877 Note: Own calculations; Multilevel logit estimation; cluster-robust standard errors are in parentheses (on the firm-level); Mandatory-job training courses are excluded. Regression includes a variables capturing the number of months a worker has been asked retrospectively. Significance levels: ∗= significant at 10%-level, ∗∗= significant at 5%-level, ∗∗∗= significant at 1%-level 4. NEW EVIDENCE ON FIRM-BASED TRAINING 134

Table 4.9: Participation in Training Courses Overlapping With Leisure-Time

Dependent Variable: Participation in Job-related Training Courses At Least Partly in Leisure-Time Variable Null Model +Firm +Worker +Firm & +Firm & Variables Variables Worker Var. Worker + Occ. Year 2008 -0.17* (0.09) -0.17* (0.09) -0.16* (0.09) -0.16 (0.09) -0.16* (0.09) Year 2009 -0.13 (0.08) -0.13 (0.08) -0.11 (0.08) -0.11 (0.08) -0.14 (0.08) Year 2010 -0.26*** (0.07) -0.26*** (0.07) -0.22*** (0.07) -0.22*** (0.07) -0.22*** (0.07) State Saxony -0.07 (0.16) -0.11 (0.15) -0.05 (0.14) State Bavaria -0.08 (0.10) -0.07 (0.10) -0.09 (0.09) State NRW -0.04 (0.11) 0.01 (0.11) 0.03 (0.11) State MV -0.11 (0.13) -0.13 (0.12) -0.05 (0.11) Service Sector 0.36*** (0.07) 0.29*** (0.07) 0.03 (0.07) Public Sector 0.18* (0.10) 0.13 (0.09) 0.04 (0.07) Employees 200-500 -0.08 (0.07) -0.08 (0.06) -0.08 (0.06) Employees 500-2000 -0.08 (0.09) -0.09 (0.09) -0.06 (0.07) Founded before 1975 -0.18 (0.12) -0.16 (0.12) -0.07 (0.11) Founded 1976-1991 -0.04 (0.08) -0.03 (0.07) -0.01 (0.06) Worker Council 0.00 (0.05) 0.02 (0.05) 0.06 (0.04) Median Wage -0.05 (0.10) -0.09 (0.09) -0.15* (0.08) Median Wage Trend 0.13** (0.07) 0.11* (0.06) 0.07 (0.06) Wage Compression 0.13 (0.08) 0.07 (0.08) -0.02 (0.07) Women 0.21*** (0.05) 0.17*** (0.05) 0.08* (0.04) Cohabiting 0.02 (0.03) 0.02 (0.03) -0.01 (0.04) Tertiary Educ. 0.28*** (0.04) 0.28*** (0.04) 0.07* (0.04) No Voc. Qualification -0.18*** (0.05) -0.18*** (0.05) -0.08* (0.04) Age 0.58** (0.28) 0.59** (0.28) 0.87*** (0.31) Age Sq. -0.85*** (0.28) -0.86*** (0.28) -1.06*** (0.30) Unempl. Exp. -0.03 (0.04) -0.03 (0.04) -0.05 (0.04) Labor Attachment 0.09*** (0.03) 0.09** (0.04) 0.04 (0.03) Foreign Citizenship -0.02 (0.04) -0.02 (0.04) -0.01 (0.03) Health Status -0.05* (0.03) -0.05* (0.03) -0.02 (0.03) Limited Contract 0.05 (0.03) Managerial Resp. 0.08** (0.04) Part-Time 0.09** (0.04) Tenure 0.11 (0.21) Tenure Squared -0.19 (0.20) Recently Hired 0.1** (0.04) Constant -2.02*** (0.10) -1.97*** (0.10) -2*** (0.09) -1.99*** (0.09) -2.02*** (0.08) Occupations No No No No Yes Tasks No No No No Yes 2 σFirm 0.66 (0.11) 0.4 (0.09) 0.47 (0.09) 0.34 (0.08) 0.18 (0.06) 2 σWorker 2.1 (0.24) 2.1 (0.24) 1.92 (0.22) 1.91 (0.22) 1.52 (0.19) N 12560 12560 12560 12560 12560 Wald χ2 149 240 394 491 1185 Log Likelihood -5849 -5825 -5747 -5731 -5538 Note: Own calculations; Multilevel logit estimation; cluster-robust standard errors are in parentheses (on the firm-level); Mandatory-job training courses are excluded. Regression includes a variables capturing the number of months a worker has been asked retrospectively. Significance levels: ∗= significant at 10%-level, ∗∗= significant at 5%-level, ∗∗∗= significant at 1%-level 4. NEW EVIDENCE ON FIRM-BASED TRAINING 135

Table 4.10: Participation in Training: Intensity

Dependent Variable: Number of Training Courses (incl. non-participants) Variable Null Model +Firm +Worker +Firm & +Firm & Variables Variables Worker Var. Worker + Occ. Year 2008 -0.19** (0.08) -0.18** (0.08) -0.18** (0.08) -0.18** (0.08) -0.17** (0.08) Year 2009 0.03 (0.08) 0.03 (0.08) 0.05 (0.08) 0.04 (0.08) 0.02 (0.08) Year 2010 -0.15* (0.08) -0.15* (0.08) -0.12 (0.08) -0.12 (0.08) -0.13 (0.08) State Saxony 0.3** (0.15) 0.24* (0.14) 0.26** (0.10) State Bavaria 0.1 (0.12) 0.09 (0.12) 0.15* (0.09) State NRW -0.02 (0.12) 0.02 (0.11) 0.09 (0.08) State MV 0.05 (0.19) -0.01 (0.19) 0.11 (0.13) Service Sector 0.48*** (0.08) 0.45*** (0.08) 0.12** (0.06) Public Sector 0.32*** (0.10) 0.26*** (0.10) 0.01 (0.08) Employees 200-500 -0.04 (0.07) -0.04 (0.07) 0.01 (0.05) Employees 500-2000 -0.04 (0.10) -0.07 (0.09) -0.05 (0.07) Founded before 1975 -0.09 (0.10) -0.06 (0.10) 0.01 (0.07) Founded 1976-1991 -0.04 (0.10) -0.04 (0.10) 0.04 (0.07) Worker Council 0.14 (0.14) 0.16 (0.13) 0.17* (0.09) Median Wage 0.01*** (0.00) 0.01** (0.00) 0.00* (0.00) Median Wage Trend 0.11*** (0.02) 0.1*** (0.02) 0.09*** (0.02) Wage Compression 0.84*** (0.27) 0.68*** (0.24) 0.26 (0.19) Women 0.07* (0.04) 0.05 (0.04) 0.05 (0.04) Cohabiting 0.04 (0.04) 0.04 (0.04) 0.01 (0.03) ISCED High 0.36*** (0.04) 0.35*** (0.04) 0.08** (0.04) ISCED Low -1.04*** (0.17) -1.01*** (0.17) -0.6*** (0.16) Age 0.24** (0.12) 0.25** (0.12) 0.37*** (0.13) Age Sq. -0.38*** (0.11) -0.39*** (0.11) -0.48*** (0.12) Unempl. Exp. -0.03 (0.04) -0.02 (0.04) 0.00 (0.05) Labor Attachment 0.07*** (0.01) 0.07*** (0.01) 0.04*** (0.01) Foreign Citizenship -0.11 (0.10) -0.09 (0.10) -0.04 (0.10) Health Status -0.04*** (0.02) -0.04*** (0.02) -0.02* (0.01) Limited Contract -0.05 (0.06) Leader 0.18*** (0.03) Part-Time -0.03 (0.04) Tenure -0.01 (0.02) Tenure Squared 0.00 (0.00) Recently Hired 0.09* (0.05) Constant -0.38*** (0.07) -1.85*** (0.33) -0.37*** (0.07) -1.62*** (0.31) -1.7*** (0.26) Occupations No No No No Yes Tasks No No No No Yes 2 σFirm 0.31 (0.05) 0.12 (0.03) 0.25 (0.04) 0.11 (0.02) 0.04 (0.01) 2 σWorker 0.34 (0.02) 0.34 (0.02) 0.29 (0.02) 0.3 (0.02) 0.2 (0.02) N 13089 13089 13089 13089 13089 Wald χ2 396 680 859 1200 3963 Log Likelihood -16327 -16270 -16132 -16081 -15590 Note: Own calculations; Multilevel poisson estimation; cluster-robust standard errors are in parentheses (on the firm-level); Mandatory-job training courses are excluded. Regression includes a variables capturing the number of months a worker has been asked retrospectively. Significance levels: ∗= significant at 10%-level, ∗∗= significant at 5%-level, ∗∗∗= significant at 1%-level 4. NEW EVIDENCE ON FIRM-BASED TRAINING 136

Table 4.11: Participation in Training: Multinomial Logit

Dependent Variable: Training Participation in ... Variable Firm-financed Worker Both Forms of Training co-financed Training Year 2008 -0.14 (0.17) -0.24 (0.20) -0.9*** (0.34) Year 2009 0.15 (0.16) -0.14 (0.19) -0.61* (0.31) Year 2010 -0.10 (0.16) -0.45** (0.19) -0.8*** (0.30) State Saxony 0.33* (0.18) -0.09 (0.24) 0.31 (0.29) State Bavaria 0.26* (0.15) -0.16 (0.20) 0.30 (0.25) State NRW -0.08 (0.16) -0.02 (0.19) 0.20 (0.25) State MV -0.10 (0.18) -0.09 (0.26) 0.33 (0.30) Service Sector 0.27*** (0.10) 0.03 (0.11) 0.07 (0.14) Public Sector -0.10 (0.16) 0.01 (0.14) -0.09 (0.20) Employees 100-200 -0.17* (0.09) 0.10 (0.10) 0.13 (0.13) Employees 500-2000 -0.20** (0.09) -0.01 (0.10) 0.08 (0.12) Founded before 1975 0.16 (0.14) -0.10 (0.16) -0.07 (0.17) Founded after 1991 0.17 (0.11) 0.02 (0.14) -0.04 (0.15) Worker Council 0.14 (0.15) 0.17 (0.13) 0.32 (0.20) Median Wage 0.01** (0.00) 0.00 (0.00) 0.00 (0.00) Median Wage Trend 0.10*** (0.03) 0.06 (0.03) 0.13*** (0.03) Wage Compression 0.06 (0.05) -0.08 (0.06) 0.06 (0.08) Women -0.24*** (0.06) 0.01 (0.08) 0.18 (0.12) Cohabiting 0.04 (0.06) 0.02 (0.08) -0.01 (0.10) ISCED High -0.03 (0.10) 0.09 (0.09) 0.20* (0.12) ISCED Low -0.34 (0.27) -0.38 (0.30) -14.03*** (0.29) Age 0.04 (0.24) 0.92*** (0.32) 0.39 (0.44) Age Sq. -0.22 (0.23) -1.04*** (0.30) -0.78* (0.42) Unempl. Exp. -0.12 (0.10) -0.25** (0.12) -0.21 (0.14) Labor Attachment 0.03 (0.03) 0.03 (0.03) 0.09 (0.06) Foreign Citizenship -0.19 (0.21) 0.00 (0.24) -0.57 (0.65) Poor Health Status -0.06 (0.06) -0.02 (0.07) -0.30** (0.12) Limited Contract -0.36** (0.14) 0.12 (0.13) -0.02 (0.18) Leader 0.32*** (0.07) 0.23*** (0.09) 0.42*** (0.11) Part-Time -0.27*** (0.10) 0.14 (0.09) 0.00 (0.14) Tenure -0.21 (0.21) 0.07 (0.21) -0.27 (0.32) Tenure Squared 0.12 (0.19) -0.21 (0.19) 0.21 (0.28) Recently Hired 0.04 (0.13) 0.24* (0.13) 0.25 (0.23) Constant -2.89*** (0.38) -2.58*** (0.50) -5.70*** (0.67) Occupations Yes Tasks Yes N 12560 Pseudo R2 0.102 Log Likelihood -12525 Note: Multinomial logit estimation. cluster-robust standard errors are in parentheses (on the firm-level); Mandatory-job training courses are excluded. Regression includes a variables capturing the number of months a worker has been asked retrospectively. Reference Groups are workers who do not participate in training. Significance levels: ∗= significant at 10%-level, ∗∗= significant at 5%-level, ∗∗∗= significant at 1%-level 5 Gender Differences in Wages and Training

Susanne Steffes (Centre for European Economic Research, ZEW)

Arne Jonas Warnke (Centre for European Economic Research, ZEW)

JEL-Classification: I24, J16, J24, M53

Keywords: Human Capital, Training, Gender Wage Gap (GWG), Linked-Employer-Employee Data (LEE), Blinder-Oaxaca Decomposition, Unobserved Heterogeneity

Acknowledgements: Special thanks go to Uschi Backes-Gellner, Sarra Ben Yahmed, Daniel Dietz, Bernd Fitzenberger, Katja Görlitz, François Laisney, Heiko Stüber, Elsbeth Wright, Thomas Zwick and seminar participants at the 18th Colloquium on Personnel Economics, EEA Conference 2015, and at workshops at the Humboldt University, University of Würzburg, ZEW Mannheim and of the DFG SPP1764. We are grateful to Alina Bartscher, Maximilian Huppertz, Thies Wollesen and Steffen Wyngra for their outstanding research assistance.

137 5. GENDER DIFFERENCES IN WAGES AND TRAINING 138

Abstract: There is an extensive literature on the gender wage gap and the reasons behind it. Some studies have pointed out prevailing gender differences regarding the participation in job- related training. In this study we shed new light on these gender differences in training, and their effect on wages. Our central finding is that there is a substantial training gap between male and female employees which can be explained neither by job segregation, firm or worker characteristics, nor by unobserved productivity. We suggest that this result may be explained by statistical discrimination, signaling or preferences. Nevertheless, gender differences in training do not provide an explanation for the gender wage gap since there are counterbalancing effects. On the one hand, female workers receive less firm-sponsored training. But these courses do not have large training effects anyway. On the other hand, women more often co-sponsor training themselves. Even though these courses lead to higher wages, the returns for women appear to be smaller than those for male workers. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 139

5.1 Introduction

There is a significant gender wage gap (GWG) in the labour market. In 2015 for example, the raw gap in gross hourly wages earned by men and women in Germany, was 21% (Statistisches Bundesamt, 2016). Many hypotheses have been developed and tested with the aim of explaining such differences (for a recent review see Blau and Kahn, 2016). The relevant studies have often focused on differences in human capital, in workers’ level of schooling, work experience and on occupational segregation.117 Such gender differences remain, however, only partially explained. Indeed, the GWG is noticeable even between women and men who have comparable levels of education and who are employed in similar jobs. In Germany for example, around one third of the GWG cannot be explained by observable differences such as working hours, managerial po- sitions or sector (Statistisches Bundesamt, 2016). At the same time, a large body of theoretical literature has highlighted the importance of investments in on-the-job (post-schooling) training for wages and employment chances. If investments in training and the respective returns differ between the sexes, it can be expected that this will contribute significantly to the gender dif- ferences seen in the labour market. Detailed training information is, however, not available in most datasets. Empirical literature on this topic is scarce, most existing studies do not consider participation in training.

Previous research into on-the-job human capital investments made by male and female work- ers has indeed indicated that investment behaviours differ between the sexes (Bassanini et al., 2005). Several existing studies which have focused on Germany for example, have found lower rates of participation in job-related training amongst women (Pischke, 2001; Fitzenberger and Muehler, 2014). Having said this, male and female rates of participation in training do not always differ to a significant extent (Leber and Müller, 2008; Grund and Martin, 2012). More- over, these differences vary with age (Fitzenberger and Muehler, 2014). It is also crucial to take the nature of the tasks which workers actually perform into account. When considering firm and worker characteristics only, Chapter 4 finds no difference in the training behaviour of male and female workers. After detailed job content is taken into account, however, lower participation rates in job-related training are noted amongst female workers.

In this study, we analyse whether the gap between the sexes in terms of their investments in job- related training, what we term the gender training gap (GTG), can help to explain the gender

117See, for example, Macpherson and Hirsch (1995), Fitzenberger and Kunze (2005) or Addison et al. (2014). More recent studies have tended to focus on psychological attributes or non-cognitive skills which ”account for a small to moderate portion of the gender pay gap” (Blau and Kahn, 2016). 5. GENDER DIFFERENCES IN WAGES AND TRAINING 140 wage gap (GWG). We show that there are considerable differences between the male and female workforce in terms of their participation in different forms of training. Theoretical models show that human capital investments can generally be expected to result in higher wages (Becker, 1962). We analyse therefore whether not only participation in training, but also wage returns to training differ between men and women. The aim of this study is to investigate both the drivers of the GTG and the respective gap in wage returns to training. This information will then enable us to analyse how these phenomena are related to the GWG.

The decision to invest in work-related training is generally made by either the firm or the worker (or a joint decision is made by these two parties). In addition, job content is closely associated with participation in training (see Chapter 4). A comprehensive study on the determinants of such investments should therefore take the heterogeneity of firms, workers and jobs into ac- count. We perform a statistical decomposition of the GTG in order to investigate to what extent firm, worker and job characteristics might explain the difference between the male and female workforces’ participation in training. For this purpose, we use linked employer-employee data which contains detailed (retrospective) information about individuals’ recent training history. In addition, the data includes information relating to firm, worker and job characteristics. This dataset provides us with almost all information pertaining to the possible drivers of training participation and wage differences which have been identified in previous studies. Further- more, the available survey data has been merged with administrative records. This allows us to observe detailed employment biographies for each individual, as well as aggregate informa- tion concerning a firm’s total workforce. Contributing to the literature, we provide an in-depth investigation of the firm’s role in determining the allocation of work-related training to male and female workers. Finally, we suggest a novel approach to the estimation of wage returns to training.

Recent studies have emphasised the importance of information about the initiative and financing of training investments for the analysis of the GTG (e.g. Simpson and Stroh, 2002 for the US, or Puhani and Sonderhof, 2011 for Germany). Differences in the initiative and sponsorship of training provided to male and female employees can be mainly attributed to a firm’s monopsony power or to discriminatory behaviours. We contribute to the literature by proposing a new measure for participation in job-related training. We differentiate between training courses initiated and fully sponsored by the firm - those which take place wholly within working hours; and training courses initiated by the worker - those conducted at least partially outside of the individual’s usual working hours. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 141

Empirical evidence has shown that the GTG cannot be fully explained by factors in the work- ing environment and by differences in labour market attachment between men and women (e.g. Royalty, 1996). We therefore examine whether heterogeneity which is not captured by observ- able differences might also play a role. The recent literature on gender differences in the labour market, for example, highlights the role of heterogeneity amongst women. Mulligan and Ru- binstein (2008) show that in the US highly educated and highly productive women are today more attached to the labour market than less educated and less productive women. Empirical evidence relating to the GWG in Germany shows that women in male-dominated jobs often find themselves being positively selected compared to their male peers (Ludsteck, 2014). It has been shown that participation in training positively correlates with initial education levels and other sources of productivity (Bassanini et al., 2005). We therefore expect investments in train- ing to be higher amongst the group of highly productive workers. In Germany at least, female labour attachment closely correlates with a worker’s initial level of education and productivity. This also applies, though not to the same extent, to men.118 These distinct gender differences in labour attachment between more and less productive workers can also be expected to result in training differences between men and women within these groups. There are larger differences in the labour attachment between the group of less productive workers and this might lead to a larger GTG within this group (see also the related discussion in Blau and Kahn, 2016). We address this question by investigating the GTG for different groups of workers.

The paper is structured as follows. In Section 5.2 we summarise the most important existing theoretical and empirical work in this field. In Section 5.3, we introduce our data. In Section 5.4 we show and discuss evidence for the drivers of the GTG and in Section 5.5, we make the respective link to wage returns. Our conclusions are detailed in Section 5.6.

5.2 Literature

Economic theories point to a number of possible reasons for the differences in training par- ticipation between the male and female workforces (for reviews see Altonji and Blank, 1999; Azmat and Petrongolo, 2014). Many of the mechanisms identified within the relevant theories are similar, if not identical, to those recognised as driving the GWG. Whilst some models focus on differences between individuals or jobs (human capital theory and occupational segregation

118According to the Eurostat "Labor Force Survey" database for example, the employment rate amongst individ- uals holding a tertiary degree in Germany in 2015 was 91.3% for men and 86.4% for women (individuals aged 25-64). The employment rates amongst male and female workers who are less highly educated and who hold no vocational qualification were 68.0% and 51.5% respectively. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 142 for example), other models consider firms’ behaviour (e.g. statistical or taste-based discrimi- nation) and the influence that an individual firm may exert over its workers (e.g. monopsony power).

In human capital models, it is the lower labour market attachment of women which is found to explain gender differences in further training. This is also the case for models concerning job and occupational segregation, phenomena which are closely related to human capital theory. In such models, women either self-select into low-wage positions in which career interruptions are less costly (Polachek, 1981), or firms assign women to jobs which require a lower degree of investment in on-the-job training, and which, at the same time, offer women fewer opportunities for career advancement (Lazear and Rosen, 1990; Barron et al., 1993). The reason for this is the lower degree of labour market attachment expected amongst the female workforce due to anticipated career interruptions or women’s decision to work fewer hours. Empirical studies have indeed shown that men and women tend to be employed in different working environments and perform different tasks. Goldin (2014) argues that a significant part of the GWG can be put down to the fact that in certain occupations, the returns to working overtime and/or the penalties for interruptions in employment are large. There are also differences in the work content of men and women employed in the same occupations. Black and Spitz-Oener (2010) examine the role of technological change between and within occupations, as seen over the last few decades, in narrowing the GWG. Simpson and Stroh (2002) show that occupational segregation has a direct impact on the GTG. Using US data, Simpson and Stroh have found that one third of the positive GTG (that is where women participate in training more than men) can be explained by gender segregation across jobs.119

Aside from occupational segregation, labour market attachment also has a direct impact in pro- moting investments in on-the-job training (Becker, 1964). This is not true only of current labour attachment, but also applies for expected or future labour attachment. On average, women are more engaged in home production than men. Women therefore exhibit a higher turnover and tend to work fewer hours. A shorter amortization period for training investments is therefore a probable explanation for the GTG. Royalty (1996) finds that 25% of the negative GTG (women participate less in training than men) seen for formal, in-house training courses can be explained by gender differences in turnover probabilities. This does not, however, give us the full picture. The remaining GTG is still considerable.

Other studies consider family status and motherhood in order to explain the GTG (Huber and

119Other studies also find that occupational segregation explains around one third of the GTG (Grönlund, 2011). 5. GENDER DIFFERENCES IN WAGES AND TRAINING 143

Huemer, 2015). Using data for Germany, Fitzenberger and Muehler (2014) find a negative GTG which is related to childbearing but concerns all women in the relevant age group. Drawing on the personnel records of a single firm, they find age-specific gender differences in rates of participation in training. The raw participation gap in formal employer-provided training is greatest when male and female workers are aged 35. Puhani and Sonderhof (2011) analyse the effects that an extension of maternity leave has had on prolonged career interruptions in Germany. They have found that these regulatory changes have led to a decline in the number of training courses which employers offer even for young women without children. These findings are consistent with models of statistical discrimination. According to these models, firms use sex as a proxy for the unknown, individual labour attachment. A firm thereby relies on information concerning the behaviour of previous female employees in order to predict the future labour market attachment of currently employed female workers (Phelps, 1972; Arrow, 1973).

Assuming symmetric information, expected labour market attachment should have a similar ef- fect in determining the investments in training made by both firms and workers. It is clear that both parties must predict the individual worker’s future labour attachment on the basis of infor- mation which is currently available. These predictions are noisy as the future labour attachment is unknown. In the case of asymmetric information, predictions made by the employer are also less reliable than those made by the individual (Altonji and Blank, 1999). Let us assume that the true labour attachment of a female worker is greater than the predicted group average. In this case, she is likely to be subject to underinvestment by the firm. We presume therefore that she has an incentive to increase her own investments in order to enhance her productivity level, and/or to signal her engagement and willingness to the firm (Bassanini et al., 2005).120 This example shows that proxies for low labour attachment can, on the one hand, be negatively cor- related with firm-sponsored training whilst, on the other hand, being positively correlated with worker-sponsored training.

From a firm perspective, it is important to consider not only labour attachment in general, but also the male and female labour supply to any one firm. The role of part-time work provides a good example of this. Reductions in working hours generally serve as a proxy for lower labour market attachment. Working with data from Switzerland, however, Backes-Gellner et al. (2014) identify no, or only small differences in participation in employer-provided training between

120It could of course be the case that female workers are credit constrained or discouraged to invest in training because the anticipate discrimination by the firm. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 144 women working full-time and women working part-time. Picchio and Van Ours (2016) find that the same is true for the female workforce in the Netherlands. Backes-Gellner et al. (2014) provide a model of statistical discrimination in which a woman’s choice to work only part-time in a given firm, is only a noisy predictor of her future attachment to that same firm. This explains why there are only minor differences in training provision between full- and part-time workers.

We provide here a further explanation for the difference in training investments seen between full and part-time workers. Under labour market frictions such as search and mobility costs, the current firm exerts monopsony power. This means that the firm can reap a higher share of the expected payoffs and in addition, poaching is less likely (Acemoglu and Pischke, 1998). Accordingly, a higher level of firm attachment serves as an incentive for firms to invest more in the human capital of its current workers. It should also be noted that, in general, women show a greater level of firm attachment than men because their labour supply to the firm is less elastic (Hirsch et al., 2010).

We suggest here that part-time work can function as a proxy for a high level of firm attachment (an argument which has also been made by Picchio and Van Ours, 2016). Part-time workers tend to favour greater temporal and geographical flexibility, which in turn negatively impacts on their employment prospects. Part-time work can therefore serve as a proxy for low labour market attachment and for a high degree of firm attachment with opposing predictions for training investments. Although it is not clear whether part-time workers generally participate less in training than full-time workers, we expect to see distinct differences between participation rates in firm-sponsored and worker-sponsored training between full- and part-time workers. The labour supply to a single firm is considerably more important for participation in firm-sponsored training courses because firms want to ensure that their training investments pay off.

The role of the firm in perpetuating differences in male and female investments in on-the-job training has in general received little attention in the literature. Studies on wage inequalities highlight the importance of firms in general, and in particular, the significance of sorting work- ers into firms (Card et al., 2016; Card et al., 2016). In addition, differences in the capacity of firms to exert monopsony power or taste-based discrimination may lead to differences between firms in terms of the investments they make in on-the-job training (Blau and Kahn, 2016). While it is empirically difficult to disentangle discriminatory behaviour from other mechanisms such as monopsony power, linked employer-employee data can help us to understand the difference across and within firms (Webber, 2016).

The wage returns workers gain on training investments depends on the allocation of costs as well 5. GENDER DIFFERENCES IN WAGES AND TRAINING 145 benefits. If the firm bears the majority of investment costs, for firm-specific training courses for example, it will also most likely reap the benefits in form of higher productivity. The worker’s share of benefits may be relatively low. This might explain why recent studies have found low wage returns to training attendance. This implies that the wage effect will be greater for worker-sponsored training.

The question remains, however, why the returns to training should differ between men and women at all. Gender differences in returns to training have received far less attention than differences in returns to schooling. On the one hand, one might argue that returns are lower for female employees only if returns are the result of a negotiation process between the firm and its workers. It has been shown that women tend to have less bargaining power than men (compare Card et al., 2016). On the other hand, women’s participation in on-the-job training might be the result of a process of positive selection, whereby participating women would be generally more productive than male participants. This could mean that women will derive greater benefits from participation in training than men (note the positive correlation between a high level of skill/ability and returns to training).

Low wage returns to participation in training have recently been identified in the empirical literature by, among others, Leuven and Oosterbeek (2008), Goerlitz (2011), Schwerdt et al. (2012) and Görlitz and Tamm (2016).121 These studies do not, however, differentiate between firm- and worker-sponsored training. Pischke (2001) finds higher returns to training which is sponsored by workers, than on training funded by firms. This is particularly the case for women.

5.3 Data

We use the German linked employer-employee dataset WeLL Berufliche Weiterbildung als Be- standteil Lebenslangen Lernens (further training as a part of lifelong learning, see Huber and Schmucker, 2012). This survey focuses on participation in training and comprises four waves, conducted between 2007 and 2010. Data was collected by the Research Data Centre of the Fed- eral Employment Agency at the Institute for Employment Research (FDZ). In the following, we briefly describe the available dataset. We give a more detailed description about WeLL in Chapter 4. WeLL consists of 149 establishments from both the manufacturing and service sec- tors, for which information is available from the IAB Establishment panel (an annual employer

121Earlier studies have often found very high returns to training using instrumental variable approaches. More recent studies, however, use comparison group approaches (see Section 5.5) or randomised voucher experiments. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 146 survey, see Kölling, 2000). These establishments have a workforce of between 100 and 2000 employees. They operate in five German states, including two in East Germany. An employee survey was conducted, consisting of more than 18,500 interviews with 7,350 respondents.122

WeLL contains comprehensive information on workers’ recent participation in training. The questionnaire includes retrospective questions about individuals’ participation in formal training courses. More specifically, questions concerned the duration and content of courses, as well as the source of initiative for training, the source of finance and whether courses overlapped with leisure time.123 In addition, the survey included questions regarding socio-demographic characteristics, working time and reasons for cancelling training courses. Finally, respondents were asked about their health, labour attachment and career aspirations.

Survey data in WeLL has been linked to the administrative records of each employee (Schmucker et al., 2014). Records contain information on wages per day, the duration of em- ployment and periods of unemployment dating back to the 1970’s (or to the early 1990’s for individuals from the former GDR).124 This information is available, not only for participants in the WeLL panel, but for all workers in a firm. This gives us information about the size of an establishment, its average wage and the establishment-specific gender wage gap or turnover rate. A particular advantage of the WeLL dataset is the availability of information about hours worked. This enables us to compute hourly wages.

In line with previous literature, such as Arulampalam et al. (2004), we restrict the sample to workers aged between 25 and 54. Older workers are excluded because training rates drop sharply as retirement approaches (Zwick, 2015). Our analyses focus on workers who are still employed in the establishment included in the WeLL survey. This is because we lack infor- mation such as the sector of establishments which do not participate in the IAB Establishment Panel. In addition, we exclude apprentices and focus our analyses on individuals with German citizenship only. We also exclude individuals for whom information on the relevant variables in Table 1a or Table 1b is missing. In line with previous studies, we also restrict the sample to employees who work at least 15 hours a week (Arulampalam et al., 2004).125 This leaves us with a sample of 149 establishments and 9,905 individuals, including 3,511 women (35.4%).

122These numbers exclude individuals who did not agree to their survey data being linked to social security records (see Chapter 6). 123Detailed information is available for up to three training courses per wave. 124Periods of employment which are not subject to social security, such as periods of self-employment or em- ployment in the civil service are not observed. 125In addition, we discuss further analyses based on the sample of full-time workers only. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 147

Information on employees’ daily wages is drawn from social security data. In order to calculate an hourly wage, we merge information about the daily wages of a given employee with infor- mation regarding the agreed number of weekly working hours. Daily wages, as given in social security data, are top-coded. We therefore impute wages above the social security threshold using tobit regression, taking the uncertainty of the imputation procedure and possible het- eroscedasticity over time, as well as between men and women, into account (see Appendix 5.7.4).

Table 1a or Table 1b shows descriptives for our sample. The mean hourly wage is 16.82 Euro for men and 13.10 Euro for women. Accordingly, the GWG in hourly wages in our dataset is 22.1%, which is comparable to the average GWG for Germany.126

5.3.1 Training Definition

The main variable of interest is workers’ participation in formal training courses. In the first wave for instance, individuals were asked whether they participated in any job-related seminars or training courses since January 1, 2016. On average, 49.2% of all workers in a given wave have participated in job-related training (this is comparable to figures from other data, e.g. the Adult Education Survey).

Individuals who responded positively to the above question, were then asked on who’s initiative they had participated in training. Respondents were given the option of selecting one of the five following answers: "my own initiative", "by order of the firm/supervisor", "mostly upon advice of my firm", "mandatory occupational training" and "mostly upon the advice of someone else". They were further asked whether they had attended the training course fully or partially during working hours, during their own leisure time or during holidays.127

In order to capture the interactive relationship between workers and firms, we identify which party initiated participation in a training course, and who then bore the cost in terms of time. As in the previous literature (e.g. Pischke, 2001), we do not consider monetary investment a suitable indicator of which party made the training investment. This is because only 16% of all training courses are at least partially funded by the worker. Instead, we consider whether a worker participated in a training course completely during their working hours (60%) or dur-

126The Federal Statistical Office of Germany gives a raw GWG of 23% for 2006 (24% in West Germany, and 6% in East Germany). This includes other age groups, part-time workers, apprentices and those in partial retirement, but excludes the self-employed (Finke, 2011). 127Possible answers to this question were, "completely in working time", "only partly in working time", "com- pletely outside working hours", "not employed at that time". 5. GENDER DIFFERENCES IN WAGES AND TRAINING 148 ing their leisure time (40%, including workers who participated in training entirely outside of working hours). Practitioners have told us that firms often pay for training courses, even if it is not directly relevant for the tasks involved in a job itself. In exchange, they require workers to attend such a course at the weekend or in the evening. The question of which party initiated training is important as it enables us to see whether it was primarily the firm or the worker who wanted the worker to engage in training. This distinction in turn allows us to observe whether female workers participate more often in training which they themselves have initiated, and in which they invest their own time.

42.5% of all courses considered in the survey (for which detailed information is available) were undertaken on the initiative of an individual worker. 39.7% of courses, however, were attended following orders or advice from the firm or a worker’s direct supervisor. 16.1% of courses constitute mandatory job training. Only 1.6% are courses undertaken on the advice of a third party, such as an employment agency. We have excluded both mandatory training, such as obligatory first-aid training, and courses taken on the advice of a third party, from the analysis. In addition, we do not consider training courses which workers participated in during periods of unemployment.

We term a training course firm-sponsored if it is initiated by the firm and takes place completely during working hours. The average participation rate in firm-sponsored training is 21.9% for men and 18.6% for women. Worker-sponsored training are courses which are initiated by the worker and take place at least partially in the worker’s leisure time. 10.5% of men and 19.8% of women participate in training courses co-sponsored by the worker themselves. We do not consider in the main part training courses which are initiated by the worker and which take place completely in working hours (see Appendix 5.3.1). There are fewer training courses initiated by the firm and overlapping with leisure time which we also disregard here.

5.3.2 Control Variables

Our choice of variables was strongly influenced by close reading of the training and GWG literature.

In all specifications, we control for the year of the survey to capture time effects. Given that the time period about which workers are retrospectively asked differs between and within waves we add a variable to each specification which captures the length of that time period in months. In addition, we include an indicator for the worker’s gender. We subsequently add worker and 5. GENDER DIFFERENCES IN WAGES AND TRAINING 149 establishment information, characteristics of the employment relationship and of the job itself. These later aide us in providing possible explanations for the GTG. We also include pre-survey wage fixed-effects as a proxy for the unobserved productivity of a worker.

Worker characteristics include formal education, age, age squared and actual work experience (since 1992). Establishment information covers the German Bundesland in which a firm is located, the size of the firm (100-250, 250-500 or 500-2,000 employees) and the operating sector (service or manufacturing sector). We add an indicator for public, as opposed to private enterprises, and also indicate whether the firm has an active works council. As in Chapter 4, we also use the social security data of all employees of the given establishment, including those employees who did not participate in the survey, to derive the median wage within an establishment and the linear growth rate of the median wage from 2000 to 2010. We add a measure for the wage-compression of the establishment which, according to the model provided by Acemoglu and Pischke (1999), might motivate firms to invest in general training. This measure is the standard deviation of an establishment’s log daily wages for full-time workers in 2006. For wage regressions, we also add further information regarding the presence of a collective agreement (either sectoral or company agreement).128

In Chapter 4, we have shown that the employment relationship, as well as the content of a job, play an important role in determining levels of participation in training. The variation between firms, in terms of the average rates of participation in training of their employees, can to a large extent be explained by differences in firms’ use of production technologies (measured according to occupations and job tasks). To ensure that differences in male and female participation in training are not driven by divergent work conditions, we add a number of variables to our specifications. We include tenure at the current firm and indicate whether a worker is employed on a temporary contract.129 We control for working time by including an indicator for part-time work which takes the value one if a worker has a working time of between 15 and 34 hours a week.130 In addition, we add information relating to the job tasks performed by a worker, as well as details about his or her occupation (see Appendix 5.7.6).

It is important to note, that even a rich set of observable characteristics may not fully capture

128For the training regressions, we only indicate whether there is a works council in a firm. This is due to the limited number of establishments in our sample. Pfeifer (2015) finds that, in Germany, the presence of a works council is a significant determinant of workers’ participation in training. This is not the case for a union. 129Tenure is measured from 1992 onwards. Employment at a firm before 1992 cannot be taken into account because no information is available for establishments located in East Germany before the German reunification in 1990. 130Results are similar if we use working time and working time squared. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 150 all the heterogeneity between workers which is relevant for participation in training (see also Chapter 4). Recent studies which have considered returns on training participation, have there- fore used voucher experiments (Schwerdt et al., 2012, or Görlitz and Tamm, 2016), a quasi- experimental framework. Other studies have compared the wage growth of employees who have participated in training with that of individuals who intended to attend training but did not, due to more or less random events (sometimes termed the ’comparison group’ approach e.g. Leuven and Oosterbeek, 2008, or Goerlitz, 2011). In the majority of analyses, we will add a proxy for unobserved productivity (see next paragraph). The reason for this is that we are specif- ically aiming to compare employees who have participated in training with all other workers. Restricting the analyses to those workers who intended to participate in training may provide misleading results if, for example, certain workers are discouraged from even trying to attend training because they anticipate that their employer will not provide any. Furthermore, due to the nature of the dataset, we do not know by whom a training course was initiated. Similarly, our dataset does not indicate whether the training course which an employee intended to attend, but did not, would have taken place during or outside of working hours. We will, however, use a comparison group approach in combination with a proxy for unobserved productivity in order to analyse possible wage effects of training (see Section 5.5).

As a proxy variable for the unobserved productivity (see Wooldridge, 2002), we extract a worker’s fixed effect from a log daily wage regression for the pre-survey period (2000-2006) from social security data. We control for work experience, work experience squared, tenure and job status (e.g. blue-collar or white-collar, master craftsmen, part-time worker). Further details are given in the Appendix 5.7.4. We are careful to acknowledge that this measure of unobserved ”productivity” might also reflect any existing discrimination seen in pre-training wage levels (we discuss this topic in Section 5.4.5).

5.4 Results for Training

5.4.1 Gender Differences in Training

Table 1a shows that female workers generally participate more than male workers in non- mandatory job-related training. In a given wave, 50.8% of women, compared to only 48.3% of men, participate in training. Given the average training rate of 49.2%, the gender differ- ence of 2.7pp is sizeable, although not statistically significant. The division of training courses into those initiated by the firm and taking place during working hours (firm-sponsored), and 5. GENDER DIFFERENCES IN WAGES AND TRAINING 151 those initiated by the worker overlapping with leisure time (worker-sponsored) reveals an in- teresting picture. Whilst men generally show a greater tendency than women to participate in firm-sponsored training (22.1% compared to 18.2% for women), we find a positive and eco- nomically large GTG in respect to worker-sponsored training (10.5% participation rate for men compared to 19.9% for women). We limit the analyses here to the incidence in participation in training. Results are, however, very similar if we consider the intensity of participation in training (see Appendix 5.7.5).

We estimate a linear probability model to investigate the impact of potential drivers of participa- tion in training (Equation (5.1)).131 This enables us to account for the fact that there is variation between female and male employees in terms of work experience and the jobs in which men and women are employed. The following equation shows our full specification:

Trainingit = Titτ + Xitβ + Femaleiδ +α ˆiγ + it. (5.1)

T includes wave fixed-effects and a measure of the duration of the potential training period. X covers worker and establishment information, as well as characteristics of the employment relationship and the job, including working-time, tenure or job tasks performed by the employee (see Tables 1a-1b). αˆ is the proxy variable to account for unobserved productivity differences (see Appendix 5.7.4). δ measures our main parameter of interest - the respective GTG. We label the GTG ”positive” if female workers exhibit higher rates of participation in training than male workers, and ”negative” if not. The outcome in this regression is either participation in work-related training (excluding mandatory training), in firm-sponsored, or worker-sponsored training. In Figure 5.1, we illustrate the sensitivity of the GTG (the female coefficients δ) to different sets of controls for each training outcome respectively.132 The first bar depicts the raw differences (adjusted only for time effects), to which firm and worker characteristics have been added to the regression shown in the second bar. The third bar illustrates results for a model which also includes job characteristics. Finally, the fourth bar depicts the full model including the proxy for unobserved productivity.133

131Results are similar if for example, we use the logit link function and estimate Equation (5.1) via maximum- likelihood. We cluster standard error on the establishment-level. For the specifications for which we add a proxy for unobserved productivity, we bootstrap clustered standard error to account for the fact that this variable is estimated with uncertainty, see Appendix. 132Full results for Equation (5.1) including all controls, are provided in Table 5.2. 133The minor difference between 2.7pp calculated from Table 1a and 2.6pp in Table 5.2 is a result of the inclusion of time controls. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 152

Our results indicate a raw overall difference in job-related training between men and women that is not statistically significant (p-value 0.25). Depending on firm, worker and job characteristics, the GTG becomes significantly negative (p-value 0.025). It would seem that women generally participate slightly more in training. There are various potential reasons for this, including the fact that women are often employed in jobs which require a high degree of training. If we take selection into certain firms and jobs into account, and control for individual characteristics, we see a significant decrease in female participation in training (-3.2pp). We hereby compare men and women who are considered to share certain characteristics. The fact that the GTG reduces to -2.1pp in the full model suggests that the GTG can in part be explained by the fact that male workers have on average higher unobserved productivity than female workers.

In the second and third segments in Figure 5.1, we distinguish between firm- and worker- sponsored training. For firm-sponsored training, we see that the firms in our sample provide less training to female workers than they do to male workers. The first bar in the second seg- ment shows a negative GTG of -3.9pp which is highly significant. The gap is almost halved (in the full model) where we have added covariates to the model. This indicates that the GTG can, to some extent, be explained by firm segregation and by the fact that men tend to work more hours than women. We find a very large positive GTG for worker-sponsored training. The difference between male and female participation in worker-sponsored training, of 9.4pp, as shown in the third segment of Figure 5.1, indicates that female workers participate almost twice as often as male workers in worker-sponsored training. The subsequent bars show that the GTG in firm- and worker-sponsored training remains significant after conditioning for various firm, worker and job characteristics. Having said this, observable differences do explain the training differences between men and women to some extent (+2.4pp in the third bar in segment three). If we finally add the proxy for unobserved productivity to the model, we see that the GTG once again widens. This gives us a highly significant positive GTG for worker-sponsored training of 3.3pp. This indicates that the positive gap is partially driven by the fact that highly productive female workers are more engaged in worker-sponsored training than their male counterparts.

5.4.2 The role of job characteristics

In order to gain a greater understanding of the significance of the various explanations for the GTG posited above, we run a detailed Oaxaca-Blinder decomposition (for an overview, see Fortin et al., 2011). An Oaxaca-Blinder decomposition partitions the difference in the average training rate of men and women into a first part, which is explained by the predictors, and 5. GENDER DIFFERENCES IN WAGES AND TRAINING 153 a second part, which remains unexplained. The explained part draws on differences in the characteristics X, the fact that women work part-time or tend on average to have less work experience than men, for example. An unexplained part might arise if male part-time workers were to be provided with more firm-sponsored training than female part-time workers. This is due to differences in βˆ (including the intercept). In addition, the conditional GTG, which remains after controlling for the full set of variables, is also included in the unexplained part. A detailed decomposition gives us an individual contribution for each variable:

M F T   Train − Train = XM − XF βˆ∗ + (XM)0βˆM − βˆ∗ + (XF)0βˆ∗ − βˆF . (5.2) | {z } | {z } Explained Part Unexplained Part

We now consider the decomposition of firm- and worker-sponsored training. As a reference, we choose a pooled regression in which the regression coefficients are constrained to be equal for both groups. Results for the two different kinds of training and different sets of variables are depicted in Table 5.3. The coefficient in an unconditional regression of participation in training for the variable of being female constitutes the ”raw gap” which is the same as that seen in Figure 5.1 in column (1) (except for the presence of time controls). For firm-sponsored training, we see that including various control variables or proxies for unobserved productivity reduces the GTG by almost half (45%).

Table 5.3 shows that occupational segregation is the most important driver of the GTG seen for worker-sponsored training (explaining 40% of the total difference). This is not the case, however, for firm-sponsored training (explaining only 3% of the total difference). Contrary to human capital theories, it would seem that firms do not base their training investment decisions on the basis of their employees’ occupational background. A closer look reveals that female workers are overrepresented in healthcare professions, including doctors and nurses (72.8% are female), and in social and teaching professions (66.2%). As shown in Table 5.2, partic- ipation in worker-sponsored training is very common in these occupations. In fact, no other jobs in our sample exhibit higher training rates. Moreover, the fact of being employed in a healthcare or social profession is by far the most important predictor of participation in worker- sponsored training.134 In contrast, for firm-sponsored training healthcare, social and teaching professions, along with organisational occupations (qualified office employees, accountants or entrepreneurs) and other services (mostly low-skilled jobs such as cleaners or janitors) are those which also exhibit the lowest participation rates.

134In these jobs, it is common that training is financed by the worker (or takes place outside working hours). 5. GENDER DIFFERENCES IN WAGES AND TRAINING 154

Closer investigation of the role of job content within given occupations gives us a somewhat different picture. In this case, the job characteristics generally attributed to female workers are more favourable for both firm- and worker-sponsored training. The inclusion of job content variables explains almost 12% of the total GTG for worker-sponsored training and 36% of the GTG for firm-sponsored training. This means that the GTG in firm-sponsored training would be approximately one third greater if women were to perform the same job tasks as men. Note, that the fact that women perform job-tasks which require much training contradicts theoretical models such as Barron et al. (1993). In this model it is predicted that firms will assign women to jobs requiring less training in view of their lower degree of labour attachment. Women have, however, recently benefited from technological changes which have been one driver of the nar- rowing of the GWG over time (Black and Spitz-Oener, 2010). Indeed, we find that women work with computers more often than men, they less often perform routine tasks such as operating and monitoring machines, and are more likely to complete non-routine tasks such as organisa- tional tasks, planning or gathering information (see Table 1a). The comparative advantage of women in performing certain tasks which are in high demand in firms is a significant driver of the narrowing of gender inequalities in the labour market (see Blau and Kahn, 2016).

Another job task which is not yet captured by the set of job content variables is leadership responsibility. Our measure of leadership position covers superiors but also individuals who regularly lead groups or projects. In our sample, 19% of female workers are in such positions while this is the case for almost 36% of men. Since leadership positions are positively correlated with participation in firm-sponsored training, this single variable accounts for 15% of the total GTG for firm-sponsored training but only for 2% of the gap in worker-sponsored training. In contrast to the results seen for job content, these results are in line with the model of Lazear and Rosen (1990). Managerial jobs are often extremely time-consuming, leaving the individual worker with less flexibility in his or her working schedule (Goldin, 2014). This might explain why women are less likely to be employed in such positions.

Part-time work can be considered to be at the opposite end of the spectrum; working part- time often serves as a job amenity providing individual flexibility (Goldin, 2014; Picchio and Van Ours, 2016).135 The part-time workers in our sample are overwhelmingly female (90.5%). These workers tend to participate significantly less often than full-time workers in firm-sponsored training (-2.9pp, p-value 0.06). According to Table 5.3, part-time work explains 36% of the total GTG for firm-sponsored training. Undertaking part-time work is therefore

135Note that we assume here that part-time work is voluntary. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 155 twice as significant as employment in a managerial position in determining participation in training.136 As discussed in Section 5.2, part-time work can on the one hand serve as an indi- cator for a high level of labour attachment. On the other hand, part-time work might also be viewed as a proxy for (high) firm attachment because the labour supply of part-time workers to the local firm is less elastic than it is for full-time workers. In the latter case, we would observe a positive correlation between part-time work and participation in firm-sponsored training. Given that this is not the case, we are able to conclude that the dominant effect is that part-time work does not signal a high degree of attachment to the firm in which a worker is currently employed.

We can thus reasonably conclude that part-time work is viewed as a signal of lower labour attachment or of a greater preference for non-market activities. It is not clear, however, whether there are information asymmetries between the worker and the firm in regard to the worker’s future labour attachment. In a market with symmetric information, we would expect that part- time work would impact firm- and worker-sponsored training in a similar way. This is because both parties would expect lower returns to training because of the current lower degree of labour attachment. This is not necessarily the case, however, if a worker has superior information about his or her own future labour attachment. Our results show a positive impact on worker- sponsored training (+1.8pp, p-value 0.18) although part-time work explains only 6% of the total gap. The diverging findings regarding the role of part-time work for participation in either firm- or worker-sponsored training indicates that statistical discrimination by the firm plays a role. But how can we explain the higher participation rates of part-time workers in worker-sponsored training? A female part-time worker who exhibits an above average level of labour market attachment has an incentive to compensate for the low investments made by the firm in her training, by herself making investments. The female worker may, on the one hand, invest in training in order to enhance her productivity in line with human capital theory. On the other hand, she may consider her participation in training to function as a signal of her commitment to the firm itself. We are unable to distinguish between these two mechanisms here. In either case, an investment in training made by the worker, can induce the firm to reassess its prediction of the individual’s future productivity and labour attachment.

It is clear that job characteristics are overall an important driver of the GTG, in particular for

136If we look at full-time workers only (55.9% of females and 97.5% of males), we see that the GTG for firm- sponsored, as well as for worker-sponsored training, is slightly reduced, but of similar magnitude as before. Full- time employed females participate in firm-sponsored training by -1.7pp less (full controls; p-value 0.17). For worker-sponsored training, we find a significant positive GTG of +2.5pp (p-value≈ 0.03) for the full-time sample. The lack of statistical significance for firm-sponsored training is most probably due to the lesser number of female workers left in the sample. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 156 worker-sponsored training. Job-characteristics explain around 10% of the total GTG for firm- sponsored training and almost 60% of the GTG for worker-sponsored training (Table 5.3 ). The impact of job tasks on the GTG for firm-sponsored training, is more than offset by the lesser provision of training to part-time workers and by the under-representation of females in man- agerial positions. Together, these factors lead to a negative contribution of job characteristics for firm-sponsored training. Job heterogeneity makes a significant contribution to the positive GTG seen for work-sponsored training. This is primarily due to occupational segregation, although job tasks, and to a lesser extent part-time work, also play a role.

5.4.3 Content of training courses

We have seen that job characteristics are an important driver of the GTG and women have com- parative advantages over men because they tend to perform job tasks which require a high level of training. This raises the question whether we would find similar gender differences if we were to consider the content of training courses themselves (Table 5.4). Our data enables us to distinguish between courses primarily focusing on ICT skills, foreign language skills, com- mercial or quality management, social, managerial, and health/safety content. Looking at Table 5.4, we see that after taking firm, worker and job characteristics into account, there are no firm- sponsored training courses in which female workers participate significantly more often than men.137 Men participate much more often than women in technical training courses sponsored by the firm (+2.2 pp with an average training rate of 4.5%; p-value< 0.01). Overall, few work- ers participate in firm-sponsored managerial training courses although women do tend to be provided with fewer opportunities for leadership training (controlled for personnel responsibil- ities and all other variables; -0.6pp, p-value 0.16). Men are therefore more often assigned to training courses which we can expect to have greater returns to worker productivity, and which in turn, increase the individual employee’s likelihood of promotion. Table 5.4 shows that even within the same job, men participate more in certain forms of training than women. Conversely however, there are no types of training in which men participate less often than women. It is therefore interesting to analyse the differences between male and female workers in terms of the returns to training and the role attributed to the content of training (see Section 5.5).

We observe some further differences between male and female workers in terms of the content of the worker-sponsored training courses in which they tend to participate. Women show a ten-

137Here, we estimate Equation (5.1) and regress, for example, participation in ICT training on firm, worker and job characteristics including the proxy for unobserved productivity. Std. errors are bootstrapped. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 157 dency to invest significantly more than men in courses focused on foreign language skills, and somewhat more in ICT training (p-value< 0.01). The differences in participation in technical training are rather small (-0.3pp n.s.). Particularly interesting, however, is the fact that females participate significantly less than men in worker-sponsored training concentrating on leadership skills. The magnitude of the difference between men and women is similar to that which we see for firm-sponsored training (-0.8pp, p-value< 0.01, 0.7% of workers participate on average in self-sponsored leadership courses). This might be considered to indicate one of two things. Firstly, we might perceive that women do not seek such time-consuming positions, either be- cause they prefer flexibility or because of the large anticipated penalties for lack of flexibility in such positions (Goldin, 2014). Alternatively, it might indicate that women anticipate double standards, whereby male workers would be favoured for promotion (Lazear and Rosen, 1990).

5.4.4 Sorting into firms

The Oaxaca-Blinder decomposition presented in Table 5.3 shows that firm characteristics are an important driver of the GTG for firm-sponsored training. The gender-specific sorting of workers into firms seems to be the source of around one third of the gender differences identified in employers’ provision of training to male and female workers. This applies for worker-sponsored training to a much lesser extent (explaining only 6% of the total difference).

According to Table 5.2, public enterprises sponsor less training than private firms (depending on worker and job characteristics). It is important to note, therefore, that female workers are overrepresented in these firms (compare Table 1b).138 In addition, women are underrepresented in fast-growing businesses (according to the median wage trend) which tend to sponsor more training. If we control for other firm and job characteristics, like Lynch and Black (1998), we do not find any differences pertaining to firm size.

In order to gain a more thorough understanding of the impact of gender sorting on the GTG, we investigate whether female workers are overrepresented in certain firms which generally provide less training. If this were the case, controlling for differences in average rates of participation in training would cause the GTG to vanish or at least to significantly diminish. We therefore

138This might seem surprising as the literature has frequently found that public-sector workers participate more often than private sector workers in training. Indeed, in Chapter 4, we identified higher rates of training in public enterprises. This can be primarily attributed to the occupational structure and the tasks usually performed in these firms (Table 4.10 in Chapter 4). This training often overlaps, however, with leisure time (Tables 4.8 and 4.9 in the same paper). Although we did not specifically consider the initiative for training in Chapter 4, we did not find any differences for training in working hours between public and private sector workers when taking into account the fact that most public sector workers are employed in the service sector. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 158 run a training regression (Equation (5.1)) with firm fixed-effects. Given that they are almost all time-invariant, we also exclude all establishment characteristics. This means that we have taken firm-segregation, and all policies which affect both sexes equally, into account. The GTG in firm-sponsored training falls to -1.4pp, constituting a reduction of one third on Figure 5.1 column 4. Although the gap is now statistically insignificant, it is still large in magnitude. The coefficient is reduced by only 16% in regard to worker-sponsored training. Within firms therefore, female workers still tend to participate 2.6pp more often than men in self-sponsored training. This difference is significant (p-value< 0.01). Whilst this exercise has indicated that firm heterogeneity plays a role for the GTG, differences within firms still remain.

5.4.5 The role of worker characteristics

We have so far learned that firms systematically provide more training to men, whilst women invest significantly more in training themselves. Since this phenomenon cannot be fully ex- plained by job differences and firm sorting, we must now consider the role played by worker characteristics in increasing the GTG. Human capital theory predicts that individual characteris- tics play an important role for participation in job-related training. These characteristics include education, age, work experience and productivity. Our results, presented in Table 5.2, generally confirm these predictions. It is interesting, however, that we do not find an overall training ad- vantage for highly educated workers. Individuals who hold a tertiary degree, for example, tend to participate significantly less in firm-sponsored training than individuals who have had only a vocational education (-3pp, p-value 0.02). The former are much more likely to participate in worker-sponsored training (+4.4pp, p-value< 0.01).139 In line with human capital theory, age and work experience are negatively correlated with training and the proxy for unobserved productivity exhibits positive coefficients. The latter relationship is not significant, however, in the regression concerning firm-sponsored training.

Female workers in our sample are somewhat more likely than male workers to hold a vocational qualification (76.5% vs. 74.3%) and they are less likely to hold a tertiary degree (21.6% vs 24.6%, Table 1a). They have on average 3 years less work experience than men (which is capped at 20 years, Table 1b). These characteristics favour female participation in training, explaining 10% of the GTG seen for firm-sponsored training, and 14% of the GTG for worker-

139In Chapter 4, we found that if we control for job characteristics, tertiary-educated workers in fact participate slightly more in training which takes place at least partially during workers’ leisure time. There are virtually no differences, however, for training undertaken during working hours. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 159 sponsored training (Table 5.3). The positive impact of these characteristics in increasing female participation in training, however, is offset by the negative impact of the proxy for unobserved productivity. The fact that the average level of this fixed effect for women is half a standard deviation lower than the average seen for men, explains around 10% of the GTG for both types of training.

The last finding is somewhat puzzling. We find more women employed in jobs requiring a high level of training and female workers exhibit characteristics which are associated with greater participation in training. According to levels of pre-survey fixed effects, however, women can be expected to be less productive than men. We are convinced that permanent wage compo- nents, net of education, work experience, tenure and job status (see Appendix 5.7.4), are some of the best available measures of productivity.140 These measures are not, however, without limitations. The measures are noisy, we do not have information about working hours in the social security data which we utilize to extract the fixed-effects (only information about part- time work). Another explanation for the lower productivity is that additional job differences exist between men and women which we do not capture by considering factors such as occupa- tions and job tasks. This job heterogeneity is reflected in a wage gap and, therefore, also in the differences in fixed effects between men and women (as in Goldin, 2014).

Another more troublesome explanation is the GWG itself. Female wages are systematically lower than those of their male counterparts, even within narrowly defined job cells and within the same firms. If the GWG is in part a result of gender discrimination, this would also be reflected in the differences in fixed effects (for example through in discriminatory wage policies, see Blau and Kahn, 2016). If this is true, it would be expressed in our results in lower wages and simultaneously, in lower human capital investments. But how then do we explain that results are similar for firm- and worker-sponsored training? And why should firms behave like this? Under the assumption of perfect labour markets, discriminatory firms forego profits because they have to pay higher wages (Becker, 2010)? We have no convincing answers to these questions.

In order to gain a more detailed picture of the situation described above, we next consider whether the GTG in firm- and worker-sponsored training differs between groups of more or less productive workers. According to the considerations outlined in Section 5.2, we can expect a lower GTG among highly productive workers as there should be less differences in labour at- tachment between men and women. According to statistical discrimination models, this should

140Results are very similar if we use a random coefficient model in which we allow for individual heterogeneity not only with regard to the intercept but also for the linear and quadratic slope coefficient of wages. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 160 be more relevant for firm-sponsored training. To examine this hypothesis, we look at three proxies for productivity. Firstly, whether a worker holds a tertiary degree.141 Secondly, whether a worker performs complex job tasks.142. Finally, we consider whether a worker exhibits an above average fixed effect. We split our sample according to the three group criteria and run separate regressions by subgroup for both training outcomes on the full set of variables.

Overall, we find large differences in the GTGs of the various subgroups. The GTG in firm- sponsored training is driven by differences in training participation amongst less productive men and women. Amongst the group of non-university educated workers, female participation rates are on average 3.5pp lower than those of male workers. We find a similar gap when considering workers who perform routine or manual tasks and for those with below-average unobserved productivity. In contrast, there are no significant differences between men and women within the "highly productive" subgroups. Participation in firm-sponsored training tends to be even higher amongst tertiary educated women than it is amongst their male counterparts (+0.7 pp, albeit not statistically significant).

In contrast, ”highly productive” women participate more in worker-sponsored training than "highly productive" men. Tertiary educated women for instance, participate in worker- sponsored training on average 7.3pp more than their male counterparts (p-value< 0.01). The gap still totals +3.9pp for workers who perform complex job tasks. And for workers who ex- hibit above-average levels of productivity, we find a GTG of +5.9pp. Within the subgroup of less productive workers, we find slightly higher rates of female worker-sponsored training for the group of non-tertiary educated workers (+1.8pp, n.s.) and virtually no gender differences within the other two subgroups.

In the Appendix 5.7.1, we show that the GTG is not driven by the presence of children. Mothers of children up to the age of 10 exhibit very low participation rates in training (firm- or worker- sponsored), but we still document a negative GTG of 1.9pp for firm-sponsored training for female workers without children (see Table 5.5; Fitzenberger and Muehler, 2014, find similar results). The GTG in worker-sponsored is also not explained through motherhood. Furthermore, the GTG cannot be explained by differences in average (firm-specific) turnover rates between female and male workers (see also Royalty, 1996). Although women show higher turnover probabilities, in particular in their thirties, this does not explain why firms provide less training

141There are only few low-skilled workers in our sample (1.4%). The majority of individuals who did not obtain a tertiary degree hold a vocational qualification 142Complex tasks are here defined as analytical tasks such as gathering information, investigating or researching (see Black and Spitz-Oener, 2010). We find similar results if we look at complex interactive tasks. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 161 to them (see Appendix 5.7.2).

To sum up, firms are more reluctant to invest in training for less productive women, than they are for men with similar skills. In contrast, women who are more productive tend to invest more often in training themselves, than do their male peers. The first finding suggests that discrim- ination (whether statistical or taste-based remains unclear) does indeed continue to play a role in perpetuating the GTG. How, however, do we explain the results for highly productive work- ers? Firstly, regardless of their productivity levels, all women might expect to be (statistically) discriminated against. Highly productive women might accordingly react by investing in train- ing themselves. Secondly, women with particular career aims might face double standards for promotions. They might therefore deem it necessary to make a greater self-investment, thereby signalling their willingness and qualification level (as suggested in the model of Lazear and Rosen, 1990). Finally, some female workers might later recoup foregone investments which were earlier missed due to career interruptions or periods of working reduced hours. If firms do not provide such training courses, more productive women might have an incentive to make investments themselves in order to equalise human capital differences.

5.5 Results for Wages

The previous section has shown that in comparison to their male counterparts, female workers participate less often in firm-sponsored training, and more often in worker-sponsored training. If participation in training is associated with higher wages, we can expect opposing effects on the GWG. Wage returns to training might, however, be close to zero if it is the firm rather than the individual who reaps the majority of benefits from employees’ participation in training. Furthermore, the wage effects of participation in training might differ between both firm- and worker-sponsored training and between male and female workers. In this section, in order to gain a clearer picture of these potential differences, we consider the wage returns for male and female workers on the two different types of training considered. This in turn allows us to analyse the overall effect of the GTG on the GWG.

We estimate a log-linear model via ordinary least squares in which we regress log hourly wages on participation in training, interacted with an indicator for being female, and on control vari- 5. GENDER DIFFERENCES IN WAGES AND TRAINING 162 ables. We again distinguish between firm- and worker-sponsored training.

Log(Hourly Wage)it = Titτ + Xitβ +α ˆiγ + Femaleiδ (5.3)

+ Trainingi,t−1γ + Trainingi,t−1 × Femaleiη + it

We consider the hourly wages of individual i at time t. As in the previous section, we control for time effects (T ) and unobserved productivity αˆ. X includes the same list of firm, worker and 143 job characteristics as in Equation (5.1). Trainingi,t−1 indicates training courses which were attended by individual i in the 12-24 month period (depending on the wave and on prior survey participation) prior to the interview at point t. To measure wage returns to training, we look at γ and η. γ measures the returns to training for male workers while η gives the corresponding difference in wage returns to training for female workers. We limit the analysis to workers who participated in one type of training and exclude individuals who participated in both types of training during the observation period.

In order to establish a credible quasi-causal link between training and wages, we apply a so- called comparison-group-approach (Leuven and Oosterbeek, 2008). We compare those who participated in training, to those who intended to attend training but did not due to more or less random events. This approach has been applied for the WeLL data by Goerlitz (2011).144 Goerlitz (2011) analyses wage returns to training without differentiating between firm-financed or worker-financed training or between men and women. Using this approach, she finds small positive but insignificant wage returns of about 0.5% for participation in one training course. Random events includes training courses which were cancelled by the provider, and those which had to be cancelled by the worker due to a high workload.145 Results based on the full sample of workers are generally similar.

The results are presented in Table 5.6 in columns 3 and 4. If we look at the control variables in our wage regression, we see that results for individual characteristics are in line with other studies. We find higher wages for individuals who hold a tertiary degree and for male workers. The age-profile of wages is inversely U-shaped, and increasing with experience. Establishment characteristics do not seem greatly significant. This is most probably due to the small sample

143Additionally, we add information about unionization which we did not include in Equation (5.1), see Footnote 128. 144Dietz and Zwick (2016) use this approach to measure the retention effects of training. 145Goerlitz (2011) used the first wave of WeLL and also considered cancellation of attendance due to family or health reasons. We choose not to include these reasons as they are not available in subsequent waves of WeLL. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 163 size (n=149) and the sample selection criteria (firms between 100 and 2,000 employees). Whilst some job-tasks are moderately related to wages, we find large differences between occupational groups.

Table 5.6 further illustrates that wage returns to firm-sponsored training are small and close to zero (1.1% for men, column 3). Female workers seem to benefit more than men from participa- tion in firm-sponsored training but the difference compared to their male peers is insignificant (p-value 0.35). There are contrasting results for worker-sponsored training (column 4). We observe significant and high wage returns for worker-sponsored training to the order of 4.2% for male workers. Wage returns are lower for female workers (2.7%).146 The gender gap seen in wage returns for worker-sponsored training is again not significantly different (p-value 0.57). We find similar results for the whole sample (see columns 1 and 2).

The low wage returns for firm-sponsored training can be explained by the seminal model of human capital theory (Becker, 1964). If training provided by the firm were purely firm-specific, the firm would bear the total costs and reap the full benefits. We would thus not observe wage returns to training.147 There are, however, two considerations which contradict this argument. Firstly, the majority of training courses are rather general (see Chapter 4). Secondly, workers may leave the firm which provided training if they do not see an increase in wages.

What then might explain the low returns to firm-sponsored training? Monopsony power of the firms could limit poaching (Dietz and Zwick, 2016). Such market power might develop if out- side firms have less information on the real productivity of a worker than the training firm. In addition, we also see that most of the observed training courses are rather short (the median number of hours per firm-sponsored training course is 16). To expect a direct effect on wages might therefore be somewhat unrealistic, at least in the short-run. Having said that, there might be medium- to long-run career effects which we cannot observe using a short observation pe- riod. Beyond that, training might have other payoffs. According to Leuven et al. (2005) for instance, training might be used by the firm to induce the reciprocity of the worker. Where a firm invests in training, this may increase the worker’s positive opinion of the firm and he or she may, in turn, effectively reward the firm with lower turnover intention or more effort. In addi- tion, training might serve as employment insurance for some workers for whom productivity is expected to drop below a certain threshold level. In such a case we would expect an impact on

146exp0.042−0.015 ≈ 2.7% 147To test this argument, one could look at firm’s value added or other productivity measures (e.g. Dearden et al., 2006). Our sample is, however, rather small and such an analysis would go beyond the scope of this study. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 164 layoffs but not on wages (at least no wage increases).

The large wage returns to worker-sponsored training of 3%-4% might seem surprising. There are several arguments, however, that support this observation. Firstly, worker-sponsored train- ing courses tend in general to be longer than firm-sponsored courses; the median number of hours is 20 compared to 16. It is particularly important to note that very intensive training courses which might have larger effects are very often worker-sponsored. 25% of worker- sponsored courses last for at least 50 hours or more, whilst this is true for only 10% of firm- sponsored training courses. We will investigate the impact of training hours on hourly wages in future research. Secondly, we do not claim that the link between participation in worker- sponsored training and wages is a result of higher productivity. Attending a training course on one’s own initiative, and even more so, one which takes place at least partially in the worker’s leisure time, gives the firm a strong signal of individual labour attachment. This signal is only observable if a worker actually attends a training course, but not if the course is cancelled (even if this was due to random events).

If this is the case, we would expect that wage returns to worker-sponsored training would be greater for women. In fact, they tend to be smaller. To explain this finding, we investigate workers’ motivation for participating in training courses. In the survey, workers were asked to state their main objective in attending training courses. 16.4% of male workers and 14.0% of female workers cited higher wages. For 23.1% of men and 20.5% of women, career prospects was the most important motive. Surprisingly, job security is by far the most prominent outcome; 37.9% of men and 42.1% of women cited this as a primary motive. We find that job security is a significantly more important motive for women than it is for men.148 There are however, no differences in regard to career prospects or wages.

In addition, workers were asked to make a subjective assessment about the returns to training (e.g. "From today’s point of view, how did the participation in courses or seminars affect the following aspects of your work? How did the training course impact your salary?"). In line with our findings for firm-sponsored training, workers make rather pessimistic assessments of the impact of training courses on wages and career prospects. For 93.5% of male workers and 97.2% of female workers, participation in training had "no" or "rather no" effect on their salary. The corresponding numbers for promotion prospects (90.3% resp. 85.4%) are similarly nega- tive. Individuals are, however, more satisfied with the impact on their individual job security,

148Here we regress an indicator of whether job security is a very important motive for participation in training in a linear regression on firm, worker, job characteristics and unobserved productivity. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 165 with only 42% of male workers and 34% of female workers stating that training participation had (rather) no effect. The gender difference here suggests that women indeed use training participation as a signalling device. The primary aim is generally not, however, to gain a wage increase but rather to improve employment perspectives.

To sum up, we find no wage returns to firm-sponsored training but large returns to worker- sponsored training. The majority of workers cite job security as their main motivation for par- ticipating in training. Workers are often fairly satisfied with the impact of attending training on job security, but not on wages or career prospects. Gender differences are rather small with re- gard to the benefits of training, although men might gain more from investing themselves. This might be explained by their greater bargaining power (compare Card et al., 2016). There are some gender differences between men and women in terms of their motivations for participation in training. Women in particular attend training in order to improve their job security.

The aim of this study was to investigate whether gender differences in training participation can help explain the GWG. The simple answer to this question is that we do not find any such relationship. Figure 5.2 shows an Oaxaca-Blinder decomposition of the GWG to which we have subsequently added groups of variables. The final bar adds participation in firm- and worker-sponsored training to the decomposition. We see that the explained part of the GWG does not increase at all. The main reason for this finding is that there are two counterbalancing effects. Female workers participate more in worker-sponsored training but they gain somewhat less from these courses than male workers. In addition, women benefit slightly more than men from firm-sponsored training but they attend fewer such courses. Future research should analyse long-run wage and career effects of (repeated) training investments.149

5.6 Conclusions

Human capital is a major causal driver of individual and group differences in wages. Accord- ingly, many studies have investigated the impact of differences in formal education on the gen- der wage gap. Research has looked at both the duration of formal education, as well as indi- viduals’ field of study. Theoretical models have highlighted the significance of participation in training for gender differences in wages. Empirical evidence in this field does, however, remain limited. This is surprising insofar as several studies have found that women are provided with less firm-financed training. In this study, we look at the gender training gap (GTG), i.e. the

149In a previous version of this paper we used imputed long-run training histories (up to 20-years) and analysed the impact on the gender wage gap. Even this exercise revealed only a small explanatory power. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 166 male-female difference in average participation rates in different forms of training. We distin- guish between firm-sponsored training, which takes place during working hours, and training courses which are initiated by the worker and which overlap with workers’ leisure time.

We confirm findings from previous studies by showing that there is a negative GTG in firm- sponsored training – on average male workers are 18% more likely than female workers to be trained on the initiative of the firm during working hours. Whilst this gap can in part be explained by differences between firms, between male and female workers and the jobs they perform, a significant proportion of the GTG remains unexplained.

Perhaps surprisingly, firms seem to be less selective in their training decisions than workers. Firm-sponsored training is more equally distributed across the workforce. Only 41% of the GTG in firm-sponsored training can be explained by a large set of observable characteristics. In the case of worker-sponsored training, however, we can explain 68% of the GTG in refer- ence to such characteristics.150 It is therefore particularly remarkable that we observe a GTG in firm-sponsored training. We show that these gender differences are driven by the group of less productive workers. Females who do not hold a tertiary degree and who perform routine tasks receive much less training than their male colleagues. There is not, however, a GTG amongst the group of highly productive individuals. Job segregation, and in particular the un- derrepresentation of women in leadership roles, as well as the prevalence of part-time work amongst female workers, do explain the GTG to some extent. The same holds for unobserved productivity (although we cannot rule out that this reflects pre-existing discrimination).

It appears that firms do not support female workers who are not in high-wage jobs or who prefer to maintain more individual flexibility by working part-time. This finding can be explained neither by lower labour attachment per se, nor by the preferences identified amongst these workers. If this were the case, we would observe similar gender differences for firm-sponsored training and for worker-sponsored training. This is not the case. The frequency with which less productive men and women participate in worker-sponsored training is similar. Such a finding might be explained by a statistical discrimination model in which firms assign workers to either a job which requires much training but offers a high wage, or to a job which requires little training but offers only a low wage, depending on an employee’s gender (Barron et al., 1993). Female workers may invest themselves in training either to increase their productivity

150We have described similar findings in Chapter 4. A decomposition approach based on a multilevel logit model, revealed that 35% of the individual heterogeneity in participation in training overlapping with leisure time is due to differences between workers (observable or unobservable). The comparable figure for training during working hours is 23%. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 167 or to signal that their individual labor attachment is higher than of other less productive women. This finding is also in line with the considerations of Goldin (2014) who attributes a large share of gender differences in the labour market to individuals’ willingness to work long hours. Whilst we cannot rule out that firms discriminate (statistically) against women, there might be further job segregation between male and female workers which we have not captured. Such job heterogeneity might arise due to voluntary choices made by the worker. Future research on gender differences in the labour market should further explore this.

Highly productive female workers participate in firm-sponsored training just as often as their male counterparts and, in addition, they invest much more often in training themselves. It might be the case that standards for promotion are higher for women than for men (Lazear and Rosen, 1990) and career-focused women must attend more training than men in order to improve their career prospects. Signalling may also play an important role here. Highly productive women, in particular, might additionally invest in training themselves in order to signal their true labour attachment and job engagement to the firm.

The benefits of participation in training differ between firm-sponsored training and worker- sponsored training. There is less difference, however, in the benefits seen for male and female workers. Whilst we do not find large wage returns to firm-sponsored training, wage effects are higher for attendance of worker-sponsored courses. The latter finding seems to be somewhat more pronounced amongst male workers. Workers themselves state that they have not benefited from training in terms of higher wages or better career prospects. They are, however, more satisfied with their job security after having attended a training course. We also find that job security is a more significant objective for participation in a training course than wages or promotions. This applies in particular to women. The fact that female workers value job security more than their male peers may explain why returns on worker-sponsored training might be lower for men. Future research should focus on identifying the long-term effects of repeated investments in training for men and women, and on heterogeneity regarding wage returns on different forms of training. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 168

5.7 Appendix

Figures

Figure 5.1: The Gender-Training Gap for Different Forms of Training and Different Set of Controls

Note: Figure 5.1 shows the female coefficient δ in Equation (5.1). The rows show the different training outcomes and the columns the different set of controls: (1) raw gap, controlling only for time effects. (2) additionally controlling for socio-demographic and establishment characteristics. (3) additionally controlling for characteristics of the job and the employment relationship. (4) additionally adding pre-survey fixed-effects. Significance levels: ∗= significant at 10%-level, ∗∗= significant at 5%-level, ∗∗∗= sig- nificant at 1%-level. We report cluster-robust standard error (on the establishment- level) except for (4) in which we bootstrap standard error (clustered on the firm-level) to account for the fact that fixed-effects are estimated. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 169

Figure 5.2: Oaxaca-Blinder Decomposition of the Gender-Wage-Gap

Note: Oaxace-Blinder decomposition of log hourly wages, similar to Equation (5.2). The reference group is a pooled regression. Results are available upon request. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 170

Tables

Table 1a: Descriptive Statistics Establishment, Worker and Job Characteristics (Binaries)

Female Male Absolute Percent Absolute Percent Female 3,511 100.0 % 6,394 100.0 % Training Participation 1,785 50.8 % 3,088 48.3 % Firm-sponsored Training 640 18.2 % 1,412 22.1 % Worker-sponsored Training 697 19.9 % 673 10.5 % Year 2007 1,184 33.7 % 2,205 34.5 % Year 2008 929 26.5 % 1,699 26.6 % Year 2009 819 23.3 % 1,425 22.3 % Year 2010 579 16.5 % 1,065 16.7 % Establishment in Bavaria 674 19.2 % 1,247 19.5 % Establishment in MV 312 8.9 % 658 10.3 % Establishment in NRW 1,088 31.0 % 2,085 32.6 % Establishment in S-H 196 5.6 % 602 9.4 % Establishment in Saxony 1,241 35.3 % 1,802 28.2 % Service Sector 2,034 57.9 % 2,324 36.3 % Public Sector 1,265 36.0 % 631 9.9 % Employees 100-200 1,027 29.3 % 2,123 33.2 % Employees 200-500 1,485 42.3 % 2,287 35.8 % Employees 500-2000 999 28.5 % 1,984 31.0 % Agreement Sector 2,561 72.9 % 4,382 68.5 % Agreement Company 224 6.4 % 1,037 16.2 % Worker Council 3,167 90.2 % 5,984 93.6 % Low-Skilled 66 1.9 % 70 1.1 % Vocational Qualification 2,686 76.5 % 4,749 74.3 % Tertiary Degree 759 21.6 % 1,575 24.6 % Limited Contract 217 6.2 % 302 4.7 % Leadersip Responsibilities 676 19.3% % 2,280 35.7 % Part Time 1,548 44.1 % 162 2.5 % Computer Work 3,051 86.9 % 5,246 82.0 % Task production 432 12.3 % 1,977 30.9 % Task monitoring 772 22.0 % 2,679 41.9 % Task serving 2,098 59.8 % 2,670 41.8 % Task repairing 166 4.7 % 1,394 21.8 % Task buying 611 17.4 % 1,070 16.7 % Task consulting 2,314 65.9 % 3,570 55.8 % Task measuring 1,314 37.4 % 3,227 50.5 % Task organizing 2,369 67.5 % 4,076 63.7 % Task negotiaing 902 25.7 % 1,661 26.0 % Task informing 2,135 60.8 % 3,723 58.2 % Task researching 212 6.0 % 979 15.3 % Task teaching 824 23.5 % 1,159 18.1 % Occ. Other Manufacturing 135 3.8 % 491 7.7 % Occ. Chemicals / Synthetics 28 0.8 % 207 3.2 % Occ. Metalworking 33 0.9 % 535 8.4 % Occ. Machine Building 28 0.8 % 688 10.8 % Occ. Electric <10 <0.3 % ca. 280 ca. 4.5 % Occ. Assembler 109 3.1 % 198 3.1 % Occ. Food 57 1.6 % 104 1.6 % Occ. Product Test / Dispatcher 70 2.0 % 129 2.0 % Occ. Technical Professionals 60 1.7 % 743 11.6 % Occ. Technician 145 4.1 % 619 9.7 % Occ. Purchasing 104 3.0 % 160 2.5 % Occ. Dealer 412 11.7 % 226 3.5 % Occ. Drivers / Storemen 89 2.5 % 692 10.8 % Occ. Organization 824 23.5 % 667 10.4 % Occ. Health / Nursery 965 27.5 % 360 5.6 % Occ. Education / Social 280 8.0 % 143 2.2 % Occ. Other Service 162 4.6 % 152 2.4 % Note: State S-H refers to Schleswig-Holstein, NRW to North Rhine-Westphalia and MV to Mecklenburg-Vorpommern. The exact numbers for the electric occupations are anonymized due to the small number of females. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 171

Table 1b: Descriptive Statistics Establishment, Worker and Job Characteristics (Continuous)

Female Male Mean SD Mean SD Hourly Wage in Euro 13.10 5.10 16.82 8.31 Proxy Unobs. Heterogneity (Std.) -0.12 0.92 0.42 0.86 Age Years 44.1 7.0 44.4 6.7 Work Experience 11.7 4.5 14.9 3.3 Tenure 11.0 5.5 11.4 5.6 Median Wage 83.3 21.1 78.9 17.8 Median Wage Trend 1.9 1.2 2.1 1.4 Wage Compression (Std.) 0.25 1.1 -0.01 1.0 Note: Establishment Characteristics are weighted by sex-specific em- ployee size. Standardization has been done for the total workforce, see Appendix 5.7.4. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 172

Table 5.2: Participation in Job-related Training Courses

Dependent Variable: Participation in Job-related Training Courses Overall Firm-sponsored Worker-sponsored Year 2008 -0.057* (0.031) 0.018 (0.032) -0.043* (0.023) Year 2009 -0.024 (0.030) 0.030 (0.030) -0.035 (0.022) Year 2010 -0.082*** (0.028) 0.025 (0.028) -0.056*** (0.020) Months 0.036*** (0.013) 0.030** (0.014) 0.009 (0.010) State Saxony 0.043* (0.022) 0.032* (0.019) 0.027** (0.013) State Bavaria 0.018 (0.021) 0.015 (0.018) -0.002 (0.013) State NRW 0.006 (0.021) -0.001 (0.018) 0.017 (0.013) State MV 0.012 (0.024) 0.001 (0.022) 0.045*** (0.016) Service Sector 0.027* (0.015) 0.009 (0.014) -0.001 (0.010) Public Sector -0.028* (0.017) -0.067*** (0.015) 0.029* (0.016) Employees 100-200 0.009 (0.013) -0.009 (0.011) 0.007 (0.009) Employees 500-2000 -0.006 (0.012) -0.008 (0.011) 0.009 (0.008) Median Wage 0.000 (0.000) 0.000 (0.000) 0.000 (0.000) Median Wage Trend 0.017*** (0.004) 0.014*** (0.004) 0.009*** (0.003) Wage Compression 0.002 (0.007) 0.014** (0.006) -0.004 (0.004) Workers’ Council 0.033 (0.020) 0.024 (0.017) 0.011 (0.013) Tertiary Educ. 0.003 (0.015) -0.030** (0.013) 0.044*** (0.012) No Voc. Qualification -0.057* (0.034) -0.069*** (0.023) -0.017 (0.015) Age -0.169* (0.102) -0.167* (0.091) -0.041 (0.078) Age Squared 0.108 (0.092) 0.124 (0.081) 0.022 (0.069) Experience 0.000 (0.008) -0.003 (0.007) -0.017*** (0.006) Limited Contract 0.002 (0.023) -0.021 (0.020) 0.042** (0.018) Leader 0.060*** (0.013) 0.038*** (0.011) 0.009 (0.008) Part Time -0.031* (0.019) -0.029* (0.015) 0.018 (0.013) Computer Work 0.099*** (0.015) 0.035*** (0.012) 0.048*** (0.008) Task production -0.058*** (0.016) -0.050*** (0.014) -0.007 (0.009) Task monitoring -0.007 (0.012) -0.009 (0.010) -0.017** (0.008) Task serving 0.028** (0.011) 0.028*** (0.010) 0.001 (0.008) Task repairing 0.007 (0.016) 0.000 (0.014) -0.010 (0.009) Task buying 0.006 (0.014) -0.009 (0.012) 0.013 (0.011) Task consulting 0.046*** (0.012) 0.022** (0.011) 0.006 (0.009) Task measuring 0.003 (0.010) 0.015 (0.010) -0.012 (0.008) Task organizing 0.047*** (0.013) 0.004 (0.011) 0.021** (0.009) Task negotiaing 0.04*** (0.012) 0.032*** (0.012) 0.003 (0.010) Task informing 0.06*** (0.013) 0.010 (0.011) 0.023*** (0.009) Task researching 0.035** (0.017) 0.018 (0.017) 0.021 (0.013) Task teachning 0.001 (0.013) -0.001 (0.012) 0.008 (0.010) Occ. Other Manufacturing -0.078** (0.038) -0.085*** (0.031) 0.052** (0.022) Occ. Chemicals / Synthetics -0.019 (0.042) -0.051 (0.040) 0.082*** (0.029) Occ. Metalworking -0.100** (0.039) -0.071** (0.034) 0.038 (0.023) Occ. Machine Building -0.059 (0.037) -0.063** (0.031) 0.072*** (0.023) Occ. Electrician etc. 0.018 (0.043) -0.072* (0.037) 0.125*** (0.028) Occ. Assembler -0.177*** (0.041) -0.124*** (0.034) 0.036 (0.024) Occ. Food -0.137*** (0.042) -0.100*** (0.036) 0.032 (0.025) Occ. Product Test / Dispatcher -0.052 (0.044) -0.061 (0.040) 0.088*** (0.028) Occ. Technical Professionals 0.015 (0.033) -0.036 (0.030) 0.009 (0.024) Occ. Technician -0.021 (0.034) -0.066** (0.029) 0.093*** (0.023) Occ. Dealer 0.001 (0.039) -0.036 (0.037) 0.067** (0.028) Occ. Drivers / Storemen -0.085** (0.033) -0.100*** (0.028) 0.083*** (0.023) Occ. Organization 0.028 (0.027) -0.080*** (0.024) 0.065*** (0.021) Occ. Health, Nursery 0.120*** (0.027) -0.083*** (0.024) 0.225*** (0.020) Occ. Education / Social 0.143*** (0.033) -0.094*** (0.029) 0.185*** (0.029) Occ. Other Service -0.094*** (0.036) -0.106*** (0.029) 0.041* (0.023) Female -0.021 (0.014) -0.021* (0.012) 0.033*** (0.010) Proxy Unobs. Heterogneity 0.077*** (0.019) 0.029* (0.015) 0.052*** (0.016) Constant 0.281*** (0.058) 0.174*** (0.049) -0.017 (0.037) N 9905 9905 9905 F-Statistic 64.2 17.9 27.4 R2 0.17 0.05 0.11 Note: Mandatory-job training courses are excluded. Regression includes a variable which takes into ac- count the number of months a worker has been retrospectively asked. Reference occupation is "other services". We report bootstrap standard error (300 rep.). Significance levels: ∗= significant at 10%-level, ∗∗= significant at 5%-level, ∗∗∗= significant at 1%-level. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 173

Table 5.3: Decomposition of the Gender Training Gap

Firm-sponsored Worker-sponsored Raw-Gap -3.9pp +9.4pp Explained -1.6pp +6.4pp Firms -1.2pp +0.6pp Worker +0.4pp +1.3pp Productivity -0.4pp -1.0pp Jobs -0.4pp +5.4pp Occupation -0.1pp +3.8pp Tasks +1.4pp +1.1pp Leadership -0.6pp -0.2pp Part-Time -1.2pp +0.6pp Unexplained -2.2pp +2.9pp Note: Table 5.3 shows the results of the Oaxaca-Blinder decomposition (Equation (5.2)) for firm- and worker-sponsored training. The reference group is a pooled regression.

Table 5.4: Gender Training Gap by Content of Training

Firm-sponsored Worker-sponsored IT training +0.1% +1.0%*** Foreign language training -0.1% +1.6***% Commercial training +0.3% -0.1% Technical training -2.3%*** -0.3% Social training +0.1% +0.7% Managerial training -0.6% -0.8%*** Health/safety training -0.4% +0.7% Note: Table 5.4 shows the female coefficient δ in Equation (5.1). The rows show the different training ouctomes. The set of controls include time effects, worker and establishment characteristics, job characteristics and occupations (incl. man- gerial responsibilities and job-tasks), pre-survey fixed-effects. We report boot- strap standard error (300 rep.). Significance levels: ∗= significant at 10%-level, ∗∗= significant at 5%-level, ∗∗∗= significant at 1%-level.

Table 5.5: Gender Training Gap and Presence of Children

Firm-sponsored Worker-sponsored Child aged 0-3 -0.037 -0.007 Child aged 4-6 +0.016 -0.018 Child aged 7-10 -0.008 +0.004 Child aged 11-18 +0.008 -0.028 Female -.019 +0.029 Female x child aged 0-3 -0.074 -0.073 Female x child aged 4-6 -0.057 +0.004 Female x child aged 7-10 -0.028 -0.055 Female x child aged 11-18 +0.025 +0.033 Note: The set of controls include time effects, worker and establishment charac- teristics, job characteristics and occupations (incl. mangerial responsibilities and job-tasks), pre-survey fixed-effects. We do not report (bootstrapped) standard error here due to the small number of parents of young children. Table 5.6: Wage-Returns to Training

Dependent Variable: Log(Hourly Wages) Firm-spons. Worker-spons. Firm-spons. Worker-spons. Year 2008 -0.016 (0.02) 0.008 (0.02) -0.011 (0.02) 0.064** (0.03) Year 2009 -0.004 (0.02) 0.012 (0.02) 0.008 (0.02) 0.062** (0.03) Year 2010 0.016 (0.02) 0.040* (0.02) 0.010 (0.02) 0.088*** (0.03) Months -0.014 (0.01) -0.004 (0.01) -0.004 (0.01) 0.029** (0.01) State Saxony -0.092*** (0.01) -0.082*** (0.02) -0.134*** (0.02) -0.127*** (0.03) State Bavaria -0.032** (0.01) -0.021 (0.02) -0.035* (0.02) 0.000 (0.03) State NRW -0.027** (0.01) -0.024 (0.02) -0.005 (0.02) 0.009 (0.03) State MV -0.100*** (0.02) -0.091*** (0.02) -0.152*** (0.03) -0.140*** (0.04) Service Sector -0.088*** (0.01) -0.089*** (0.01) -0.065*** (0.02) -0.059*** (0.02) Public Sector 0.077*** (0.01) 0.076*** (0.01) 0.027 (0.02) 0.024 (0.02) Agreement Sector 0.043*** (0.01) 0.034*** (0.01) 0.044** (0.02) 0.015 (0.02) Agreement Company 0.040*** (0.01) 0.039*** (0.01) 0.051** (0.02) 0.052** (0.02) Employees 100-200 -0.001 (0.01) -0.002 (0.01) -0.007 (0.01) -0.006 (0.02) Employees 500-2000 -0.004 (0.01) -0.010 (0.01) -0.003 (0.01) -0.020 (0.02) Median Wage 0.005*** (0.00) 0.006*** (0.00) 0.003*** (0.00) 0.005*** (0.00) Median Wage Trend 0.057*** (0.00) 0.061*** (0.00) 0.049*** (0.00) 0.066*** (0.01) Wage Compression 0.000 (0.00) 0.001 (0.00) -0.005 (0.01) -0.006 (0.01) Workers’ Council 0.051*** (0.01) 0.038** (0.01) 0.063*** (0.02) 0.043* (0.03) Tertiary Educ. 0.127*** (0.01) 0.143*** (0.01) 0.098*** (0.02) 0.151*** (0.02) No Voc. Qualification -0.096*** (0.03) -0.104*** (0.03) -0.144** (0.06) -0.177*** (0.06) Age 0.288*** (0.08) 0.321*** (0.08) 0.245** (0.11) 0.392*** (0.13) Age Squared -0.256*** (0.07) -0.280*** (0.07) -0.216** (0.10) -0.337*** (0.12) Experience 0.082*** (0.01) 0.073*** (0.01) 0.082*** (0.01) 0.062*** (0.01) Limited Contract -0.067*** (0.02) -0.045* (0.02) -0.106** (0.04) -0.043 (0.04) Leader 0.074*** (0.01) 0.078*** (0.01) 0.061*** (0.01) 0.065*** (0.01) Part Time -0.170*** (0.02) -0.171*** (0.02) -0.001 (0.02) -0.015 (0.03) Computer Work -0.019 (0.01) -0.030** (0.01) -0.003 (0.03) -0.056 (0.04) Task production 0.000 (0.01) 0.008 (0.01) -0.005 (0.02) 0.024 (0.02) Task monitoring -0.004 (0.01) -0.002 (0.01) -0.009 (0.01) -0.001 (0.02) Task serving 0.002 (0.01) 0.007 (0.01) -0.011 (0.01) -0.001 (0.02) Task repairing 0.002 (0.01) -0.001 (0.01) 0.005 (0.01) 0.003 (0.02) Task buying -0.003 (0.01) -0.011 (0.01) 0.008 (0.01) -0.013 (0.02) Task consulting 0.010 (0.01) 0.014* (0.01) -0.006 (0.01) 0.004 (0.02) Task measuring -0.029*** (0.01) -0.032*** (0.01) -0.021** (0.01) -0.026* (0.01) Task organizing 0.030*** (0.01) 0.029*** (0.01) 0.034*** (0.01) 0.027 (0.02) Task negotiaing 0.030*** (0.01) 0.031*** (0.01) 0.009 (0.01) 0.011 (0.01) Task informing 0.018** (0.01) 0.023*** (0.01) 0.010 (0.01) 0.024 (0.02) Task researching 0.032*** (0.01) 0.028** (0.01) 0.040** (0.02) 0.047** (0.02) Task teachning 0.007 (0.01) 0.008 (0.01) 0.009 (0.01) 0.023 (0.02) Occ. Other Manufacturing -0.093*** (0.03) -0.088*** (0.02) -0.138*** (0.04) -0.148*** (0.06) Occ. Chemicals / Synthetics -0.092*** (0.03) -0.082** (0.03) -0.163*** (0.04) -0.158*** (0.06) Occ. Metalworking -0.056** (0.02) -0.052** (0.02) -0.070* (0.04) -0.080 (0.06) Occ. Machine Building -0.065*** (0.02) -0.060*** (0.02) -0.083** (0.03) -0.109** (0.05) Occ. Electrician etc. 0.001 (0.02) 0.009 (0.03) -0.048 (0.04) -0.052 (0.06) Occ. Assembler -0.149*** (0.03) -0.152*** (0.03) -0.187*** (0.05) -0.264*** (0.05) Occ. Food -0.119*** (0.04) -0.149*** (0.04) -0.166*** (0.06) -0.347*** (0.07) Occ. Product Test / Dispatcher -0.119*** (0.04) -0.115*** (0.04) -0.088** (0.04) -0.077 (0.05) Occ. Technical Professionals 0.132*** (0.02) 0.121*** (0.02) 0.106*** (0.04) 0.062 (0.05) Occ. Technician 0.063*** (0.02) 0.064*** (0.02) 0.018 (0.03) 0.012 (0.04) Occ. Dealer 0.030 (0.03) 0.046 (0.03) 0.003 (0.04) 0.020 (0.05) Occ. Drivers / Storemen -0.095*** (0.02) -0.087*** (0.02) -0.136*** (0.04) -0.166*** (0.05) Occ. Organization 0.096*** (0.02) 0.080*** (0.02) 0.078** (0.03) 0.019 (0.04) Occ. Health, Nursery 0.063*** (0.02) 0.073*** (0.02) 0.027 (0.03) 0.031 (0.03) Occ. Education / Social 0.150*** (0.03) 0.149*** (0.02) 0.078** (0.04) 0.060 (0.04) Occ. Other Service -0.232*** (0.03) -0.250*** (0.03) -0.243*** (0.04) -0.353*** (0.06) Female -0.071*** (0.01) -0.071*** (0.01) -0.062*** (0.02) -0.060** (0.02) Proxy Unobs. Heterogneity 0.280*** (0.02) 0.276*** (0.02) 0.387*** (0.03) 0.349*** (0.03) Firm-sponsored Training 0.007 (0.01) 0.011 (0.01) Firm-sponsored Training x Female 0.018 (0.02) 0.023 (0.02) Worker-sponsored Training 0.034** (0.01) 0.042** (0.02) Worker-sponsored Training x Female -0.007 (0.02) -0.015 (0.03) Constant 2.020*** (0.04) 1.970*** (0.04) 2.130*** (0.07) 2.000*** (0.09) N 8535 7853 2815 2133 F-Statistic 84.2 84.3 112.5 72 R2 0.74 0.73 0.77 0.74 Comparison Group Approach No No Yes Yes Note: Excluded are workers who participate in both firm- and worker-sponsored training. Regression includes a variable which takes into account the number of months a worker has been retrospectively asked. Reference occupation is "other services". We report bootstrap standard error (300 rep.). Significance levels: ∗= significant at 10%-level, ∗∗= significant at 5%-level, ∗∗∗= significant at 1%-level. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 175

Further Analyses

5.7.1 Presence of Children

Here, we consider the role of career interruptions. We investigate how the GTG develops after a woman has had a child. For this purpose, we add to Equation (5.1) controls for having children aged 0-2, 3-5, 6-9 and 10-18 which we interact with being female (see Table 5.5). Our results indicate that female workers receive less firm-sponsored training, even if they do not have chil- dren in the household (-1.9pp, n.s.). In addition, we find that mothers with children below the age of 10 participate less often in firm-sponsored training than fathers with children in this age category. Finally, the results show that the GTG for firm-sponsored training is particularly pro- nounced amongst parents with toddlers (additionally -7.4pp, however the sample size is quite small). This finding is particularly notable as even male workers with small children up to the age of three participate less in firm-sponsored training (-4.2pp, n.s.). If we consider worker- sponsored training, we again find a significant (positive) GTG in worker-sponsored training of +2.9pp amongst workers without children. We do not find differences between male workers without children and those with children aged below 10.151 It would seem that being a mother of small children also lowers a woman’s incentive to invest herself in training.

Our results indeed suggest that as a mother’s children become older, she may invest in train- ing which she missed when she when her children were younger.152 The coefficients become positive for women with children aged 11 to 18. While the effect is smaller for firm-sponsored training (+2.4pp), mothers of teenagers have participation rates for worker-sponsored training which are an additional 3.5pp higher than those of fathers with children of a similar age.

5.7.2 Turnover Probabilities

Gender differences in training could arise from differences in turnover patterns because firms may fear that investments in training are lost if a workers leaves after being trained (Royalty, 1996). To test whether different turnover rates between men and women drive our results, we looked at turnover patterns in WeLL establishments between 2006 and 2010. We ran a logit regression whether a worker leaves a WeLL establishment on a cubic in age (interacted with

151It is interesting that fathers of children aged 10-18 seem to participate somewhat less in training than fathers of younger children. 152Fitzenberger and Muehler (2014) do not find evidence for such catch-up effects after parental leave. They use four-year administrative personnel records of a multinational firm. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 176 being female), on tenure and tenure squared, occupational status and working hours (skilled or unskilled blue collar worker, white collar worker, part-time with or without top-up benefits, freelancer), schooling and being female. This gives us firm- and sex-specific turnover probabil- ities with may vary with age.

Next, we simply predict the turnover probability for each individual. Male workers have a lower turnover probability of than females (the median predicted turnover probability is 4.7% for men and 6.7% for women). Age differences between men and women are pronounced (conditional on tenure and job status): We find somewhat higher turnover probabilities for women in their early thirties while men leave WeLL establishments with a considerable higher probability in the age range of 40 and 55.

If we include this measure of firm- and sex-specific turnover probabilities, we see that main results remain robust. This indicates that differences in turnover rates are not the main driver of training differences.

Further Information

5.7.3 Population of WeLL-Establishments

The link to social security data allows us to observe employment biographies of all employees working at WeLL establishments between 2006 and 2010 (independent of being a survey par- ticipant or not). Employment biographies date back to the 1970’s for Western Germany and to the early 1990’s for Eastern Germany. They cover all employment spells subject to social se- curity (excluding civil servants and self-employed). This data contains information about daily wages, job status (part-time employment, white-collar, unskilled or skilled blue-collar work), education and qualification, firm switches. It provides the opportunity to calculate exact tenure and work experience for each worker.

We use this information to extract pre-survey wage fixed-effects, the amount of wage com- pression among full-time workers in an establishment and to calculate sex- and establishment- specific turnover rates. For the analyses based on the population of the WeLL establishments, we use information between 2000 and 2010 (depending on the analysis). The extraction of pre- survey wage fixed-effects is based on all employment spells; for the other measures we only refer to those workers actually working in one of the 149 WeLL establishments.

We drop individuals aged below 20 or above 60 and employees who work from home or who 5. GENDER DIFFERENCES IN WAGES AND TRAINING 177 have a daily wage of less than 1 Euro. This leaves us with a population of ca. 65,900 (unique) workers in 2006 whereof 54,300 are employed in one of the 149 WeLL establishments.153 The minimum number of workers in an establishment is ca. 100 and the maximum 2000.

Wages in the social security data are top-coded at the social security threshold. As in Dustmann et al. (2009), we calculate separate tobit regressions for men and women and for three differ- ent education groups independently for each year between 2000-2010 for full-time employees controlling for the state of the employer, for age, age squared (interacted with working in East- ern Germany), job status (e.g. part-time, white-collar/blue-collar, master craftsmen), highest schooling degree and occupation. We correct the educational information in the social security based on Fitzenberger et al. (2006).

5.7.4 Proxy for Unobserved Heterogeneity (Pre-Survey Wage Fixed-Effects)

To assess unobserved productivity differences between workers, we look at the wage biogra- phies of workers prior to the survey period (see also our discussion in Section 5.4.5) for another interpretation). We run a log-linear fixed-effects estimation of log daily wages on experience, experience squared, tenure and job status (e.g. blue-collar or white-collar, master craftsmen, part-time worker) for the period 2000-2006.154 In this analysis, we include not only survey participants in WeLL but also biographies of all other employees who work during the survey period in one of the WeLL establishments to increase the efficiency of our estimates. This leaves us with 437,793 observations.

We standardize the fixed-effects (for the total population) by subtracting the mean and dividing by the standard deviation. This gives a mean of standardized pre-survey wage fixed effect for the sample of survey participants used here of 0.42 for men and -0.12 for women (the median is almost equal to the mean). The standard deviation of fixed-effects is with 0.92 larger for women than for men (0.86).155

According to Wooldridge (2002), a proxy variable has to fulfil two properties. The first is ig-

153This figure is comparable to the population of WeLL employees from which survey participants for the first wave are drawn (Huber and Schmucker, 2012). 154We do not include further variables such as education, occupation or firm characteristics, since those are mostly time-invariant. 155The standard deviation of pre-survey wage fixed effect for the sample of survey participants considered in this analyses is smaller than 1. This is due to the fact that we drop certain individuals for the analyses who probably have quite distinct fixed-effects (very low or large) due to low wages (less than one Euro per day), low working hours (less than 15 hours a week), low or high age (20-60 to calculate the fixed-effects but 25-54 in the final analyses), foreigners etc. 5. GENDER DIFFERENCES IN WAGES AND TRAINING 178 norability (or also called redundancy), it requires the proxy variable z to be ignorable if the     unobserved productivity q would be available E y X, q, z = E y X, q . This requirement is comparable to an instrumental variable which should also be redundant in case the unobserv- ables would be added to the model. The second property requires the correlation between the unobserved variable q and each observable xj to be zero after conditioning on z. Wooldridge     (2002) describes this requirement via a linear projection L q 1, x1, ..., xk, z = L q 1, z . The latter property requires a high correlation between the proxy and the unobservable. This is different from the IV-approach where the correlation between the instrument and the unob- servable must be zero. We apply a two-stage bootstrap procedure to account for the fact that pre-survey fixed-effects are estimated from the data.

5.7.5 Training Definition

Basic information about training participation (start and end date) are available for all train- ing courses while detailed information has been collected for up to three most recent training courses per wave. Only 4% of workers in our sample participate in more than three training courses and we exclude those workers from our analyses because we miss information about the financing of the fourth and further training courses.156

We have limited the analyses here to the incidence in participation in training. Results are, however, very similar if we consider the number of training courses attended by a worker (up to three). We also find a large GTG in terms of the duration (in hours) of firm-sponsored training courses attended by male and female workers. Male workers participate on average 8.3 hours in firm-sponsored training and female workers only 5.7 hours. For the case of worker-sponsored training the picture is reversed, men attend these courses on average for 10.4 hours and women for 16.0 hours.

As noted above, we do not consider here training courses which are initiated by the worker and which take place completely in working hours (”shared investments”). The GTG in shared investment courses generally lies somewhere between that seen for firm- and worker-sponsored training and we do nob observe significant gender differences.

156We leave individuals who participate in more than three courses who stated that they participated both in firm- sponsored and worker co-financed training courses (among the three courses with detailed information) because here we do not face this uncertainty. Our previous research using the WeLL data showed that we get similar results about training patterns whether those workers are included or not (see Chapter 4). 5. GENDER DIFFERENCES IN WAGES AND TRAINING 179

5.7.6 Occupations and Job Tasks

To account for possible differences in the job content between men and women, we add infor- mation about the occupations and the tasks performed at work to our specifications. We define 17 occupational groups and 13 job tasks (se Table 1a).157 Information about job-tasks is very similar to those used by Black and Spitz-Oener (2010). Workers self-report job-tasks such as computer use, teaching, controlling machines or serving / accommodating. Respondents in the survey were asked whether they perform a task "frequently", "seldom" or "never". We consider here only job-tasks which are frequently performed.158 Table 1a shows that women perform more often interactive tasks such as consulting, teaching or serving while men are clearly more frequently involved in researching, production, monitoring, measuring or repairing. There are few differences in negotiating or informing tasks, among others.

157Occupations are aggregated from the German two-digit classification of occupations (KldB-1988) available in social security data. Not all occupations appear in an adequate number in our sample, e.g. agricultural or wood processing occupations are underrepresented. So we treat these occupations as "other". 158Rohrbach-Schmidt and Tiemann (2013) give an excellent overview about task measurement in different datasets. 6 An Investigation of Record Linkage Refusal and Its Impli- cations for Empirical Research

Arne Jonas Warnke (Centre for European Economic Research, ZEW)

JEL-Classification: C18, C83

Keywords: Linkage Consent, Consent Bias, Administrative Records, Record Linkage, Linked- Employer-Employee Data, Survey Data, Selection Bias

Acknowledgements: Special thanks go to Bernd Fitzenberger, Martin Kiefel, François Lais- ney, André Nolte and seminar participants at Humboldt University, ZEW Mannheim and in the workshop ”Nonresponse Bias: Qualitätssicherung sozialwissenschaftlicher Umfragen” at DIW Berlin. I am grateful to Antonia Entorf, Sven Giegerich, Cecilia Großmann, Timo Haller and Elsbeth Wright for their outstanding research assistance.

180 6. LINKAGE CONSENT BIAS 181

Abstract: Linking survey data to administrative records provides access to large quantities of information such as full employment biographies. Although this practice is becoming increas- ingly common, only a small number of studies in the field of social sciences have thus far inves- tigated the variables associated with linkage consent. These studies have produced diverging results with regard to the relevance of certain characteristics for the provision or non-provision of linkage consent. In this study, we analyze two comparable German datasets, thereby shedding new light on the possible reasons for previously inconsistent results. This is also the first study in which possible linkage consent bias is investigated in applied models, via the replication of an existing study for the sample in which respondents did not consent to data linkage. Whilst similar results are found between standard socio-demographic variables and linkage consent, there are considerable inconsistencies between the comparable datasets in terms of variables such as individual personality traits and work satisfaction. Overall, however, the results are promising – results do not differ much where respondents who did not provide linkage consent are considered. 6. LINKAGE CONSENT BIAS 182

6.1 Introduction

The linkage of survey data to administrative sources ("record linkage") facilitates access to reliable data relating to near complete employment biographies or extensive medical records. This in turn enables survey institutes to reduce the length of interviews and to gain access to large quantities of information which are generally relatively free of measurement errors. It is for these reasons that record linkage has become an increasingly popular tool in research and in medical research (e.g. Künn, 2015).

Data privacy laws in many countries require that polling institutes obtain prior consent to link survey data to administrative sources or to medical records. Such datasets span almost entire employment biographies, providing detailed information about periods of unemployment. Sim- ilar considerations apply to medical records, which often contain sensitive personal information. Whilst individuals who prefer to maintain their privacy may indeed be willing to participate in a survey interview, they may be reluctant to share such sensitive information with other parties. It is often the case that this reluctance is non-negligible; consent rates have been found to be as low as 40% (Bates, 2005). Final datasets in which survey data is linked to official records will not include information about respondents who do not consent to record linkage. Non-consent can therefore be viewed as a new form of non-response.

Several studies documented in the medical literature, as well as a number of studies carried out in the field of social sciences, have investigated linkage consent. These studies have so far indicated little consistency in terms of predictors of linkage consent. Whilst a number of standard socio-demographic characteristics such as respondents’ age or sex have been shown to have a statistically significant positive correlation with linkage consent in one study, the same predictors have been shown to have a statistically significant negative association in other studies (see Section 6.2). Furthermore, there is growing evidence that the transferability of results from the medical literature to social sciences, and vice versa, is limited (Jenkins et al., 2006).

This study seeks to shed light on the possible reasons for inconsistent findings on predictors of linkage consent, as documented in the literature. To this end, we compare two very similarly structured datasets from the same country. In the two datasets, both of which were collected in surveys conducted by the same polling institute, workers in different establishments were asked questions about work-related aspects relevant for social science research. We first use the same set of controls for both datasets, thereby confirming that the two studies are broadly comparable. 6. LINKAGE CONSENT BIAS 183

Secondly, we add further variables to the datasets to see whether varying the set of controls gives rise to inconsistent results. These additional variables are not necessarily available in both datasets and include psychological attributes as well as job and firm characteristics. In a subsequent step, we make use of the matched employer-employee structure of the available data to investigate whether the individual consent decision is driven by the work environment.

Finally, we hope to answer the question of whether similar results would have been obtained if the analysis had not been restricted to the sample of respondents who provided linkage consent. Suppose our target population consists of all survey participants regardless of their linkage consent decision. Our sample consists of respondents from whom linkage consent was obtained. Applied researchers are mostly concerned whether one can use the sample at hand to derive consistent estimators for statistics of the target population without strong assumptions (see, for example, Solon et al., 2015). This is the case if the association between linkage consent decision and the outcome of interest depends only on observable characteristics. In this case, it is possible to derive consistent population statistics from the sample by adding these observable characteristics to the regression or by weighting. In contrast, if unobserved heterogeneity is correlated with linkage consent (and the outcome) it is necessary to make stronger assumptions.

We investigate this question by testing whether results for two economic models would have been different if data on individuals who refused to provide linkage consent had in fact been available. The first model is a replication of our own earlier research on participation in job- related training. In the original study, we excluded information on survey participants who did not provide linkage consent. In this study we replicate our original results using the survey data only. We investigate whether we would have drawn different conclusions if we had included survey participants who did not provide linkage consent (the non-consent sample). In the second model, we estimate an augmented Mincer-regression, a "cornerstone of empirical economics" (Heckman et al., 2003), to see whether different samples give different findings on wage returns to human capital investments.

In general, our findings regarding the role of (denied) linkage consent for applied research are rather promising. It is clear that linkage consent is not independent of many individual characteristics. In particular, younger and white-collar workers (in Germany) are more reluctant to share their administrative data. The decision to provide linkage consent is closely related to respondents’ willingness to participate in future interviews or to select non-response when they are asked to report their income. This does not, however, seem to translate into a large bias in economic models in our two applications. Looking at the non-consent sample provides 6. LINKAGE CONSENT BIAS 184 us with more or less similar results. Furthermore, while there is some form of establishment heterogeneity in terms of employees’ tendency to consent to data linkage, this can mostly be explained by observable differences between workforces. The location of a firm for example, seems to be a determining factor for linkage consent. The workforces of establishments located in East Germany for example, exhibit much higher consent rates than the workforces of similar establishments in West Germany. Workforces consisting of higher numbers of white-collar workers, however, tend to have lower consent rates.

The structure of the paper is the following. In the next section we present the literature concern- ing linkage consent. In Section 6.3 we introduce the research questions to be answered in this study. Section 6.4 describes the two datasets used here. Results are presented in Section 6.5 to 6.6 before we finally conclude.

6.2 Literature

Antoni (2011), Sakshaug et al. (2012) and Sakshaug and Kreuter (2012) review the literature on linkage consent bias and show that relevant predictors vary considerably between studies. Their reviews include studies which found significant positive, as well as significant negative associations between linkage consent and characteristics such as age, level of education, sex and income. We give a non-exhaustive summary of recent studies in Table 6.1.

The exact reasons for these inconsistent results regarding predictors of linkage consent remain largely unknown. Inconsistencies may be due to differences in the record domain, the design of surveys, the way in which consent questions are worded, differences between the popula- tions surveyed or variations in the analyses of linkage consent patterns. Jenkins et al. (2006), for example, have looked at multiple consent questions (administrative as well as employer’s records) using a special follow-up wave of the British Household Panel Survey (BHPS).159 They have thereby shown that predictors of linkage consent differ widely between different consent requests (or record domains). The authors find that only one variable, the household context, is associated with linkage consent in both record domains. Due to the (additional) issues arising from comparing linkage consent across record domains, this study focuses on the literature in the field of social sciences. For a review of the literature on consent decisions in the medical field, we refer to Kho et al. (2009).160

159This is one of the studies reviewed by Sakshaug et al. (2012). 160Kho et al. (2009) review 17 unique studies from different countries and find inconsistent results for linkage consent patterns with regard to age, sex, race, education, income, or health status. 6. LINKAGE CONSENT BIAS 185

A number of studies have investigated the total effects of non-consent and non-response on the representativeness of final survey data. Sakshaug and Kreuter (2012) analyze the IAB PASS- data and show that imbalances between the survey and the population seem to be small both in absolute terms and relative to classical non-response and measurement error. Measurement error in survey data is an issue which can often be mitigated by imputing data such as income from official records. The authors therefore consider the process of merging survey data with administrative records to be more reliable than directly asking respondents to provide similar information themselves.

Warnke (2015) provides an overview of non-response and non-consent patterns in the IAB WeLL-data (see Section 6.4). Using social security information, he assesses the total bias due to non-response and non-consent by comparing final survey participants willing to share their data, with the general workforce at 150 establishments. The results show that over a period of ten years survey participants have moderately higher wage growth than the average of the workforce of those establishments. However, this does not bias the estimation of ‘returns to education’ (the wage premiums that more highly educated individuals receive). Sakshaug and Huber (2015) have independently compared non-response and linkage consent bias across dif- ferent waves of the WeLL data. Bias is here defined as differences between survey participants and the population as a whole (also taking into account the consent decision) with respect to sex, whether participants are over 55, non-German citizenship, a low level of education, em- ployment status and whether participants are working for a low wage. The authors find a modest non-response bias for two variables – a low wage and a low level of education. Of greater con- cern are increasing trends in a non-response bias over time. When comparing different forms of non-response bias, the authors find that bias due to linkage consent is small compared to classic non-response and measurement error bias. They also find that linkage consent bias decreases over time.

This is the first study to compare associations with linkage consent across two samples. We in- vestigate whether systematic differences remain when two similar datasets, which are as broadly comparable as possible, are analyzed. We thereby try to shed new light on the reasons for the inconsistencies regarding predictors of linkage consent.

Finally, we go beyond comparing a sample of respondents who did provide linkage consent with a non-consent sample (or the population) with regard to observable characteristics such as age or level of education. Applied researchers are generally more concerned with non-ignorable non- consent. This is because regression models (or matching approaches) give representative results 6. LINKAGE CONSENT BIAS 186 if a sample differs from the population only with respect to variables which are observable to the researcher.161 To examine the role of non-ignorable non-consent, we replicate an existing study for the sample of individuals who did not provide linkage consent.

6.3 Research Questions

We begin by investigating predictors of linkage consent and compare our results to those in the literature. We will contrast predictors of linkage consent across two comparable surveys (see Section 6.4) to determine if and to what extent there are diverging results between two compa- rable German datasets. The interviews conducted to collect the data contained in these datasets were carried out by the same polling institute. Both datasets concern similar populations and have a comparable structure. Participants in the surveys were asked to provide consent for linkage of their survey data to the same administrative records in an almost identical manner. Furthermore, to make the analyses of consent patterns as comparable as possible, we start by using the same set of controls. We therefore expect similar results between both datasets.

Research Question 3 Which control variables predict linkage consent? Are there any differ- ences between the two surveys?

As shown in Section 6.2, the literature has found little consistency with respect to predictors of linkage consent (see also Sakshaug et al., 2012). Linkage consent patterns differ between stud- ies for several important socio-demographic variables, such as age, level of education, income or sex. In Research Question 4, we want to investigate possible reasons for the inconsistent results found so far.

We are particularly interested in whether the choice of control variables matters. Standard, com- parable socio-demographic variables are available for both datasets. However, further predic- tors for linkage consent, as discussed in the literature, are often not available for both datasets. Amongst other characteristics, these variables include personality traits or job attributes of the respondents. In this research question, we will therefore investigate whether including an ex- tended set of further controls results in different conclusions being drawn with regard to standard variables. This may be one reason for the inconsistencies which have been found and discussed in the literature (see Section 6.2). We are particularly interested in what economists call non- cognitive skills such as risk aversion, or personality traits which differ between groups of a

161To give an example, let us assume that individual motivation is a driver for linkage consent decisions. If motivation is also related to both of the right-hand-side variables and to the outcome in an empirical model, there is a classic omitted variable bias. In this case therefore, coefficients in applied models are not consistently estimated. 6. LINKAGE CONSENT BIAS 187 different age and education level. The inclusion or omission of these variables could therefore shed new light on predictors of linkage consent.

Research Question 4 What reasons are there for the diverging results regarding predictors of linkage consent found in the existing literature? Does the choice of control variables matter?

In a third question, we investigate whether response rates depend on the work environment (the firm a worker is employed by). Social sciences use increasingly linked employer-employee data in the context of inequality, globalization or innovation, among others (see Hamermesh, 1999). Little is known so far about whether co-workers within a firm who are interviewed at home, behave similarly when it comes to linkage consent. Analyses based on merged firm- worker surveys could be biased if there is a substantial intra-firm correlation in the response behavior. This could be the case, for example if individual survey-response is related to both firm effects and the outcome. Al Baghal et al. (2014) show that household members give similar consent decisions but, to our best knowledge, no information is available with respect to possible workplace heterogeneity.

Research Question 5 Do workers of the same establishment respond similarly to the linkage consent question when asked in a telephone interview at home?

The survey methodology literature bias primarily refers to imbalances in the distribution of observables such as age, education level or sex, between the available sample and the general population. Rubin (1976) terms this missing-at-random (MAR), and this can be formalized in the context of linkage consent as follows. We assume that X is the set of observed control variables or predictors which are available to all survey participants independent of the consent decision. Let 1refuse be an indicator variable which takes the value one if an individual did not consent to data linkage. Let Y be additional data from social security. Y includes the outcome of interest and it is observed only for the individuals from whom linkage consent was obtained. We therefore partition Y as Y = Y consent,Y refuse. Missing data is MAR if the following equation holds.162

P (1refuse | Y consent,Y refuse,X) = P (1refuse | Y consent,X) (6.1)

162This simple formula can of course be extended to more complicated patterns of missing data such as censoring (if X is always observed but y only up to a threshold) or truncation (if (X, y) is only observed for certain ranges of y). 6. LINKAGE CONSENT BIAS 188

Equation (6.1) says that the probability of missingness depends only on X and Y consent. Equa- tion (6.1) is violated if missingness depends on information which is not available (Y refuse).163 MAR is closely linked to the concept of ignorability (see Gelman et al., 2014). MAR is often of little concern to researchers in the field of applied sciences. This is because X is usually controlled for in multivariate analyses. Researchers in this field are therefore more concerned with missing-not-at-random (MNAR) or with missingness which depends on unobservable het- erogeneity which violates Equation (6.1). To give an example for which MAR does not hold, let us suppose that we are interested in measuring the wage effects of attending a job-related training course (see Section 6.6.2). Let us assume that individuals with higher earnings are less likely to provide linkage consent but more willing to participate in training. If training is associated with earnings, our data is not MAR anymore. In this case, we would estimate biased results for the wage returns to training if we restrict our analyses solely to the individuals who provided linkage consent (”non-consent bias”). Statistical techniques to deal with MNAR data are among others the Heckman selection approach (Heckman, 1979) or pattern-mixture models (Little, 1993).

We investigate whether the MAR assumption is justified in two economic applications in Sec- tion 6.6. To check the MAR assumption, we compare estimated regression results from two samples, individuals who provided or denied linkage consent. This is, however, not a full proof. Misspecification could give constant regression coefficients even in the presence of MNAR. Nonetheless, if we find similar results between both samples, it indicates that MNAR is at least not a severe issue for these applications.

Research Question 6 Do we find non-consent bias in economic models? Is missing informa- tion due to linkage consent “missing-at-random”?

Several studies carried out in different contexts have analyzed whether non-response and/or attrition in panel contexts is ignorable. Fitzgerald et al. (1998); MaCurdy et al. (1998) for example, have considered earnings regressions and van den Berg et al. (2006), unemployment durations. These studies generally find that non-response bias is rather small in magnitude. However, not all researchers agree with this view. In a recent working paper for example, Heffetz and Reeves (2016) found that official government statistics in the US, including the unemployment rate and labor force participation, depend on the ease or difficulty of contacting

163An even stronger assumption regarding missing data is missing-completely-at-random where P (1refuse | Y consent,Y refuse,X) is assumed to be constant, e.g. independent of observable and unobservable variables. 6. LINKAGE CONSENT BIAS 189 a respondent. To the best of our knowledge, no such evidence is available with respect to linkage consent.

6.4 Data

We compare two longitudinal, linked employer-employee datasets which have a similar struc- ture. The first dataset is called WeLL (Berufliche Weiterbildung als Bestandteil Lebenslangen Lernens, which might be translated as "further training as a part of lifelong learning", see Huber and Schmucker, 2012). The second dataset is the Linked Personnel Panel (LPP, see Bellmann et al., 2015). The data contained in both of these datasets has been collected by the Research Data Centre of the Federal Employment Agency at the Institute for Employment Research, in cooperation with the polling institute infas. Table 6.2 shows basic information for both datasets.

WeLL is a four-wave panel which was conducted annually between 2007 and 2010. LPP is (currently) a two-wave panel, which was run in 2012 and 2014. Both surveys began with a firm survey (the relevant business units were establishments).164 For this study, establishments were drawn from the IAB Establishment Panel (an annual employer survey, see Kölling, 2000, for more information) in 2005 (WeLL) and 2011 (LPP). Only establishments with at least 50 em- ployees subject to social security contributions were eligible. For WeLL, the effective minimum number of employees is 100 and there is an upper limit of 2,000 employees. In a second step, outlined in detail in Sections 6.8.3 and 6.8.4, employees who work in these establishments were invited to participate in the employee survey.

In both datasets, survey data was merged with German social security records. These records contain employers’ reports about all employees who are subject to social security contributions. These reports are relevant for health insurance, the statutory pension scheme and unemployment benefits (Dorner et al., 2010). The reports include information about individuals’ daily wages, periods of employment, citizenship, educational attainments and occupation. This information is combined with data from the Federal Employment Agency concerning social security benefits and periods of unemployment. At the end of the interview, individuals were asked whether they would agree to their data being linked to "other data [...] available at the Institute for Employment Research in Nuremberg", as outlined below. Only information on individuals who agreed to their survey data being linked with social security records was available to researchers

164We do not consider the establishment survey as data privacy laws mean that we are not entitled to link the survey information for the non-consent sample to the establishment sample. Moreover, the establishment survey has not been made public in the case of WeLL. 6. LINKAGE CONSENT BIAS 190 using both survey data and administrative records (what we call the consent sample).165

In the following we will describe both employee surveys. These surveys are publicly available to the scientific community. Establishment information is available for those respondents who consented to their data being linked to the IAB Establishment Panel (and through the employer survey in the case of LPP). For WeLL we have further information relevant for the stratification of all respondents (see Section 6.8.3). Since other establishment information is not available for the non-consent sample, we limit the following description to the employee survey.

The WeLL dataset focuses on job-related training and consists of data relating to 149 estab- lishments and approximately 7,900 individual survey respondents. The LPP dataset focuses on human resources and management practices, job quality and corporate culture. In terms of its hierarchical structure, this dataset is very similar to WeLL, but it differs with regard to the ratio of workers to establishments. The number of establishments is more than six times that in WeLL (980 establishments), while the number of employees is approximately 50% higher than in WeLL (7,508 in the first wave plus 3,987 new employees in the second wave). The number of survey participants per establishment is therefore considerably lower in LPP than in WeLL (the median number of first-time survey participants per establishment in the final dataset is 25 in WeLL and 8 in LPP).

Both datasets sample full-time, part-time or marginally employed workers excluding appren- tices and workers in partial retirement. We also exclude individuals for whom information on key variables such as education level, working hours or nationality is missing. We thereby re- strict the sample to individuals who are still employed by the establishment (at least prior to the most recent interview). This leaves us with 11,385 first-time respondents in LPP and 5,753 first-time respondents in WeLL. We focus on first-time survey participants to investigate the ini- tial linkage decision (with the exception of Section 6.6.1). We do this for two reasons. Firstly, considering multiple interviews per survey participant would mix results for linkage consent and panel non-response. Secondly, some individuals revised their consent decision – but only in one direction. Once individuals had agreed to record linkage, they were not consulted about linkage consent again in subsequent waves of the survey. It was only participants who did not give initial consent who were again asked at a later interview. As a consequence, there are no individuals who withdrew their consent. There are, however, a number of survey participants who initially declined to give consent, but who later changed their minds (almost half of respon- dents who initially refused to provide linkage consent are included in our final WeLL sample).

165In the case of WeLL, the merged data is called WeLL-ADIAB (Schmucker et al., 2014). 6. LINKAGE CONSENT BIAS 191

By restricting the (main) analyses to first-time participants we take this asymmetry resulting from the interview design into account. If we do not exclude panel participants, the results are generally similar. In Section 6.6.1 for example, we consider panel respondents in a replication of a study using WeLL.

Age is available in only four categories in WeLL. In contrast, LPP contains this information by year. In order to make the analyses between both datasets comparable we define similar age categories in LPP.

Table 6.3 shows descriptive statistics for all binary variables in LPP and WeLL.166 Average con- sent rates amongst the first-time survey participants in our sample are 82% (LPP) and 94% (WeLL). The fact that the consent rates are above 80% means that they are relatively high com- pared to those of other datasets (linkage consent rates vary between 24% and 89% in the non- exhaustive review in Sakshaug and Kreuter, 2012). Besides higher average consent rates, we find for WeLL also a greater willingness to participate in future waves and lower item-non- response (for the question about net income). Compared to WeLL, the LPP-data is on average older, and there are fewer female respondents. While 80% of the participants in the WeLL survey state that they are in good or very good health, this is true for only 60% of the LPP-respondents (the age structure is probably an important reason for this difference). In WeLL, we find both more low-educated and highly educated workers (without a vocational qualification and with a tertiary degree respectively) but fewer workers with a vocational qualification.

Detailed information on WeLL and LPP is given in the Appendix 6.8.3 and 6.8.4.

6.5 Predictors of Linkage Consent and Establishment Heterogeneity

6.5.1 Predictors of Linkage Consent

In Research Question 3 we look at possible predictors of linkage consent in LPP and WeLL. We will begin by comparing a selected set of variables which are available in both datasets. The variables used were chosen on the basis of information provided in the existing literature on non-response and linkage consent.167 The selected variables include socio-demographic and job characteristics as well as personality traits, self-reported health and work satisfaction.168

166We standardize all non-binary variables by subtracting the mean and dividing by the standard deviation. 167In the Appendix, we document a lasso variable selection approach (mainly for WeLL) in order to test the robustness and the relevance of the variable selection. 168Other studies such as Jenkins et al. (2006) or Sakshaug et al. (2012) have also included income as an additional predictor. We will refrain from using income as a predictor because item non-response is high, in particular among 6. LINKAGE CONSENT BIAS 192

The results are provided in Columns 3 and 4 of Table 6.4. To ease interpretation, non-binary variables, such as a proxy for labor attachment and the item regarding work satisfaction, are standardized by subtracting the mean and dividing by the standard deviation.

We estimate a random effects logit model including establishment random effects. This model allows us to capture possible intra-firm correlation in the linkage consent decision. Previous studies have considered the role of interviewers (Sakshaug et al., 2012) and/or households (Al Baghal et al., 2014). There is no information available, however, indicating whether em- ployees who work for the same establishment tend to reach a similar decision with regard to linkage consent in a telephone interview in which they participate from home.

  −1  P Linkage Consenti αi, Xi = logit αi + Xiβ (6.2)

We model the linkage consent decision of individual i in the respondent’s first interview. X includes varying sets of predictors such as wave-specific consent rates, socio-demographic or job characteristics, as listed in Table 6.4. α are random effects for the establishment in which a 2 worker is employed (we assume α ∼ N (0, σα)). We will start by discussing the role of certain observable predictors (X). At the end of the section we will also briefly describe the role of establishment heterogeneity.

Our findings demonstrate that linkage consent is strongly related to age. Older employees are more likely to consent to data linkage than their younger counterparts. This tendency is particularly noticeable in the LPP-data. Amongst the variables included in Columns 3 and 4 in Table 6.4, it is by far the age variable, or more specifically, being over 55, which is the most important predictor of consent linkage (according to the size of the coefficient). The size of the coefficient is comparable across both surveys. The results gained for each of the surveys, however, diverge in terms of whether respondents aged between 35 and 45 were more or less likely to give consent than younger adults.169 See below for a discussion of possible reasons for these findings regarding age and possible cohort effects.

Next, we come to respondents’ level of formal education. Existing studies show inconsistent results regarding the role of (higher) education when it comes to linkage consent (see Table

respondents who do not provide linkage consent. Furthermore, in WeLL only respondents who did not give their consent to data linkage have been asked to state their gross income. For the sample of respondents who did provide consent, gross income was taken from social security records. For LPP we find that gross wage is not significantly related to linkage consent and it does not alter the results for the other variables. 169The difference in consent rates for individuals aged 35-45 is significant according to a Z-test (p=0.043). 6. LINKAGE CONSENT BIAS 193

6.1). Comparison of the two surveys considered within this study has revealed lower consent rates amongst individuals who have completed tertiary education in the LPP-data. This is not the case in WeLL (however, according to a Z-test, p=0.18 this difference is not statistically sig- nificant). In previous research which we carried out for WeLL (Warnke, 2015), more highly educated survey participants exhibited significantly lower linkage consent rates than less highly educated respondents. The discrepancy with our previous results might be explained by the fact that the samples used in the respective studies were selected according to different selection criteria and a different variable selection. In this study, we have focused exclusively on em- ployed individuals in their first interview and have also included an indicator for white-collar workers.170

We now turn our attention to the jobs which are typically performed by more highly educated workers. Interestingly, we find much lower consent rates in both surveys for workers employed in white-collar jobs. Such positions generally presuppose that workers hold a tertiary degree: 95% of respondents holding a tertiary degree are employed as white-collar workers. In contrast, ‘merely’ 57% of those who do not hold a tertiary degree are employed in such a position. This may indicate a tendency towards increased reluctance to share data amongst the more highly educated workforce. This in turn may be explained by the greater concerns which more highly educated individuals generally have with regard to data privacy (Sheehan, 2002).

Next, we look at further socio-demographic characteristic considered in this study. We see virtually no differences between the consent rates of men and women throughout the entire sample. In addition, with regard to the household context, few predictors have been found to be significant. Neither marital status, nor whether the respondent has children has been found to be a significant predictor of linkage consent behavior. For this reason, we do not include them in the main specifications presented here.171 Respondents living alone represent an exception to this rule; these individuals are considerably less likely to consent to data linkage.

With regard to job characteristics, we identify higher consent rates for individuals performing shift work in both surveys (this tendency is significant in LPP only). In addition, there are significant differences in the consent decision made by workers with and without personnel responsibilities in WeLL. Whilst approximately 29% of workers in both surveys have managerial responsibilities, such individuals included in LPP do not show higher consent rates (significant

170If we adapt the sample selection criteria by including unemployed or self-employed individuals and use a similar set of variables as in Warnke (2015), we are able to confirm our earlier results. 171Further results are available upon request. 6. LINKAGE CONSENT BIAS 194 with p=0.06 according to a Z-test). Having said that, this finding might be a statistical artefact arising due to multicollinearity. The lasso approach, as described in the Appendix, suggests a much lower coefficient for WeLL.

We now consider the so-called Big Five personality traits and the more subjective items con- cerning job insecurity and work satisfaction. Sala et al. (2012) identified no link between the Big Five personality traits and linkage consent for the United Kingdom. Looking at the LPP- data, we come to a similar conclusion with regard to Germany. According to an F-test (p=0.83), all Big Five personality traits are jointly insignificant in this sample. If we consider the WeLL data, we do find that coefficients are in general of a larger magnitude. There is a significantly positive correlation between conscientiousness and linkage consent. The opposite is true of extraversion. The coefficients for conscientiousness differ significantly between both surveys (p=0.02, Column 3 and 4). Both surveys use the same Big Five items and we accordingly find very similar relationships between personality traits and tertiary education, for example.172 Nonetheless, different conclusions can clearly be drawn with respect to personality traits when considering WeLL as opposed to LPP.173

If we add further controls which are not available in both surveys, findings generally remain quite similar. Such variables include risk and trust for LPP (Column 5) and establishment characteristics, voluntary work and a proxy for labor attachment for WeLL (column 6). Few of the predictors discussed above are altered to any meaningful extent when further controls are added. We also added indicators of individuals’ willingness to participate in future waves of the survey and an indicator of whether information about net income was available or not. We see that the provision of linkage consent is closely associated with item non-response with regard to net income and individuals’ readiness to take part in a future survey with similar magnitude across both datasets.174 29.1% of the individuals who did not consent to data linkage also did not provide information about household net income in LPP, but we wage information is missing for only 9.3% of the individuals in the consent sample. In terms of rates of linkage consent provision, there are no large differences between individuals who refuse to state their net income and those who do not know it. We therefore considered such individuals in a single

172Workers who hold a tertiary degree tend to have a conscientiousness score which is 2.2 (1.8) standard devia- tions higher than those with no vocational education in LPP (WeLL). 173Again, results are similar if we use only the Big Five personality traits without including any further controls as predictors. 174Jenkins et al. (2006) have found that item non-response is related to rates of consent linkage where the item concerns permission to contact the respondent’s current employer. No relation is found however, if the item concerns permission to request the respondent’s national insurance number or permission to access administrative tax and benefits records. 6. LINKAGE CONSENT BIAS 195 group.175

After controlling for risk aversion and trust, the confusion surrounding the role of personality traits and other psychological attributes for linkage consent increases. Both variables are signif- icantly related to linkage consent with higher rates of consent seen amongst more risk-friendly and confident individuals. In LPP, we found that conscientiousness had a significantly negative correlation with linkage consent. After including the further controls mentioned above, it seems that individuals who score more highly with regard to openness to new experiences, are more likely to consent to data linkage. These changes can be explained by the association between personality traits and risk aversion and trust.176 We find the coefficients of both personality traits to be significantly different to our findings in WeLL (where the inclusion of further vari- ables leads to very similar conclusions as before). In addition, the association between work satisfaction and the provision of linkage consent remains elusive. While more satisfied individ- uals are significantly more likely to provide consent for data linkage in LPP, we find a negative, albeit insignificant correlation in WeLL.177

6.5.2 Explanations for Inconsistencies Regarding Predictors of Linkage Consent

With regard to the reasons for the diverging results regarding predictors of linkage consent (Research Question 4), we have so far seen that results for socio-demographic variables are quite consistent and that they remain largely unaffected by the inclusion of a different set of controls. This is also true if we add further variables such as job tasks or household information. Psychological attributes and more subjective items such as work satisfaction are relevant for linkage consent. The interpretation of these predictors seems to depend on the inclusion of similar controls and we find here different results between both surveys.

In the WeLL-data (last column in Table 6.4), we find that individuals working in East Germany tend to agree much more often to their data being linked to social security information than individuals working in West Germany. The magnitude of this relationship is large and is com- parable to the age gradient in linkage consent seen in individuals aged above 55 or below 35. A similar association has been found by Antoni (2011) and Korbmacher et al. (2013). If we assume that many of the workers concerned grew up in East Germany, this finding seems at

175Information regarding the reasons for item non-response is only available for LPP. 176In LPP, individuals who are more open to risk are on average less conscientious and, surprisingly, less curi- ous. More confident individuals meanwhile, exhibit greater levels of conscientiousness and are more open to new experiences. 177According to a Z-test, p=0.07, the difference in coefficients for work satisfaction is significant. 6. LINKAGE CONSENT BIAS 196

first to be somewhat counterintuitive.178 In the former German Democratic Republic (GDR), everyday life was subject to active surveillance, which we might justifiably presume would lead to heightened privacy concerns amongst the East German population. Furthermore, it is also known that the East German population generally exhibits lower levels of trust (Rainer and Siedler, 2009), a predictor which is positively related to linkage consent, as seen for the LPP- data. We do not have a trust measure in WeLL but we might expect to find on average lower consent rates in East than in West Germany as a result of the omission of this variable. However, opposing arguments can also be made. In the 1980s for example, there was an intense debate about privacy concerns in West Germany which lead to fierce opposition to the census carried out in 1987. Such experiences may have changed attitudes towards privacy in West Germany. Future research may look at the role of, on the one hand, trust or cohort experiences and, on the other hand, linkage consent.

If attitudes have changed over time due to experiences such as growing up in the GDR, this could mean that our age gradient in fact reflects cohort effects. The short time span of the two surveys means that we are unable to distinguish between age and cohort effects (and possible additional period trends). There are, however, no differences in the age-consent pattern between workers employed in establishments in East and West Germany in WeLL (according to a F-Test, p=0.24, see also Table 6.5). This could indicate that in Germany at least, reluctance to consent to data linkage to social security records declines as age increases. Such a finding has indeed been outlined at the beginning of this section. Considering the findings of earlier social science studies, this pattern is somewhat surprising.179

If we now turn our attention to differences in linkage consent rates between the first and subse- quent waves of the survey, we find that individuals in the first wave more often consent to data linkage. The tendency toward lower rates of consent in later waves of the survey is particularly noticeable in WeLL where the survey wave is in fact the most important predictor of linkage consent (with the exception of item non-response and willingness to participate in future waves of the survey). One possible explanation for this finding is that more respondents participate in subsequent waves of the survey who had been initially reluctant to take part. These individuals are presumably also less willing to give their consent for data linkage (see also Heffetz and Reeves, 2016). This pattern should be more pronounced in WeLL than in LPP. In WeLL, there

178Korbmacher et al. (2013) show that higher consent rates in East Germany are mostly driven by individuals who have lived in the former German Democratic Republic. 179In the medical literature, however, some studies have found higher consent rates amongst older individuals, whilst other studies have found no differences with respect to age, see Kho et al. (2009). 6. LINKAGE CONSENT BIAS 197 are on average fewer workers per establishment who had not been contacted, or who had not responded in the first wave of the survey. It might also be the case that the wave-specific consent rate in WeLL has been overestimated due to multicollinearity and small sample sizes. In the Appendix, we provide a regularized lasso approach which is better suited to multicollinearity (Tibshirani, 1996). This method suggests much weaker differences between the first and subse- quent waves (but similar magnitudes for other predictors such as age and item non-response).

Our analyses indicate that both psychological attributes and job characteristics play an impor- tant role in an individual’s decision whether or not to grant permission for data linkage. Psy- chological attributes include personality traits, the degree of risk aversion, trust and subjective measures such as work satisfaction or expectations regarding future labor market activity (or labor attachment). Job characteristics included in this study include white-collar occupations, managerial responsibilities, shift work or certain job tasks. Our study suggests that the role of environmental indicators, such as our indicator for East or West Germany, should be the object of greater research. This is a potential source of the inconsistencies seen in the literature. In the WeLL for example, we find that female respondents employed in East Germany are sig- nificantly more likely than their male colleagues to grant consent for data linkage. No such difference relating to gender is identified in Western Germany.180 Similar results are found for shift work which strongly correlates with the provision of linkage consent in establishments in East Germany only. These results reflect what we have found in the LPP data.

6.5.3 Information on Establishment Heterogeneity

We next consider possible establishment heterogeneity (Research Question 5). We thereby wish to investigate whether respondents employed by a particular firm reach a similar decision regarding data linkage consent when asked in a telephone interview. If the work environment plays a role for the individual linkage consent decision in a telephone interview at home, this could matter for the analyses of linked employer-employee data. Suppose, for example, that workers who fear they might lose their jobs do not provide linkage consent in private firms but are more inclined to give linkage consent when working in the public sector (or in other firms with stronger job protection). This could bias studies analyzing public sector motivation, for example. Furthermore, it could possibly also help to explain the inconsistencies regarding predictors of linkage consent found in the literature.

180Results are based on a regression including the same set of controls as those listed in Columns 3 and 4 in Table 6.4 separately for establishments located in East and West Germany. Results are available upon request. 6. LINKAGE CONSENT BIAS 198

Establishment heterogeneity can be assessed by the variance of the establishment random ef- 2 fects terms (Table 6.4, Row σα). Aggregated consent rates between establishments vary much more than we would expect given pure random variation (taking into account the fact that sam- ple sizes differ) as shown in columns 1 / 2. The variance of the establishment random ef- 2  2  fects is comparable between WeLL 100·Var(σFirm) = 2.74 and LPP 100·Var(σFirm) = 4.32 . According to a (conservative) likelihood-ratio test this heterogeneity is significant for both datasets.181.

Establishment heterogeneity can also be evaluated via a simple Monte-Carlo simulation (see the Appendix 6.8.2). Here, we model the distribution of (aggregated) mean consent rates by establishment. This distribution is clearly non-normal for two reasons. First, the number of respondents per establishment varies widely (in particular in the LPP data). Second, average consent rates are by construction limited to [0, 1]. In the Monte-Carlo simulation, we assume that in the absence of intra-establishment correlations with respect to linkage consent, average consent rates represent series of Bernoulli trials. In Figure 6.1, we compare the empirically observed distribution to 10,000 simulated draws.

If we include individual-level controls (Column 3/4 in Table 6.4), we see that the variance of the establishment random effects is reduced by half in LPP and is close to zero in WeLL. This indicates that there is some amount of establishment heterogeneity which can be primarily explained by differences in observable characteristics between the respondents employed by different firms.

For WeLL we find that an indicator for East Germany can alone explain most of the estab- lishment heterogeneity.182 We therefore run two separate logit regressions for East and West Germany (without random effects). This allows us to investigate whether predictors of link- age consent are consistent between East and West Germany. The results are shown in Table 6.5. Interestingly, we often find quite different results for workers employed in East German establishments compared to respondents who work in West Germany. We find significant dis- crepancies between female respondents working in establishments located in East and in West Germany (p-value 0.08).183 It is only female workers in establishments in East Germany who exhibit higher consent rates than their male colleagues. We also find that the positive asso-

18195-th confidence intervals are [0.002, 0.40] in WeLL and [0.12,0.16] in LPP. 182Due to data anonymization, we do not have information relating to region in LPP and therefore cannot say whether this is also the case in this sample. 183The p-value is calculated by running a joint regression in which we interacted all variables with an indicator for working in an East German establishment. 6. LINKAGE CONSENT BIAS 199 ciation between shift work and linkage consent is driven entirely by the consent behaviors of respondents employed in establishments located in East Germany.184 Furthermore, there are other notable differences, for example, regarding part-time (which is negatively associated with linkage consent only in East Germany) or for less-educated workers (who tend to provide more consent in East Germany but somewhat less in West Germany). These differences are, however, not significant according to conventional levels (p-value 0.17 for part-time and p-value 0.18 for low educated).

6.6 Bias in Economic Models

In this final section we wish to shed light on possible unobservable heterogeneity between survey participants who consent to data linkage and those who do not (Research Question 6). In order to assess whether non-consent can be viewed as missing-at-random (MAR), we look at two economic models. The first application concerns participation in job-related training while the second looks at the wage effects of human capital investments. We test the MAR assumption by comparing regression results for the sample of respondents who gave linkage consent with those who did not provide linkage consent. If, on the one hand, both groups differ in unobserved variables which are correlated with both other predictors and the outcome of interest, this will generally give inconsistent regression coefficients. This would be a strong indication that missingness due to linkage consent is not-missing-at-random (MNAR). On the other hand, if regression coefficients are stable across groups, it is reassuring that MNAR plays only a limited role for these applications. It should still be noted that misspecification or other issues could give stable coefficients even if linkage consent is MNAR.185

6.6.1 Replication of Chapter 4

The first analysis is a replication of our own previous work in Chapter 4. In that paper, we analyzed workers’ participation in training using a matched employer-employee dataset. We explained to what extent training rates differ between workers within the same firm and between workers employed in different firms. For this purpose, we made use of the WeLL-ADIAB-data (Schmucker et al., 2014). WeLL-ADIAB links the WeLL survey data to social security data.186

184The interaction term for shift work and East Germany is significant (p-value 0.06). 185We also carefully look at the variance explained for different groups, as shown to be important to check coefficient stability (Oster, forthcoming). 186Social security records are available for all employees of the establishments participating in WeLL. 6. LINKAGE CONSENT BIAS 200

This dataset includes for example, full employment biographies (subject to social insurance contributions). It includes information about wages, periods of unemployment and levels of education. We used the social security data to measure, in particular, firms’ rates of wage compression. This is an important variable discussed in the theoretical training literature.

In WeLL-ADIAB, survey data is available online only for those respondents who consented to data linkage. For the following replication, we use the WeLL survey data and re-run our original analyses based on WeLL-ADIAB for this sample. Further details are available in the Appendix 6.8.6. As in our earlier study, we do not restrict the analysis in this study to first-time interview respondents, but instead also include panel participants. In order to analyze variation in training rates between workers and firms in Chapter 4, we ran a two-way random effects logit model estimated via maximum-likelihood in Equation (6.3). We thereby used the panel dimension of the WeLL-data to separate firm and worker heterogeneity. The original estimation equation reads as follows:

  −1  Pr Training=1it αj(i), θi, Tt, Xit = logit Ttτ + αj(i) + θi + Xitβ (6.3)

Here, Tt capture time effects, θi are random-effects for worker i, αj(i) are random effects for the establishment j where worker i is employed at time t.187 This allowed us to analyze the (relative) importance of firms and workers in determining the individual’s participation in training. By gradually adding worker, firm and job characteristics as predictors (Xit), we then explained these differences in participation in training.

For the purpose of replication, we pool survey information respondents for whom consent for data linkage was obtained with the sample of respondents who did not provide consent. We control for age (four categories), sex, level of education and citizenship of the given worker, as well as for the respondent’s relationship status and whether he or she has recently experienced a period of unemployment. We control for the subjective health status of the respondent and for the probability that he or she will be active in the labor force in one year’s time, as assessed by the respondents themselves (a proxy for labor attachment). We estimate here the following

187 2 2 αj(i) ∼ N (0, σα) and θi ∼ N (0, σθ ) are assumed to be mutually independent, and independent of Tt and Xit. 6. LINKAGE CONSENT BIAS 201 equation:

h i 1 2 1 2 1  Pr Training=1it αj(i), αj(i), θi , θi , γ, Linkage Refusali , Tt, Xit, = (6.4)   −1 1 1 1  2 2 R = logit Ttτ + αj(i) + θi + Xitβ + Linkage Refusali αj(i) + θi + γ + Xitβ

1  Equation (6.4) extents Equation (6.3) by adding the indicator " Linkage Refusali " which takes the value of one if an individual has not provided consent. γi is a (fixed) constant which represents the relative intercept of the group of respondents who do not give consent compared 2 2 to the consent sample. θi and αj(i) are random coefficients for the non-consent sample. We 2 2 are interested in γ, θi , αj(i) and the interaction terms of linkage refusal with Xit. These pa- rameters show us differences in participation in training between the sample of individuals who provided linkage consent and those who did not.188 For computational reasons, we assume that the covariances between the random effects and random coefficients are all zero.189

The results are presented in Table 6.6. We start with the findings for the respondents who provided linkage consent. As in Chapter 4, we find a strong association between training on the one hand, and age or education on the other hand.190 Individuals with higher labor attachment and those with better health participate more in training and the opposite is true for workers with a migration background or who have experienced unemployment. We have shown that many of these associations disappear after controlling for job tasks performed at work.

In the following, we look at training patterns for the non-consent sample. Column 1 in Table 6.6 shows that individuals who did not provide linkage consent participate on average less in training. The (unconditional) average training rate is 46.2% among respondents who provided linkage consent and 45.3% for the sample of individuals who did not. The training gap of 0.9pp is rather small and becomes insignificant after including further variables (Column 2). For the other variables, we see that results are in general close to the previous results with two exceptions. Age above 55 and health status show a significant negative interaction term.191 Older and less healthy workers participate less in training, in particular among the respondents who did not provide linkage consent. This indicates that, by focusing on the consent sample, we might have underestimated the already negative association between poor health and age on

188 1  Results are similar if we additionally interact Tt and Linkage Refusali . 189 We further assume that random effects and random coefficients are independent of Xit. 190In Chapter 4, we used age and age squared. This showed a large negative but insignificant squared age term. Age is only available in four categories in the survey data. 191The health status is standardized with lower values meaning better health. 6. LINKAGE CONSENT BIAS 202 the one hand, and participation in training on the other hand. The association with predictors such as sex or level of education remains unchanged between the previous and current study.

Next, we look at the random effects and random coefficients. We test whether the inclusion of the random coefficients for the individuals who did not give linkage consent significantly improves our model using a likelihood-ratio test. The likelihood ratio statistic is marginally significant for the model including time effects only (Column 1, p-value 0.12) and significant at conventional levels for the model such as worker characteristics (Column 2, p-value 0.04). How large are these differences? To ease interpretation, we have presented variance components from separate regressions in Table 6.7. This table shows in (1) the original results from Chapter 4, in (2) results for the respondents who gave consent and (3) for the non-consent sample.192 We see that the variance components are indeed slightly lower among respondents who did not give consent. Two points should be noted. First, linkage refusal is highly correlated with panel attrition (Table 6.4). Second, almost one-third (29.3%) of individuals who initially declined to provide linkage consent later reconsidered their decision and subsequently provided consent (and appear in the consent sample).193 This implies that fewer respondents with linkage consent are observed at multiple periods, almost two-thirds of the individuals in the non-consent sample are observed for only one period compared to less than one-third of the consent sample. As a consquence, the estimation of random effects becomes less precise (similar to attenuation bias in the presence of classical measurement error).

One of the contributions of Chapter 4 to the training literature is a detailed variance decomposi- tion based on the random-effects. Thereby, we show that firm heterogeneity plays only a minor role for workers’ participation in training after taking into account differences in firm, worker and job characteristics. This result, among others, seems to be unaffected by the omission of re- spondents who never gave linkage consent. Even if we partition the variance components based on the non-consent sample only, this does not affect our interpretation of training differences between and within firms. This result is reassuring and indicates that unobserved heterogeneity associated with linkage consent is not (very) relevant for participation in job-related training. There is little evidence for missingness-not-at-random (see Research Question 6) in this con- text.

192In Appendix 6.8.6 we discuss why the analysis based on the survey data only does yield slightly different results compared to the original study. 193Results are similar if we restrict the analyses to individuals who never gave consent. 6. LINKAGE CONSENT BIAS 203

6.6.2 Earnings Regression

Various (augmented) forms of the Mincer earnings function have been estimated in microeco- nomics with the aim of estimating returns to schooling (Mincer, 1958). This model relates the logarithm of earnings on measures of the educational level, work experience and experience squared and other variables which often serve as controls for observable heterogeneity between educational groups. In the following we will estimate a standard Mincer earnings function us- ing the LPP for which gross earnings are available for both the linkage consent sample and for hold-outs.194

As in Section 6.5, we will again focus on first-time participants only. We thereby avoid con- fusing inconsistencies due to linkage consent with possible panel attrition bias. We estimate a Mincer earnings function in which we interact all variables with an indicator for data linkage consent. This amounts to separate estimation on the two subsamples defined by the linkage con- sent indicator. These predictors include age and age squared (so-called potential experience) as well as indicators for individuals without a vocational qualification and with a tertiary degree, individuals without German citizenship, those with subjectively estimated good health or those withholding linkage consent. We add a further variable measuring participation in training in the last ca. 12 months prior to the interview, which is a common measure of returns on training (e.g. Bassanini et al., 2005). We consider this measure to be of particular interest because, as detailed in Section 6.6.1, respondents who never consent to data linkage tend to participate less in training. We are therefore particularly interested to find out whether this might have an effect on the association between training and wages.

1  R 1  R yi = β0 + Xi β + Linkage Refusali β0 + Xi Linkage Refusali β + i (6.5) y is the logarithm of hourly gross wages and X includes the list of variables described above. 1  As in the previous section, Linkage Refusali is an indicator function for linkage consent refusal. We cluster standard errors on the individual level.

Table 6.8 shows the results for the wage regression. The results are very much in line with those seen in the literature and indicate that education is rewarded in the labor market, that wages increase with age (with a negative squared term) and that women tend to earn less than men on

194In WeLL, only respondents who did not give their consent to data linkage were asked to state their gross income. For the sample of respondents who did provide consent, gross income was taken from social security records. 6. LINKAGE CONSENT BIAS 204 an hourly basis. There is a slight wages penalty for individuals holding non-German citizenship, whilst individuals in good health earn more than their counterparts who report having health issues. Individuals who participate in training earn considerably more than individuals who do not (approximately two-thirds of the gender wage gap). This should not be interpreted as a causal link. Individuals who participate in training could earn more even if they did not attend training due to favorable unobserved attributes.195

We next compare the results for respondents who gave linkage consent to the results for re- spondents who did not. In Table 6.8 this is expressed by the interaction effect. The results are generally reassuring for the MAR assumption (Equation (6.1)). Only one of the ten in- teraction terms is statistically significant. The majority of terms are small in magnitude. We find that individuals who refused consent earn somewhat lower wages, but the difference is not significant. Similarly, the general decrease in wages amongst less highly educated individuals is less marked amongst the non-consent sample than it is amongst respondents who did con- sent. The only significant differences concern participation in training, where we find that the wage difference between those who do participate in training and those who do not is higher among individuals who refuse to give linkage consent. The difference is 16.2% in the sample of respondents who did give consent, but 20.9% in the sample of respondents who did not.196

6.7 Conclusions

Survey data is increasingly being merged with administrative records. Due to survey data pri- vacy laws, polling institutes must obtain explicit consent from individuals in order to link such data. However, not all individuals agree to their survey data being linked to such administra- tive records however, thereby giving rise to a new form of non-response. A growing body of literature has investigated predictors of linkage consent in order to ascertain whether surveys which are linked with administrative records can nonetheless be considered representative of the relevant general population. Previous studies in this field have thus far provided inconsis- tent results, in regard even to standard socio-demographic characteristics such as respondents’ age or sex.

In this study, we have looked at two comparable German surveys, the data from which has been linked to social security data. Using these datasets, we have provided new insights about the

195In the empirical literature, researchers sometimes compare participants in a training course to a control group who planned to participate but cancelled due to more or less random events such as a cancellation by the provider. 196exp0.15 ≈ 1.162 6. LINKAGE CONSENT BIAS 205 characteristics of those individuals who tend to decide against allowing their data to be linked to social security data. Furthermore, we discuss the implications of these findings for survey practitioners and researchers.

We first shed new light on the relevance of possible reasons for the inconsistencies found in the existing literature. We have compared linkage consent patterns in a multivariate regression in both datasets using the same set of control variables. We have thereby shown that common predictors such as age, sex or non-German citizenship have comparable associations with link- age consent across the two datasets. The existing literature has suggested that linkage consent is closely related to panel attrition and item non-response, a pattern which we can confirm for both datasets. In addition, we have illustrated that an individual’s decision to consent (or not consent) to data linkage is associated with other psychological items such as trust or risk aversion.

There are, however, some diverging results. This concerns formal education and, in particular, the Big-Five personality traits and levels of work satisfaction. Conscientiousness for example shows either no correlation or a negative correlation with linkage consent in LPP, while there is a statistically significant positive correlation between the provision of linkage consent and conscientiousness in WeLL. In contrast, work satisfaction positively correlates with the provi- sion of linkage consent in LPP, whilst it shows a negative correlation with consent in WeLL. We have also found that including further predictors does not help to explain diverging findings regarding respondents’ level of education, personality traits or an individual’s level of work satisfaction. Taking further variables, which are not necessarily available in both datasets and which capture other psychological attributes or firm characteristics, into account does not alter our findings. The vast majority of correlations found remain unchanged when these further variables are included.

We have shown that the work environment plays only a minor role for the individual deci- sion to provide linkage consent. Yet, there are large (average) differences in consent rates between respondents working in East and West Germany. Consent rates are higher amongst those employed in firms located in East Germany. This association cannot be explained by cohort differences and indicates that the role of shared experience may be important. We then have compared the linkage consent patterns identified for respondents in establishments in East and West Germany. Whilst female respondents employed in firms in East Germany are signif- icantly more likely to consent to data linkage than their male colleagues, such a difference is not seen between male and female respondents employed in West German firms. We do find, however, that other results are generally comparable between East and West Germany. These 6. LINKAGE CONSENT BIAS 206

findings indicate that differences between the populations surveyed may have contributed to inconsistencies with respect to predictors of linkage consent in the literature.

Our study is the first to analyze and compare the impact of linkage consent in two empirical models. Firstly, we have replicated one of our own previous studies in which we made use of the WeLL survey data linked with social security data. Accordingly, it was not possible to consider individuals who did not give linkage consent in this analysis. In respect to job-related training at least, there are few differences between the sample of individuals who consent to their survey data being linked to social security data, and those who do not give such consent. For this reason, we were able to confirm our previous findings on the sample of respondents who did not give consent to data linkage. Secondly, we have considered the results from a well-known empirical model which measures the wage returns to human capital investments (schooling as well as job-related training). We find that wage differences between individuals who do participate in training and those who do not participate in training is larger among the sample of respondents who fail to provide linkage consent than for respondents from whom linkage consent was obtained. All other results differ very little by consent.

We therefore conclude with a promising view about linkage non-consent. The role of unob- served heterogeneity between respondents who gave linkage consent and respondents who did not seems to be rather small in the applications we have analyzed. Future research should ad- dress the role of psychological attributes in determining an individual’s decision for or against linkage consent in more detail. 6. LINKAGE CONSENT BIAS 207

6.8 Appendix 6. LINKAGE CONSENT BIAS 208

6.8.1 Tables

Table 6.1: Overview of Consent Patterns for Selected (Recent) Studies from Social Sciences

Linkage Consent (1) (2) (3) (4) (5) (6) Benefits Health Benefits Benefits Health Benefits Benefits Consent Rate 91.6% 41% 39% 67.8% 77.6% 66.9% 77.9% 93.9% Age (higher) + 0 0 0 - - + Foreign-born 0 - - 0 0 0 0 Female 0 0 - 0 0 - 0 0 Highly Educated 0 0 + + 0 - 0 - Partnership etc. 0 0 0 0 + 0 0 0 Children 0 0 0 0 - 0 Health Problems 0 + 0 0 0 + Employed + 0 0 0 0 + Country DE UK US DE UK DE Interview 1st-Interv. F.-Up F.-Up F.-Up F.-Up F.-Up Method Bivar. Probit R.-E. Logit R.-E. Logit Logit Sample aged 50+ aged 50+ Employed Controls Interviewer Yes Yes Yes Yes Yes No Item Non-Response Yes No No Yes No No Note: See also Antoni (2011) for an excellent related overview. Here, "+" refers to statistically significant positive associations, "0" insignificant and "-" to significant negative associations. We do not illustrate age effects for Korbmacher et al. (2013) because the SHARE data covers only individuals aged 50 and older. (1): Antoni (2011), Table 5 (Columns 1), ALWA dataset (2): Sala et al. (2012), Table 2 (Columns 5 & 6), BHPS dataset (3): Sakshaug et al. (2012), Table 4, HRS dataset (4): Korbmacher et al. (2013), Table 2 (Column 5), SHARE dataset (5): Al Baghal et al. (2014), Table 4 (Columns 1 & 2), Understanding Society dataset (6): Warnke (2015), Table 4, WeLL dataset

Table 6.2: Two IAB Matched-Employer-Employee Datasets

LPP WeLL Matched-Employer- Further Training as Employee Panel a Part of Lifelong Learning Focus HRM Further Training Waves 2 4 Establishments 869 149 Individuals 7,508 6,404 (wave 1) Method Phone (CATI) Phone (CATI) contacted at home Polling institute infas infas Eligible employees subject to social security or minor employment (excl. apprentices) Response Rate 24.5% 31.7% (of gross sample) Response Rate 34.1% 38.7% (contacted individuals) Avg. Consent Rate 81.9% 94.2% (first-time participants) Note: The average linkage consent rate refers to the sample of first-time survey participations without missing information regarding key variables such as education, working hours or nationality (see Chapter 6.4). 6. LINKAGE CONSENT BIAS 209

Table 6.3: Descriptive Statistics (Binary Variables)

LPP WeLL Absolute Percent Absolute Percent Total (First-Interview) 11,385 100% 5,753 100% Linkage Consent 9,325 81.9% 5,321 92.5% Wave 1 7,417 65.2% 4,483 77.9% Wave 2 3,968 34.9% 536 9.3% Wave 3 - - 734 12.8% Female 3,240 28.5% 2,236 38.9% No Vocat. Qualif. 262 2.3% 583 10.1% Vocational Qualif. 9,017 79.2% 3,786 65.8% Tertiary Degree 2,106 18.5% 1,384 24.1% Age below 35 2,257 19.8% 1,331 23.1% Age ca. 35-45 2,361 20.7% 1,821 31.7% Age ca. 45-55 4,325 37.8% 1,932 33.6% Age over 55 2,442 21.5% 669 11.6% Foreign-born 1,077 9.5% 343 6.0% Part-time 1,424 12.5% 904 15.7% White Collar 6,962 61.2% 3,833 66.6% Child Under 14 2,983 26.2% 1,803 31.3% Living Alone 758 6.7% 954 16.6% Good Health 6,871 60.4% 4,593 79.8% Managerial Resp. 3,396 29.8% 1,682 29.2% Limited Contract 678 6.0% 766 13.3% Shift Work 3,636 31.9% 2,452 42.6% Panel 10,677 93.8% 5,666 98.5% Net Wage Missing 1,463 12.9% 180 3.1% Voluntary Work - - 1,591 27.7% Firm Size 100-200 - - 861 15.0% Firm Size 200-500 - - 1,397 24.3% Firm Size 500-2000 - - 3,495 60.8% East Germany - - 2,267 39.4% Service Sector - - 2,925 50.8% Training Firm - - 4,990 86.7% Investment Firm - - 2,604 45.3% Note: First-interview sample. Excluded are respondents for whom information on key variables such as education level is missing. Non-binary variables are standardized and not presented here. 6. LINKAGE CONSENT BIAS 210

Table 6.4: Random-Effects Logit Regression Estimates (on Linkage Consent)

Outcome: Linkage Consent LPP WeLL LPP WeLL LPP WeLL Variable Coef (SE) Coef (SE) Coef (SE) Coef (SE) Coef (SE) Coef (SE) Wave 2 -0.15*** (0.05) -1.00*** (0.14) -0.15*** (0.05) -0.85*** (0.15) -0.17*** (0.06) -0.61*** (0.16) Wave 3 -1.14*** (0.11) -0.95*** (0.13) -0.76*** (0.14) Female 0.00 (0.06) 0.15 (0.14) 0.02 (0.07) 0.06 (0.14) No Vocat. Qualif. -0.03 (0.18) 0.00 (0.17) 0.01 (0.2) -0.05 (0.18) Tertiary Degree -0.19*** (0.06) -0.01 (0.13) -0.25*** (0.07) -0.02 (0.13) Aged ca. 35-45 0.21*** (0.07) -0.07 (0.12) 0.28*** (0.08) -0.06 (0.13) Aged ca. 45-55 0.28*** (0.07) 0.16 (0.18) 0.35*** (0.08) 0.15 (0.19) Aged over 55 0.51*** (0.09) 0.64*** (0.23) 0.68*** (0.09) 0.57** (0.24) Foreign-born -0.20** (0.09) -0.15 (0.21) 0.03 (0.10) 0.10 (0.25) Part-time -0.09 (0.08) -0.19 (0.14) -0.16* (0.09) -0.12 (0.15) White-Collar -0.28*** (0.06) -0.30** (0.14) -0.27*** (0.07) -0.30* (0.15) Child Under 14 -0.09 (0.06) 0.01 (0.12) -0.09 (0.07) 0.02 (0.13) Living Alone -0.24*** (0.09) -0.13 (0.17) -0.30*** (0.10) -0.12 (0.17) Good Health 0.10** (0.05) 0.08 (0.13) 0.09 (0.06) 0.01 (0.13) Managerial Resp. 0.02 (0.06) 0.27** (0.12) -0.02 (0.06) 0.24** (0.12) Limited Contract 0.09 (0.11) -0.02 (0.13) 0.17 (0.11) 0.00 (0.15) Shift Work 0.20*** (0.06) 0.20 (0.12) 0.20*** (0.06) 0.08 (0.13) Conscientiousness -0.02 (0.04) 0.16** (0.07) -0.09** (0.04) 0.11* (0.06) Extraversion -0.04 (0.04) -0.1* (0.06) -0.01 (0.04) -0.10 (0.06) Neuroticism 0.02 (0.04) 0.11 (0.07) 0.00 (0.04) 0.09 (0.07) Agreeableness -0.01 (0.04) -0.11 (0.09) -0.01 (0.04) -0.13 (0.09) Openness to new Exp. 0.02 (0.04) -0.07 (0.08) 0.10** (0.05) -0.08 (0.08) Job Insecurity 0.03 (0.03) -0.05 (0.06) 0.04 (0.03) -0.03 (0.06) Work Satistfaction 0.04 (0.02) -0.05 (0.06) 0.05* (0.03) -0.07 (0.06) Panel 2.46*** (0.09) 2.02*** (0.21) Net Income Missing -1.26*** (0.07) -1.48*** (0.23) Openness to Risk 0.11*** (0.03) Trust 0.08*** (0.03) Justice -0.01 (0.03) Voluntary Work 0.20 (0.12) Labour Attachment 0.15** (0.06) Firm Size 100-200 0.17 (0.19) Firm Size 500-2000 0.15 (0.13) East Germany 0.45*** (0.12) Service Sector 0.10 (0.11) Training Firm -0.08 (0.15) Investment Firm -0.12 (0.1) nWorker 11,385 5,753 11,385 5,753 11,385 5,753 nFirms 1,591 149 1,591 149 1,591 149 Intercept 1.59*** (0.04) 2.86*** (0.08) 1.49*** (0.09) 2.72*** (0.25) -0.58*** (0.12) 0.67 (0.34) 2 σα 0.043 (0.028) 0.027 (0.037) 0.021 (0.023) 0.000 (0.000) 0.023 (0.025) 0.000 (0.000) Log-Likelihood -5377.0 -1482.6 -5298.2 -1461.5 -4635.0 -1385.1

Note: First-time respondents only. All observations with missing values for any of the predictors are excluded (results for Column 1 and Column 2 are very similar if the use the full sample of first-time respondents only). Non-binary variable have been standardized by subtracting the mean and dividing by the standard deviation (before applying sample restrictions). 6. LINKAGE CONSENT BIAS 211

Table 6.5: Separate Estimates for Linkage Consent for East and West Germany (WeLL)

Outcome: Linkage Consent East Germany West Germany Wave 2 -0.82*** (0.25) -0.87*** (0.18) Wave 3 -1.12*** (0.24) -0.9*** (0.15) Female 0.42* (0.23) -0.08 (0.17) No Vocat. Qualif. 0.38 (0.3) -0.1 (0.2) Tertiary Degree 0.1 (0.29) -0.14 (0.13) Age ca. 35-45 -0.22 (0.22) 0 (0.15) Age ca. 45-55 0.47* (0.27) 0.05 (0.23) Age over 55 0.93** (0.42) 0.46 (0.28) Foreign-born 0.00 (0.00) Part-time -0.37 (0.25) 0.06 (0.19) White Collar -0.35 (0.28) -0.25 (0.16) Child Under 14 0.25 (0.19) -0.08 (0.16) Living Alone 0.19 (0.24) -0.21 (0.22) Good Health 0 (0.29) 0.08 (0.16) Managerial Resp. 0.36 (0.31) 0.28** (0.12) Limited Contract -0.12 (0.24) 0.03 (0.16) Shift Work 0.44** (0.18) -0.01 (0.16) Conscientiousness 0.16 (0.11) 0.15* (0.08) Extraversion -0.02 (0.11) -0.15** (0.07) Neuroticism 0.1 (0.12) 0.12 (0.09) Agreeableness -0.11 (0.13) -0.1 (0.12) Openness to Exp. -0.17* (0.1) -0.04 (0.11) Job Insecurity -0.16 (0.11) -0.02 (0.07) Work Satistfaction -0.11 (0.13) -0.03 (0.07) Intercept 2.69*** (0.33) 2.76*** (0.33) nWorker 2231 3486 nFirms 61 88 Log Pseudo- -453.03 -988.86 likelihood Note: First-time respondents only. All foreign-born respondents in East Germany provided consent and have therefore been disregarded in Column 2. 6. LINKAGE CONSENT BIAS 212

Table 6.6: Replication of Chapter 4 (WeLL)

Outcome: Participation in Training Intercept 0.17** (0.08) -0.07 (0.13) Intercept x Refusal -0.24** (0.11) -0.44 (0.44) Female -0.04 (0.06) Female x Refusal 0.01 (0.21) Cohabitating 0.06 (0.06) Cohabitating x Refusal 0.15 (0.24) No Voc. Qualification -0.07 (0.08) No Voc. Qualif. x Refusal 0.00 (0.33) Tertiary Education 0.88*** (0.06) Tertiary Educ. x Refusal 0.26 (0.23) Age ca. 35-45 -0.22*** (0.07) Age ca. 35-45 x Refusal -0.10 (0.25) Age ca. 45-55 -0.39*** (0.07) Age ca. 45-55 x Refusal -0.04 (0.26) Age above 55 -0.69*** (0.09) Age above 55 x Refusal -0.76* (0.44) Unempl. Exp. -0.12 (0.18) Unempl. Exp. x Refusal -0.64 (0.63) Labor Attachment 0.05*** (0.01) Labor Attachm. x Refusal 0.01 (0.04) Foreign Born -0.58*** (0.11) Foreign born x Refusal 0.49 (0.37) Health Status -0.12*** (0.02) Health Status x Refusal -0.16* (0.09) 2 σFirm 0.6 (0.09) 0.48 (0.08) 2 σFirm × Refusal 0.23 (0.16) 0.32 (0.19) 2 σWorker 1.4 (0.10) 1.24 (0.10) 2 σWorker × Refusal 0.32 (0.53) 0.02 (0.28) nWorker 17269 17269 nFirms 149 149 Wald χ2 358.57 749 Log Likelihood -11082.43 -10840.85

Table 6.7: Replication of Variance Components in Chapter 4 (WeLL)

(1) (2) (3) Variable Model Coef (SE) Coef (SE) Coef (SE) 2 σFirm Time Effects 0.65 (0.10) 0.61 (0.09) 0.50 (0.22) 2 σWorker Time Effects 1.5 (0.16) 1.42 (0.10) 1.26 (0.51) 2 σFirm +Worker Characteristics 0.48 (0.08) 0.49 (.08) 0.43 (0.20) 2 σWorker +Worker Characteristics 1.29 (0.14) 1.20 (0.09) 0.81 (0.41) nFirm 149 149 132 nWorker 12,560 16,263 666 Note: (1) Original results in Chapter 4, (2) Re-analyses on survey data only, (3) Replication on non-Consent Sample. nWorker refers to the number of observations (interviews). 6. LINKAGE CONSENT BIAS 213

Table 6.8: Wage Regression (LPP-data)

Outcome: Log Gross Hourly Wage Data: LPP Variable Coef (SE) Intercept 1.93*** ( 0.07 ) Intercept x Refusal -0.16 ( 0.16 ) 2nd Wave 0.04** ( 0.02 ) 2nd Wave x Refusal 0.00 ( 0.02 ) Female -0.24*** ( 0.01 ) Female x Refusal 0.03 ( 0.02 ) Poorly Educated -0.16*** ( 0.03 ) Poorly Educated x Refusal 0.06 ( 0.07 ) Highly Educated 0.34*** ( 0.01 ) Highly Educated x Refusal 0.01 ( 0.02 ) Age 0.04*** ( 0.00 ) Age x Refusal 0.01 ( 0.01 ) Age Squared 0.00*** ( 0.00 ) Age Squared x Refusal 0.00 ( 0.00 ) Foreign-born -0.03** ( 0.02 ) Foreign-born x Refusal -0.04 ( 0.03 ) Good Health 0.08*** ( 0.01 ) Good Health x Refusal -0.01 ( 0.02 ) Training 0.15*** ( 0.01 ) Training x Refusal 0.04* ( 0.02 ) n 8964 R2 0.2553 Note: First-time respondents only. Analyses restricted to individuals reporting working hours between 15 and 60 hours per week. The lowest and highest wage percentiles were trimmed. 6. LINKAGE CONSENT BIAS 214

Figure 6.1: Consent Rates on the Establishment Level

(a) LPP (b) WeLL

Note: Simulation based on 10, 000 repetitions. First-time respondents only. Gaussian kernel estimation with bandwidth fixed at 0.025 (LPP) / 0.02 (WeLL).

6.8.2 Figures

In Chapter 6.5 we have shown that consent rates are partly driven by establishment heterogene- ity (if no further controls are added). Another way to illustrate how consent rates differ across establishments is a simple Monte-Carlo experiment. Here we assume that establishment-wide average consent rates are a result of a series of Bernoulli trials. We model the distribution of the expect average consent rate p¯j for each of the 149 establishments in WeLL (j = 1, ..., 149): n Pj pj = p. Here p = 89.7% corresponds to the grand mean of consent rates for first-time k=1 respondents, and nj equals the number of first-time respondents per establishment j. The result of 10,000 draws of the simulation are depicted in Figure 6.1. Figure 6.1 shows a kernel density plot of the average (aggregated) consent rate by establishment. The graph demonstrates that actual consent rates indeed seem to differ from what we would expect, as is apparent from the lower number of establishments with a consent rate around the grand average. The very dif- ferent worker to establishment ratio in the LPP-data leads to a distinct shape of the simulation profile, but notable deviations from what we would expect due to random noise for probabilities close to zero and one. 6. LINKAGE CONSENT BIAS 215

6.8.3 Detailed Description of Sampling in WeLL

The sampling procedure in WeLL has been described in Bender et al. (2008) and Knerr et al. (2009). The population consists of 149 establishments located in five different German states (three states in West Germany and two states in East Germany) which were sampled in a strat- ified way from the IAB Establishment Panel.197 These establishments were active in either the service sector or in the manufacturing sector and had between 100 and 2000 employees. The strata were defined on the basis of size (three categories with 200 - 500 employees be- ing the middle category), sector (manufacturing or service sector) and location (East or West Germany). In addition, establishments were sampled according to their willingness to make investments and whether they indicated training provision. A survey of the establishments has been conducted but this has not been made available to researchers.

We will next describe the sampling of survey participants from the 149 WeLL establishments. We will start with the procedure used in the first wave. All employees who were subject to social security contributions and who were employed at the WeLL-establishment on December 31, 2006, were eligible. This excludes apprentices or workers in partial retirement and includes approximately 56,000 employees. Within this group, 20,190 individuals were sent a letter invit- ing them to participate in a telephone interview, along with information explaining the purpose of the survey. 16,552 individuals were finally contacted at home and 6,404 interviews were con- ducted. This gave a response rate of 38.7%.198 Computer assisted telephone interviews (CATI) were conducted by infas, Bonn, between October 2007 and January 2008.

The telephone interviews lasted for an average of 32 minutes (Knerr et al., 2009). At the end of the interview, individuals were asked whether they were willing to participate in future waves of the survey, and whether they would provide consent for the interview data to be linked to their social security records. The question regarding data linkage read as follows, "We have now talked a lot about topics such as your job or your education. To shorten the interview, we would like to include for the analysis excerpts of data available at the Institute for Employment Research in Nuremberg. This includes information about previous periods of employment and unemployment. The Data Protection Act requires your consent for the purpose of linking such information to the interview data, to what I would like to ask you cordially. It is absolutely

197Originally, WeLL had sampled 167 establishments but 18 of these had to be excluded for reasons of anonymity - less than 50 employees were eligible to participate in WeLL. See further details about the sampling of respondents from the establishments in the next paragraph. 198The difference between the 20,190 individuals invited and the final sample is due to missing addresses or telephone numbers, insufficient language skills or other similar reasons. 6. LINKAGE CONSENT BIAS 216 certain that all data protection regulations are strictly adhered to. Your consent is of course voluntary. You can also withdraw your consent at any time. Do you agree to this additional information potentially being merged with your details in the interview?" 91% of respondents consented to data linkage. This consent rate is extremely high compared to those reported in the literature (see Section 6.2). Furthermore, 92% of respondents agreed to participate in future waves of the survey.199

While the second and the third waves of the survey made use of both panel participants and new respondents, the fourth wave was limited to panel respondents only. Individuals who joined a WeLL establishment in 2008 for the second wave (2009 for the third wave) were eligible as new respondents. This also includes apprentices who had become regular employees in the respective year. The second (third or fourth) wave was conducted in autumn 2008 (2009 or 2010, respectively).

In WeLL, basic establishment information which is relevant for the stratification has been made public for all respondents. This includes firm-size (100-199 employees, 200-499 and 500- 1,999), sector (manufacturing or service), location (East or West Germany), whether the estab- lishment provides further training (yes/no) and the establishment’s willingness to make invest- ments (yes/no). Further establishment variables are available for the consent sample through the link to the IAB Establishment Panel. The employer survey has not been made public.

6.8.4 Detailed Description of Sampling in LPP

LPP consists of an establishment and an individual survey. Establishments in the LPP were drawn from the IAB Establishment Panel 2011 (in the first wave). Letters of invitation to par- ticipate in the LPP establishment survey were sent to all 2,222 non-agricultural establishments with more than 50 employees. Establishments primarily owned by the state and those which operate as non-profit establishments were excluded. In total, 1,219 interviews were conducted between July and October 2012 by TNS Infratest Sozialforschung (which also organized the in- terviews for the IAB Establishment Panel in 2011). Further information is available in Gensicke and Tschersich (2015).

The individual survey was conducted by infas, which also carried out surveys for the collection of data for the WeLL dataset (Schütz et al., 2014). Individuals were drawn from 869 LPP-

199The average consent rate is probably very high because the question is unspecific regarding the nature of the data with which survey information is to be merged. Furthermore, consent rates are much higher in East Germany which is over-represented in WeLL. This could explain the difference to LPP. 6. LINKAGE CONSENT BIAS 217 establishments which had expressed their willingness to participate in future waves of the survey and which employed a sufficient number of eligible individuals. The survey was conducted between December 2012 and April 2013 in the form of a CATI. The average duration of the interview was 30 minutes.

At the end of the interview, individuals were asked whether they would consent to data linkage. The question was similar to that asked in the surveys conducted for WeLL. It read as follows: "To shorten the next interview by not asking about your full employment biographies, we would like to include excerpts of other data for the analysis. This data is available at the Institute for Employment Research in Nuremberg. This includes information about previous periods of employment. However, the inclusion of this data requires your consent. The Data Protection Act requires your consent for the purpose of linking such information to the interview data to what I would like to ask you cordially. It is absolutely certain that all data protection regulations are strictly adhered to. Your consent is of course voluntary. You can also withdraw your consent at any time. Do you agree to this additional information potentially being merged with your details from the interview".200

LPP does not include establishment information on individuals who did not consent to their data being linked to administrative records. For the sample who did provide consent (consent sample), establishment information is available both through the link to the IAB Establishment Panel and through the employer survey.

6.8.5 Lasso Estimation of Linkage Consent

We conduct a robustness test in order to illustrate that the identified predictors of linkage consent are indeed important and that they should therefore be taken into account in future research. In the main specification we left out a number of variables which may in fact be important. These variables include predictors which have been used in previous studies, such as marital status (available in WeLL only) as well as variables such as job tasks used in the field of social science and available in this study for WeLL only.201 Furthermore, we have already included a large number of variables in Table 6.4. Multicollinearity can be an issue where unregularized

200Individuals who agreed to participate in future waves of the survey were asked this question. The question addressed to respondents who declined to participate in future interviews was very similar. 201Many variables regarding the family status and household context are highly correlated. Individuals who are married for example, are likely to live with another person whilst widowed individuals are often fairly old. We offer a regularized approach in order to account for multicollinearity. It is for this reason that we have so far excluded these variables. 6. LINKAGE CONSENT BIAS 218 methods are used for variable selection. Here, we want to assure that our findings are robust to using another approach which works well in the case of moderate multicollinearity.

Lasso regularization is a commonly used method in the machine learning literature and is suit- able for variable selection (least absolute shrinkage and selection operator, Tibshirani, 1996). Lasso regularisation constrains the sum of the absolute values of the estimates. It thereby sets many coefficients to exactly zero and is therefore well suited to variable selection (Friedman et al., 2001). Many researchers prefer lasso to standard approaches such as stepwise selection models, in particular in the presence of highly correlated variables (e.g. Yuan and Lin, 2006).

" N # 1 T X T (β0+x β) min − Linkage Consent · (β0 + x β) − log(1 + e i ) + λ||β||1. (6.6) p+1 i i (β0,β)∈ N R i=1

Here x includes the 32 variables listed in Table 6.4 for WeLL and 20 other variables includ- ing marital status and household composition, 12 job tasks and future expectations regarding wage growth. We do not include any further variables for LPP, but we do check whether lasso confirms the results from the random effects logistic regression in Chapter 6.5. λ controls the amount of shrinkage and is estimated via 50-fold cross-validation.202

The results are overwhelmingly in line with previous findings with only a few exceptions. In WeLL we find that the main indicators, such as willingness to participate in future interviews, item non-response, age and employment in a firm in East Germany, are of an almost identical size. Contrasting results are found for individuals with managerial responsibilities, a predictor which exhibits only a small positive correlation with linkage consent according to lasso. The negative coefficient for white-collar workers is also only half as large as that shown in Table 6.4. The time effects are also much smaller and the coefficient for the third wave is even set to zero in the lasso approach. Results for LPP are fully in line with our previous findings.

For WeLL, lasso regularization suggests that alternative household situations should be included; e.g. a single household not living with a partner or being divorced rather than living alone. The negative coefficient for being divorced is, however, comparable to that found for those living alone (indeed, we find similar associations for other household situations). In addition, lasso regularization suggests that different job tasks should also be included. These include "mea-

202k-fold cross-validation partitions the original data into k subsamples of equal size. k − 1 samples are then used as a training set and one remaining sample is then used for validation. This exercise is repeated k times. As in Table 6.4, we use a logistic loss function. 6. LINKAGE CONSENT BIAS 219 suring, testing, quality control" (positively associated with linkage consent), "teaching, train- ing, educating" (positive association), "taking care, healing" (positive association), "operating, controlling machines" (positive association), "manufacturing of goods, planting" (positive as- sociation) and "repairing, renovating, restoring" (negative association). The absolute size of the coefficients in the lasso approach is between 0.16 for "measuring" and 0.05 for "repairing". If we add these variables to the specification in Table 6.4 column 6, we obtain similar results. Only one additional predictor, "measuring, testing, quality control", however, is significant at the 10 percentage level.

6.8.6 Further Details of the Replication of Chapter 4

Here, we replicate our original results presented in Chapter 4. We restrict the replication to specifications which use only worker characteristics available in the survey data. As in our earlier paper, we exclude mandatory job-training courses and restrict the analyses to workers who are employed in a full- or part-time position and working at least 15 hours a week. This provides us with estimates comparable to those in Table 7, Columns 1 and 3, in Chapter 4. We excluded all individuals with missing information on key variables such as level of education or training attendance. In the earlier study we used age and age squared. In the survey data used here four age-categories are included instead.

These sample selection criteria give us a sample of 16,263 interviews and 6,731 unique respon- dents over four waves from whom linkage consent was obtained. Compared to the original study and for these reasons, there are approximately 29% more interviews and 16% more workers in this study. There are a number of different (related) reasons why this will provide slightly vary- ing results. Firstly, not all variables which we used for the original data preparation are available in the survey. We cannot tell from the survey data alone, for example, whether a worker still works for a given WeLL-establishment. This information is, however, directly accessible from social security data. Moreover, in Chapter 4 we excluded individuals who left a given firm. Secondly, variables such as age are anonymised in the survey data (four categories) whilst the social security data provides exact information regarding the year in which respondents were born. We cannot therefore exclude individuals aged below 21 or above 64, nor use age contin- uously for the regression analyses as was the case in our earlier paper. Furthermore, in Chapter 4 we excluded individuals whose social security entries were missing or who received very low wages.

As expected, the replication of our results derived from the survey data gives us results similar 6. LINKAGE CONSENT BIAS 220 to those seen in Chapter 4. There are some notable differences between the intercept and the time effects which is most probably due to the inclusion of workers who leave an establishment. In order to acquire the new skills, individuals tend to participate more in training when they have recently begun a new job. 5. REFERENCES 221

5 References

References

Abowd, J. M., Kramarz, F., 1999. Econometric Analyses of Linked Employer-Employee Data. Labour Economics 6 (1), 53–74.

Abowd, J. M., Kramarz, F., 1999. The Analysis of Labor Markets Using Matched Employer- Employee Data. Handbook of Labor Economics 3, 2629–2710.

Acemoglu, D., Pischke, J.-S., 1998. Why Do Firms Train? Theory and Evidence. The Quarterly Journal of Economics 113 (1), 79–119.

Acemoglu, D., Pischke, J.-S., 1999. Beyond Becker: Training in Imperfect Labour Markets. The Economic Journal 109 (453), 112–142.

Acemoglu, D., Pischke, J.-S., 1999. The Structure of Wages and Investment in General Train- ing. Journal of Political Economy 107 (3), 539–572.

Addison, J. T., Ozturk, O. D., Wang, S., 2014. The Role of Gender in Promotion and Pay over a Career. Journal of Human Capital 8 (3), 280–317.

Akerlof, G. A., 1982. Labor Contracts as Partial Gift Exchange. Quarterly Journal of Economics 97 (4), 543–569.

Al Baghal, T., Knies, G., Burton, J., et al., 2014. Linking Administrative Records to Surveys: Differences in the Correlates to Consent Decisions. Tech. rep., Understanding Society at the Institute for Social and Economic Research.

Alchian, A. A., Demsetz, H., 1972. Production, Information Costs, and Economic Organization. American Economic Review 62 (5), 777–795.

Almeida Santos, F., Mumford, K. A., 2004. Employee Training in Australia: Evidence From AWIRS*. Economic Record 80 (s1), 53–64.

Altonji, J. G., Blank, R. M., 1999. Race and Gender in the Labor Market. Handbook of Labor Economics 3, 3143–3259. REFERENCES 222

Amabile, T., 1988. Children’s Artistic Creativity: Detrimental Effects of Competition in a Field Setting. Personality and Social Psychology Bulletin 8, 573–578.

Amabile, T., 1996. Creativity in Context: Update to ”The Social Psychology of Creativity.”. Westview Press, New York.

Amabile, T. M., 1997. Motivating Creativity in Organizations: On Doing What You Love and Loving What You Do. California Management Review 40(1), 39–58.

Amemiya, T., 1977. Some Theorems in the Linear Probability Model. International Economic Review, 645–650.

Antoni, M., 2011. Linking Survey Data With Administrative Employment Data: The Case of the German ALWA Survey. FDZ Methodenreport 12.

Ariely, D., Gneezy, U., Loewenstein, G., Mazar, N., 2009. Large Stakes and Big Mistakes. Review of Economic Studies 76 (2), 451–469.

Arrow, K., 1973. The Theory of Discrimination. Princeton, NJ: Princeton University Press, pp. 3–33.

Arulampalam, W., Booth, A. L., Bryan, M. L., 2004. Training in Europe. Journal of the Euro- pean Economic Association 2 (2-3), 346–360.

Asplund, R., 2005. The Provision and Effects of Company Training: A Brief Review of the Literature. Nordic Journal of Political Economy 31, 47–73.

Autor, D. H., Levy, F., Murnane, R., 2003. The Skill Content of Recent Technological Change: An Empirical Exploration. The Quarterly Journal of Economics 118 (4), 1279–1333.

Autorengruppe Bildungsberichterstattung, 2012. Bildung in Deutschland 2012: Ein indika- torengestützter Bericht mit einer Analyse zur kulturellen Bildung im Lebenslauf. W. Ber- telsmann Verlag, Bielefeld.

Azmat, G., Iriberri, N., 2010. The Importance of Relative Performance Feedback Information: Evidence From a Natural Experiment Using High School Students. Journal of Public Eco- nomics 94 (7-8), 435–452.

Azmat, G., Petrongolo, B., 2014. Gender and the Labor Market: What Have We Learned From Field and Lab Experiments? Labour Economics 30, 32–40. REFERENCES 223

Backes-Gellner, U., Oswald, Y., Tuor Sartore, S., 2014. Part-Time Employment - Boon to Women But Bane to Men? New Insights on Employer-Provided Training. Kyklos 67 (4), 463–481.

Bandiera, O., Barankay, I., Rasul, I., 2005. Social Preferences and the Response to Incentives: Evidence from Personnel Data. The Quarterly Journal of Economics 120 (3), 917–962.

Barankay, I., 2011. Gender Differences in Productivity Responses to Performance Rankings: Evidence From a Randomized Workplace Experiment. Working Paper.

Barron, J. M., Black, D. A., Loe, 1993. Gender Differences in Training, Capital, and Wages. The Journal of Human Resources 28 (2), 343–364.

Bassanini, A., Booth, A. L., Brunello, G., De Paola, M., Leuven, E., 2005. Workplace Training in Europe. IZA Discussion Papers (1640).

Bates, N., 2005. Development and Testing of Informed Consent Questions to Link Survey Data With Administrative Records. In: Association, A. S. (Ed.), Proceedings of the Survey Re- search Methods Section. pp. 3786–3793.

Becker, G. S., 1962. Investment in Human Capital: A Theoretical Analysis. The Journal of Political Economy 70 (5), 9–49.

Becker, G. S., 1964. Human Capital. Columbia University Press.

Becker, G. S., 2010. The Economics of Discrimination, 2nd Edition. University of Chicago Press.

Bellmann, L., Bender, S., Bossler, M., Broszeit, S., Dickmann, C., Gensicke, M., Gilberg, R., Grunau, P., Kampkötter, P., Laske, K., et al., 2015. LPP-Linked Personnel Panel* Quality of Work and Economic Success: Longitudinal Study in German Establishments (Data Collec- tion on the First Wave), FDZ-Methodenreport, 05/2015 (en).

Bem, D. J., 1972. Self-Perception Theory. In: Berkowitz, L. (Ed.), Advances in Experimental Social Psychology. Vol. 6. Academic Press, New York, pp. 1–62.

Bénabou, R., Tirole, J., 2006. Incentives and Prosocial Behavior. American Economic Review 96 (5), 1652–1678. REFERENCES 224

Bender, S., Fertig, M., Görlitz, K., Huber, M., Hummelsheim, S., Knerr, P., Schmucker, A., Schröder, H., et al., 2008. WeLL-Berufliche Weiterbildung als Bestandteil Lebenslangen Lernens. No. 5.

Berger, L., Klassen, K. J., Libby, T., Webb, A., 2013. Complacency and Giving Up Across Re- peated Tournaments: Evidence From the Field. Journal of Management Accounting Research 25 (1), 143–167.

Bertrand, M., Schoar, A., 2003. Managing with Style: The Effect of Managers on Firm Policies. The Quarterly Journal of Economics 118 (4), 1169–1208.

Binder, J. J., Findlay, M., 2011. The Effects of the Bosman Ruling on National and Club Teams in Europe. Journal of Sports Economics 13 (2), 107–129.

Black, S. E., Spitz-Oener, A., 2010. Explaining Women’s Success: Technological Change and the Skill Content of women’s work. The Review of Economics and Statistics 92 (1), 187–194.

Blanes i Vidal, J., Nossol, M., 2011. Tournaments Without Prizes: Evidence From Personnel Records. Management Science 57 (10), 1721–1736.

Blau, F. D., Kahn, L. M., January 2016. The Gender Wage Gap: Extent, Trends, and Explana- tions. Working Paper 21913, National Bureau of Economic Research.

Bonner, S. E., Hastie, R., Sprinkle, G. B., Young, S. M., 2000. A Review of the Effects of Financial Incentives on Performance in Laboratory Tasks: Implications for Management Ac- counting. Journal of Management Accounting Research 12 (1), 19–64.

Boudreau, K. J., Lacetera, N., Lakhani, K. R., 2011. Incentives and Problem Uncertainty in Innovation Contests: An Empirical Analysis. Management Science 57 (5), 843–863.

Bowles, S., Polania-Reyes, S., 2012. Economic Incentives and Social Preferences: Substitutes or Complements? Journal of Economic Literature 50 (2), 368–425.

Bracha, A., Fershtman, C., 2012. Competitive Incentives: Working Harder or Working Smarter? Management Science, 1–11.

Brandts, J., Charness, G., 2004. Do Labour Market Conditions Affect Gift Exchange? Some Experimental Evidence. The Economic Journal 114 (497), 684–708.

Brunt, L., Lerner, J., Nicholas, T., 2012. Inducement Prizes and Innovation. The Journal of Industrial Economics 60 (4), 657–696. REFERENCES 225

Buraimo, B., Frick, B., Hickfang, M., Simmons, R., 2015. The Economics of Long-term Con- tracts in the Footballers’ Labour Market. Scottish Journal of Political Economy 62 (1), 8–24.

Byron, K., Khazanchi, S., 2012. Rewards and Creative Performance: A Meta-Analytic Test of Theoretically Derived Hypotheses. Psychological Bulletin 138 (4), 809–830.

Cairns, J., Jennett, N., Sloane, P. J., 1986. The Economics of Professional Team Sports: A Survey of Theory and Evidence. Journal of Economic Studies 13 (1), 3–80.

Camerer, C., Hogarth, R., 1999. The Effects of Financial Incentives in Experiments: A Review and Capital-Labor-Production Framework. Journal of Risk and Uncertainty 19 (1), 7–42.

Cameron, A. C., Miller, D. L., 2014. Robust Inference for Dyadic Data. Unpublished manuscript, University of California-Davis.

Cameron, A. C., Miller, D. L., 2015. A Practitioners Guide to Cluster-Robust Inference. Journal of Human Resources 50 (2), 317–372.

Card, D., Cardoso, A. R., Heining, J., Kline, P., 2016. Firms and Labor Market Inequality: Evidence and Some Theory. Working Paper.

Card, D., Cardoso, A. R., Kline, P., 2016. Bargaining, Sorting, and the Gender Wage Gap: Quantifying the Impact of Firms on the Relative Pay of Women. The Quarterly Journal of Economics 131 (2), 633–686.

Card, D., Heining, J., Kline, P., 2013. Workplace Heterogeneity and the Rise of West German Wage Inequality. The Quarterly Journal of Economics 128 (3), 967–1015.

Carneiro, P., Heckman, J. J., 2003. Human Capital Policy. In: Heckman, J. J., Krueger, A. (Eds.), Inequality in America: What Role for Human Capital Policies. MIT Press.

Charness, G., 2004. Attribution and Reciprocity in an Experimental Labor Market. Journal of Labor Economics 22 (3), 665–688.

Charness, G., Gneezy, U., 2009. Incentives to Exercise. Econometrica 77 (3), 909–931.

Charness, G., Grieco, D., 2012. Individual Creativity, Ex-ante Goals and Financial Incentives, mimeo. REFERENCES 226

Chetty, R., Friedman, J. N., Rockoff, J. E., 2014. Measuring the Impacts of Teachers I: Evalu- ating Bias in Teacher Value-Added Estimates. American Economic Review 104 (9), 2593– 2632.

Dearden, L., Reed, H., Van Reenen, J., 2006. The Impact of Training on Productivity and Wages: Evidence from British Panel Data. Oxford bulletin of economics and statistics 68 (4), 397–421.

Deci, E. L., 1971. Effects of Externally Mediated Rewards on Intrinsic Motivation. Journal of Personality and Social Psychology 18 (1), 105–115.

Deci, E. L., 1972. Intrinsic Motivation, Extrinsic Reinforcement and Inequity. Journal of Per- sonality and Social Psychology 22(1), 113–120.

Deci, E. L., Koestner, R., Ryan, R. M., 1999. A Meta-Analytic Review of Experiments Examin- ing the Effects of Extrinsic Rewards on Intrinsic Motivation. Psychological Bulletin 125 (6), 627–668.

DeJarnette, P., 2015. Effort Momentum. Mimeo.

DFL, 2006–2015. Die wirtschaftliche Situation im Lizenzfussball. Bundesliga Report.

Dickinson, 1999. An Experimental Examination of Labor Supply and Work Intensities. Journal of Labor Economics 17 (4), 638–670.

Dietz, D., Zwick, T., 2016. The Retention Effect of Training: Portability, Visibility, and Credi- bility. ZEW Discussion Papers (16-011).

Dohmen, T., Falk, A., 2011. Performance Pay and Multidimensional Sorting: Productivity, Preferences, and Gender. The American Economic Review 101 (2), 556–590.

Dorner, M., Heining, J., Jacobebbinghaus, P., Seth, S., 2010. The Sample of Integrated Labour Market Biographies. Schmollers Jahrbuch 130 (4), 599–608.

Dufwenberg, M., Kirchsteiger, G., 2004. A Theory of Sequential Reciprocity. Games and Eco- nomic Behavior 47, 268–298.

Dustmann, C., Ludsteck, J., Schönberg, U., 2009. Revisiting the German Wage Structure. The Quarterly Journal of Economics 124 (2), 843–881. REFERENCES 227

Dutcher, G. E., 2012. The Effects of Telecommuting on Productivity: An Experimental Exami- nation. The Role of Dull and Creative Tasks. Journal of Economic Behavior & Organization 84, 355–363.

Eckartz, K., Kirchkamp, O., Schunk, D., 2012. How Do Incentives Affect Creativity? Working Paper.

Eisenberger, R., Shanock, L., 2003. Rewards, Intrinsic Motivation, and Creativity: A Case Study of Conceptual and Methodological Isolation. Creativity Research Journal 15 (2-3), 121–130.

Englmaier, F., Leider, S., 2010. Gift Exchange in the Lab – It is Not (Only) How Much You Give ..., CESifo Working Paper No. 2944.

Erat, S., Gneezy, U., 2015. Incentives for Creativity. Experimental Economics 18, 1–12.

Eriksson, T., Teyssier, S., Villeval, M.-C., 2009. Self-Selection and the Efficiency of Tourna- ments. Economic Inquiry 47 (3), 530–548.

European Commission, 2015. 2015 Joint Report of the Council and the Commission on the im- plementation of the strategic framework for European cooperation in education and training (ET 2020). Official Journal of the European Union C 417, 25–35.

Fahr, R., Irlenbusch, B., 2000. Fairness as a Constraint on Trust in Reciprocity: Earned Property Rights in a Reciprocal Exchange Experiment. Economics Letters 66 (3), 275–282.

Falk, A., Fischbacher, U., 2006. A Theory of Reciprocity. Games and Economic Behavior 54 (2), 293–314.

Falk, A., Ichino, A., 2006. Clean Evidence on Peer Effects. Journal of Labor Economics 24 (1), 39–58.

Fang, M., Gerhart, B., 2012. Does Pay for Performance Diminish Intrinsic Interest? The Inter- national Journal of Human Resource Management 23 (6), 1176–1196.

Feddersen, A., 2006. Economic Consequences of the UEFA Champions League for National Championships: The Case of Germany. No. 01/2006. Hamburg Contemporary Economic Discussions. REFERENCES 228

Feddersen, A., Maennig, W., 2005. Trends in Competitive Balance: Is There Evidence for Growing Imbalance in Professional Sport Leagues?. No. 01/2005. Hamburg Contemporary Economic Discussions.

Fehr, E., Falk, A., 2002. Psychological Foundations of Incentives. European Economic Review 46 (4-5), 687–724.

Fehr, E., Gächter, S., 2000. Fairness and Retaliation: The Economics of Reciprocity. Journal of Economic Perspectives 14 (3), 159–181.

Fehr, E., Gächter, S., 2002. Do Incentive Contracts Undermine Voluntary Cooperation? IEW Working Paper No. 34.

Fehr, E., Gächter, S., Kirchsteiger, G., 1997. Reciprocity as a Contract Enforcement Device: Experimental Evidence. Econometrica: Journal of the Econometric Society 65 (4), 833–860.

Fehr, E., Schmidt, K., 1999. A Theory of Fairness, Competition, and Cooperation. The Quar- terly Journal of Economics 114 (3), 817–868.

Finke, C., 2011. Verdienstunterschiede zwischen Männern und Frauen eine Ursachenanalyse auf Grundlage der Verdienststrukturerhebung 2006. Wirtschaft und Statistik, H 1, 36–50.

Fischbacher, U., 1999. Z-Tree - Experimenter’s Manual. Working Paper No. 21.

Fitzenberger, B., Kunze, A., 2005. Vocational Training and Gender: Wages and Occupational Mobility Among Young Workers. Oxford Review of Economic Policy 21 (3), 392–415.

Fitzenberger, B., Muehler, G., 2014. Dips and Floors in Workplace Training: Using Personnel Records to Estimate Gender Differences. Scottish Journal of Political Economy 62 (4), 325– 429.

Fitzenberger, B., Osikominu, A., Völter, R., 2006. Imputation Rules to Improve the Education Variable in the IAB Employment Subsample. Schmollers Jahrbuch (Journal for Economics and Social Sciences) 126 (3), 405–436.

Fitzgerald, J., Gottschalk, P., Moffitt, R., 1998. An Analysis of Sample Attrition in Panel Data: The Michigan Panel Study of Income Dynamics. The Journal of Human Resources 33 (2), 251–299. REFERENCES 229

Flores, R., Forrest, D., Tena, J. d. D., 2010. Impact on Competitive Balance from Allowing Foreign Players in a Sports League: Evidence from European Soccer. Kyklos 63 (4), 546– 557.

Florida, R., 2002. The Rise of the Creative Class. Basic Books New York.

Fortin, N., Lemieux, T., Firpo, S., 2011. Decomposition Methods in Economics. Handbook of Labor Economics 4, 1–102.

Frazis, H., Gittleman, M., Joyce, M., April 1999. Correlates of Training: An Analysis Using Both Employer and Employee Characteristics. Indus. & Lab. Rel. Rev. 53 (3), 443–462.

Frey, B. S., Jegen, R., 2001. Motivation Crowding Theory. Journal of Economic Surveys 15 (5), 589–611.

Frick, B., Prinz, J., 2006. Crisis? What Crisis? . Journal of Sports Eco- nomics 7 (1), 60–75.

Friedman, J., Hastie, T., Tibshirani, R., 2001. The Elements of Statistical Learning. Vol. 1. Springer Series in Statistics Springer, Berlin.

Friedman, J., Hastie, T., Tibshirani, R., 2010. Regularization Paths for Generalized Linear Mod- els via Coordinate Descent. Journal of Statistical Software 33 (1), 1–22.

Gelman, A., Carlin, J. B., Stern, H. S., Rubin, D. B., 2014. Bayesian Data Analysis. Vol. 2. Chapman & Hall/CRC Boca Raton.

Gelman, A., Hill, J., 2007. Data Analysis Using Regression and Multilevel/Hierarchical Mod- els. Cambridge University Press.

Gensicke, M., Tschersich, N., 2015. Vertiefende Betriebsbefragung ”Arbeitsqualität und wirtschaftlicher Erfolg” 2012. Tech. rep., Institut für Arbeitsmarkt-und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].

Gill, D., Prowse, V., 2012. A Structural Analysis of Disappointment Aversion in a Real Effort Competition. The American Economic Review 102 (1), 469–503.

Gneezy, U., List, J., 2006. Putting Behavioral Economics to Work: Field Evidence on Gift Exchange. Econometrica 74 (5), 1365–1384. REFERENCES 230

Gneezy, U., Meier, S., Rey-Biel, P., 2011. When and Why Incentives (Don’t) Work to Modify Behavior. The Journal of Economic Perspectives, 191–209.

Goerlitz, K., 2011. Continuous Training and Wages. An Empirical Analysis Using a Comparison-group Approach. Economics of Education Review 30 (4), 691–701.

Goldin, C., 2014. A Grand Gender Convergence: Its Last Chapter. The American Economic Review 104 (4), 1091–1119.

Goldstein, H., Browne, W., Rasbash, J., 2002. Partitioning Variation in Multilevel Models. Understanding Statistics: Statistical Issues in Psychology, Education, and the Social Sciences 1 (4), 223–231.

Goossens, K., 2005. Competitive Balance in European Football: Comparison by Adapting Mea- sures: National Measure of Seasonal Imbalance and Top 3. Tech. rep., University of Antwerp, Faculty of Applied Economics.

Görlitz, K., Tamm, M., 2016. Revisiting the Complementarity between Education and Training – The Role of Job Tasks and Firm Effects. Education Economics 24 (3), 261–279.

Greiner, B., 2004. An Online Recruitment System for Economic Experiments. In: Forschung und Wissenschaftliches Rechnen 2003. Vol. 63. GWDG Bericht, pp. 79–93.

Grönlund, A., 2011. On-The-Job Training - A Mechanism for Segregation? Examining the Relationship Between Gender, Occupation, and On-The-Job Training Investments. European Sociological Review.

Groot, L. F. M., 2008. Economics, Uncertainty and European Football: Trends in Competitive Balance. Edward Elgar Publishing.

Grund, C., Martin, J., 2012. Determinants of Further Training-Evidence for Germany. The In- ternational Journal of Human Resource Management 23 (17), 3536–3558.

Guilford, J., 1967. The Nature of Human Intelligence. New York: McGraw-Hill.

Guilford, J. P., 1959. Personality. McGraw-Hill, New York.

Haan, M., Koning, R. H., Van Witteloostuijn, A., 2007. Competitive Balance in National Euro- pean Soccer Competitions. Statistical Thinking in Sports, 63–76. REFERENCES 231

Hall, S., Szymanski, S., Zimbalist, A. S., 2002. Testing Causality Between Team Performance and Payroll the Cases of Major League Baseball and English Soccer. Journal of Sports Eco- nomics 3 (2), 149–168.

Hamermesh, D. S., 1999. LEEping Into the Future of Labor Economics: The Research Potential of Linking Employer and Employee Data. Labour Economics 6 (1), 25–41.

Harbring, C., Irlenbusch, B., 2003. An Experimental Study on Tournament Design. Labour Economics 10 (4), 443–464.

Harbring, C., Irlenbusch, B., 2011. Sabotage in Tournaments: Evidence from a Laboratory Experiment. Management Science 57 (4), 611–627.

Heckman, J. J., 1979. Sample Selection Bias as a Specification Error. Econometrica: Journal of the Econometric Society, 153–161.

Heckman, J. J., Lochner, L. J., Todd, P. E., 2003. Fifty Years of Mincer Earnings Regressions. Tech. rep., National Bureau of Economic Research.

Heffetz, O., Reeves, D. B., 2016. Difficulty to Reach Respondents and Nonresponse Bias: Evi- dence from Large Government Surveys. Working Paper.

Hennig-Schmitt, H., Rockenbach, B., Sadrieh, A., 2010. In Search of Worker’s Real Effort Reciprocity – A Field and a Laboratory Experiment. Journal of the European Economic As- sociation 8 (4), 817–837.

Hentschel, S., Muehlheusser, G., Sliwka, D., 2014. The Contribution of Managers to Organiza- tional Success: Evidence from German Soccer. Working Paper.

Hirsch, B., Schank, T., Schnabel, C., 2010. Differences in Labor Supply to Monopsonistic Firms and the Gender Pay Gap: An Empirical Analysis Using Linked Employer-Employee Data from Germany. Journal of Labor Economics 28 (2), 291–330.

Hox, J., 2010. Multilevel Analysis: Techniques and Applications, 2nd Edition. Routledge Aca- demic.

Huber, M., Schmucker, A., 2012. Panel WeLL: Arbeitnehmerbefragung für das Projekt Beru- fliche Weiterbildung als Bestandteil Lebenslangen Lernens Dokumentation für die Original- daten Wellen 1-4. FDZ Datenreport. Documentation on Labour Market Data (03/2012 DE). REFERENCES 232

Huber, P., Huemer, U., 2015. Gender Differences in Lifelong Learning: An Empirical Analysis of the Impact of Marriage and Children. Labour 29 (1), 32–51.

Jacobebbinghaus, P., Seth, S., 2007. The German Integrated Employment Biographies Sample IEBS. Schmollers Jahrbuch 127 (2), 335–342.

Jenkins, S. P., Cappellari, L., Lynn, P., Jäckle, A., Sala, E., 2006. Patterns of Consent: Evi- dence from a General Household Survey. Journal of the Royal Statistical Society: Series A (Statistics in Society) 169 (4), 701–722.

Joussemet, M., Koestner, R., 1999. Effect of Expected Rewards on Children’s Creativity. Cre- ativity Research Journal 12 (4), 231–239.

Keane, M. P., 1992. A Note on Identification in the Multinomial Probit Model. Journal of Busi- ness & Economic Statistics 10 (2), 193–200.

Késenne, S., 2000. Revenue Sharing and Competitive Balance in Professional Team Sports. Journal of Sports Economics 1 (1), 56–65.

Kessler, J. B., 2013. When Will There Be Gift Exchange? Addressing the Lab-Field Debate With a Laboratory Gift Exchange Experiment. Mimeo.

Kho, M. E., Duffett, M., Willison, D. J., Cook, D. J., Brouwers, M. C., 2009. Written Informed Consent and Selection Bias in Observational Studies Using Medical Records: Systematic Review. BMJ 338.

Kiefel, M., Warnke, A. J., 2015. A Comparison of Bayesian and Frequentist Association Foot- ball Forecasting Models Using Individual-Level Player Data. Mimeo.

Kim, K., 2006. Can We Trust Creativity Tests? A Review of the Torrance Tests of Creative Thinking (TTCT). Creativity Research Journal 18 (1), 3–14.

Knerr, P., Schröder, H., Aust, F., Gilberg, R., et al., 2009. Berufliche Weiterbildung als Be- standteil Lebenslangen Lernens (well) WeLL-Erhebung 2007-Methodenbericht. FDZ Meth- odenreport (6).

Kölling, A., 2000. The IAB-Establishment Panel. Schmollers Jahrbuch 120 (2), 291–300.

Koning, R. H., 2009. Sport and Measurement of Competition. De Economist 157 (2), 229–249. REFERENCES 233

Korbmacher, J. M., Schroeder, M., et al., 2013. Consent When Linking Survey Data With Administrative Records: The Role of the Interviewer. In: Survey Research Methods. Vol. 7. Citeseer, pp. 115–131.

Kräkel, M., 2008. Emotions in Tournaments. Journal of Economic Behavior & Organization 67 (1), 204–214.

Kremer, M., Williams, H., 2010. Incentivizing Innovation: Adding to the Tool Kit. In: Innova- tion Policy and the Economy, Volume 10. University of Chicago Press, pp. 1–17.

Kube, S., Marechal, M. A., Puppe, C., 2012. The Currency of Reciprocity: Gift Exchange in the Workplace. American Economic Review 102 (4), 1644–62.

Kube, S., Marechal, M. A., Puppe, C., 2013. Do Wage Cuts Damage Work Morale? Evidence from a Natural Field Experiment. Journal of the European Economic Association 11(4), 853– 870.

Kulish, N., April 24, 2013. Bayern Munich Eyes Midfielder Götze. New York Times.

Künn, S., 2015. The Challenges of Linking Survey and Administrative Data. IZA World of Labor (214).

Lane, A. M., Whyte, G. P., Terry, P. C., Nevill, A. M., 2005. Mood, Self-set Goals and Exami- nation Performance: The Moderating Effect of Depressed Mood. Personality and individual differences 39 (1), 143–153.

Laske, K., Schroeder, M., 2015. Quantity, Quality, and Novelty: Direct and Indirect Effects of Incentives on Creativity. Mimeo.

Lazear, E. P., 2000. Performance Pay and Productivity. American Economic Review 90 (5), 1346–1361.

Lazear, E. P., Gibbs, M., 2009. Personnel Economics in Practice, 2nd Edition. John Wiley & Sons.

Lazear, E. P., Oyer, P., 2014. Personnel Economics. Handbook of Organizational Economics 3.

Lazear, E. P., Rosen, S., 1981. Rank-Order Tournaments as Optimum Labor Contracts. Journal of Political Economy 89 (5), 841–64. REFERENCES 234

Lazear, E. P., Rosen, S., 1990. Male-Female Wage Differentials in Job Ladders. Journal of Labor Economics, 106–123.

Leber, U., Müller, I., 2008. Weiterbildungsbeteiligung ausgewählter Personengruppen. Schmollers Jahrbuch : Journal of Applied Social Science Studies / Zeitschrift für Wirtschafts- und Sozialwissenschaften 128 (3), 405–429.

Lepper, M. R., Greene, D., Nisbett, R. E., 1973. Undermining Children’s Intrinsic Interest With Extrinsic Rewards: A Test of The Over-Justification Hypothesis. Journal of Personality and Social Psychology 28 (1), 129–137.

Leuven, E., Oosterbeek, H., 2008. An Alternative Approach to Estimate the Wage Returns to Private-Sector Training. Journal of Applied Econometrics 23 (4), 423–434.

Leuven, E., Oosterbeek, H., Sloof, R., Van Klaveren, C., 2005. Worker Reciprocity and Em- ployer Investment in Training. Economica 72 (285), 137–149.

Levine, D. K., 1998. Modeling Altruism and Spitefulness in Experiments. Review of Economic Dynamics 1 (3), 593–622.

Little, R. J., 1993. Pattern-Mixture Models for Multivariate Incomplete Data. Journal of the American Statistical Association 88 (421), 125–134.

Ludsteck, J., 2014. The Impact of Segregation and Sorting on the Gender Wage Gap: Evidence from German Linked Longitudinal Employer-Employee Data. Industrial & Labor Relations Review 67 (2), 362–394.

Lynch, L. M., Black, S. E., 1998. Beyond the Incidence of Training: Evidence From a National Employers’ Survey. Industrial and Labor Relations Review 52, 64–81.

Macdonald, B., et al., 2012. Adjusted Plus-Minus for NHL Players Using Ridge Regression with Goals, Shots, Fenwick, and Corsi. Journal of Quantitative Analysis in Sports 8 (3), 1–24.

Macpherson, D. A., Hirsch, B. T., 1995. Wages and Gender Composition: Why Do Women’s Jobs Pay Less? Journal of Labor Economics, 426–471.

MaCurdy, T., Mroz, T., Gritz, R. M., 1998. An Evaluation of the National Longitudinal Survey on Youth. The Journal of Human Resources 33 (2), 345–436. REFERENCES 235

Maxcy, J. G., et al., 2009. Progressive Revenue Sharing in MLB: The Effect on Player Transfers and Talent Distribution. Review of Industrial Organization 35 (3), 275–297.

Michie, J., Oughton, C., 2004. Competitive Balance in Football: Trends and Effects. The Sport- snexus.

Mincer, J., 1958. Investment in Human Capital and Personal Income Distribution. The Journal of Political Economy, 281–302.

Mohnen, A., Pokorny, K., Sliwka, D., 2008. Transparency, Inequity Aversion, and the Dynam- ics of Peer Pressure in Teams: Theory and Evidence. Journal of Labor Economics 26 (4), 693–720.

Mulligan, C. B., Rubinstein, Y., 2008. Selection, Investment, and Women’s Relative Wages over Time. The Quarterly Journal of Economics, 1061–1110.

Neale, W. C., 1964. The Peculiar Economics of Professional Sports: A Contribution to the The- ory of the Firm in Sporting Competition and in Market Competition. The Quarterly Journal of Economics, 1–14.

Oster, E., forthcoming. Unobservable Selection and Coefficient Stability: Theory and Evidence. Journal of Business Economics and Statistics.

Partosch, C., 2014. Der Einfluss der Champions League auf die Wettbewerbsposition einzel- ner Vereine und die Competitive Balance der Bundesliga. Tech. rep., Diskussionspapier des Instituts für Organisationsökonomik.

Pawlowski, T., Breuer, C., Hovemann, A., 2010. Top Clubs Performance and the Competitive Situation in European Domestic Football Competitions. Journal of Sports Economics 11 (2), 186–202.

Pfeifer, C., 2015. Intra-Firm Wage Compression and Coverage of Training Costs Evidence From Linked Employer-Employee Data. ILR Review.

Phelps, E. S., 1972. The Statistical Theory of Racism and Sexism. The American Economic Review 62 (4), 659–661.

Picchio, M., Van Ours, J. C., 2016. Gender and the effect of working hours on firm-sponsored training. Journal of Economic Behavior & Organization 125, 192–211. REFERENCES 236

Pischke, J.-S., 2001. Continuous Training in Germany. Journal of Population Economics 14 (3), 523–548.

Polachek, S. W., 1981. Occupational Self-Selection: A Human Capital Approach to Sex Differ- ences in Occupational Structure. The Review of Economics and Statistics, 60–69.

Prendergast, C., 1999. The Provision of Incentives in Firms. Journal of Economic Literature 37 (1), 7–63.

Puhani, P. A., Sonderhof, K., 2011. The Effects of Parental Leave Extension on Training for Young Women. Journal of Population Economics 24 (2), 731–760.

Rabe Hesketh, S., Skrondal, A., 2012. Multilevel and Longitudinal Modelling Using Stata, third edition Edition. STATA Press.

Rainer, H., Siedler, T., 2009. Does Democracy Foster Trust? Journal of Comparative Economics 37 (2), 251–269.

Robinson, T., Simmons, R., 2014. Gate-sharing and Talent Distribution in the English Football League. International Journal of the Economics of Business 21 (3), 413–429.

Rohrbach-Schmidt, D., Tiemann, M., 2013. Changes in Workplace Tasks in Germany – Evalu- ating Skill and Task Measures. Journal for Labour Market Research 46 (3), 215–237.

Rottenberg, S., 1956. The Baseball Players’ Labor Market. The Journal of Political Economy 64 (3), 242–258.

Royalty, A. B., 1996. The Effects of Job Turnover on the Training of Men and Women. Indus- trial & Labor Relations Review 49 (3), 506–521.

Rubin, D. B., 1976. Inference and Missing Data. Biometrika 63 (3), 581–592.

Runco, M., 1991. Divergent Thinking. Ablex Publishing, Norwood, N J.

Sæbø, O. D., Hvattum, L. M., 2015. Evaluating the Efficiency of the Association Football Transfer Market Using Regression Based Player Ratings. Norsk Informatikkonferanse (NIK).

Sakshaug, J. W., Couper, M. P., Ofstedal, M. B., Weir, D. R., 2012. Linking Survey and Ad- ministrative Records Mechanisms of Consent. Sociological Methods & Research 41 (4), 535– 569. REFERENCES 237

Sakshaug, J. W., Huber, M., 2015. An Evaluation of Panel Nonresponse and Linkage Consent Bias in a Survey of Employees in Germany. Journal of Survey Statistics and Methodology.

Sakshaug, J. W., Kreuter, F., 2012. Assessing the Magnitude of Non-consent Biases in Linked Survey and Administrative Data. Survey Research Methods 6 (2), 113–122.

Sala, E., Burton, J., Knies, G., 2012. Correlates of Obtaining Informed Consent to Data Linkage Respondent, Interview, and Interviewer Characteristics. Sociological Methods & Research 41 (3), 414–439.

Schmucker, A., Seth, S., Eberle, J., 2014. WeLL-Befragungsdaten verknüpft mit administra- tiven Daten des IAB:(WELL-ADIAB) 1975-2011. Tech. rep., Institut für Arbeitsmarkt-und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Ger- many].

Schütz, H., Gilberg, R., Dickmann, C., Schröder, H., et al., 2014. IAB-Beschäftigtenbefragung: Projekt ”Arbeitsqualität und wirtschaftlicher Erfolg: Panelstudie zu Entwicklungsver- läufen in deutschen Betrieben-Personenbefragung”. Tech. rep., Institut für Arbeitsmarkt-und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Ger- many].

Schwerdt, G., Messer, D., Woessmann, L., Wolter, S. C., 2012. The Impact of an Adult Edu- cation Voucher Program: Evidence from a Randomized Field Experiment. Journal of Public Economics 96 (7), 569–583.

Shalley, C., Zhou, J., Oldham, G., 2004. The Effects of Personal and Contextual Characteristics on Creativity: Where Should We Go from here? Journal of Management 30 (6), 933–958.

Shearer, B., 2004. Piece Rates, Fixed Wages and Incentives: Evidence From a Field Experi- ment. The Review of Economic Studies 71 (2), 513–534.

Sheehan, K. B., 2002. Toward a Typology of Internet Users and Online Privacy Concerns. The Information Society 18 (1), 21–32.

Simonton, D. K., 2004. Creativity in Science: Chance, Logic, Genius, and Zeitgeist. Cambridge University Press.

Simpson, P. A., Stroh, L. K., 2002. Revisiting Gender Variation in Training. Feminist Eco- nomics 8 (3), 21–53. REFERENCES 238

Sloane, P. J., 1971. The Economics of Professional Football: The Football Club as a Utility Maximizer. Scottish Journal of Political Economy 18 (2), 121–146.

Solon, G., Haider, S. J., Wooldridge, J. M., 2015. What Are We Weighting for? Journal of Human Resources 50 (2), 301–316.

Spitz-Oener, A., 2006. Technical Change, Job Tasks, and Rising Educational Demands: Look- ing Outside the Wage Structure. Journal of Labor Economics 24 (2), 235–270.

StataCorp, L., 2013. Stata Multilevel Mixed-Effects Reference Manual. College Station, TX: StataCorp LP.

Statistisches Bundesamt, 2016. Verdienstunterschied zwischen Frauen und Männern in Deutschland bei 21%. Pressemitteilung 097/16.

Stryhn, H., Sanchez, J., Morley, P., Booker, C., Dohoo, I., 2006. Interpretation of Variance Pa- rameters in Multilevel Poisson Regression Models. In: Proceedings of the 11th International Symposium on Veterinary Epidemiology and Economics.

Szymanski, S., 2001. Income Inequality, Competitive Balance and the Attractiveness of Team Sports: Some Evidence and a Natural Experiment from English Soccer. Economic Journal, 69–84.

Szymanski, S., Késenne, S., 2004. Competitive Balance and Gate Revenue Sharing in Team Sports. The Journal of Industrial Economics 52 (1), 165–177.

The Economist, 2016. How to Reform the Champions League (April 27th).

Tibshirani, R., 1996. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288.

Torrance, E., 1968. Torrance tests of creative thinking. Personnel Press, Incorporated.

Torrance, E. P., 1998. Torrance Tests of Creative Thinking: Norms-Technical Manual: Figural (Streamlined) Forms A & B. Scholastic Testing Service.

UEFA, 2004–2015. Financial Report. UEFA Financial Report. van den Berg, G. J., Lindeboom, M., Dolton, P. J., 2006. Survey Non-response and the Duration of Unemployment. Journal of the Royal Statistical Society: Series A (Statistics in Society) 169 (3), 585–604. REFERENCES 239 van der Weele, J., Kulisa, J., Kosfeld, M., Friebel, G., 2010. Resisting Moral Wiggle Room: How Robust is Reciprocity? IZA Discussion Paper No. 5374.

Vansteenkiste, M., Deci, E. L., 2003. Competitively Contingent Rewards and Intrinsic Motiva- tion: Can Losers Remain Motivated? Motivation and Emotiond 27 (4), 273–299.

Warnke, A. J., 2015. Verzerrung durch selektive Stichproben. In: Nonresponse Bias. Springer, pp. 305–323.

Webber, D. A., 2016. Firm-Level Monopsony and the Gender Pay Gap. Industrial Relations: A Journal of Economy and Society 55 (2), 323–345.

Weitzman, M. L., 1998. Recombinant Growth. The Quarterly Journal of Economics 113 (2), 331–360.

Woodman, R. W., Sawyer, J. E., Griffin, R. W., 1993. Toward a Theory of Organizational Creativity. The Academy of Management Review 18 (2), 293–321.

Wooldridge, J. M., 2002. Econometric Analysis of Cross-Section and Panel Data. The MIT Press.

Yuan, M., Lin, Y., 2006. Model Selection and Estimation in Regression with Grouped Variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1), 49–67.

Zwick, T., 2005. Continuing Vocational Training Forms and Establishment Productivity in Ger- many. German Economic Review 6 (2), 155–184.

Zwick, T., 2015. Training Older Employees: What is Effective? International Journal of Man- power 36 (2), 136–150. Bibliography of the Chapters

Chapter 1:

Bradler, C., Neckermann, S., and Warnke, A. J., 2016. Incentivizing Creativity: a Large-Scale Experiment with Tournaments and Gifts. ZEW Discussion Paper (16-040), Mannheim.

Chapter 2:

Sittl, R., Warnke, A. J., 2016. Competitive Balance and Assortative Matching in the German Bundesliga. ZEW Discussion Paper (16-058), Mannheim.

Chapter 3:

Steffes, S., Warnke, A. J., 2016, New Evidence on the Determinants of Firm-based Training. Unpublished manuscript.

Chapter 4:

Steffes, S., Warnke, A. J., 2016, Gender Differences in Wages and Training. Unpublished manuscript.

Chapter 5:

Warnke, A. J., 2016, An Investigation of Record Linkage Refusal and Its Implications for Em- pirical Research. Unpublished manuscript.

240